Pipe vs stdin redirection

$ foo | bar

Your standard Unix pipe. It connects foo's stdout with bar's stdin -- essentially, foo sends data to bar.

But what if bar takes ~1 second to start? Does foo output a single line and then wait for bar to consume it? Or does it write until it can't anymore and becomes "blocked" on I/O?

I had a realization the other day that I should have known to begin with. The latter scenario was true, but I wanted to see it in action:

$ yes | tee >(sleep 1) | wc -c
  • yes repeatedly outputs a literal 'y' character to stdout
  • tee takes input and sends it to two places:
    • stdout (that is, the stdout of tee)
    • another file descriptor (fd)
  • sleep 1 will sleep for 1 second
  • wc -c will count the number of bytes on its stdin

The whole pipeline reads: Run yes (which outputs y as fast as possible) and send that data into tee. tee should read data as fast as possible, printing to its stdout as well as sending data to sleep. The sleep call won't do anything with its stdin -- it just sleeps for 1 second. wc -c reads the bytes which tee outputs.

Each one of these commands, yes, tee, sleep, wc, will all spawn their own processes and execute independently, though their stdins and stdouts are connected in some pipeline.

sleep, which doesn't read from its stdin, will eventually exit which will cause tee to receive a signal that the fd it was instructed to write to is no longer open. tee will then exit, causing yes to exit since its stdout fd is no longer open.

The scenario here is that if there wasn't some type of pipe buffer maximum size, yes could write data unbounded and either fill RAM or consume disk to the point that resources are hogged or the system crashes.

There is a size limit to the pipe buffer, and in our example we see the amount of data written to the buffer before yes is blocked is 69632 bytes. This size is different on varying systems and can even be different per process.

Regardless, just to prove a point, this number shouldn't change even if we give the whole pipeline more time to write:

$ yes | tee >(sleep 1) | wc -c

$ yes | tee >(sleep 2) | wc -c

But, what about a semantically equivalent Bash expression?

$ tee >(sleep 1) < <(yes) | wc -c
69632  # Spoilers!  
  • tee is the same as above, but this time uses the < operator to specify which fd to use as stdin
  • yes is the same as above, but...
    • <() syntax essentially creates a temporary file descriptor
  • wc -c is the same as above

Here, the pipeline reads exactly the same with one small exception: yes writes to a temporary fd which tee uses as its input.

The questions to ask here are:

  1. Does process substitution (<()) make an in-RAM fd or is it potentially written to disk?
  2. Does it behave just like the pipe operator (|) or is it better to use it in certain scenarios?

I will have to dive into the system calls made between the two calls at a later point, but for now I am satisfied that this is nearly identical between the two contexts on account of the same number of bytes being written before yes becomes blocked.