$ foo | bar
Your standard Unix pipe. It connects
stdin -- essentially,
foo sends data to
But what if
bar takes ~1 second to start? Does
foo output a single line and then wait for
bar to consume it? Or does it write until it can't anymore and becomes "blocked" on I/O?
I had a realization the other day that I should have known to begin with. The latter scenario was true, but I wanted to see it in action:
$ yes | tee >(sleep 1) | wc -c
yesrepeatedly outputs a literal 'y' character to
teetakes input and sends it to two places:
stdout(that is, the
- another file descriptor (fd)
sleep 1will sleep for 1 second
wc -cwill count the number of bytes on its
The whole pipeline reads: Run
yes (which outputs
y as fast as possible) and send that data into
tee should read data as fast as possible, printing to its
stdout as well as sending data to
sleep call won't do anything with its
stdin -- it just sleeps for 1 second.
wc -c reads the bytes which
Each one of these commands,
wc, will all spawn their own processes and execute independently, though their
stdouts are connected in some pipeline.
sleep, which doesn't read from its
stdin, will eventually exit which will cause
tee to receive a signal that the fd it was instructed to write to is no longer open.
tee will then exit, causing
yes to exit since its
stdout fd is no longer open.
The scenario here is that if there wasn't some type of pipe buffer maximum size,
yes could write data unbounded and either fill RAM or consume disk to the point that resources are hogged or the system crashes.
There is a size limit to the pipe buffer, and in our example we see the amount of data written to the buffer before
yes is blocked is 69632 bytes. This size is different on varying systems and can even be different per process.
Regardless, just to prove a point, this number shouldn't change even if we give the whole pipeline more time to write:
$ yes | tee >(sleep 1) | wc -c 69632 $ yes | tee >(sleep 2) | wc -c 69632
But, what about a semantically equivalent Bash expression?
$ tee >(sleep 1) < <(yes) | wc -c 69632 # Spoilers!
teeis the same as above, but this time uses the
<operator to specify which fd to use as
yesis the same as above, but...
<()syntax essentially creates a temporary file descriptor
wc -cis the same as above
Here, the pipeline reads exactly the same with one small exception:
yes writes to a temporary fd which
tee uses as its input.
The questions to ask here are:
- Does process substitution (
<()) make an in-RAM fd or is it potentially written to disk?
- Does it behave just like the pipe operator (
|) or is it better to use it in certain scenarios?
I will have to dive into the system calls made between the two calls at a later point, but for now I am satisfied that this is nearly identical between the two contexts on account of the same number of bytes being written before
yes becomes blocked.