$ foo | bar
Your standard Unix pipe. It connects foo
's stdout
with bar
's stdin
-- essentially, foo
sends data to bar
.
But what if bar
takes ~1 second to start? Does foo
output a single line and then wait for bar
to consume it? Or does it write until it can't anymore and becomes "blocked" on I/O?
I had a realization the other day that I should have known to begin with. The latter scenario was true, but I wanted to see it in action:
$ yes | tee >(sleep 1) | wc -c
yes
repeatedly outputs a literal 'y' character tostdout
tee
takes input and sends it to two places:stdout
(that is, thestdout
oftee
)- another file descriptor (fd)
sleep 1
will sleep for 1 secondwc -c
will count the number of bytes on itsstdin
The whole pipeline reads: Run yes
(which outputs y
as fast as possible) and send that data into tee
. tee
should read data as fast as possible, printing to its stdout
as well as sending data to sleep
. The sleep
call won't do anything with its stdin
-- it just sleeps for 1 second. wc -c
reads the bytes which tee
outputs.
Each one of these commands, yes
, tee
, sleep
, wc
, will all spawn their own processes and execute independently, though their stdin
s and stdout
s are connected in some pipeline.
sleep
, which doesn't read from its stdin
, will eventually exit which will cause tee
to receive a signal that the fd it was instructed to write to is no longer open. tee
will then exit, causing yes
to exit since its stdout
fd is no longer open.
The scenario here is that if there wasn't some type of pipe buffer maximum size, yes
could write data unbounded and either fill RAM or consume disk to the point that resources are hogged or the system crashes.
There is a size limit to the pipe buffer, and in our example we see the amount of data written to the buffer before yes
is blocked is 69632 bytes. This size is different on varying systems and can even be different per process.
Regardless, just to prove a point, this number shouldn't change even if we give the whole pipeline more time to write:
$ yes | tee >(sleep 1) | wc -c
69632
$ yes | tee >(sleep 2) | wc -c
69632
But, what about a semantically equivalent Bash expression?
$ tee >(sleep 1) < <(yes) | wc -c
69632 # Spoilers!
tee
is the same as above, but this time uses the<
operator to specify which fd to use asstdin
yes
is the same as above, but...<()
syntax essentially creates a temporary file descriptor
wc -c
is the same as above
Here, the pipeline reads exactly the same with one small exception: yes
writes to a temporary fd which tee
uses as its input.
The questions to ask here are:
- Does process substitution (
<()
) make an in-RAM fd or is it potentially written to disk? - Does it behave just like the pipe operator (
|
) or is it better to use it in certain scenarios?
I will have to dive into the system calls made between the two calls at a later point, but for now I am satisfied that this is nearly identical between the two contexts on account of the same number of bytes being written before yes
becomes blocked.