Whether out of habit or clarity of reading, I often find myself writing shell recipes like:
$ cat file.txt | grep foo
$ grep foo < file.txt
In order to read then search a file for some text -- there's something about using
cat and piping it into another command that is easier for me to follow.
However, there is a price to be paid for such a pattern. But how expensive is it?
pv, monitor the progress of data through a pipe.
This is a neat tool which you can insert between any piped expressions to see the flow of data between processes. It has a number of flags which can alter the output format, or even rate limit the flow of data between said processes!
$ cat file.txt | pv | grep foo
This would not only show the results from
pv would also print a progress bar and data transfer rate. So now we can measure the speed of various CLI recipes.
Slow to fast
- Using a Raspberry Pi 2b
- Transferring 1GiB
Not sure exactly why, but
yes is pretty slow:
$ yes | head --bytes $((1024**3)) | pv | cat > /dev/null 1GiB 0:01:12 [14.2MiB/s]
yes to see if this is any faster:
$ cat /dev/zero | head --bytes $((1024**3)) | pv | cat > /dev/null 1GiB 0:00:07 [ 133MiB/s]
Wow. What if we remove the initial
cat and have
head read from
$ head --bytes $((1024**3)) < /dev/zero | pv | cat > /dev/null 1GiB 0:00:04 [ 247MiB/s]
And if we remove the final
cat and have
pv write to
$ head --bytes $((1024**3)) < /dev/zero | pv > /dev/null 1GiB 0:00:03 [ 287MiB/s]
We've gone from 133MiB/s to 287MiB/s, a ~2x improvement just by removing the preceding and trailing
pv without any other processes will yield the best performance. However, since we're unable to limit the amount of data to read, I will just manually end it after 10 seconds:
$ pv < /dev/zero > /dev/null 8.32GiB 0:00:10 [ 861MiB/s] ^C
(I suppose I could have utilized a RAM disk, but the Raspberry Pi -- limited to 1GB of RAM -- is not really an option.)
So I should really start considering having tools read/write directly to destinations, rather than using pipes liberally.