Whether out of habit or clarity of reading, I often find myself writing shell recipes like:
$ cat file.txt | grep foo
$ grep foo < file.txt
In order to read then search a file for some text -- there's something about using
cat and piping it into another command that is easier for me to follow.
However, there is a price to be paid for such a pattern. But how expensive is it?
pv, monitor the progress of data through a pipe.
This is a neat tool which you can insert between any piped expressions to see the flow of data between processes. It has a number of flags which can alter the output format, or even rate limit the flow of data between said processes!
$ cat file.txt | pv | grep foo
This would not only show the results from
pv would also print a progress bar and data transfer rate. So now we can measure the speed of various CLI recipes.
Slow to fast
- Using a Raspberry Pi 2b
- Transferring 1GiB
Not sure exactly why, but
yes is pretty slow:
$ yes | head --bytes $((1024**3)) | pv | cat > /dev/null 1GiB 0:01:12 [14.2MiB/s]
yes to see if this is any faster:
$ cat /dev/zero | head --bytes $((1024**3)) | pv | cat > /dev/null 1GiB 0:00:07 [ 133MiB/s]
Wow. What if we remove the initial
cat and have
head read from
$ head --bytes $((1024**3)) < /dev/zero | pv | cat > /dev/null 1GiB 0:00:04 [ 247MiB/s]
And if we remove the final
cat and have
pv write to
$ head --bytes $((1024**3)) < /dev/zero | pv > /dev/null 1GiB 0:00:03 [ 287MiB/s]
We've gone from 133MiB/s to 287MiB/s, a ~2x improvement just by removing removing the preceding and trailing
pv without any other processes will yield the best performance. However, since we're unable to limit the amount of data to read, I will just manually end it after 10 seconds:
$ pv < /dev/zero > /dev/null 8.32GiB 0:00:10 [ 861MiB/s] ^C
(I suppose I could have utilized a RAM disk, but the Raspberry Pi -- limited to 1GB of RAM -- is not really an option.)
So I should really start considering having tools read/write directly to destinations, rather than using pipes liberally.