Wednesday, January 19, 2011

Count number of bytes piped from one process to another

I'm running a shell script that pipes data from one process to another

process_a | process_b

Does anyone know a way to find out how many bytes were passed between the two programs? The only solution I can think of at the moment would be to write a small c program that reads from stdin, writes to stdout and counts all the of the data transfered, storing the count in an environment variable, like:

process_a | count_bytes | process_b

Does anyone have a neater solution?

  • Use pv the pipe viewer. It's a great tool. Once you know about it you'll never know how you lived without it.

    It can also show you a progress bar, and the 'speed' of transfering.

    Simon Hodgson : In my searching I had come accross this, but I need it to set a variable with the number of bytes transfered so that I can use it in another process.
  • process_a | tee >(process_b) | wc --bytes might work. You can then redirect wc's count to where-ever you need it. If process_b outputs anything to stdout/stderr you will probably need to redirect this off somewhere, if only /dev/null.

    For a slightly contrived example:

    filestore:~# cat document.odt | tee >(dd of=/dev/null 2>/dev/null) | wc --bytes
    4295
    

    By way of explanation: tee lets you direct output to multiple files (plus stdout) and the >() construct is bash's "process substitution" which makes a process look like a write-only file in this case so you can redirect to processes as well as files (see here, or this question+answer for an example of using tee to send output to many processes).

    Simon Hodgson : I like this solution, sadly the shelll I'm using (BusyBox) doesn't appear to support the >() notation, but it does provide a way of doing what I'm after.
    David Spillett : Aye, you need a pretty complete bash to have that feature - it is the sort of thing that isn't commonly used so gets stripped out of cut-down shells (even those with a target of being more-or-less bash compatible) like busybox in order to save space.
  • Pipe through dd. dd's default input is stdin and default output is stdout; when it finishes stdin/stdout I/O, it will report to stderr on how much data it transferred.

    If you want to capture the output of dd and the other programs already talk to stderr, then use another file-descriptor. Eg,

    $ exec 4>~/fred
    $ input-command | dd 2>&4 | output-command
    $ exec 4>&-
    
    Dennis Williamson : Couldn't you skip the `exec` and just output to the file directly? `input-command | dd 2>~/fred | output-command`
    Phil P : Uh, yes. I was apparently having one of "those" moments, sorry.
    From Phil P

0 comments:

Post a Comment