29 September 2006

BufferedReader vs InputStream

Java has an IO system that's really flexible for reading and writing data. If you keep NIO out of the equation, then data from an input source typically originates from an InputStream. However, most tutorials and mentors would recommend that you wrap this raw stream into something more appropriate for your needs. With text files, when you want to inhale in Strings (XML falls into this category), you frequently see examples of an InputStream being wrapped up sequentially into a BufferedReader. So, do these abstraction layers for IO impose a penalty on performance?

I reused my big file (USA.txt) and set up a benchmark run with a BufferedReader.

...
File f = new File("USA.txt");
FileInputStream fin = new FileInputStream(f);
BufferedReader in = new BufferedReader(new InputStreamReader(fin));
String s = "";
long start = System.currentTimeMillis();
while ( (s = in.readLine()) != null) {
int x = s.length();
}
long finish = System.currentTimeMillis();
System.out.println(finish - start);
...


The chart shows the performance of 100 runs of the code. It's pretty stable and hovers somewhere between 80 to 100 milliseconds.



Next step: get the InputStream up and running and compare. So, I went with a code block like so:

...
File f = new File("USA.txt");
InputStream in = new FileInputStream(f);
// buffSize configurable
byte[] buffer = new byte[buffSize];
int len = 0;
long start = System.currentTimeMillis();
while ( (len = in.read(buffer)) > 0 ) {
String s = new String(buffer, 0, len);
int x = s.length();
}
long finish = System.currentTimeMillis();
System.out.println(finish - start);
...


One of the drawbacks to the InputStream is that you can either read a byte at a time, or have to supply a byte read size. I started low, with a 16 byte sized array to read into. I could have been cute here, but I opted for simplicity. Here's how the 16 byte sized reads from the InputStream stacked up against the BufferedReader:



Not so hot, but not so surprising either. With such a small buffer size, I was surely choking the read speeds. So, I decided to try the runs with incrementing buffer sizes (32, 64, 265, 512, 1024, 2048 and 2086). And, here's what happened:



As we ratchet up the buffer size, we see performance gains. However, these gains steadily diminish. The most notable detail is that there's no point at which the InputStream read outperforms the BufferedReader. Behind the nice and friendly methods that the BufferedReader provides, there's clearly a huge amount of cool code that bridges the Reader to the InputStream. The cute parts that figure out the best way to read seem to be hidden, and working.

Spend some time reading through performance myths (from the pros), and there's one constant that shows up consistently. Keep it simple. The VMs do a lot of hard work for you, so you can write code that is clean. And, clearly this simplicity doesn't impose a performance burden here. Nifty.

Now, I wonder how much NIO helps...

7 comments:

  1. It does make sense

    BufferedReader buffers what it reads it, meaning that it uses an internal storage area to hold more
    data that is requested, so helping to reduce the number of low-level I/O
    operations performed. The end result is faster, more efficient I/O.

    The same reasoning stands for improving performances in the input stream test u ran. The increasing of the buffer size led to the same behaviour as defined above.


    If you think about it, a buffered stream will read a whole buffer of
    information from a socket, file, serial port, etc. Then a program can
    read this information from the buffer... and the buffer will
    automagically refill itself as needed from the source. This is really
    useful if you're trying to suck data from a source as fast as
    possible... but there will be instances where you want to keep the
    data in the stream

    Now there is a difference between a BufferedStream and a BufferedReader
    'reader'-based classes
    read 8-or-more bit characters, translating them into 16 bit Unicode.

    while 'InputStream'-based classes read 8 bit bytes

    ReplyDelete
  2. Anonymous6:48 AM

    Great article thanks.

    Ali

    ReplyDelete
  3. Anonymous10:43 AM

    You are awesome man....this is some article....really helpful...

    ReplyDelete
  4. Anonymous9:46 PM

    You're allocating new strings in the while loop of the InputStream code. That can cause a significant performance overhead, because you have to create a bunch of new objects.

    ReplyDelete
  5. great article,
    but what did you use for the benchmark?

    ReplyDelete
  6. Thanks, It's great help to me. Good luck.

    ReplyDelete
  7. Anonymous2:19 PM

    great article.. thanks. it would be awesome if you could add the code as well here so we could test it ourselves...

    ReplyDelete