29 September 2006

BufferedReader vs InputStream

Java has an IO system that's really flexible for reading and writing data. If you keep NIO out of the equation, then data from an input source typically originates from an InputStream. However, most tutorials and mentors would recommend that you wrap this raw stream into something more appropriate for your needs. With text files, when you want to inhale in Strings (XML falls into this category), you frequently see examples of an InputStream being wrapped up sequentially into a BufferedReader. So, do these abstraction layers for IO impose a penalty on performance?

I reused my big file (USA.txt) and set up a benchmark run with a BufferedReader.

...
File f = new File("USA.txt");
FileInputStream fin = new FileInputStream(f);
BufferedReader in = new BufferedReader(new InputStreamReader(fin));
String s = "";
long start = System.currentTimeMillis();
while ( (s = in.readLine()) != null) {
int x = s.length();
}
long finish = System.currentTimeMillis();
System.out.println(finish - start);
...


The chart shows the performance of 100 runs of the code. It's pretty stable and hovers somewhere between 80 to 100 milliseconds.



Next step: get the InputStream up and running and compare. So, I went with a code block like so:

...
File f = new File("USA.txt");
InputStream in = new FileInputStream(f);
// buffSize configurable
byte[] buffer = new byte[buffSize];
int len = 0;
long start = System.currentTimeMillis();
while ( (len = in.read(buffer)) > 0 ) {
String s = new String(buffer, 0, len);
int x = s.length();
}
long finish = System.currentTimeMillis();
System.out.println(finish - start);
...


One of the drawbacks to the InputStream is that you can either read a byte at a time, or have to supply a byte read size. I started low, with a 16 byte sized array to read into. I could have been cute here, but I opted for simplicity. Here's how the 16 byte sized reads from the InputStream stacked up against the BufferedReader:



Not so hot, but not so surprising either. With such a small buffer size, I was surely choking the read speeds. So, I decided to try the runs with incrementing buffer sizes (32, 64, 265, 512, 1024, 2048 and 2086). And, here's what happened:



As we ratchet up the buffer size, we see performance gains. However, these gains steadily diminish. The most notable detail is that there's no point at which the InputStream read outperforms the BufferedReader. Behind the nice and friendly methods that the BufferedReader provides, there's clearly a huge amount of cool code that bridges the Reader to the InputStream. The cute parts that figure out the best way to read seem to be hidden, and working.

Spend some time reading through performance myths (from the pros), and there's one constant that shows up consistently. Keep it simple. The VMs do a lot of hard work for you, so you can write code that is clean. And, clearly this simplicity doesn't impose a performance burden here. Nifty.

Now, I wonder how much NIO helps...

28 September 2006

StringBuilder vs StringBuffer

As Java 1.5 came out, many of us were eager to get our hands on StringBuilder. As a non-thread-safe version of StringBuffer, one would imagine that it would pack some heat on the performance end for cases where you don't have multi-threaded appends.

To test the relative performance differences, I used a BufferedReader over decently sized text file (622732 words gathered by repeatedly pasting the wikipedia document on USA). Armed with data, I wrote and measured the following loops that merely read in the file and appended to either a StringBuffer or a StringBuilder.


....
for (int x=0; x<100; x++) {
File f = new File("USA.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = "";
StringBuffer sb = new StringBuffer();
// StringBuilder sb = new StringBuilder();
long start = System.currentTimeMillis();
while ((line = in.readLine()) != null) {
sb.append(line);
}
long mid = System.currentTimeMillis();
String s = sb.toString();
long done = System.currentTimeMillis();
System.out.println((mid - start) + " " + (done - mid));
in.close();
}
...


As evident, I merely switched the constructor to a StringBuilder when I was done. The measurement dumps out the total file read and append times as well as to hit on the toString method.



Clearly, they both perform on almost the same lines. At Java One last year, there was lots of talk about how well the Sun VM was handling synchronization. This is proof.

But here's the dicey part. When we did a mass search and replace on our application, we found a massive boost. So, were we dreaming?

I was convinced that my findings were bogus, and that the overhead imposed by the readLine call on my measurement was hiding a performance difference. I was wrong. I changed the loop to measure like so:


...
long read = 0;
while ((line = in.readLine()) != null) {
long t1 = System.currentTimeMillis();
sb.append(line);
long t2 = System.currentTimeMillis();
read += (t2 - t1);
}
...


No difference. I'm happy about this because we need not start hunting down StringBuffer and switching it to StringBuilder like crazy. But, I'm perplexed about what we saw earlier. So, I decided to try and execute the same test on a JRockit VM (1.5 spec). Here's what I got:



JRockit seems to jump in performance steps- almost like the VM is adjusting to the code. Notice the step like decrease in append times for both the buffer and the builder, and the toString is clearly faster than Sun. But, there's something to be said for the sheer predictability of the Sun VM too.

So, my conclusion? Don't race to switch StringBuffer to StringBuilder- there doesn't seem to be a real tangible difference in performance.

26 September 2006

entry zero

In looking up the term recursion on wikipedia, I found a nifty item: The Droste Effect. MC Escher appears to have made use of the technique.