08 June 2009

Constant Time Add Operations

I had to write a paper recently, and I opted to analyze the seemingly trivial constant time add operations on two different Java list implementations. Guaranteed to bore you to death: List Add Operations.



Updated (18 June 2009) link above should work.

05 April 2009

insert performance variance

Disclaimer: I've done some research here based purely on measurement and assumed use cases. The performance discrepancies shown are on very large data volumes (1 million elements), so the findings may or may not be applicable.



Here's the premise under which I started searching for measurement:



  • I see Sorted collections, specifically TreeSet objects being used rampantly to maintain an ordered set of data

  • Typically, the ordering is comparison based (driven by the object implementing Comparable)
  • Often, the contents of the collection, once populated, are not tampered with

  • A frequent use of the contents of the collection is to iterate over it in sequence



Considering the above, it seems like we (I know I've done this) generally feel comfortable incurring some cost upon adding elements to a collection in order to be able to recover them in sorted fashion quickly. The specific cost of insert time is log(n) where n is the number of elements already in the Set. For a clean explanation about why the cost is log(n), read about Binary Search Trees.



Here's what my dilemma is though: add operations on lists cost very little (a constant for all practical purposes). And sorting on a list that can manage a swap operation quickly is likely to be very fast too. A simple QuickSort should theoretically be able to accomplish n(log(n)) performance if the data is reasonably well distributed.



So the question that I wanted to tackle was: For large enough data sets, would it be quicker to get a sorted collection by adding data to a simple list and then sorting it, or would it be quicker to work with a data structure that sorts with each insert? I decided to play with the ArrayList and TreeSet to dig into this.



Here's the ArrayList code:



ArrayList<Long> arrayList = new ArrayList<Long>();
Random r = new Random(1l);
long start = System.currentTimeMillis();
for (int x = 0; x < 1000000; x++) {
long l = r.nextLong();
arrayList.add(l);
if (arrayList.size() % 10000 == 0) {
long now = System.currentTimeMillis();
System.out.println(now - start);
}
}



Here's the TreeSet:



TreeSet<Long> treeSet = new TreeSet<Long>();
Random r = new Random(1l);
long start = System.currentTimeMillis();
for (int x=0; x<1000000; x++) {
long l = r.nextLong();
treeSet.add(l);
if (treeSet.size() %10000 == 0) {
long now = System.currentTimeMillis();
System.out.println(now - start);
}
}



Here's a graph of the performance:





Not surprisingly, the ArrayList is way quicker. Looking at the cumulative cost though, you get a better picture of the total time it takes to add a lot of data to a sorted vs unsorted collection:





Again, not terribly suprising. But, now let's try and equalize the game a bit. Let's transcribe the QuickSort Algorithm to code, and then run it on the ArrayList after we've added everything to it. Here's the QuickSort:




private void sort(List<Long> arr, int start, int end, int d) {
if (start < end) {
int pivot = start;
int newPivot = partition(arr, pivot, start, end);
sort(arr, start, newPivot - 1, ++d);
sort(arr, newPivot + 1, end, ++d);
}
}


private int partition(List<Long> arr, int pivot, int start, int end) {
long pivotValue = arr.get(pivot);
swap(arr, pivot, end);
int newPivot = start;
for (int x = start; x < end; x++) {
if (arr.get(x) <= pivotValue) {
swap(arr, x, newPivot);
newPivot++;
}
}
swap(arr, newPivot, end);
return newPivot;
}

/**

* Helper to manage a 3-op swap
*/

private void swap(List<Long> arr, int from, int to) {
long tmp = arr.get(from);
arr.set(from, arr.get(to));
arr.set(to, tmp);
}

public void sort(List<Long> arr) {
sort(arr, 0, arr.size() -1, 0);
}



And, tweaking the ArrayList insert code to leverage the sort:



ArrayList<Long> arrayList = new ArrayList<Long>();
Random r = new Random(1l);
QuickSort qs = new QuickSort();
long start = System.currentTimeMillis();
for (int x = 0; x < 1000000; x++) {
long l = r.nextLong();
arrayList.add(l);
if (arrayList.size() % 10000 == 0) {
long now = System.currentTimeMillis();
System.out.println(now - start);
}
}
qs.sort(arrayList);
long now = System.currentTimeMillis();
System.out.println(now - start);



Our performance numbers are:



Even with the separate sort, costing n(log(n)), the ArrayList won handsomely (a factor of approx 20%). Granted, the quicksort wasn't very general, but this does seem to raise a few questions. Perhaps for large volumes, this is a viable strategy?

Mailboxes and Dogs

From a few weeks back:



It appears that someone decided to drive over our mailbox. I'm not sure what the state of their vehicle is currently, but it can't be great. Bruce (our neighbor) has set about building a brand new mailbox (post included), and from what I can tell, there's no wood involved- just loads of metal.



Party on..



A few weeks back, when the weather was better, we had our neighbors over for brunch. Dogs are always welcome.

12 January 2009

Mowgli

Read this news article first: Omaha Woman Rescues Dogs from Streets of Mumbai



Mowgli is one of the four puppies from the trash bag, and he now lives with us:


06 August 2008

Radiohead rules

Like many other people, I saw that Radiohead had created an interesting music video for House of Cards with all sorts of visualizations. From what I read, they had sampled visual data using Lidar. Substantial portions of the data used in constructing the video are available (as is the music) to the general public. The House of Cards group on YouTube showcases how a number of people have re-run the data through various visual engines.



As you may notice, most of the videos on YouTube use some form of 3D engine to visualize the data. The data is structured fairly simply in the CSV files with x, y, and z coordinates, as well as an intensity level for each point.



Problem: I'm not adept (at all) with anything 3D based. So, I decided to see if I could kludge my way into trying to create a visualization of my own using what I know. Er.. All I know is Java and Graphics2D.



For rendering data, I just used a simple linear scale to position the x and y points. Each point was merely a filled circle. I plotted all points in white. The intensity value on each point drove the alpha values on each point. Lastly, I used the z value to control the size of the point- closer z values got larger points. Here's what I got for starters:





Some tuning in point size, and changes in colors to prevent it from looking too scary:





Anyone know an easy way to sequence some 1000 images into a decent video? I can't seem to get iMovie to have a delay between images of less than .3s, and the actual frames need to be rendered out at 30 frames per second.



*Follow up:* Meant the attach a larger copy that shows the patterns generated by the circle size and alpha variations:



04 August 2008

Broader thinking

Steph decided to take a horizontal sequence of pictures on her camera when we were up in the Rocky Mountain National Park. The goal was to stitch the images into a panorama. The last time I did this, was back in 2001, gluing together the Manhattan skyline (World Trade Center buildings included). That was done mostly with gimp, if I recall right.



Meet Hugin- a gem of a tool that does all the hard work needed to assemble a panoramic blend of pictures. The image below, cobbled from some 7 shots took around 3 minutes. Just be sure to preserve your Exif information as you get your pictures together since Hugin can use them to drive a massive amount of automation.





Hugin is an open source application, and it leverages the libpano suite. The value it provides is really outstanding. Moreover, the community behind Hugin seems to be adding value in raw photographic know-how on their blog: panospace. Kudos to a great app, and the community that powers it.

13 July 2008

Need bicycle porn?

Or a house? Look no further.

Lightning and a Moth

Some of the storms in early June had such a high density of lightning strikes that it turned out to be easy enough to catch a couple of strikes. My location left a lot to be desired, and I kind lacked the courage (or funds) to subject myself and the camera to the worst of the storm. Here's what I got:





Towards the latter part of June, Steph saw this moth and took some photos of it. Note the fur all over it, and the fern like antennae. Does anyone know kind of moth it is?



08 June 2008

Java 6 on OS X

This almost sneaked past me.. The most recent update humbly drops in Java SE 6:






It doesn't seem to replace Java 1.5 right away. You've got to fire up the Java Preferences app and fiddle with a few switches.





And then, presto:





And in case you need to add the new VM to some app (like Eclipse), the goldmine is at: /System/Library/Frameworks/JavaVM.framework/Versions.