Saturday, June 18, 2005

Power of hash

For the past two days, I was trying to improve the performance of a piece of code that I wrote, during which I learnt the power of hash in real numbers.

On a high level, the problem is about searching an ordered list of objects to figure out the position of the given input object in the list. There is no bounds on the number of objects stored in the list and also on the number of input objects. If the given input object is not found in the list, it also possible that given input object might be subset of any of the objects stored in the list. I was doing a linear search on the list (provided by the Java Collections API) which has a time complexity of O(n).

Profiling the code indicated the bottleneck and I was able to figure out the floppy logic. First I was using 'contains' API to check whether the given input object is found as such in the list and if found I use 'indexof' API to retrieve the location in the list. Thus both the operations were O(n). I updated the code and was able to cut down the time by 50%.

With this fix in place, still the performance was no where near what I targeted. A deep look at the problem revealed that for given 'n' input objects the time complexity of the linear search would result in O(n^2). After pondering a while on the plausible resolutions, I decided to change the data structure from ordered list to hashtable. In the hashtable the key being the object and value being the order that I need to generate. This improved the performance by leaps and bounds.

The following metric should reveal the power of hash in real numbers

No of objects in List: 10000
No of Input objects: 10000
Linear search timings: 255 secs
Hash based search: 25 secs

No comments: