Sunday, November 2, 2008

Overview of Hadoop

My friend Tom Wheeler has written an excellent article over at the OCI Java News Brief.

The article has not only an overview but also many reference links and some outstanding examples of Hadoop in the wild. Examples include the New York Times Machine project and a record-setting cluster with 4000 nodes.

As neat as this all is, I'm more excited about the notion of people solving problems in a different way, via the Map-Reduce strategy. Tom gives a counter-intuitive example of a word count algorithm: in the small, it is awkward and slow; in the large, it scales seamlessly to colossal proportions.

Back in university, a favourite class was the Theory of Computation. I recall the strategy of proving a problem was NP-Hard: if one can express it in terms of another problem that is known to be NP-Hard, then one could argue that if one can solve the original problem in polynomial time, then one could solve the set of NP problems in polynomial time (I'll let commenters distinguish between NP-Hard and NP-Complete and other subtleties). As an understated aside, solving this problem would be 'good for one's resume'.

Solving a problem with Map-Reduce isn't the same thing, but there is an abstract, psychic thread that seems to tie them together: the notion of thinking about problems in a certain mindset. Very cool stuff. I hope to find time to explore that further.

1 comment:

Mark Whybird said...

P vs NP and seeing a UI is Mac- or iPhone- like vs making it so.
(This comment reproduced here in full from my own blog entry)

So, I read this Code to Joy entry, which reminded me about P vs NP problems, and this from Daring Fireball about how you could tell at a glance in the 80s if an application’s user interface was Mac-like, and you can do the same now for iPhones, but it takes notable effort to actually make such an interface.

So, my observation is that there is a very close analogy here:
It is easy (a P problem) to say YES, this UI is iPhone-like, but it is hard and time-consuming (an NP problem) to create one.

That’s my insight for the day; I hope it is of some benefit to someone :-)