RSS Entries (RSS) and Comments (RSS)  

Amazon supports Hadoop

(by Eyal Oren)

Amazon have long offered an on-demand infrastructure where you can scale CPUs or disk space very flexibly (pay-as-you-go). Hadoop is the Yahoo!-led open-source clone of Google’s MapReduce, a simple system for processing data-driven jobs on a cluster: parallel programming for the common man. Using Hadoop, it’s quite easy to put the power of clusters such as Amazon’s to your advantage without knowing too much about parallel or distributed programming. People have run Hadoop stuff on Amazon but this involved manual setup; now Amazon supports Hadoop directly.

Why is this interesting? First, in my own experience, Hadoop is the easiest platform to do data-intensive parallel processing. If you have lots of data and not much experience with clusters (ie: most Semantic Web people), the Amazon offer will get you a lot of bang-for-the-buck. Secondly, Yahoo! sponsors most of Hadoop’s development and uses it internally, but is apparently not interested in commercialising it. Amazon is an infrastructure provider (renting out machines), and Yahoo! apparently isn’t (or just missed the boat). Yahoo!’s strategy seems to remove Google’s competitive advantage (namely MapReduce) by turning it into commodity software, and they probably don’t mind if Amazon makes good money in the meantime.

Tags: , ,

Leave a Reply