RSS Entries (RSS) and Comments (RSS)  

Posts Tagged ‘hadoop’

WebPIE wins the IEEE SCALE challenge!

Saturday, May 22nd, 2010

100 billion-triple reasoning also impresses the scalable computing crowd

The WebPIE team has been announced as the winners of the 3rd IEEE International Scalable Computing Challenge (SCALE 2010)!

The objective of the Scalable Computing Challenge, sponsored by the IEEE Computer Society Technical Committee on Scalable Computing (TCSC), is “to highlight and showcase real-world problem solving using computing that scales”.  SCALE 2010 was organised as part of CCGRID 2010, the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

The competition involved submitting a paper, giving a presentation, doing a demo, and showing the thing at the demo/poster market.

Together with Niels Drost (from the VUA’s High Performance Distributed Computing group lead by Prof. Henri Bal), LarKC tream-member Jacopo Urbani presented WebPIE, the very large scale Hadoop based inference engine, running on 64 machines of the the DAS-3 compute cluster.  A live visualisation showed the RDF graph growing before the eyes of the audience as WebPIE was racing through its inferences.

WebPIE SCALE demo screen shot

For LarKC, winning this price in a strong field of international and top-ranking finalists is all the more important because it shows that our results are not only relevant to the Semantic Web community, but that also the people who are native to scalable computing appreciate the results from the LarKC project.

For more info on WebPIE, see the earlier blog entries here and here.  A recent blog entry explains how to run WebPIE on your own datasets, using the Amazon Elastic Computing Cloud service.

Reblog this post [with Zemanta]

Billion-triple reasoning? Now for everybody with a credit-card!

Wednesday, May 19th, 2010

Remember the WebPIE infrence engine? WebPIE is the first inference engine that can do inference over 100billion(!) triples, and that can compute the full OWL Horst closure of Uniprot (1billion triples) in just over 6 hours?

Of course, the catch was that you needed to run Hadoop on a 32 machine compute cluster to do this at home….

But now, the LarKC team behind the WebPIE engine (VU Amsterdam PhD student Jacopo Urbani, and postdoc Spyros Kotoulas) have made the entire WebPIE infrastructure available on Amazon’s Elastic Compute Cloud. The AWS image has the reasoner preinstalled so that you can perform inferences on your own datasets with almost no effort. All you need is a credit card to pay the bill!

The homepage of WebPie contains detailed instructions on how to deploy your own Amazon reasoner on your own RDF graph (up to OWL Horst expressivity).

At the current pricing of Amazon, you can perform a whole lot of inferencing for a couple of hundred bucks (and often much less). LarKC hopes that this will bring very large scale inferencing withing reach of anybody who’s interested.

Reblog this post [with Zemanta]

LarKC team succeeds in extending MapReduce reasoning to OWL Horst

Monday, January 4th, 2010

Last Oktober, the LarKC team at the VU Amsterdam was the first to use the MapReduce framework for distributed RDFS reasoning. In their ISWC 2009 paper, they reported some of the fastest RDFS inference rates seen at that date, at 4 million inferences per second on a 32 machine cluster.

At the time, some of the commentators said “great work, but it will never work for OWL”. The team has now proved their critics wrong. In a paper submitted to ESWC 2010 (currently under review), VU Amsterdam PhD student Jacopo Urbani has shown how to extend the MapReduce algorithms to the OWL Horst semantics. This allowed him and his team to compute the full OWL Horst closure of the UniProt dataset (1.5billion triples) in just over 6 hours on 32 commmodity machines from the DAS-3 cluster, outperforming the best results to date by a large margin.
(Besides this result, they also used their algorithm to break the 100billion triple barrier for the first time ever, see previous blog entry

Reblog this post [with Zemanta]

LarKC team first one to break the 100 billion triple barrier

Monday, January 4th, 2010

Hadoop

In a submission to ESWC 2010 (currently under review), members of the LarKC team at VU Amsterdam have been the first to compute the OWL Horst closure of 100 billion triples. This was done by designing an OWL Horst inference engine optimised for the the Hadoop/MapReduce distributed computing platform. Using the widely adopted LUBM benchmark to generate 100 billion input triples, they deployed 64 commodity machines from the DAS-3 cluster to derive 47 billion additional inferences in just under 2 days. The same cluster can compute the closure of 10 billion LUBM triples in as little as 4 hours, which is 60 times faster than the best performing reasoner to date (BigOWLIM taking 290 hours on 12 billion LUBM triples).

Reblog this post [with Zemanta]

Amazon supports Hadoop

Friday, April 3rd, 2009

(by Eyal Oren)

Amazon have long offered an on-demand infrastructure where you can scale CPUs or disk space very flexibly (pay-as-you-go). Hadoop is the Yahoo!-led open-source clone of Google’s MapReduce, a simple system for processing data-driven jobs on a cluster: parallel programming for the common man. Using Hadoop, it’s quite easy to put the power of clusters such as Amazon’s to your advantage without knowing too much about parallel or distributed programming. People have run Hadoop stuff on Amazon but this involved manual setup; now Amazon supports Hadoop directly.

Why is this interesting? First, in my own experience, Hadoop is the easiest platform to do data-intensive parallel processing. If you have lots of data and not much experience with clusters (ie: most Semantic Web people), the Amazon offer will get you a lot of bang-for-the-buck. Secondly, Yahoo! sponsors most of Hadoop’s development and uses it internally, but is apparently not interested in commercialising it. Amazon is an infrastructure provider (renting out machines), and Yahoo! apparently isn’t (or just missed the boat). Yahoo!’s strategy seems to remove Google’s competitive advantage (namely MapReduce) by turning it into commodity software, and they probably don’t mind if Amazon makes good money in the meantime.