LarKC team first one to break the 100 billion triple barrier

In a submission to ESWC 2010 (currently under review), members of the LarKC team at VU Amsterdam have been the first to compute the OWL Horst closure of 100 billion triples. This was done by designing an OWL Horst inference engine optimised for the the Hadoop/MapReduce distributed computing platform. Using the widely adopted LUBM benchmark to generate 100 billion input triples, they deployed 64 commodity machines from the DAS-3 cluster to derive 47 billion additional inferences in just under 2 days. The same cluster can compute the closure of 10 billion LUBM triples in as little as 4 hours, which is 60 times faster than the best performing reasoner to date (BigOWLIM taking 290 hours on 12 billion LUBM triples).
Tags: Distributed computing, hadoop, mapreduce
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=6a2e5a00-20f4-4f47-bac9-8c10316de7db)
January 22nd, 2010 at 4:46 pm
[…] huge knowledge bases in real time is not an straightforward task (though some people are achieving impressive results with enormous database sizes, real-time is still a different beast). So far, the strategy that […]
February 8th, 2010 at 4:20 am
[…] huge knowledge bases in real time is not an straightforward task (though some people are achieving impressive results with enormous database sizes, real-time is still a different beast). So far, the strategy that […]
February 24th, 2010 at 1:21 pm
[…] The VUA team submitted the work lead by Jacopo Urbani and Spyros Kotoulas on WebPIE, an inference engine that can perform at Webscale. This MapReduce-based inference engine performs OWL Horst inference on datasets up to 100billion triples, as earlier reported here and here. […]
May 19th, 2010 at 8:13 pm
[…] the WebPIE infrence engine? WebPIE is the first inference engine that can do inference over 100billion(!) triples, and that can compute the full OWL Horst closure of Uniprot (1billion triples) in just over 6 […]
May 22nd, 2010 at 11:19 pm
[…] more info on WebPIE, see the earlier blog entries here and here. A recent blog entry explains how to run WebPIE on your own datasets, using the Amazon […]