RSS Entries (RSS) and Comments (RSS)  

LarKC Chinese Forum for Chinese Semantic Web Developers and Users

August 3rd, 2010

by Yi Zeng

Following the release of LarKC Chinese Website (http://cn.larkc.eu/) and several Chinese document related to LarKC (including translated user manual, introduction paper, slides, etc.), the LarKC project provides a LarKC Chinese Forum(http://www.w3china.org/larkc) to the Chinese Semantic Web researchers, developers and users.

The forum is located on the W3China website (The most influential Chinese WWW developer website which is devoted to promote W3C related technologies). We thank W3China for providing the special forum on their website. LarKC members are available for answering LarKC related questions and many up-to-date LarKC news, document will be shared through this forum.

In the mean time, LarKC is going to have the 4th early adopters tutorial in Beijing in Nov 13th, 2010. We will select questions, requirements through the forum and discuss them during the tutorial.

LarKC is very proud to be connected with Chinese WWW researchers, developers and practitioners. We are looking forward to meeting you on the  LarKC Chinese Forum !

Will China Become a Semantic Web Superpower?

July 27th, 2010

by Zhisheng

The Chinese government has decided to make a big move to “Internet of Things”.  That may make China a semantic web superpower in coming few years.  Recently

China’s ‘Internet Of Things’ To Become Semantic Web Superpower?

The LarKC Consortium is going to have a project meeting in Beijing in November 2010. During the meeting, LarKC will have the 4th early adopters workshop in China. That is considered to be an important dissemination activity in China for LarKC.
We expect the act will capture much attention of the researchers and developers from China universities and industry, and perhaps some officers from the Chinese government. Urban computing and its stream processing and reasoning in the LarKC WP6 case study is considered to be one which provides a strong connection between the Semantic Web technology and Internet of Things.

LarKC Platform v1.1 released

July 6th, 2010

We are glad to announce that the LarKC Platform Release v1.1 is now available in our repository on http://larkc.sourceforge.net.

The redistributable package can be downloaded from our collaborative development environment, LarKC@SourceForge at:

http://sourceforge.net/projects/larkc/files/Release-1.1/larkc-release-1.1.zip/download (OS independent)

The source code belonging to the release can be checked out from SVN at:

https://larkc.svn.sourceforge.net/svnroot/larkc/tags/Release-1.1

The complete (updated) manual for both users and developers can be found at:

http://sourceforge.net/projects/larkc/files/Release-1.1/LarKC_PlatformManual_V1_1.pdf

If you need any support or want to give us any feedback, don’t hesitate to contact us at:

  •  larkc-user-support@lists.sourceforge.net (if you want to use LarKC)
  • larkc-dev-support@lists.sourceforge.net (if you are or are willing to become a LarKC developer)

If you are interested in discussions around the LarKC Platform, don’t hesitate to participate in our forums at:

https://sourceforge.net/projects/larkc/forums/

We hope you enjoy LarKC and we are looking forward to your feedback!

The LarKC Platform development team

Twitter plans to support annotations. Could be an interesting new stream of structured data

June 26th, 2010

(By Jose Quesada)Twitter plans to support annotations. Since Facebook started supporting RDFa with their openSocial graph, it was just a matter of time that twitter followed. What are annotations? From Gigaom:

In a nutshell, Annotations would allow developers (and Twitter itself, of course) to add additional information to a tweet — such as a string of text, a URL, a location tag or bits of data — without affecting its character count. In other words, such information would be metadata about the tweet or the user who posted it, and would be carried along as an additional payload as it traveled through the Twitter network. Apps and services could then collect that information and filter it or make sense of it.

It isn’t clear exactly how Annotations will be implemented, but it doesn’t matter, as they are published in some form. This is a gigantic nod towards linked data by one of the largest internet companies (others, such as Google and Facebook both already support RDFa).In some ways, Annotations are like Facebook’s open graph protocol, which also adds metadata to the behavior of users. But they could also be Activity Streams, an extension to the Atom format to represent social objects (see slide 6).There seems to be a lot of interest on real-time web combined with linked data. Alex Passant won the scripting challenge at ESWC2010 with sparqlPuSH, which uses XMPP. And of course there’s C-SPARQL.What this means is that now the three largest social web companies (Google, Twitter, Facebook) all will support linked data formats. It is hard to overestimate this fact. As Bernard Lunn (excellent coverage) puts it:

When gorillas compete, everybody else wins. The logic of the market is increasing support for RDFa by Google, Facebook, Twitter and therefore everybody else.

That is a win for open standards and that is a win for all of us, who can publish RDFa and search RDFa and build tools that make publishing and searching RDFa easier.

Real-time Semantic Web with Twitter Annotations

Stream Reasoning : Where We Got So Far

June 2nd, 2010

The idea of Stream Reasoning originated in Politecnico di Milano in 2007, when I and Stefano Ceri were helping writing the LarKC project proposal. In the last three years, lot of investigation has been done.

Davide Barbieri, Daniele Braga, Stefano Ceri, Michael Grossniklaus, I defined the notion of RDF Stream together with an extension of SPARQL for continuous querying RDF Stream (i.e., our C-SPARQL). Most recently, we also investigated techniques for incremental reasoning on RDF streams. I was invited to give a key note in NeFoRS 2010 and I though that I should have used this opportunity to tell where Stream Reasoning research got so far.

Click on the image hereafter to see the slides!

Click to go to slideshare

Thanks to Frank van Harmelen and Heiner Stuckenschmidt for helping spreading Stream Reasoning concept.
There’s much more to come.

Keep an eye on http://streamreasoning.org

Linked Life Data 0.4.2 released

May 28th, 2010

The Linked Life Data (LLD) development team is pleased to announce that the 0.4.2 release is now online at  http://linkedlifedata.com

LLD is a semantic data integration platform for the biomedical domain that interlinks more than 20 popular biomedical data sources. The current release includes 4,193,400,044 statements, which interconnect 582,691,283 RDF resources.

Changes in the 0.4.2 release: 

  • Added LarKC Carcinogenesis research datasets
  • Added ChEBI dataset
  • Added Literature-derived Human Gene-Disease Network (LHGDN) dataset 
  • Integrated RelFinder application (still an alpha version):  visually trace paths in the graph of data http://linkedlifedata.com/relfinder 
  • Fixed multiple LLD data rendering issues 
  • Closed major security vulnerabilities (please request a username/password for all extended functionality beyond the SPARQL endpoint interface)

Reblog this post [with Zemanta]

WebPIE wins the IEEE SCALE challenge!

May 22nd, 2010

100 billion-triple reasoning also impresses the scalable computing crowd

The WebPIE team has been announced as the winners of the 3rd IEEE International Scalable Computing Challenge (SCALE 2010)!

The objective of the Scalable Computing Challenge, sponsored by the IEEE Computer Society Technical Committee on Scalable Computing (TCSC), is “to highlight and showcase real-world problem solving using computing that scales”.  SCALE 2010 was organised as part of CCGRID 2010, the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

The competition involved submitting a paper, giving a presentation, doing a demo, and showing the thing at the demo/poster market.

Together with Niels Drost (from the VUA’s High Performance Distributed Computing group lead by Prof. Henri Bal), LarKC tream-member Jacopo Urbani presented WebPIE, the very large scale Hadoop based inference engine, running on 64 machines of the the DAS-3 compute cluster.  A live visualisation showed the RDF graph growing before the eyes of the audience as WebPIE was racing through its inferences.

WebPIE SCALE demo screen shot

For LarKC, winning this price in a strong field of international and top-ranking finalists is all the more important because it shows that our results are not only relevant to the Semantic Web community, but that also the people who are native to scalable computing appreciate the results from the LarKC project.

For more info on WebPIE, see the earlier blog entries here and here.  A recent blog entry explains how to run WebPIE on your own datasets, using the Amazon Elastic Computing Cloud service.

Reblog this post [with Zemanta]

Billion-triple reasoning? Now for everybody with a credit-card!

May 19th, 2010

Remember the WebPIE infrence engine? WebPIE is the first inference engine that can do inference over 100billion(!) triples, and that can compute the full OWL Horst closure of Uniprot (1billion triples) in just over 6 hours?

Of course, the catch was that you needed to run Hadoop on a 32 machine compute cluster to do this at home….

But now, the LarKC team behind the WebPIE engine (VU Amsterdam PhD student Jacopo Urbani, and postdoc Spyros Kotoulas) have made the entire WebPIE infrastructure available on Amazon’s Elastic Compute Cloud. The AWS image has the reasoner preinstalled so that you can perform inferences on your own datasets with almost no effort. All you need is a credit card to pay the bill!

The homepage of WebPie contains detailed instructions on how to deploy your own Amazon reasoner on your own RDF graph (up to OWL Horst expressivity).

At the current pricing of Amazon, you can perform a whole lot of inferencing for a couple of hundred bucks (and often much less). LarKC hopes that this will bring very large scale inferencing withing reach of anybody who’s interested.

Reblog this post [with Zemanta]

LarKC Platform Release V1.0 available

April 26th, 2010

We are glad to announce that the LarKC Platform Release V1.0 is now available in our repository, http://larkc.sourceforge.net. 

The redistributable package can be downloaded from our collaborative development environment, LarKC@SourceForge, at https://sourceforge.net/projects/larkc/files/

A complete manual for both users and developers can be found at https://sourceforge.net/projects/larkc/files/Release-1.0/LarKC_PlatformManual_V1_0.pdf

If you need any support or want to give us any feedback, don’t hesitate to contact us at:

or to participate in our forums at:

We hope you enjoy LarKC and we are looking forward to your feedback! 

The LarKC Platform development team 

LDSR Passes the Modigliani Test for Semantic Web

April 23rd, 2010

A week ago Richard MacManus published on ReadWriteWeb a post “The Modigliani Test: The Semantic Web’s Tipping Point“. He essentially argues that the linked data are not sufficiently linked. He wrote that “The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return“. Then he defined the “Modigliani Test” for the Semantic Web: he wants to be able to query a search engine “tell me the locations of all the original paintings of Modigliani” and get back large list of results.

I liked the post a lot because it spots an important problem and provides a clear example. And because I like Modigliani’s paintings … the nude ladies, in particular :-) So, I tried to comment on Richard’s post but my comment did not appeared public for three days, so, I decided to post it here. Now let’s go to the subject.

Indeed linked data are hard to query and use today. In a way, they are semi-structured, because getting useful information from LOD quite often requires a lot of efforts to analyse and post-process them in order to get reasonable answers to structured queries.

I don’t believe there is a way to get this problem fixed for the entire linked data web. Still, we develop an approach called reason-able views to the web of data - the idea of collecting, cleaning up, and indexing multiple datasets into a single semantic repository in a way, which allows those to be queried and used in a reliable fashion. One can find the motivation here.We have created a reason-able view called LDSR, which includes several of the central LOD datasets: DBPedia, Freebase, Geonames, UMBEL, Wordnet, and few others. And another one called LinkedLifeData, includes 20+ datasets related to life sciences. Both LDSR and LLD were developed in LarKC project, as test cases for large-scale reasoning (reasoning with billions of linked data statements is fun, but it is a different subject).

Before going to Modigliani, I will start with Richard’s query about Bosch paintings in order to provide some background about LDSR. The query can be put in a slightly more readable form as follows:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?pl ?ml ?img ?l
WHERE {
  ?p skos:subject  <http://dbpedia.org/resource/Category:Hieronymus_Bosch_paintings>.
  ?p foaf:depiction ?img ;  rdfs:label ?pl . FILTER ( lang(?pl) = "en" ) .
  ?p dbpedia2:museum [ rdfs:label ?ml ]. FILTER ( lang(?ml) = "en" ) .
  ?p dbpedia2:city ?l .
}

One can execute this query at http://ldsr.ontotext.com/sparql and you will get the same results like in DBPedia, because the latter is part of LDSR. In fact, there are some duplicates, because in LDSR there are multiple English labels for the big museums. To deal with the problem with the multiple labels, we have
introduced, “preferred labels”, by means of postprocessing. Using them, the query can look a bit simpler and return back more readable results:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?pl ?ml ?img ?cl ?prr
WHERE {
  ?p skos:subject <http://dbpedia.org/resource/Category:Hieronymus_Bosch_paintings> ;
     foaf:depiction ?img ;  ot:preferredLabel ?pl ;
     dbp-prop:museum [ ot:preferredLabel ?ml ] ;
     dbp-prop:city [ ot:preferredLabel ?cl ] ;
     ot:hasPageRank ?prr .
} ORDER BY DESC(?prr)

As one can see, we can also order the results by RDF Rank - a PageRank-like measure for the importance of each node in the RDF graph of LDSR. We believe that query and result readability and relevance ranking are very important when dealing with the web of data.

And now getting really to the point with Modigliani. Here follows the query, which solves the test. When executed against LDSR, it returns cities where original paintings of Modigliani can be seen:

PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
  ?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
     fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
     ot:preferredLabel ?painting_l.
     ?ow ot:preferredLabel ?owner_l .
  OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
  OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
  OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}

Few comments:

  • the major credits here should go to the guys from Metaweb - without Freebase it would have been impossible to get the paintings of Modegliani;
  • technically, the test is not solved, because these are the locations of few, but not *all* Modegliani paintings and even not a really long list of them;
  • getting the locations was tough … as you see we needed to take them through three different patterns, from DBPedia and Freebase; UMBEL was necessary to filter out only those values of dbp-prop:location, which are cities;
  • the query combines information from three datasets: DBPedia, Freebase and UMBEL. In this specific case, federated SPARQL query, where each of the datasets is served from a different SPARQL end-point, is computationally possible, because the constraints about the paintings from Freebase return small number of results. Note, however, that most of the queries which span over data from multiple datasets will not have this nice property; thus, evaluating them would require a setup like LDSR, where all the datasets are loaded in a single repository
  • some of the data is noisy: Manhattan is linked through dbp-ont:city to some of the paintings … all the beauties of the linked data are presented :-)
  • there are no ranks; they can be linked like in the query for Bosch above, but it is not useful, because ranks of all Modegliani paintings from FB appear to be zero :-( So, I preferred to keep the query simpler :-)
  • it took me more than one hour to compose this query :-(

Finally, few words about LDSR. The public version currently exposes the older versions of DBPedia and Freebase. We will update it with the spring versions soon. Also, LDSR is work in progress - we constantly develop the UI, searching for handy metaphors and retrieval modalities for linked data. So, we didn’t spend much time in polishing and beautification - as a results it looks as geeky as it does. Still, we are trying to maintain high availability WRT exploration and SPARQL queries.

In summary, LDSR can be seen as search engine for part of the linked data web. It partly solves The Modigliani Test, although it is not able to take natural language queries, yet :-). There is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge. The good news is that we have progress - half an year ago making such query would have required one day. And a year ago it would have been impossible.  I think I see the light at the end of the tunnel. Or I am hallucinating after one hour of crafting SPARQL queries and this is just a funeral candle?  Is there anybody out there?  Please, comment!

Atanas Kiryakov

Ontotext