LDSR Passes the Modigliani Test for Semantic Web
A week ago Richard MacManus published on ReadWriteWeb a post “The Modigliani Test: The Semantic Web’s Tipping Point“. He essentially argues that the linked data are not sufficiently linked. He wrote that “The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return“. Then he defined the “Modigliani Test” for the Semantic Web: he wants to be able to query a search engine “tell me the locations of all the original paintings of Modigliani” and get back large list of results.
I liked the post a lot because it spots an important problem and provides a clear example. And because I like Modigliani’s paintings … the nude ladies, in particular
So, I tried to comment on Richard’s post but my comment did not appeared public for three days, so, I decided to post it here. Now let’s go to the subject.
Indeed linked data are hard to query and use today. In a way, they are semi-structured, because getting useful information from LOD quite often requires a lot of efforts to analyse and post-process them in order to get reasonable answers to structured queries.
I don’t believe there is a way to get this problem fixed for the entire linked data web. Still, we develop an approach called reason-able views to the web of data - the idea of collecting, cleaning up, and indexing multiple datasets into a single semantic repository in a way, which allows those to be queried and used in a reliable fashion. One can find the motivation here.We have created a reason-able view called LDSR, which includes several of the central LOD datasets: DBPedia, Freebase, Geonames, UMBEL, Wordnet, and few others. And another one called LinkedLifeData, includes 20+ datasets related to life sciences. Both LDSR and LLD were developed in LarKC project, as test cases for large-scale reasoning (reasoning with billions of linked data statements is fun, but it is a different subject).
Before going to Modigliani, I will start with Richard’s query about Bosch paintings in order to provide some background about LDSR. The query can be put in a slightly more readable form as follows:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?pl ?ml ?img ?l
WHERE {
?p skos:subject <http://dbpedia.org/resource/Category:Hieronymus_Bosch_paintings>.
?p foaf:depiction ?img ; rdfs:label ?pl . FILTER ( lang(?pl) = "en" ) .
?p dbpedia2:museum [ rdfs:label ?ml ]. FILTER ( lang(?ml) = "en" ) .
?p dbpedia2:city ?l .
}
One can execute this query at http://ldsr.ontotext.com/sparql and you will get the same results like in DBPedia, because the latter is part of LDSR. In fact, there are some duplicates, because in LDSR there are multiple English labels for the big museums. To deal with the problem with the multiple labels, we have
introduced, “preferred labels”, by means of postprocessing. Using them, the query can look a bit simpler and return back more readable results:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?pl ?ml ?img ?cl ?prr
WHERE {
?p skos:subject <http://dbpedia.org/resource/Category:Hieronymus_Bosch_paintings> ;
foaf:depiction ?img ; ot:preferredLabel ?pl ;
dbp-prop:museum [ ot:preferredLabel ?ml ] ;
dbp-prop:city [ ot:preferredLabel ?cl ] ;
ot:hasPageRank ?prr .
} ORDER BY DESC(?prr)
As one can see, we can also order the results by RDF Rank - a PageRank-like measure for the importance of each node in the RDF graph of LDSR. We believe that query and result readability and relevance ranking are very important when dealing with the web of data.
And now getting really to the point with Modigliani. Here follows the query, which solves the test. When executed against LDSR, it returns cities where original paintings of Modigliani can be seen:
PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
ot:preferredLabel ?painting_l.
?ow ot:preferredLabel ?owner_l .
OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}
Few comments:
- the major credits here should go to the guys from Metaweb - without Freebase it would have been impossible to get the paintings of Modegliani;
- technically, the test is not solved, because these are the locations of few, but not *all* Modegliani paintings and even not a really long list of them;
- getting the locations was tough … as you see we needed to take them through three different patterns, from DBPedia and Freebase; UMBEL was necessary to filter out only those values of dbp-prop:location, which are cities;
- the query combines information from three datasets: DBPedia, Freebase and UMBEL. In this specific case, federated SPARQL query, where each of the datasets is served from a different SPARQL end-point, is computationally possible, because the constraints about the paintings from Freebase return small number of results. Note, however, that most of the queries which span over data from multiple datasets will not have this nice property; thus, evaluating them would require a setup like LDSR, where all the datasets are loaded in a single repository
- some of the data is noisy: Manhattan is linked through dbp-ont:city to some of the paintings … all the beauties of the linked data are presented
- there are no ranks; they can be linked like in the query for Bosch above, but it is not useful, because ranks of all Modegliani paintings from FB appear to be zero
So, I preferred to keep the query simpler
- it took me more than one hour to compose this query
Finally, few words about LDSR. The public version currently exposes the older versions of DBPedia and Freebase. We will update it with the spring versions soon. Also, LDSR is work in progress - we constantly develop the UI, searching for handy metaphors and retrieval modalities for linked data. So, we didn’t spend much time in polishing and beautification - as a results it looks as geeky as it does. Still, we are trying to maintain high availability WRT exploration and SPARQL queries.
In summary, LDSR can be seen as search engine for part of the linked data web. It partly solves The Modigliani Test, although it is not able to take natural language queries, yet :-). There is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge. The good news is that we have progress - half an year ago making such query would have required one day. And a year ago it would have been impossible. I think I see the light at the end of the tunnel. Or I am hallucinating after one hour of crafting SPARQL queries and this is just a funeral candle? Is there anybody out there? Please, comment!
Atanas Kiryakov
Ontotext
April 26th, 2010 at 11:11 am
[…] is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he […]
April 26th, 2010 at 11:57 am
[…] is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he […]
April 26th, 2010 at 11:59 am
[…] is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he […]
April 26th, 2010 at 12:04 pm
[…] is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he […]
April 26th, 2010 at 12:27 pm
Richard MacManus, the author of the original post at ReadWriteWeb (also founder and editor of RRW), made a second post commenting on my reply here:
http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php
April 27th, 2010 at 10:31 pm
[…] is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he […]