RSS Entries (RSS) and Comments (RSS)  

The unreasonable effectiveness of fake controversies

New, Improved *Semantic* Web! :-) Image by dullhunk via Flickr

(by Frank van Harmelen)

The Halevy, Norvig & Pereira paper on “The Unreasonable Effectiveness of Data” (published in IEEE Intelligent Systems, and posted on the Google Blog) was much discussed in recent days.

I had my finger on  the trigger for a response, when I stumbled across Stefano Mazzocchi’s blog which phrased my opinion about the piece exactly: Halevy (first author) makes his case by creating a controversy that isn’t really there. He opposes a symbolic/structural approach to semantics against a statistical approach, and makes it seem as if the two are entirely mutually exclusive. Obviously that isn’t the case: it’s great if statistical analysis of humongous datasets can unearth important relationships, and I can see no reason why the results of such work could then not be used in structural/symbolic approaches. This is (potentially) a mutually beneficial relationship, not an antagonistic one.

As Mazzocchi rightly points out, it’s rather ironic that the entire Google empire is built on …. gues what…. analysing a structural/symbolic network (namely the HREF links between webpages), which they then very succesfully combine with all kinds of statistical measures. If this combination of structural and statistical approaches works for Web1.0, why suddenly create this fake controversy when we talk about Web3.0?

The most fruitful way forward would be to investigate how the ace work that Halevy c.s. are doing on statistical methods with huge datasets can be combined with approaches that exploit the explicit structure that is available in so many large datasets.

And as an aside: cartooning the Semantic Web as being about “tagging web-pages” is defaulting to a rhetorical device known as “seting up a straw man“. Never a strong sign. I’m sure Alon c.s. are familiar with LOD, but no mention of it in their paper…)

To finish up, here are some quotes from  Stefano Mazzocchi’s excellent blog entry:

What upset me about that paper is not how they say “oh sure, structure is great, but look overhere: there is a goldmine in all the sand” (which is something I fully resonate with) but they phrased it as a fight, deterministic vs. statistical, trying to convince people that adding structure it not the way to go, it’s basically a global waste of research resources

….

Google uses all sort of techniques, statistical and not and they are very good at mixing them together, but that’s not what you get from the paper. What you get is a undertone of criticism for those who believe that what’s needed is a lot more explicit structure

….

this confrontational undertone is coming across at best as hypocrite and at worst as toxic, especially when coming from the research heads of an entity that so much benefited from non-statistical amplification of minor distributed increases in data structure.

Amen to that.

Reblog this post [with Zemanta]

Tags:

6 Responses to “The unreasonable effectiveness of fake controversies”

  1. Ivan Herman Says:

    Amen indeed.

    There is another fallacy in the paper, and a well known one, namely equating the Semantic Web with (indeed, possibly expensive) ontologies. If I use Calais or Zemanta that will add URI-s as tags drawn from DBPedia (or similar) I do Semantic Web, and I do not necessarily use complex ontologies. Ontologies are for the Semantic Web (when needed), but the Semantic Web is not for ontologies:-)

    I can fully understand that Google wants to exploit the formidable amount of data they harvest, and that therefore statistical approaches play an important role in their research activities. No problem with that. But generalizations and, primarily, unnecessary controversies is not what the Web community at large needs…

  2. Andraz Tori Says:

    Hmmm,

    I think the critique is at least partly valid. Most vocal proponents of semantic web are not doing much to apply really interesting (statistical) methods to it.

    And wast majority of (government) research resources go into structured approaches, even though they are hitting into the wall all the time.

    People involved in LOD are great guys. But I think LOD has yet to deliver in terms of real utility. Can you name companies doing useful stuff with statistical methods over LOD?

    So the challenge is the following: the data is out there (thanks to enormous efforts of semweb proponents). now we need to create really useful services on top of it and there’s not enough of effort applied at the moment.

    bye
    Andraz Tori, Zemanta

  3. Arjen P. de Vries Says:

    Still wonder how the same paper can trigger such diverse reactions! I re-read it last night, and I still like it.

    For a position paper, it seems fine to me to put things a little provocative; and, the quote from the TBL SemWeb paper (”without needing artificial intelligence”) is correct, isn’t it?!

    They appropriately distinguish between the syntax & tools of the SW, and the next step of semantic interpretation, and just raise a question how much the additional syntax obtained (syntax that hints at semantics) would help data analysis in comparison to not having that syntax. (At least, that is how I interpret it.) I think that is a valid question, and yet to be answered.

    Greetings

  4. The Unreasonable Effectiveness of Google « O’Really? Says:

    […] van Harmelen (2009) The Unreasonable Effectiveness of Fake Controversies LarKC […]

  5. Semantik Web Öldü mü? (Hayır, sadece garip kokuyor) | FZ Blogs Says:

    […] bir bakış açısını çok güzel özetliyor. Konu ile ilgili birkaç kritik saptama da şu blog girdisinde ve yorumlarında yer […]

  6. » Hello World! LISyS Says:

    […] μόλις άρχισε (δείτε π.χ. τα postings του Stefano Mazzocchi και του Frank van Harmelen) και αναμένεται […]

Leave a Reply