RSS Entries (RSS) and Comments (RSS)  

Posts Tagged ‘Google’

The unreasonable effectiveness of fake controversies

Wednesday, April 1st, 2009

New, Improved *Semantic* Web! :-) Image by dullhunk via Flickr

(by Frank van Harmelen)

The Halevy, Norvig & Pereira paper on “The Unreasonable Effectiveness of Data” (published in IEEE Intelligent Systems, and posted on the Google Blog) was much discussed in recent days.

I had my finger on  the trigger for a response, when I stumbled across Stefano Mazzocchi’s blog which phrased my opinion about the piece exactly: Halevy (first author) makes his case by creating a controversy that isn’t really there. He opposes a symbolic/structural approach to semantics against a statistical approach, and makes it seem as if the two are entirely mutually exclusive. Obviously that isn’t the case: it’s great if statistical analysis of humongous datasets can unearth important relationships, and I can see no reason why the results of such work could then not be used in structural/symbolic approaches. This is (potentially) a mutually beneficial relationship, not an antagonistic one.

As Mazzocchi rightly points out, it’s rather ironic that the entire Google empire is built on …. gues what…. analysing a structural/symbolic network (namely the HREF links between webpages), which they then very succesfully combine with all kinds of statistical measures. If this combination of structural and statistical approaches works for Web1.0, why suddenly create this fake controversy when we talk about Web3.0?

The most fruitful way forward would be to investigate how the ace work that Halevy c.s. are doing on statistical methods with huge datasets can be combined with approaches that exploit the explicit structure that is available in so many large datasets.

And as an aside: cartooning the Semantic Web as being about “tagging web-pages” is defaulting to a rhetorical device known as “seting up a straw man“. Never a strong sign. I’m sure Alon c.s. are familiar with LOD, but no mention of it in their paper…)

To finish up, here are some quotes from  Stefano Mazzocchi’s excellent blog entry:

What upset me about that paper is not how they say “oh sure, structure is great, but look overhere: there is a goldmine in all the sand” (which is something I fully resonate with) but they phrased it as a fight, deterministic vs. statistical, trying to convince people that adding structure it not the way to go, it’s basically a global waste of research resources

….

Google uses all sort of techniques, statistical and not and they are very good at mixing them together, but that’s not what you get from the paper. What you get is a undertone of criticism for those who believe that what’s needed is a lot more explicit structure

….

this confrontational undertone is coming across at best as hypocrite and at worst as toxic, especially when coming from the research heads of an entity that so much benefited from non-statistical amplification of minor distributed increases in data structure.

Amen to that.

Reblog this post [with Zemanta]

Google’s use of semantics exaggerated

Wednesday, March 25th, 2009

Image representing Google as depicted in Crunc...Image via CrunchBase

(by Frank van Harmelen)Some recent reports have been claiming that “Google Rolls out Semantic Search Capabilities”, referring to Google’s newly introduced “related searches” feature. However, the original Google blog entry where this feature was introduced doesn’t discuss the use of semantics at all (where I take “semantics” to mean “explicit (symbolic) representation of intended meaning”).There are of course the by now well known IDG News Service Oct’07 interview by Google Vice President of Search Products & User Experience Marissa Mayer, and the hints of Google’s CEO Eric Schmidt during Google’s Jan’09 fourth-quarter earnings conference call, but so far these ae no more than just comments.So, until we see more actual proof, reports of Google’s use of  semantics seem to be rather premature.

Reblog this post [with Zemanta]

Semantic advertising (again)

Wednesday, March 11th, 2009

Image representing Google as depicted in Crunc...Image via CrunchBase

(by Frank van Harmelen)

Google announced today that they will start to match advertisements with your personal interest profile which they mine from the web-pages you visit. You can manage your own interest profile using a special tool they will make available. See e.g here and many other places on the Web today.

The next obvious step is of course to be a whole lot smarter about matching your profile with the adds (and there are some some entertaining examples of why you would want to use semantics in advertising (= why the keyword model sometimes fails).

(on a side note, this week the professional magazine of the Dutch  marketing-intelligence business published a feature interview on the Semantic Web and its influence on marketing. And is was their idea to do this…)

Reblog this post [with Zemanta]

Exploring a ‘Deep Web’ That Google Can’t Grasp

Wednesday, February 25th, 2009

(by Frank van Harmelen)

From the New York Times this Monday:

One day last summer, Google ’s search engine trundled quietly past a milestone. It added the one trillionth address to the list of Web pages it knows about. But as impossibly big as that number may seem, it represents only a fraction of the entire Web.

Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.

The challenges that the major search engines face in penetrating this so-called Deep Web go a long way toward explaining why they still can’t provide satisfying answers to questions like “What’s the best fare from New York to London next Thursday?” The answers are readily available - if only the search engines knew how to find them.

(thanks to Mike Brodie for pointing to this one)

Reblog this post [with Zemanta]