First, Yahoo! returns different sets of results whether you "quote" the query "sentiment analysis" or not; the set of Sponsored Links changes because no advertiser seems to be targeting the exact phrase.

At first sight, most of the organic results, sponsored links, "Also try" related searches features, ... relate to the financial markets' sentiments. Nice, related to "sentiment analysis" but not what I am looking for. You're probably wondering how could Yahoo! or anybody else know? Well, I have been doing searches on the topic for a while, repeatedly. Can't the engines learn from my behavior? Not yet I guess.

The 3rd. sponsored link - Biz360 - seems interesting, a lead-generation landing page offering a white paper download titled "Use Competitive Intelligence Media Monitoring to Improve Your PR Positioning". Not about Internet sentiment analysis in the context of smart linguistic-based data mining, though. Biz360 is more like a news clipping service from briefly browsing the Web site.

The 4th. sponsored link URL navigates to a 404 page. I have been seeing less and less of these, though, since Greg Notess started tracking 404 rates by the major engines in the 90's. The company - TrendIQ - is a good result, though. TrendIQ's home page does mention doing sentiment analysis studies. Interesting positioning : Discovering Trends Hidden in the Internet. I'll have to get back to it and learn more. TrendIQ is then listed twice in organic results, linking to deeper, more interesting pages, including "TrendIQ Sentiment Analysis" with a case study covering the 2004 elections in the United States and another.

The 1st organic result is not bad, links to the Wikipedia's "Sentiment" page, where you can disambiguate your query, including branching out to "sentiment analysis: automatic detection of opinions embodied in text or speech". Problem is that Wikipedia is pretty light on the topic, still. On top of that, the actual "sentiment analysis" page is also listed a bit lower. That's a bit Wikipedia overkill. Yahoo! has a couple of other results about Lillian Lee's paper "A Matter of Opinion: Sentiment Analysis and Business Intelligence".

The most interesting result is probably a white paper, Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, by Theresa Wilson, Janyce Wiebe from the Department of Computer Science at the University of Pittsburgh and Paul Hoffmann. Will have to read that as well in more details. Abstract: "This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline."

Overall, not bad, but far from really good results. Better up-front disambiguation features would have helped filtering out financial topics and focus on data mining -type Sentiment Analysis results.

Jason said...

Thanks--interesting series.

For the record, Biz360 does NLP-based data mining ... see http://www.biz360.com/solutions/pov.html for a few details.

Arnaud said...

Jason, very right. Awesome correction. I checked out the link you mentioned http://www.biz360.com/solutions/pov.html, Point-of-View Sentiment, the Biz360 underlying sentiment analysis technology, very much what I was looking for. Very nice. I particularly like the example about Zune in http://marketiq.biz360.com and I appreciate the conclusion driven from the data mining exercise: knocking Apple off its throne "won’t be a quick hit, but rather a slow etching away of its market share". The kind of business intelligence Microsoft can use to readjust strategy and tactic around taking Zune to market. I trust Microsoft noted.

Arnaud said...

Jason, are you with biz 360? Can you ping me at arnaudfischer"at"hotmail.com? Talk to you soon.


chris2001 said...

Those old dead link stats by Greg Notess are interesting!

Random Sampling from a Search Engine's Index" by Ziv Bar-Yossef and Maxim Gurevich has numbers for inaccessible pages (4xx HTTP return codes) in 2006:

Yahoo: 0,5 %
MSN: 0,7 %
Google: 2 %
(see Figure 16)

Yahoo comes off best in this evaluation.

Arnaud Fischer - user-obsessed & competition-paranoid said...

Chris, awesome, thank you so much for posting this info. Very interesting. Yahoo! does not look bad at all, a little bit surprised by the size, Yahoo! bigger than Google with this sampling methodology. MS is doing some things right as well as far as the underlying infrastructure is concerned. Maybe these progress have not all reach front-end experiences and Comscore yet.

Chris, thank you!