"sentiment analysis" search on Google, ODP, Wikipedia

Search is the Internet OS. Before folks started talking about Web 2.0, the Semantic Web was all the rage. Sentiment Analysis emerges from a cloud of intersecting disciplines including search, the proliferation of user-authored content, the wisdom of the crowd, information retrieval techniques , academic research, data mining and progress in machine learning, emerging computational linguistic such as semantic categorization applications, content from the tail, and of course statistics.

For brand managers of the private sector (pharmaceutical, consumer, ...), politicians, sociologs, Wall Street and others interested in buzz monitoring, sentiment analysis unlocks actionable information out of reach until now and creates real capitalistic value, such as real time brand, public image and reputation monitoring, product-related early warning systems, detection of unfavorable rumors for risk management, customer satisfaction indices.

I have been doing search for a while - Microsoft before MS knew how important search was, Enfish desktop search that curiously never made it, Altavista before the Yahoo! acquisition, Yaga distributed P2P digital content search and eCommerce platform, right on but too early, local search at Infospace while local was hot, and now AOL Search with the FullView turn around. It is sometimes still difficult to put things in perspective and really walk the walk in our searchers' shoes in spite of all the focus groups, usability tests, eye tracking studies, so I decided to go on a quest to look for a generic definition of "sentiment analysis" and good synonyms, simply because I care about it, and wanted to feel how it really feels. The query is [sentiment analysis], a simple "informational" query as Andrei Broder would put it, in opposition to "navigational" or "transactional", and the right answer should be a collection of good links, maybe even a Onebox, FullView, Short cut, or SmartAnswer ... answers. My information need revolves around Defitions, Companies, Technology, Academic coverage, Articles, Papers and I am really not sure what to expect.

I started with ODP/DMOZ because it came up at work the other day. The Open Directory Project, with 5 million entries, is not particularly rich in the subject. 1 node [Top: Business: Investing: Derivatives: Options: Research and Analysis] related when you think about it from the content to the query, but not when you think of the intent of the query to the content. What I am looking for is generic information about Internet “sentiment analysis”, not investment indices. I tried to navigate the taxonomy from Top >> Computers >> Internet >> nothing under “buzz” nor “sentiment”. I'll have to wait for ODP to be back up and become an Editor to create a "sentiment analysis" node. Can't wait, I hear ODP is still pretty important to Webmasters, Publishers, Content owners in terms of ranking high on Google, and Editors still care passionately.

The Google search was pretty good. I first typed "sentiment analyses", and got the "Did you mean" spellcheck suggesting "sentiment analysis"; nice. Interestingly enough, I believe I got one more sponsored link, including the top premium placement with 2 after I signed in. The top 10 results are pretty good, cover IBM research, PDF papers, the Wikipedia entry, and some blog. Out of 4 Sponsored Links, 3 were investment-related. Not what I am looking for although there are parallels between Internet sentiment analysis and investment trends I am sure. At the end, the most interesting results were i) organic: Data Mining: Text Mining, Visualization and Social Media, ii) the IBM definition iii) Paid Sponsored link Nstein. Overall, the Sponsored links are definitely not as relevant as the organic matches because coverage depends on economics but not organic content. Go wonder, there are about 10 billion pages on the net and only maybe about 700,000 advertisers; unfair advantage to the organic index, to the order of 3,000 times bigger if you consider advertisers have 5 pages each (out of no-where).

Wikipedia barely had a "stub" about "sentiment analysis", so I contributed a couple more definitions I ran into. I'll clean that up as I learn more while searching next for "sentiment analysis" in Yahoo!, Ask, and the others.

Know of some good links about "Internet sentiment analysis"? What it is? Who is doing what? Who is paying for it? Cool emerging linguistic technology like semantic categorization?

Sphere: Related Content

Add to Google Add to My AOL


Paid Search Programs Finally Growing Up

... and that's what I wrote on March 31, 2004 again for Search Engine Watch.

The search industry has come a long way since the days of running poorly targeted banner advertisements on search results pages. Enhanced keyword targeting capabilities and powerful new bidding and ROI analysis tools have raised the value of search as a promotional channel for online -- and increasingly offline -- merchants.

Yet with all of the progress to date, paid search programs are still in their infancy and their rapid evolution continues. Two fundamental changes currently underway are blurring the traditional metrics and definitions that have divided paid listing and paid inclusion programs.

Evolution Toward a Common Metric: ROI

Today's marketers, helped by a variety of sophisticated automated bidding and analysis tools, are becoming increasingly savvy in calculating the performance of their search marketing initiatives. Some of the new technologies that are enabling ever more precise targeting of listings to maximize ROI include:

  • Keyword research: This concept -- long an integral part of search marketing campaigns -- is expanding to include match type flexibility, exact terms, phrases, broad matching, and keyword exclusion features.
  • Geo-targeting: Yahoo and Google have been hard at work developing local search services allowing marketers to target paid listings based on geographic locations.
  • Day-parting: Campaign management platforms, such as GoToast and Kanoodle, are adding day-parting and scheduling rules to serve up and take down listings based on the time of day.
  • Network targeting: Campaign management services such as MyGeek, expose click-through reporting by destination site, allowing marketers to opt out of Web properties delivering lesser return on their advertising investments.

With these new tools, the metrics search marketers use to measure campaign success are evolving from impression counts and click-through rates to more sophisticated and predictable return on investment methodologies based on cost per click. In fact, some pay-per-click engines already provide advertisers with tools to calculate conversion rates from impressions to orders and ROI using tracking URLs or by inserting scripts on landing and action pages. Overture and Google already go one step further, suggesting forecasted traffic levels and cost estimates for specific keyword combinations, match types and bid amounts.

Together, these forces are causing search marketers to focus less on what ranking they have achieved across the various engines through site optimization, paid inclusion and paid listings, and instead on the amount of qualified traffic generated by a given level of investment. As a result, the distinctions commonly made between the various search-marketing programs in existence today are being superceded by a gradual movement toward a common ROI measurement.

With the ability to calculate ROI across a greater array of campaigns, search marketing is becoming increasingly complicated as advertisers seek the right mix of programs to optimize their return. In response, paid search agencies such as Referencement.com, MarketLeap, Decide Interactive, Quigo and others are stepping in to efficiently aggregate management and reporting of multiple campaigns and, in the process, further obscuring the details of the individual programs themselves.

Optimizing Relevancy and Yield

A second trend that is blurring the definitions traditionally associated with paid inclusion and paid listing programs is being driven by the efforts of search engines to enhance relevancy and maximize yield.

Search engines today are increasingly monitoring impression and click metrics, a practice initiated by AskJeeves DirectHit technology a few years ago, to improve relevancy of organic results by promoting and demoting individual links based on popularity. As demonstrated by the Google AdWord program, integrating popularity parameters to paid content ranking takes search engines one step closer to optimizing relevancy of results, and yield per page.

In addition, search engine crawlers are increasingly leveraging ever-smarter linguistic technology to analyze and categorize page content, improving the relevancy of organic results. Concept and entity extraction, advanced contextual categorization, spelling and stemming adjustments, phrase extraction and stop words recognition are increasingly equally leveraged to optimize targeting of paid content, improving on relevancy and maximizing click through rates.

Search personalization will also affect ranking of organic results, as well as dramatically impact advertisers' control over paid placement. Whether inferred from explicit user profile, implicit past behavior or current application context, a better understanding of user intent will contribute to a better experience and improved click-through rates.

In an increasingly yield-driven context where content targeting gets more sophisticated and matching more scientific, paid listing results could very well be demoted to the extent of overlapping with paid inclusion results, blurring the definitions traditionally associated with these programs.

The Bid for Traffic Model

As emphasis on ROI as the common success metric across all search marketing initiatives increases, and the distinctions that have traditionally defined paid listing and paid inclusion programs continue to blur, these programs will gradually become superceded by a practice I call, "bid for traffic." Ultimately, advertisers will be able to target impressions by dictating an ROI level acceptable to them such as "8% over advertising spend," without the need to understand the unique distinctions and characteristics of the various programs available to search marketers today.

Sphere: Related Content

Add to Google Add to My AOL


What's it Going to Take to Beat Google?

Funny, that's what I wrote on June 12, 2003, for Danny and Chris at Search Engine Watch.

Search has changed dramatically since the early AltaVista days under Digital Equipment. The pure search and technology-centric engines left standing have transformed themselves into direct-marketing businesses. Too bad AltaVista was so busy thinking of itself as a portal while Overture was inventing performance-based keyword search marketing, and Google was laser-focused on relevancy, laying down the groundwork for its successfully relevant AdWord paid inclusion program.

So what's it going to take to beat Google? Here's a look at some of the critical factors search engines need to address to be successful in today's environment.

Relevancy, performance, index, and ease of use

Relevancy, index size and freshness, performance, and ease of use are still critical factors, but they are no longer sufficient to predict future success.

Google's claim to fame, with a share of about 55% of usage according to OneStat.com, came from delivering the most relevant results of all. However, the relevancy gap between Google and other search engines like Inktomi or AlltheWeb now seems to have faded. Most major engines are now very good at serving very relevant results given the few clues they can glean from the query, typically made up just of one or two search keywords. Are search engines' results as relevant as they will ever get?

Performance, or how quickly browsers display the results, has become a non-issue except for sporadic downtimes. The size of the web is growing continuously, so are indices of the major crawler-based engines. Spiders are racing through the web, refreshing their content more often than ever.

Portals have poured money into usability studies to find out what users want their results to look like. There is not a one-fit all look and feel, but search engines seem to have finally figured out that non-relevant clutter is bad and simplicity is good.

To remain on the edge, search engines have to continuously push the frontiers of innovation.

Users' intent and personalized relevancy

The subjective nature of users' intent when formulating queries is complex. Understanding the context is challenging, but two important factors, location and time, are under-exploited by the engines today. A user typing the query [pizza] from Chicago or Milan, at 11:00am or 9:00pm local time, doesn't expect the same results. Besides, the local advertising market is considerable and better geo-based services are critical to some wireless devices.

Training the engines

Search engines should learn from our behaviors. Users often search on the same topic, typing recurring queries. Users don't always click on the first result and often navigate back and forth selecting links from a results page. Search engines should be more proactive, learn for the benefit of individual users and become smarter over time.

Think of it as training the engines. The more you use them, the better advice you receive. There are issues to address, including privacy concerns and how to handle multiple users of a single PC, to name only a few. But these challenges these issues pose can be overcome.

Query disambiguation

Emerging concept-based clustering technologies, used in search engines such as Vivisimo, are doing wonders at allowing users to refine ambiguous queries. For example, searchers can discriminate between computer virus and medical virus from a more generic [virus] query, but they still have to do the work after the results are served.

Disambiguation technologies have not been fully leveraged yet. Andrei Broder, Distinguished Engineer and CTO of the Institute for Search and Text Analysis, part of the IBM Research Division, explains that queries can be divided into three categories: informational, navigational, and transactional.

Some queries are clearly transactional, such as [iPaq H1910]; others are clearly informational, such as [muscular atrophy], as others clearly are navigational, such as [NBA official site]. Some smart linguists could dissect and categorize these search terms and make better sense of the users' intent.

Making results productive

Search engines should better follow through and make results more productive for users, notifying them of new relevant results for specific queries. Users often go to the same sites looking for new content. Search engines could monitor these changes for us. What a great opportunity for direct marketing businesses to establish that one-to-one marketing relationship, directly addressing users' needs, and serve a relevant Google AdWord or Overture paid link in the notification email.

Vertical engines

The continued emergence of vertical search engines will increasingly fragment the market, eroding relative usage share away from Google. Crawling deeper through the invisible web, the indices of these topical engines have more depth for the subjects they cover. Findlaw.com, for example, allows users to retrieve proprietary content such as cases, opinions, and other legal reference material unavailable elsewhere.

Serving niche markets and a very targeted user bases, these sites are in a position to offer marketers better click conversion rates and, command higher cost-per-clicks than the general purpose engines.

Meta search engines

Meta search engines, such as Infospace's Dogpile.com, Vivisimo.com, and Mamma.com have a very good shot at beating Google. Why wouldn't the most relevant results from several of the best engines not be more relevant than the results of a single - even the best - crawler-based engine?

Meta search engines are not as technology-centric as search engines. But they have spent too much time trying to replicate single-source crawler-based engines and have been carrying a bad reputation for serving too many irrelevant paid links and obtrusive popup advertisements. Meta search engines should build on the differentiated and added value of aggregating results from the best sources.


The recent consolidation trends mean fewer players. Inevitably, these are also the fittest, with more negotiation power. Yahoo acquiring Overture would tip the balance of power, creating a formidable competitor to Google. MSN Search has certainly not played its hand yet. Redmond cannot feel too good about fueling Yahoo's traffic and paid placement revenues.

More players and competitors will surface as more creative and sustainable business models emerge. Search players will increasingly focus on respective and distinct core competencies. Indexing the web is a complex task, as is researching smarter relevancy algorithms. Richer concept-based marketing tools will require more sophisticated skills.

A new model could very well emerge, where crawlers crawl and marketing firms target campaigns. Meta search engines could very well differentiate themselves providing real aggregation value, executing on relevancy and user experience, and emerge as the top search destinations.

Sphere: Related Content

Add to Google Add to My AOL


Search is the Internet OS

That's what I have been thinking and thinking about for a while!

Sphere: Related Content

Add to Google Add to My AOL