In recent weeks, like others, I've been becoming increasingly frustrated with Google Search. Whenever I'm making anything other than the most basic of searches, I find my results are being increasingly polluted by totally irrelevant results and in some cases by complete gibberish. I feel slightly embarrassed admitting this, since I'm also, ironically, rather pleased with the way Google indexes this site: it shows up on the front page of many Google search results on subjects I've covered here, and a massive three quarters of my readers originally landed here as a result of a Google search—and practically none from the other search engines. And this site isn't the only one. Google's market share in search remains massive:
Nevertheless, the fact is that the web is changing fast, while Google's basic search algorithm hasn't changed radically over the past fifteen years. Of course, there's nothing secret about Google's algorithm per se: an algorithm is a mathematical equation that uses certain information from a website in order to define its rankings. The actual algorithm used by Google is secret, to avoid people abusing it. Yet although its specific details are kept secret, its basic outline has been patented and is thus accessible to the public: those inetrested in these matters regularly attempt to interpret any given iteration of it.
The growing problem with Google Search: spam and moronic, low-quality content
Yet over the fifteen years since Google designed its search model and rose to prominence, the web changed quite radically: available content has increased exponentially, and while the quantity of high-quality content has increased significantly in absolute terms, poor-quality content has probably proportionately increased even more. Demand Media's stunningly rubbishy sites, such as eHow, 'online how-to guide with more than 1 million articles and 170,000 videos offering step-by-step instructions' created by freelancers frequently bordering on the verge of imbecility, is one example of a site I wish could just vanish from my browser for ever.
The issue has recently reached the point where, inevitably, it became a major topic for discussion. Last month, Stack Overflow's Jeff Atwood pointed out that content farms regurgitating original content were actually ranking higher in Google's search results than the originals. His specific complaint was as follows:
In 2010, our mailboxes suddenly started overflowing with complaints from users—complaints that they were doing perfectly reasonable Google searches, and ending up on scraper sites that mirrored Stack Overflow content with added advertisements. Even worse, in some cases, the original Stack Overflow question was nowhere to be found in the search results! That's particularly odd because our attribution terms require linking directly back to us, the canonical source for the question, without nofollow. Google, in indexing the scraped page, cannot avoid seeing that the scraped page links back to the canonical source. This culminated in, of all things, a special browser plug-in that redirects to Stack Overflow from the ripoff sites.
This is the first time since 2000 that I can recall Google search quality ever declining, and it has inspired some rather heretical thoughts in me -- are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of search destined to be less algorithmic and more social?
On January 28, 2011, Matt Cutts, head of Google’s Webspam team, confirmed that Google's algorithms had been changed to 'help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.' He went on to state that:
This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.
The change in the algorithm was designed to correct such absurd imperfections as the fact that, when you googled Paris in the hope of finding useful tourist information about, say, hotels in France's exciting capital city, Google at one stage used to also regurgitate content pertaining to an unknown American young lady actually called Miss Paris Hilton, of whom no one had ever heard. This has now been corrected:
It's actually probably too early to tell whether this battle is still Google's to win. But the fact is that even after the change, it's much harder finding relevant content in Google now than it was a few years ago, even when you make full use of Google's Search settings [i] to make your search results more relevant.
Certainly, a lot of the imperfections in Google are still there: it isn't just the content farms that are polluting search, but also low-quality original content that exists in abundance on some very popular sites, forums in particular, where content can border on the moronic.
Of course, Google, which runs over one million servers in data centers around the world and processes over one billion search requests and about twenty-four petabytes of user-generated data every day, is not going to disappear overnight just because these issues have arisen. The majority of its users would be incapable of even realising there is an issue with search quality, or caring about it, in the short to medium term, if they did. In the longer term, however, Google's prospects are more open to question: its massive $200bn market capitalisation [ii] has arisen entirely because of revenue generated from search, and the company will clearly be exposed to exactly the same type of decline as has plagued Microsoft before it if it proves incapable of maintaining its excellence in its core area of business while also failing (as it has clearly done so far) to diversify into viable alternative sources of growth.
There is no reason why Google, which started as a tiny, innovative company, couldn't be beaten at its own game by a nimbler new entrant. Of course, there have been many such attempts in the past decade—and no player has so far succeeded in even scratching the surface of Google's dominance. Yet now, for the first time in fifteen years, Google is being technologically challenged in a way that it hadn't been before—and it's notoriously difficult for a large, relatively bureaucratic entity to meet that challenge. Bing and Yahoo, which are the mass-market alternatives that you can set as the default search engine, in Safari's Preferences, suffer from much the same problems and haven't in my experience demonstrated any systematic advantage compared with Google, in fact in Yahoo's case arguably the reverse. But smaller, new entrants, have already been establishing a market presence, albeit on a minute scale. Some of theme actually offer real value that makes them interesting alternatives to Google, at least on a niche basis.
It's impossible to tell, at this very early point, who the winners are going to be in the game of offering more relevant search engines than Google—and, in a sense, you don't need to: unlike email or social networks, where choosing a provider implies a measure of trust in its long-term viability because the service provided necessarily requires it to host your data, search is a service where the marginal cost of switching between providers is nil—so all you need is to keep on top of new developments. With that in mind, in addition to the three examples presented in this blog post, you might also want to look at Hakia and Kngine, which also have their strengths and weaknesses.
Duck Duck Go
Duck Duck Go demonstrates conclusively that an alternative method based on other parameters than Google's can produce quality search results. It uses information from what it calls 'crowd-sourced' sites (like Wikipedia), with the aim of augmenting traditional results and improving relevance. It doesn't record your particulars or resort to cookies, and its emphasis on privacy appeals to some users. Its results are a combination of many sources, including Yahoo! Search BOSS, Wikipedia, and its own Web crawler (DuckDuckBot). Its layout in my view is the cleanest of all search engines: it uses the 'crowd-sourced' data to populate 'Zero-click Info' boxes: these are red boxes containing topic summaries and related topics, visible just above results.
But Duck Duck Go goes further than this. Its founder, Mr Gabriel Weinberg, states he has filtered out search results for companies he believes are content mills, such as Demand Media's eHow, which publishes 4,000 articles daily produced by paid freelance writers, which Weinberg has described as 'low-quality content designed specifically to rank highly in Google's search index.' Duck Duck Go also attempts to filter out pages with lots of advertising.
Duck Duck Go's most distinctive feature, and one I find really useful, is that it can get results from other search engines without ever leaving the page using their !bang feature. You can search any major site such as Google, YouTube, Amazon and eBay by preceding the search term with '!youtube' or '!google', for instance—and you will automatically be taken to a list of search results from that service. It also works with keywords such as weather, images, and lyrics.
There are also tons of other features that Duck Duck Go offers, located on the Goodies page. These range from special queries, such as generating a random password (Search: pw) to looking up books via an ISBN number.
Duck Duck Go comes with a full set of keyboard shortcuts, which are conveniently displayed on the right of your search results in the minimalist, customisable interface: all in all, while it's clearly still work in progress [iii] it's a powerful and elegant search engine.
Blekko, which launched in a fairly massive blare of publicity in November 2010, just as criticism of Google search results was beginning to hit the news, uses slashtags [iv] to provide results for common searches (although you can of course use the search engine in the traditional way, without them).
Hashtags can be used in a variety of ways: you can search for results from one site (“MacBook Air/Amazon,” for instance, will search for MacBook Airs on Amazon.com), narrow searches by type (“Wladimir/people” shows people called Wladimir) or you can search by topic; “Climate change/conservative” shows results from right-leaning sites, and “Obama/humor” shows humor sites referring to the President of the US.
In addition, anyone searching for results in certain heavily-searched categories, such as cars, finance, health and hotels, will be presented with content screened beforehand by editors to exclude low-quality or irrelevant content that Google's algorithm-driven engine can't filter out entirely. In health, for instance, which is notorious as one of the most difficult areas to find relevant content, results in Blekko are limited to just seventy-six sites that its editors have determined to be trustworthy; the same method is also applied to recipes, autos, hotels, song lyrics, personal finance and colleges . Registered users can manually label a site as spam, and also have a number of other features available [v].
The fact is, Blekko's good. Mr John Dvorak, whose authority carries far more weight than most, has this to say of it:
The first thing I do, of course, with a new engine is search for myself. I do not do this because I'm vain, but because I know exactly what should appear and in what order. Most writers, like myself, with a high-profile on the Internet use this same trick to see if a search engine is any good. Blekko nails the search. When I search for myself, one of the following items should appear at the top of the search: my blog at dvorak.org/blog and channeldvorak.com. After that, I expect to see links to my short bio, my wiki entry, PCMag, Mevio, Marketwatch, NoAgendashow, or other newer sites where I'm active. Google nails this every time. Yahoo always used to screw it up completely. Bing mostly nails it except for adding oddball links that have nothing to do with me. Most low-budget wannabees just make a mess with old links that have lingered for a decade.
It's worth noting that Blekko and Duck Duck Go have actually started collaborating, with Blekko now providing content for DuckDuckGo’s search results in the following categories of searches: health, colleges, autos, personal finance, lyrics, recipes and hotels. As mentioned earlier, these are the initial categories in which Blekko is automatically applying its slashtags to improve results. This may well set a pattern in which the smaller and niche-based search engines work as a network, which may well turn out in aggregate to provide much more robust search content that Google's.
Worlfram Alpha isn't really a search engine at all: it defines itself as a 'computational knowledge engine. It answers factual queries directly by computing the answer from structured data, rather than providing a list of documents or web pages that might contain the answer as a search engine would.
Query answers are accompanied by links to the results of each search to illustrate the variety of answers that Wolfram Alpha provides to non-specific queries. Wolfram Alpha is also capable of responding to increasingly complex, natural-language fact-based questions. It isn't a substitute for other search engines, but provides much more relevant answers for certain types of content. Look at the following example for the query 'Google market capitalisation':
Using several search engines at once on Safari with Glims
In reality, no one search engine will meet everyone's needs. And anyone with more specific requirements may actually want to use any one of several search engines, depending on the nature of each item he is searching about:
- when I'm searching for information about an event that has only just happened, or is the subject of yet unconfirmed rumours, the only viable source of up-to-date information is currently Twitter;
- when searching for images, Google Image Search will usually get me what I want, although for some types of photographs Flickr and/or specialised sites such as Find Icons or Deviant Art can be worth looking at;
- if I want content dating from a specific time period, Google is still the most effective tool—a standard Google search can be fine-tuned to cover any custom period on which you want to focus;
- searches in foreign languages are still best conducted through Google, as the small new players aren't yet present at all for content in languages other than English;
- but when I'm after relevant content in areas where I know a standard Google search will be polluted by spam or forum gibberish, I tend to currently use Duck Duck Go or Blekko.
Using all these forms of search can be fastidious, and this is probably the reason why the majority of people stick to Google for their searches, despite the fact that looking for content on Google increasingly takes its toll on one's productivity, owing to the inordinate and increasing amount of time it takes to find things on it. To avoid wasting my own time while using the full range of search tools I need, I've for some time been using Glims, a Safari add-on that incorporates some much-needed extra features [vi] to its original set of options. Glims is worth having even if you're not interested in all its features, but improving search is where it really shines: Glims lets you change the default search engine used by Safari's toolbar search field. It comes with predefined sets of engines and, as expected, you can define your own, either as traditional search engines or as search functions within any standard website of your choice, from the preferences:
I've added Blekko and Duck Duck Go to the predefined Google and Google Images (removing all the others), and added Wolfram Alpha, Twitter and the French version of Wikipedia to the original English one. You can configure a keyboard shortcut for each of your search options: I use ^G for Google, Î for Google Images, ^B for Blekko, ^D for Duck Duck Go, ^T for Twitter, ^A for Wolfram Alpha, ^W for Wikepedia and ⌥^W for Wikipedia France:
If you combine this with the predefined Safari keyboard shortcut of ⌥⌘ which moves the cursor to the search engine box, you can actually search within any search engine or website you define without leaving the keyboard, which is a huge time saver.
Searching on mobile devices
I personally find the Google Mobile App for iOS so spectacularly bad that I've removed it from my devices, but it is also available for those who need it. I find that if you need to search Google on the iPhone, the simplest is to so within Safari, although Apple's autocorrect, which ought to be turned off by default for this type of operation, will usually make the experience extremely tiresome._______________
- Six different Search Preferences are available for modification: Interface Language, Search Language, SafeSearch Filtering, Number of Results, Results Window and Query Suggestions.
- Up from $23bn at the time of its IPO in 2004.
- As I write this, it's impossible to access Duck Duck Go from Paris, because of an Internet routing issue between the French capital and Verizon, which hosts Duck Duck Go's main website.
- At the time of the launch, Blekko's 8,000 beta editors had already registered 3,000 slashtags corresponding to their most frequent searches.
- These currently include: SEO statistics; Linking pages (in and out statistics); IP address lookup; Cached pages; Tagging of pages; Creating and searching /slashtags; Finding duplicate content; Comparing sites; Crawl statistics; Page count; Robots.txt location; Cohosted sites; Page latency; Page length.
- Tabs, Thumbnails, Full Screen, Search Engines, Search Suggestions, Forms autocomplete on, Dated download folders, Type Ahead