Understanding how Google works and how it is changing is an important part of the SEO industry.
The purpose of this encyclopedia is to keep SEO professionals, and those who work with them, informed about updates to Google’s algorithm and technologies.
Released: March 7, 2017
Consensus: An algorithm that primarily targets thin content sites that are “made for AdSense,” affiliate links, and similar ad-heavy sites.
On the morning of March 8, several reports frequented “black hat” forums of sites losing rankings, and various Google ranking tracking tools reported strong movement in the search results. The update appeared to have occurred on March 7.
After reviewing approximately 100 sites impacted by the update, Barry Schwartz concluded that the vast majority of the sites shared these characteristics:
- They were primarily content sites, typically blogs, but not always.
- They were heavy on advertisements, which were often presented in a manner that made them difficult to distinguish from primary content.
- The subject matter of the sites was often very broad in nature without focusing around a central topic.
- The content appeared to be written for ranking purposes.
- The content did not seem to add value over and above what other industry experts were providing.
- Negatively impacted sites saw drops in the range of 50 to 90 percent of their organic search traffic.
Examples Of Negatively Affected Sites:
Archived Link: March 7, 2017, Wayback Machine
- The phrase “adsbygoogle” appears in the source code 28 times.
- The keywords are formatted in bold.
- The use of grammar is poor and the content contains run-on sentences.
- The word count is short.
- The amount of information portrayed per word is low.
- The “how-to” instructions are not detailed enough to follow through on.
Archived Link: November 22, 2016, Wayback Machine
- The phrase “adsbygoogle” appears in the source code for the homepage 24 times.
- The phrase “current affairs” is repeated with high keyword density on the homepage.
- Links to blog posts are repeated within a paragraph of each other.
- The targeted phrases seem to have no semantic relationship with one another: “current affairs,” “bank recruitment,” “SSC recruitment,” and “exam updates.”
- There is no contextual content provided to help clarify the purpose of the site, just various links to quizzes
Released: September 1, 2016
Consensus: An update to local search results that seems to have only affected “3-pack” listings, designed to increase the diversity of the local 3-pack and to eliminate spam from it.
Several primary changes were identified in the weeks following the Possum update:
- Businesses located outside of city limits saw a dramatic boost in local search rankings. In the past, local search results excluded businesses that were located just outside of the city limits, sometimes not including them in any city at all. This allowed businesses which once couldn’t hope to rank in local search results the ability to compete fairly.
- An increase in filtering of business duplicates. In other words, if two businesses share the same address and belong to the same business category, only one of those businesses will be typically be listed.
- Zooming in on the map reveals the filtered businesses.
- The algorithm was sophisticated enough to identify two different addresses as being at the same location. For example, if a building is large enough to be reached from two different streets, and as a result has two different addresses, both are counted as the same location, and if two business listings are associated with that location in the same business category, one of them is filtered out.
- Similarly, different suite numbers are not identified as different locations, for the purposes of filtering.
- In an even more sophisticated twist, two businesses across the street from one another, with two different addresses, different names, and different staff, ma still be counted as the same location if they have the same owner.
- Searcher location became more important, making it more difficult to track rankings if located elsewhere.
- Results were far more personalized to the keywords being searched for, with more niche results being displayed.
- Organic search became decoupled from local filtering. In the past, if there was an association between an organically filtered site and a local site, the local site would be filtered as well. This association was loosened.
- Around the same time, a secondary update may have been released that affected organic rankings, separate from Possum. Details about this update are not clear. It is also possible that these changes were related to the decoupling in filtering between local and organic search results.
The Quality Update (“Phantom 2”)
Released: May 3, 2015
Consensus: An update to Google’s algorithm aimed at reassessing how it evaluates the quality of search results. The impact of the change was significant, but it did not appear to target any specific camp of sites, and it is still unclear what factors were adjusted or added.
Several webmasters noticed changes in their search traffic and search positions, and rank tracking tools like Mozcast and SERPs.com observed big shakeups in the search landscape, throughout the weekend of May third.
Many in the industry first suspected that the update was Panda related, but Google denied that it was a Panda update. In fact, Google denied that there had been a weekend update at all.
On May 19, Google officially confirmed that an update had in fact occurred earlier that month, and that the update was related to how Google evaluates the quality of content. This suggests that it affects sites in similar ways to Panda, but it is a different update, and in fact seems to have been an update to the core algorithm. It is possible that this update was related to RankBrain, since the precise date of RankBrain’s introduction is unknown, but there is no way to know for sure.
Before Google confirmed the update, some in the industry began referring to it as “Phantom 2,” since a prior unconfirmed Google update had been referred to as “Phantom.” Glenn Gabe was responsible for the name and much of the SEO industry research surrounding it.
Gabe’s dealings with clients suggest a few important insights:
- The Quality Update was industry agnostic
- The update seemed to affect full domains, not individual pages
- Clumsy and frustrating user experience
- Aggressive advertising
- Content rendering issues that made the site look worse in Google’s cache than to users
- Deceptive use of ads and affiliate links
- Excessive pagination
- Lackluster (Panda-like) content
- April 22, 2015
- May 12, 2016
Consensus: An update that favors mobile-friendly search results when the searcher is on a mobile device.
Google pre-announced the launch of a new mobile-friendly update to the search engines on February 26, 2015.
Google announced that mobile searches would return more mobile-friendly results than desktop searches. Google also recommended following their guidelines on creating mobile-friendly sites, and provided webmasters with a mobile-friendly test to determine if pages on their sites would be considered mobile friendly by the updated algorithm.
Google also announced that information from indexed apps would be used as a ranking factor for users who had the associated app installed.
Moz noted that the impact was fairly mild and that recovery was virtually instantaneous for sites that updated their content to be mobile friendly, which Moz had done for its blog and witnessed a recovery within 24 hours.
On March 16, Google again pre-announced another mobile update, indicating that the mobile-friendliness factor would have its strength amped up for mobile searchers. The update occurred on May 12, 2015.
Release dates: The release date is unknown, but occurred gradually in early 2015. The existence of RankBrain was confirmed by Google on October 26, 2015.
Consensus: As reported by Bloomberg and confirmed by Google, RankBrain is a machine-learning artificial intelligence system that helps Google sort through its search results. It helps Google interpret search queries.
RankBrain is a machine-learning algorithm developed to help Google sort through its search results. A machine-learning algorithm is one in which goals are specified and software outcomes are ranked by human raters, and the algorithm adjusts itself to receive higher scores from human raters.
As of the time that the existence of RankBrain was confirmed, it was considered the third most important factor in Google’s ranking algorithm. The two most important factors were content and links. No order was specified for which of those two was most important.
The purpose of RankBrain is to assist Google in interpreting the meaning of the searcher’s query. By better understanding searcher intent, the algorithm was intended to return results that were more relevant to the searcher.
Approximately 15 percent of the 3 billion searches conducted every day in Google are entirely new searches no searcher has ever entered before. RankBrain was developed primarily to deal with these types of queries. Google told Bloomberg that RankBrain impacted “a very large fraction” of these entirely novel search phrases.
RankBrain was not designed to punish or reward sites, but rather to better interpret search queries. For that reason, most changes introduced by the algorithm impact long tail, more conversational search queries. This means that short, popular search phrases were primarily unaffected.
Smaller sites with less keyword-targeted content likely saw a small boost in organic search traffic as a result of Google returning more relevant search results.
In June of 2016, Google confirmed that RankBrain was no longer applied only to the 15 percent of queries that Google had never seen before. RankBrain was, at this time, being used in the evaluation of all search queries.
Since RankBrain is a machine learning algorithm, even Google engineers do not fully understand precisely how it does what it does, but rather understand the goals of the algorithm, which is to better understand search queries.
Release dates: July 24, 2014
Consensus: An update to local search that ties traditional web signals to local search results and improves distance and location rankings.
This algorithm was introduced to introduce ranking factors traditionally reserved for traditional web search results into local search results. Factors that were primarily used to rank websites, such as links and content, and hundreds of others, were applied to local search results by this update.
One of the more apparent changes to the search results was the inclusion of more local directory listings in local search results, such as Yelp reviews and similar local business resources. Yelp had previously accused Google of manipulating search results and suppressing Yelp’s visibility. The changes were apparent for other local business listings as well.
More traditional web content was also more readily included in local search results. For example, restaurant guides for cities published by newspapers, magazines, and blogs were given more prominent placement.
Since the update incorporated more traditional SEO ranking factors, large sites like directories and review sites, with more inbound links and authority, tended to rank better than they had before. Meanwhile, small sites for local businesses were, at the time of Pigeon’s release, often shifted to the second page of the search results.
For these reasons, the update made the local pack results particularly important for local businesses, since this was where they were most likely to be seen on the front page. Likewise, it made it more important for local businesses to earn web authority through more conventional SEO factors like links and content, rather than just location and citations.
A prominent bug shortly after Pigeon’s release led to some web sites being dubbed local businesses. An infamous example included Expedia being listed as a “hotel” in Google’s local carousel. The error has since been resolved.
Pigeon was criticized by some in the SEO community for favoring big players, creating a less desirable experience for searchers, and rewarding Yelp’s “whining.” Shorter two- and three-pack local listings became much more common than before the Pigeon update, and multiple listings of Yelp-like directories increased in frequency as well.
On the other hand, some observed increases in traffic, and local advertisements with distracting star ratings became more common, drawing more attention to the organic search results.
Other important changes noted included an in increase in the granularity of geo-targeting, and an increase in the frequency of local search results, offsetting the reduction in local pack size. Sites that had previously showed up both in local packs and organic results typically showed in one or the other after the update.
“Payday Loan” Update
- 1.0: June 11, 2013
- 2.0: May 16, 2014
- 3.0: June 12, 2014
Consensus: An algorithm update designed specifically to deal with spam in particularly notorious niches, with payday loans and pornographic searches being mentioned as specific examples.
Matt Cutts pre-announced an update that would target results for keywords that tended to attract competition notorious for using spam techniques to rank in the search results.
The launch became official on June 11th, although it would be a slow rollout that would take a month or two before completion.
At SMX advanced, Cutts had explained that the algorithm would target link schemes that were primarily unique to these highly spammy niches, some of which were downright illegal.
The update hit only a small portion of queries, approximately 0.3 percent, but it hit those particular queries very hard.
Google confirmed an update to the algorithm with Search Engine Journal on May 20, 2014. This second update affected about 0.2 percent of queries.
Cutts announced that version 3.0 of the algorithm went out on June 12, 2014.
Release dates: Announced on September 26, 2013. Ranking tools suggest the update occurred on August 20.
Consensus: A complete revamping of the Google search algorithm, designed primarily to deal with “conversational search.” The revamp focused on interpreting user queries more than on ranking sites.
Google announced on September 26th that its core algorithm had been completely revamped, and that the update was affecting about 90 percent of searches. This dramatic number would have sounded terrifying to those familiar with prior updates like Panda, which affected closer to 12 percent of queries, but the update had occurred a month earlier and had not been noticed by the SEO industry.
The reason for this is that the Hummingbird revamp focused primarily on changing the way Google evaluates the meaning of a searcher’s query. This was driven in large part by mobile search, where users were more likely to speak the question to Google and, as a result, ask more conversational questions.
Google provided examples of differences in search results. For example, a search for “acid reflux prescription” used to give only drug listings, but returned more broad treatment information after the update. A search for “pay your bills through citizens bank and trust bank” used to take searchers to the homepage of Citizens Bank, but took them to the bill paying page after the update.
These changes impact long tail search far more than “head” keyword, and are more likely to send searchers to the same page for a wider variety of queries.
Exact-Match Domain Update
Release dates: September 27, 2012
Consensus: An update that diminished the visibility of sites with domain names that matched the searcher’s query.
Matt Cutts announced on September 28 that Google was rolling out an algorithm change that would reduce the prevalence of low quality exact-match domains in the search results.
It is unknown to what extent domain name ever played a direct role in Google search engine rankings, but brands with names matching a searchers query certainly attracted more anchor text for those queries and possibly other benefits.
This update corrected an overvaluation of exact-match domains and would affect 0.6% of queries, according to Cutts.
The algorithm does not punish sites for having a domain that matches a search term, but rather corrects an overvaluation of these sites that was benefiting low quality sites, and attempts to make the playing field more level.
Sites that chose a keyword focused domain over their own brand name seemed to be one of the primary casualties, including a few legitimate cases like The Michigan Association Of Public School Academies, but other issues were common, such as excessive keyword use, spammy links, and limited inbound links.
Release dates: August 2012
Consensus: An algorithm designed to penalize sites that are repeatedly accused of copyright infringement.
Google occasionally lists infringing material in its search results. If a site is listing copyrighted material, and the webmaster does not have permission to use the copyrighted material, the copyright holder can submit a Digital Millennium Copyright Act (DMCA) takedown request. Google will then remove the page from the search results.
Before the Pirate update, DMCA takedown requests only resulted in individual pages being removed from Google’s index. After the Pirate update, if too many DMCA takedown requests were filed against the same site, the entire site would then receive a lower ranking in the search results.
It should be noted that the Pirate update does not remove a site from the index entirely, the way that a DMCA takedown request results in an individual page’s complete removal. Rather, the Pirate update limits a site’s visibility based on the number of DMCA requests filed against it.
A DMCA takedown request is not proof of copyright infringement, due to the fact that anybody can file one, even if they are not the copyright holder. Even so, filling out DMCA takedown requests is time consuming, and some prominent members in Hollywood, Ari Emmanuel in particular, accused Google of not doing enough to combat piracy.
Google allows webmasters to respond to bogus DMCA takedowns by filing a counter notification. Note that filing a counter notification reveals your personal information to the entity that first filed the DMCA takedown notice, and that filing a counter notification can result in a legal battle.
In February of 2017, Google and Bing reached an agreement with the UK to keep pirated content out of the search results. Google explained that agreement was unlikely to result in changes to the algorithm, evidently believing that the existing Pirate algorithm was sufficient to meet the terms of the agreement.
The Pirate update is one of the more contentious updates in Google’s history, due to the complicated political nature of copyright law and piracy. A study by Columbia University alleges that 28 percent of the DMCA takedown notices submitted to Google were invalid. Automated DMCA notices that do not include the infringing material or that name websites which have already been shut down were the primary source of error.
- 1.0: April 24, 2012
- 1.1: May 25, 2012
- 1.2: October 5, 2012
- 2.0: May 22, 2013
- 2.1: October 4, 2013
- 3.0: October 17, 2014
- Everflux: December 10, 2014
Consensus: An algorithm designed to demote sites that violate Google’s guidelines on link manipulation, with an emphasis on outright spam.
Penguin is perhaps the most infamous Google update in the SEO community, although the vast majority of those negatively affected by the update were in fact guilty of abusive link building techniques.
The precise mechanisms of Penguin’s algorithm are unknown, but negatively affected sites generally had inbound or outbound links that appeared to be placed as a result of link buying and selling, hacking, private network building, comment spamming, excessive anchor text, and other spam-associated link building techniques.
Penguin was not originally incorporated into Google’s core search algorithm, instead updating the index as a secondary algorithm. For this reason, sites affected by Penguin were not originally able to fully recover in between Penguin data refreshes. Gradual recoveries were possible as a result of positive SEO signals building over time, but overnight recoveries as a result of addressing spam could occur only on dates that a Penguin refresh occurred.
Most have theorized that negative links were incorporated into the Penguin algorithm, allowing for the possibility of negative SEO. Others have suggested that only sites with outbound spam links were directly penalized, with sites receiving inbound spam links merely losing authority from links that were no longer counted.
Penguin went through three major algorithmic versions: 1.0, 2.0, and 3.0. Data refreshes also occurred, in which the Google index was updated by Penguin, but the Penguin algorithm was not updated. These included 1.1, 1.2, and 2.1.
Penguin Everflux was introduced in December of 2014. Beyond this point, Penguin was incorporated into the main Google search algorithm and the index was continuously updated. As a result, sites currently affected by Penguin cannot trace the impact to the date of a “Penguin update,” since updates are now continuous.
Sites that have recovered from Penguin on the date of a Penguin update seemed to do so primarily from removing spam links on their sites or pointing toward their sites. Sites that recovered in between Penguin updates were capable of doing so only through continuous branding and authority-building efforts.
- 1.0: February 23, 2011
- 2.0: April 11, 2011
- 2.1: May 9, 2011
- 2.2: June 21, 2011
- 2.3: July 23, 2011
- 2.4: August 12, 2011
- 2.5: September 28, 2011
- 2.5 “Flux”: October 5, 2011
- 3.1: November 18, 2011
- 3.2: January 18, 2011
- 3.3: February 27, 2012
- 3.4: March 23, 2012
- 3.5: April 19, 2012
- 3.6: April 27, 2012
- 3.7: June 8, 2012
- 3.8: June 25, 2012
- 3.9: July 24, 2012
- 3.9.1: August 20, 2012
- 3.9.2: September 18, 2012
- 20: September 27, 2012
- 21: November 5, 2012
- 22: November 21: 2012
- 23: December 21, 2012
- 24: January 22, 2013
- 25: March 14, 2013
- “Dance”: June 11, 2013
- “Recovery”: July 18, 2013
- 4.0: May 19, 2014
- 4.1: September 23, 2014
- 4.2: July 17, 2015
Consensus: A machine learning algorithm developed to remove thin content from the front page of the search results.
Panda has the most extensive documented history of any algorithm introduced by Google.
In response to complaints that the Google search results were becoming cluttered with “content farms,” Google developed and released the Panda update in February of 2011.
The SEO industry originally referred to the update as “Farmer,” due to its association with so-called “content farms.” Content farms are sites that focus on producing a large number of short content pieces in order to capture a wide variety of search queries. Many of these content farms were also link building havens with low editorial standards.
Well-known examples of content farms were ezinearticles.com and wikiHow. Sites like eBay with low quality user generated content also suffered under the update.
Panda was developed using machine-learning algorithms based on human quality ratings. It analyzes aspects of the content itself to determine whether it is the type of content that human quality raters would rate as poor or high in quality.
Panda has an extensive history of updates with a confusing nomenclature. After the initial release in February 2011, a second version of the algorithm (2.0) was released in April of 2011. Five data refreshes followed, and a third version of the algorithm (3.1) was released on November 18 of 2011. Several data refreshes followed, which became so numerous that the SEO industry temporarily dropped the 3.x.x numbering format and began referring to them simply by the number of updates (starting at 20).
This trend ended when Panda 4.0, a new version of the algorithm, was released in May 19 of 2014. Subsequent data refreshes were released in slow rollouts, ending the cycle of abrupt shifts in rankings that previously came with each new Panda refresh.
Sites that have recovered from Panda have done so by removing thin content, removing duplicate content, removing pages that never performed well, revamping content with a focus on usefulness for users, increasing word count provided this wasn’t accomplished by word filling, and increasing the information density of their content.
Release dates: April 28 through May 3, 2010
Consensus: An algorithmic filter that removed thin content from search results for long tail keywords.
Several webmasters noticed changes in their rankings during May, and Matt Cutts later confirmed that an update had been implemented in late April and early May.
Cutts clarified that the change affected long tail keyword searches more than “head” keyword searches. He advised webmasters who were dinged by the update to ask themselves if they were the most high quality site for the query, if they would be perceived as an authority, and if they were the most relevant search result for the query. Usefulness and being on-topic were also mentioned as important factors.
Vanessa Fox contacted Google for comment and they elaborated that the update specifically affected how sites were ranked, as opposed to changing how pages were indexed or crawled. Pages hurt by the algorithm, in other words, were not removed from Google’s index.
Most of the sites that were negatively impacted were very large, with large numbers of small pages, targeting specific search queries, with a limited amount of content on the page, and a limited number of links pointing to the pages, usually several clicks away from the homepage.
The overall impression given off by the update was that relevance signals were tweaked to contribute less to rankings, while quality signals were tweaked to contribute more to rankings. Relevance, in this context, refers mostly to the keywords used on the page and in the title, with less targeted but higher quality pages receiving a boost.
Andrew Shotland found that the hardest hit sites he was aware of typically had a very large number of URLs (millions) with very generic or thin content, and had most of their inbound link authority pointing to a much smaller set of pages. He pointed out that “yellow pages” sites were some of the hardest hit.
Larry Kim’s data suggested that the algorithm worked on the site level, rather than specifically acting on individual pages. According to Kim, the update favored sites which focused more on being an authority on a specific subject matter, as opposed to sites that aimed to be a jack of all trades. Kim also observed that while the update affected long tail queries more than head queries, it certainly wasn’t limited to them.
Release dates: August 2009 through June 2010
Consensus: An infrastructure update designed to dramatically increase the speed with which pages were crawled, indexed, and ranked, allowing new pages to be added in what was essentially real time, at least when compared to the previous speed of crawling and indexing.
On August 10, 2009, Google published an announcement on their blog that they were testing next generation infrastructure.
The announcement called for beta testers to provide feedback on the differences between the “old” Google and the Caffeine Google, which at the time was still very much under construction.
Shortly afterward, Mashable reviewed Caffeine and noted some important changes:
- Caffeine returned results for searches twice as quickly as the previous algorithm.
- In their opinion, the search results post-Caffeine were more accurate, although this one is obviously highly subjective.
- The search results in Caffeine seemed to care more about keywords, especially the keywords in the title. There is a good chance, in retrospect, that this was simply because the larger index included more sites with titles that matched the keywords in question.
- As far as they could tell, Caffeine didn’t seem to return better current event queries than the old Google. In retrospect, Caffeine laid the groundwork necessary for the later “freshness” update to work, which would indeed return more current information.
- The most obvious difference was the number of results returned. Their test returned 359 million for “dog,” with Caffeine, compared to 51.9 million with the old Google.
On November 27, Matt Cutts confirmed that Caffeine was live 50 percent of the time at one of Google’s data centers. This was in response to a great deal of noise in forums and discussion groups about dramatic changes in the search results, which were in fact only occurring at some locations.
On June 8, Google announced that the infrastructure update had been completed. In contrast with Mashable’s results (which in fairness were conducted early on in the process), Google claimed that results under Caffeine would be 50 percent more fresh than under the old Google.
The primary change in infrastructure, according to Google, was that Caffeine was more integrated. The old system consisted of several different layers, all of which would be updated on different schedules at different rates. The main layer would only update every few weeks. Refreshing it would require analyzing the entire index, which put a significant amount of time between the moment a page was first crawled and the time it was actually searchable.
Caffeine allowed small pieces of the web to be analyzed and updated at essentially the same time they were crawled. The index would be updated continuously, instead of refreshing every few weeks. Caffeine would add hundreds of thousands of pages each second.
“If this were a pile of paper it would grow three miles taller every second,” the blog post stated.
Matt Cutts clarified that pages would be indexed faster regardless of whether the content was deemed “real time,” and that the content would now be searchable within seconds of it getting crawled.
The Caffeine overhaul was designed to impact how quickly the index could be updated. It did not reflect any change in the way that pages were ranked in the search results, except for the obvious fact that a larger index would result in different listings.
Release date: February 2009
Consensus: An update that seemed to place more emphasis on branding and trust, and that some considered to unfairly favor big brands.
In February of 2009, murmurs began to flood the SEO communities and forums of a major change that had disrupted a lot of webmaster’s rankings. Small sites that had been dinged by the update frequently reported that the change in the algorithm seemed to favor big brands more than less well known sites, perhaps unfairly.
SEO Book wrote an extended post about Google’s growing interest in brands, quoting Eric Schmidt who had recently said “The internet is fast becoming a ‘cesspool’ where false information thrives…Brands are how you sort out the cesspool.”
The post pointed out how several query results had changed, with searches for “airline tickets,” “auto insurance,” “boots,” “diets,” “online degree,” and “watches” returning results that were dominated by big brands, where before the update those brands had been largely updated.
On February 26, 2009, Matt Cutts answered questions in a video response about the update. The video followed a request if any webmasters had any questions, which had received 114 questions in roughly three hours.
Cutts clarified that the update hadn’t necessarily been developed with the concept of “brands” in mind. He argued that the more relevant words were “trust,” “authority,” “reputation,” “PageRank,” and “high quality.”
Matt Cutts argued that this was a “simple change” affecting a small number of queries, and that he wouldn’t necessarily refer to it as an “update.” He also clarified that a webmaster could still be successful by becoming an authority within a small niche, and that the change wasn’t intended to favor big, general brands.
Many in the SEO industry look back at Vince as one of the first pushes the search engine made away from smaller sites toward bigger brands. In retrospect, it can be seen as the beginning of the end for the “made for AdSense” and “affiliate marketing” SEO business model that was popular at the time. While it is still very much possible for sites to make money using AdSense and affiliates, the push toward bigger brands changed the way that these businesses could survive in search results forever.
3,416 total views, 2 views today