Skip to content
ARDURA Lab
·3 min

Information gain — how Google ranks content that surpasses SERP

SEOGEOcontentinformation gain

What is information gain?

Information gain is a Google ranking signal described in patent US9317592B1 from 2014, titled "Content-based ranking signals." It measures how much new, unique information your content brings relative to pages already ranking for the same query.

In practice: if 10 pages in SERP say the same thing, the 11th page with identical content doesn't deserve ranking. A page that adds a new angle, data, experiments, fresh perspectives — gains an information gain signal and stands a chance of outranking competitors.

Why does it matter?

  • The Helpful Content algorithm (2022) and Spam Updates (2024) strongly use this signal to filter duplicated content
  • AI Overviews and GEO — language models cite sources bringing unique data, not those paraphrasing
  • Long-tail visibility — content with gain dominates less competitive (long-tail) queries
  • Defensive SEO — without gain, even a well-optimized page loses positions to fresher competition

How does Google measure information gain?

The patent describes a 3-stage process:

  1. Top-N corpus — Google fetches top 10-20 pages for the query
  2. Entity & claim extraction — extracts entities (people, places, concepts) and claims from each page
  3. Comparative scoring — compares your page with the aggregated corpus; more new entities/claims = higher gain score

In MarketingOS we implement this signal through the infogain audit module, building a SERP coverage matrix and pointing out gain points — specific fragments to add.

How to increase information gain?

1. Audit SERP before writing

  • Check top 10 pages for your query
  • Extract their H2/H3 (topic structure) and dominant entities
  • Identify gaps — topics covered by 1-2 pages but not the rest

2. Bring unique data

  • Internal metrics — own case studies, A/B tests, Google Search Console data
  • Fresh benchmarks — comparison of multiple tools/strategies with your setup
  • Experiments — description of test and conclusions nobody published yet
  • Insider knowledge — market context, product decisions, expert perspectives

3. Add entities missing from corpus

  • Enrich text with named entities (people, companies, tools, standards, dates)
  • Each entity is a potential topical signal — link to entity in knowledge graph
  • Target: 15+ unique entities per page (Google Cortez 2026 threshold)

4. Atomic answers

  • Look at content as a set of atomic answers (entity + image + content atoms)
  • Each fragment must carry one unique piece of information extractable by AI
  • Helpful formats: FAQ, definitions, statistics, step-by-step lists

Common mistakes

  • Paraphrasing top-rankers — AI detectors and Google recognize this (zero gain)
  • Theory only, no practice — corpus already contains theory; add "how we did it"
  • No numbers — raw data (percentages, KPIs, times) is a strong gain signal
  • Copy-pasted FAQ from competitors — Google groups duplicates and picks the original

Information gain and GEO

In the context of GEO (Generative Engine Optimization) information gain works even stronger — LLMs (ChatGPT, Perplexity, Gemini) pick unique sources to cite, because duplicated content gets suppressed in the response deduplication phase.

Pages with high gain earn:

  • AI citations in generative responses
  • Brand mentions in AI narratives
  • Zero-click visibility despite no click

Related terms

Need help?

AI search engine optimization — check out our offer.