Information gain — how Google ranks content that surpasses SERP
What is information gain?
Information gain is a Google ranking signal described in patent US9317592B1 from 2014, titled "Content-based ranking signals." It measures how much new, unique information your content brings relative to pages already ranking for the same query.
In practice: if 10 pages in SERP say the same thing, the 11th page with identical content doesn't deserve ranking. A page that adds a new angle, data, experiments, fresh perspectives — gains an information gain signal and stands a chance of outranking competitors.
Why does it matter?
- The Helpful Content algorithm (2022) and Spam Updates (2024) strongly use this signal to filter duplicated content
- AI Overviews and GEO — language models cite sources bringing unique data, not those paraphrasing
- Long-tail visibility — content with gain dominates less competitive (long-tail) queries
- Defensive SEO — without gain, even a well-optimized page loses positions to fresher competition
How does Google measure information gain?
The patent describes a 3-stage process:
- Top-N corpus — Google fetches top 10-20 pages for the query
- Entity & claim extraction — extracts entities (people, places, concepts) and claims from each page
- Comparative scoring — compares your page with the aggregated corpus; more new entities/claims = higher gain score
In MarketingOS we implement this signal through the infogain audit module, building a SERP coverage matrix and pointing out gain points — specific fragments to add.
How to increase information gain?
1. Audit SERP before writing
- Check top 10 pages for your query
- Extract their H2/H3 (topic structure) and dominant entities
- Identify gaps — topics covered by 1-2 pages but not the rest
2. Bring unique data
- Internal metrics — own case studies, A/B tests, Google Search Console data
- Fresh benchmarks — comparison of multiple tools/strategies with your setup
- Experiments — description of test and conclusions nobody published yet
- Insider knowledge — market context, product decisions, expert perspectives
3. Add entities missing from corpus
- Enrich text with named entities (people, companies, tools, standards, dates)
- Each entity is a potential topical signal — link to entity in knowledge graph
- Target: 15+ unique entities per page (Google Cortez 2026 threshold)
4. Atomic answers
- Look at content as a set of atomic answers (entity + image + content atoms)
- Each fragment must carry one unique piece of information extractable by AI
- Helpful formats: FAQ, definitions, statistics, step-by-step lists
Common mistakes
- Paraphrasing top-rankers — AI detectors and Google recognize this (zero gain)
- Theory only, no practice — corpus already contains theory; add "how we did it"
- No numbers — raw data (percentages, KPIs, times) is a strong gain signal
- Copy-pasted FAQ from competitors — Google groups duplicates and picks the original
Information gain and GEO
In the context of GEO (Generative Engine Optimization) information gain works even stronger — LLMs (ChatGPT, Perplexity, Gemini) pick unique sources to cite, because duplicated content gets suppressed in the response deduplication phase.
Pages with high gain earn:
- AI citations in generative responses
- Brand mentions in AI narratives
- Zero-click visibility despite no click
Related terms
- Topical authority — thematic authority
- Topical mapping — content cluster mapping
- E-E-A-T — expertise signals
- GEO — AI optimization
- Helpful content — helpful content
- SERP — search engine results page