AI Content + Google Penalties: 2026 Operator Dossier

01The 60-second TL;DR

Google does not have an AI content classifier. Confirmed by the May 2024 Content Warehouse leak (2,596 modules, 14,014 attributes) and the DOJ antitrust trial (sworn testimony from Pandu Nayak and HJ Kim). There is no isAIContent attribute. There is no model that reads your prose and decides "this was written by GPT, demote."

Google penalizes the patterns AI content tends to produce. Low contentEffort, broken NavBoost ratios (high impressions + zero lastLongestClicks), zero Chrome direct-navigation signal, no entity-authored bylines, no siteAuthority, missing brand search volume, smallPersonalSite + babyPandaDemotion stacking. AI content fails on every one of these simultaneously, which is why it looks like an AI penalty even though it's a behavioral one.

The recovery path is brand and entity, not "humanizing" the prose. HouseFresh recovered from HCU by building YouTube + PR brand signals. Multiple sites recovered in June 2025 without changing a single article. The signal that actually moves: people searching your brand name, navigating to you directly, and ending their search session on your page.

The January 2025 Quality Rater Guidelines update closes the gap. Raters are now explicitly trained to flag AI-generated low-effort content as "Lowest" quality, and those ratings train classifiers via the goldmineRanking pipeline. The "no AI classifier" finding from March 2024 has a shelf life. By 2026, behavioral signals + rater-trained classifiers converge.

The question is not "will Google detect the AI?" The question is "would a topical expert with 10 years of first-hand experience read this page and say: this person actually knows what they're talking about and showed me something I didn't already know?" If no, don't publish. That's the HCU test in one sentence.

02The truth Google never wanted public

Five things Google publicly denied for years. The May 2024 leak + DOJ trial confirmed all five.

Google's Public Claim	What the Evidence Shows
Lie "We don't have anything like a website authority score." (John Mueller)	`siteAuthority` exists. Calculated from `siteFocusScore`, `siteRadius`, `siteEmbedding`, PageRank. Stored in `CompressedQualitySignals`. Feeds Q* directly.
Lie Clicks are "too noisy" for ranking. (Gary Illyes, 2016)	NavBoost has used clicks since 2005. Confirmed by Pandu Nayak under oath as "one of the most important ranking signals." 13-month rolling window. Tracks `goodClicks`, `badClicks`, `lastLongestClicks`.
Lie Chrome browser data does not influence rankings.	`chromeInTotal`, `chrome_trans_clicks`, `uniqueChromeViews` all confirmed in leak. DOJ trial exhibit references "popularity signal that uses Chrome data." Engineer HJ Kim warned internally: "If competitors see the logs, they have a notion of authority for a given site."
Lie No special treatment for new domains (no sandbox).	`hostAge` attribute confirmed, used "to sandbox fresh spam in serving time." Domain registration + expiration tracked per-document via `RegistrationInfo`.
Lie Modern ranking is sophisticated autonomous AI.	HJ Kim under oath: "The vast majority of signals are hand-crafted." Topicality = "ABC" signals (Anchors, Body, Clicks). Q* is "largely static and related to the site rather than the query."

The signals that actually run the show

NavBoost (the click engine)

goodClicks = stayed on page meaningfully
badClicks = pogo-sticked back to SERP
lastLongestClicks = the click that ended the session (the most powerful signal)
unicornClicks = clicks from authenticated Google/Chrome users
voterTokenCount = distinct users (anti-manipulation)
13-month rolling aggregation. Bad signals persist over a year.
Segmented by country, language, metro, device, browser tier.

siteAuthority (the brand engine)

siteFocusScore = topical concentration (high is good)
siteRadius = how far pages drift from core theme (high is bad)
siteEmbedding / pageEmbedding = vector match between page and site theme
PageRank inputs from trusted seed sites
authorityPromotion = direct ranking boost when high
Persistent. Site-wide. Not per-query.

contentEffort (the HCU engine)

"LLM-based effort estimation for article pages"
NOT an AI detector. Measures effort + originality.
Multimedia integration (unique images, video, tools)
Original data, primary research
Expert quotes, named credentials
Site-wide cascade if majority of pages score low

Twiddlers (re-ranking functions)

NavBoost, FreshnessTwiddler, QualityBoost = boosts
babyPandaDemotion, babyPandaV2Demotion = HCU-flavor site demotions
navDemotion = poor UX/navigation
serpDemotion = pogo-stick triggers
clutterScore = ad-density + intrusive resources
smallPersonalSite + babyPanda can stack on the same domain

03What actually gets penalized

Not "AI content." These 10 patterns. AI content tends to produce all 10 simultaneously, which is why it looks like an AI penalty. It isn't.

Programmatic / templated content at scale with thin per-page differentiation (the 50,000-hotel-pages-only-city-changes pattern; 98% deindex rate)
Zero human editorial review on AI output (raw GPT prose published as is)
High DA + low Brand Authority mismatch (Tom Capper's finding: HCU losers averaged BA 37, winners + neutral averaged 50-52)
Intrusive ads disrupting UX (auto-play video, fixed footer, multiple interstitials, mandatory video-before-content). Accumulates negative NavBoost over the 13-month window. December 2025 update specifically punished this.
Lack of identifiable authorship (anonymous bylines, "Staff Writer," AI-generated author personas with fake bios. Sports Illustrated case.)
Artificial freshening (updating bylineDate without updating content. The three-layer date system, bylineDate + syntacticDate + semanticDate, catches it.)
Stock images and scraped social images (Cyrus Shepard correlation: stock has "surprisingly strong negative impact")
Excessive affiliate-link density with no genuine product testing (CNN Underscored, Forbes Advisor, WSJ Buy Side commerce sections wiped November 2024 under Site Reputation Abuse)
Thin YMYL content without verifiable expert authorship (medical, financial, legal, increasingly relationship + parenting + nutrition)
Competitive differentiation failure (accurate, comprehensive, but adds nothing 10 other pages don't already cover. Zero information gain.)

The asymmetry that nobody can explain

NextDoor published ~300,000 AI-generated pages and gained ~200,000 monthly organic visitors. Chegg generated 2.2 million AI solutions. Reddit and Forbes rank thin AI content with impunity. Smaller operators running identical patterns get hit. Spencer Haws (Niche Pursuits) flagged it explicitly: "there might be algorithmic protections that differ by domain authority." Charles Floate calls it "structural arbitrage." Glen Allsopp calls it Goliath SEO. The mechanism is almost certainly siteAuthority + brand NavBoost. Big platforms have so much incumbent signal that low-quality patterns don't trigger demotion thresholds.

04What survives (the HouseFresh story)

The biggest finding in the entire research: most HCU recovery sites recovered without changing their content.

HouseFresh got hit September 2023 HCU. They lost the bulk of their traffic. They built YouTube content and pursued high-profile collaborations. They did almost nothing to the existing articles. August 2024 core update: they exceeded their pre-HCU peak. Tom Capper's interpretation: their Brand Authority score caught up to their Domain Authority. The DA:BA mismatch resolved. The penalty lifted. Source: Glenn Gabe (G-Squared) tracked 400+ HCU sites. Lily Ray confirmed pattern at Amsive.

The patterns sites that survived (or recovered) share:

Genuine first-hand experience. NapLab (mattress affiliate, 6,200 → 132,000 monthly visitors) tested every product. Original photography. Methodology transparency. Quantitative scoring systems.
Identifiable credentialed authors with LinkedIn links + KG entity association.
Real brand presence. Organic search volume for the brand name. Reddit/Quora/LinkedIn citations. YouTube footprint.
Low ad density + clean UX. Positive NavBoost over the 13-month window.
Moderate publishing velocity. Not 50 articles/day from a content factory.
Original data or proprietary research (the only thing AI cannot generate from existing web content; the only path to positive information gain).
Direct navigation drivers (newsletter, app, community, free tools). Builds Chrome signal that exists outside Google's search funnel.
Off-site community engagement (real Reddit threads, forum discussions, mentioned in industry publications).

62%

More facts in AI-Overview-cited articles vs uncited (Surfer 36M study)

6 vs 3.6

Citations for fresh content (last 3 mo) vs stale

5.1 vs 3.2

Citations for 2,900+ word articles vs under 800

80%

Of AI citations point to pages NOT in Google's organic top 10 (Ahrefs via Mike King)

05The high-ranking page framework

Page-level rules. The exact "how many external links, what's the word count" answer.

Word count

Informational pillar: 1,800 to 3,500 words.
Commercial / "best of" listicle: 2,500 to 5,000 words.
Product / category page: 600 to 1,200 words of unique copy plus product grid.
Blog supporting article: 1,200 to 2,000 words.
Hard rule: length follows topic depth, not the other way around. Top-3 SERP average is the floor, not the target. Padding is a penalty risk (Surfer found cited articles average 2,900+ words but the cause is depth, not length).

Headings (H1 through H4)

One H1. Contains primary keyword. Maximum 60 characters.
Five to ten H2s. Each maps to a real subtopic or "People Also Ask" query. Each H2 functions as an independently-retrievable passage for AI search.
H3s nest under H2s for sub-questions.
H4 only when genuinely needed.
Headings should read like a table of contents that answers the search intent end to end.

Internal links

8 to 15 internal links per 2,000 words (sweet spot: roughly 1 per 200 words).
Link to: the pillar page, sibling cluster pages, deeper supporting content.
Anchor text: descriptive and varied. Never "click here." Never the same anchor twice on one page.
Every new page should be linked FROM at least 3 existing pages within 7 days. Orphan pages die. Bi-directional linking increases AI citation probability by 2.7x (Yext 2025).

External links

2 to 5 external links per article to high-authority sources (.gov, .edu, original studies, Wikipedia for entity grounding, brand homepages).
All target="_blank" rel="noopener". Don't nofollow legitimate citations. Outbound trust signals matter.
Cite sources inline with the claim, not in a generic footer.
Never link to direct competitors on commercial pages.

Images

1 hero image above the fold (original beats stock 10x).
1 image per 300 to 500 words thereafter.
Original photos, screenshots, custom diagrams beat stock.
Every image: descriptive alt, descriptive filename, <figcaption> when adding context, lazy-load below the fold, WebP/AVIF, under 200 KB.
Add at least 1 original chart or comparison table. Gets cited in AI Overviews.

Schema markup (mandatory)

Article or BlogPosting with author, datePublished, dateModified, publisher.
Person schema on author with sameAs linking to LinkedIn / X (entity verification for Google Knowledge Graph).
BreadcrumbList.
FAQPage if you have a real FAQ section. Highest AI-citation probability of any schema type.
HowTo if step-based.
Organization site-wide.
Review / AggregateRating only if real.

E-E-A-T signals (heaviest weight in 2026)

Author bio block under H1: photo, credentials, 2-3 sentence bio, links to LinkedIn + X + author archive.
"Reviewed by [credentialed expert]" line on YMYL topics.
"Last updated [date]" visible. Refresh content quarterly with real updates (not just date stamps).
First-person experience injection: at least one paragraph of "I tested / I built / I measured" with a real artifact (screenshot, photo, data).
Original data or research: even one stat you generated yourself moves the needle.

On-page mechanics

Title tag: primary keyword in first 60 characters. Brand at end. Format: Primary Keyword | Subtitle: Brand.
Meta description: 150-160 characters. Primary + secondary keyword + CTA.
URL slug: 3-5 words, hyphenated, primary keyword, no dates, no stop words. /best-voice-ai-agents/ not /2026/05/01/the-10-best-voice-ai-agents/.
Primary keyword: in H1, first 100 words, one H2, one image alt, URL, meta. Density ~0.5 to 1.5%. Never stuff.
Semantic / entity coverage: use Surfer / Frase / Clearscope to ensure the page covers the full entity set the top 10 SERP covers. Biggest lever for ranking.
TOC for any page over 1,500 words. Anchor links boost AI Overview citation.

Engagement (NavBoost loves these)

Hook in first 50 words that satisfies the query directly.
Bold key sentences for skimmers.
Bullet lists every 300-500 words.
One embedded video or interactive widget (drives dwell time).
"Jump to" TOC and back-to-top button.
Comments enabled (engagement signal).
Related posts grid at bottom (drives session depth).

Technical layer

Core Web Vitals all green: LCP under 2.5s, INP under 200ms, CLS under 0.1.
Mobile-first: every element 16px font minimum, tap targets 48px minimum.
HTTPS, valid SSL, HTTP/2.
canonical tag on every page.
hreflang if multi-language.
Clean robots.txt, sitemap.xml submitted to Google Search Console.
No render-blocking JS for above-the-fold content.

What to NEVER ship

Native <select> dropdowns on customer-facing pages.
Stock photo as the only image.
"In conclusion" / "In summary" / "It's not just X, it's Y" cadence (GPT slop).
Author = "Admin" or no author at all.
Dates with no actual update to the body when you bump them.
Affiliate links above the fold without disclosure.
Programmatic templates with thin / repeated / spun content.
More than 3 ads above the fold.

06The 8-layer hybrid pipeline

The production SOP. Hybrid content ranks 34% higher on average than unedited AI content (2025 SEO analysis). Pure AI hits Google top 10 in 28% of cases but only 3% reach the top 3.

AI draft. GPT-5.4 / Claude Opus 4.7 / Gemini 3.1 Pro. Persona prompt with credentials. Target keyword cluster. Brief with desired headings. Competing URLs in context. Fact-check list of claims to verify. Output: 800-2,000 word raw draft. Do NOT publish.
SME review. Subject-matter expert reads for factual accuracy, not prose quality. Upwork/Contra freelancer at $25-75/article, or in-house SME for core topics. Non-negotiable for YMYL.
First-hand experience injection. 2-4 spots per article: "When we tested X, we found Y specific result with numbers." Original screenshots. Original photography. Replace generic advice with specific metrics. 3-5 injected experience signals per article.
Fact-check layer. Verify every statistic against primary source. Flag any claim older than 18 months as stale. Inline citations with hyperlinks. Use Perplexity for verification, Google Scholar for academic, .gov/.edu for regulatory.
Original asset addition. At least one original asset per article: chart with real data, infographic, screenshot, video embed. The AI-content-cannot-replicate moat.
Humanizer pass (optional). Only if you have compliance contracts requiring AI-undetectable output. For pure SEO publishing: skip. Spend the budget on more SME review time.
E-E-A-T markup. Structured author bio. "Reviewed by" attribution. "Last Updated" date. Article + Person + Organization + FAQPage schema.
Technical polish. Compress + alt-tag images. TOC for 1,500+ word articles. 3-5 internal links. 2-3 external citations. Page loads under 2.5s LCP.

The single biggest lever in this pipeline: Layer 3

First-hand experience injection. Specific metrics, original screenshots, personal test results. Things AI cannot fabricate. Skipping Layer 3 turns the other 7 layers into expensive lipstick on commodity content. Doing Layer 3 well makes the rest of the pipeline optional.

07Optimizing for AI search (GEO / AEO / LLMO)

Where the puck is going. Mike King (iPullRank) calls this Relevance Engineering. The core architectural shift: Google's AI Mode decomposes one user query into 6-12+ synthetic sub-queries (per Google patent US20240289407A1). Content that only answers the head term gets cited once. Content that covers the full sub-query fan-out gets cited 6-10 times per response.

Citation patterns by platform

Platform	Citation Preference	Format Priority	Avg Citations / Answer
Google AI Overviews	85.79% from organic top 10	FAQPage schema, answer-first format	varies; appears in ~15% of queries
ChatGPT Search	Wikipedia (7.8%), encyclopedic depth	Authority + factual depth	~7.92
Perplexity	Reddit (6.6%), YouTube, recency	Lead with answer, specific data	~21.87 (most slots)

Universal citation patterns (Surfer 36M study)

YouTube ~23.3%, Wikipedia ~18.4%, Google.com ~16.8% dominate every vertical.
Video is the single most-cited content format across every vertical. An article + companion YouTube video doubles citation surface area.
UGC sites get nearly 2x more citations than brand-owned content.
For YMYL: verified authorship + primary research citations + visible credentials are decisive.

Content structure that gets extracted

H2: [Question format: "How does X work?"]
[Direct 40-60 word answer. Self-contained. No context required.]

[Supporting explanation: 150-300 words with evidence.]

[Bullet points or numbered list for scannable structure.]

[Inline citation: "According to [Source], [specific stat]."]

Information gain (Google patent US11200288B2)

The patent describes assigning each page an "information gain score" measuring new information above and beyond what the user has already encountered in the current session. The 8th article a user reads about "content marketing" scores near zero. An article with a unique data point, unusual framing, or novel entity relationship scores high. The patent was originally written for "automated assistants and chatbots." The scoring logic is embedded in how AI Overviews selects sources.

The AI content trap: AI-generated content that synthesizes existing web content scores zero information gain by definition. It is a recombination of already-indexed facts. The only path to positive information gain is original data, first-person experience, expert interviews, or genuinely novel framing.

08AI humanizer benchmark

The honest answer to "should we run our content through Undetectable.ai before publishing?"

Rank	Tool	Avg Bypass	Price/mo	Notes
1	HumanizerAI	80.4%	$14.99	Best across 5 detectors
2	Undetectable.ai	73.4%	$9.99	Best value; structural transformation
3	WriteHuman	68.0%	$12.00	Solid on GPTZero
4	StealthGPT	66.2%	$14.99	Overpriced
5	Humbot	62.8%	$14.99	Falls short on Originality.ai
10	QuillBot	47.4%	$9.95	Grammar tool, not a humanizer
11	BypassGPT	32.8%	$7.99	Worst performer

The bottom line on humanizers and Google rankings

No direct correlation. Humanizers help with third-party AI detectors. They do not meaningfully affect Google rankings. Google does not run Originality.ai on your pages. Running content through Undetectable.ai before publishing does not change whether the content satisfies user intent, demonstrates expertise, or provides original analysis. The 1,640-word humanized article still scores zero on information gain if there's no original input.

Where humanizers have indirect value: smoothing out repetitive sentence patterns and "It's important to note" / "In conclusion" GPT cadence that human editors would also catch. That structural improvement is achievable through a strong human edit without the $14.99/month subscription.

09The operator camps

The community is in its most fractured state ever. Eight operators, four camps, no consensus.

Matt Diggity Hybrid 70/30

Position: Pragmatic pro-AI with mandatory human overlay. 70% AI draft + 30% human expertise (original data, case studies, firsthand experience). Closed Affiliate Lab in 2025 saying "I don't currently know the affordable path to ranking a content website" while simultaneously claiming AI content can rank. Caps publish velocity at 3-5 articles/day to avoid Google's velocity detection. Topical mapping over keyword targeting. Entity optimization throughout.

What he warns: "Don't write and publish raw AI. You'll get nuked." High percentage of zero-traffic pages = red flag. Don't build links to garbage AI content.

Charles Floate Full AI (with engineering)

Position: Most candid voice on the white-hat-vs-platform asymmetry. Reddit and Forbes rank thin AI content; small operators get destroyed for the same patterns. Calls it structural arbitrage. Full AI content is a pipeline engineering problem, not a word-count-editing problem. Sequential multi-stage prompting, vector database integration, new-domain isolation for experiments. Average output: 2,850 words at POP score 65/100. Pivoted heavily to parasite SEO and CPA lead gen.

What he warns: HCU is unrecoverable on penalized domains. The only confirmed recovery vector is migrating to a new domain (lossy because of link equity loss).

Authority Hacker (Gael Breton + Mark Webster) Exited the space

Position: Most dramatic public pivot of any operator. Discontinued The Authority Site System (TASS) in late 2024 / early 2025. Relaunched as AI Accelerator targeting established businesses, not content site builders. Breton's direct quote: "I'm not going to have AI write the content because I think it's not very good to be honest." Sees AI as an editing/compression tool, not a drafting engine.

What worked post-HCU per Breton's case studies: Visual-heavy "comic book" style content. Original product testing. Short paragraphs (max 4 lines). Quantitative scoring methodology. Mobile-first layout. YouTube + email + Amazon Influencer Program layered alongside SEO. NapLab grew from 6,200 to 132,000 monthly visitors using this template.

Kyle Roof (Page Optimizer Pro) Hybrid mandatory

Position: Data-driven on-page testing lens. Quantified the gap: raw LLM output peaks at POP scores in the mid-60s when 80+ is needed. Only 3 of tested models hit 1,000-word targets. Readability stays at college level when 7th-grade is target. Contextual term coverage averages ~53 when 200 are needed. Calls AI a "Mechanical Turk" requiring human expertise to meet minimum SEO requirements.

Quote: "It's not about the fact that the content is AI, but whether the content is adding to the conversation versus simply regurgitating what already exists."

Glen Allsopp (Detailed.com) Structural pessimist

Position: Most data-driven structural analyst. 80% of top Google results come from "Digital Goliath" brands across 100M+ monthly searches analyzed. Even Hearst / Condé Nast / Future portfolio sites: only 18% showed YoY traffic increases in 2025. Independent operator disadvantage is structural, not tactical. AI Overviews now appear in 13.14% of US desktop queries (March 2025) and reduce organic CTR by 19.98 to 34.5%. AIO-cited results get 3.2x more clicks than non-cited results on the same page.

What's working: SaaS with product-led SEO (Chatbase: 68% organic growth in 5 months). Specialized technical verticals. Startups with genuine product differentiation: 54.7% of 670 tracked startups gained traffic YoY.

Niche Pursuits (Spencer Haws + Jared Bauman) Cautiously pro-AI for new sites

Position: Ran a live AI content challenge. The winner (Edward) hit 23,000 monthly organic visitors using programmatic SEO with ~7 million AI-generated articles. Spencer's read: Google is "quite friendly to AI content, contrary to public guidance." HCU-hit recovery is near-impossible. 129 of 130 tracked sites in one Glenn Gabe analysis either continued losing or barely recovered.

The contradiction they flagged: NextDoor + 300,000 AI pages = +200,000 monthly visitors. Chegg + 2.2 million AI solutions = no penalty. Identical patterns on small sites get nuked.

Income School (Ricky Kesler + Jim Harmer) Hybrid, AI-friendly

Position: Quietest of the operator camp. Their training has always emphasized helpful-first, niche-specific, personal-experience-first content. That framework happened to align with what HCU rewards. Ricky and Jim "regret not diving into AI sooner" but frame it as efficiency, not quality substitute.

Eli Schwartz Product-led, anti-AI-drafting

Position: Most vocal advocate against using AI to create content. Argues AI should be used to understand users + identify opportunities, not produce the content itself. Sites built purely for search traffic (SEO-first, not product-first) are inherently vulnerable because they have no floor when Google's algorithm changes. The risk isn't Google detection; it's that AI-generated content fails to build genuine products users return to.

The cleanest convergence across all 8 operators

HCU recovery is effectively impossible on penalized domains (zero or near-zero confirmed full recoveries on tracked sites; only domain migration works partially).
Raw AI output fails on its own. Every operator who tested unedited AI at scale reports collapse.
Page-level value, not content-level value, is the new standard.
Brand and entity signals matter at least as much as content quality.
Big platforms operate under different algorithmic rules than small operators.
Traffic diversification is table stakes. Not a single operator recommends pure Google SEO dependency in 2025.
Content site M&A multiples collapsed from 36x to 28x. Building-to-sell is broken.

10The 30-day recovery playbook

For a site hit by HCU, scaled content abuse, or general quality demotion. Recovery timeline: 3-6 months for algorithmic penalties (must wait for next core update). 67 days average for manual actions with reconsideration request.

Days 1-3: Audit and triage

Export Google Search Console: all pages, clicks, impressions, average position, last 12 months.
Export GA4: sessions, engagement rate, conversion rate per page.
Build a spreadsheet. Add a Decision column: Delete / Improve / Keep.
Segment thresholds: Delete = under 10 impressions AND 0 clicks in 12 months. Improve = impressions exist but CTR under 2% OR position 11-25. Keep = consistent traffic + 2+ minute average session.
Identify your 10 highest-traffic pages pre-penalty. These are your recovery anchors.

Days 4-7: Content delete pass

Execute all Delete decisions: 301 redirect to most relevant category page, or noindex if not ready to remove.
Goal: reduce crawl budget waste, signal to Google that the site has stopped producing thin content.
Do NOT delete pages with backlinks. 301 redirect those to preserve link equity.

Days 8-14: Author + E-E-A-T overhaul

Replace "Staff Writer" and byline-less articles with real named authors.
Write full author bio pages (200+ words, credentials, LinkedIn link, professional photo).
Add "Last Updated" dates to all improved articles.
Add "Reviewed By [Credentialed Expert]" to YMYL content.
Implement Person schema for all authors with sameAs linking to LinkedIn and X.
Implement Organization schema for the site.

Days 15-21: Top 10 content upgrade

For each of your 10 recovery anchor pages:

Add original data/research or first-person experience section.
Add original screenshot or photograph.
Add FAQPage section (5-7 questions) at the bottom + matching FAQPage schema.
Add Article + Author + FAQPage schema.
Update any statistics older than 18 months.
Add 3-5 internal links to related cluster content.
Improve title tag: add year, add specific number/stat, make it answer-shaped.
Improve meta description: add first-hand experience signal ("We tested 12 tools...").

Days 22-28: Technical + brand layer

Audit Core Web Vitals. Fix pages with LCP above 4 seconds.
Build or upgrade topical clusters around core content.
Set up email newsletter (even if 0 subscribers; the form + cadence is an entity signal).
Publish 1 piece of genuinely original research (a 20-50 respondent survey on Pollfish at $50-200 is enough).
Submit updated sitemap to Google Search Console.

Days 29-30: Reconsideration + monitoring

If manual action: submit a reconsideration request via GSC. Document every change. Include links to improved content. Explain what was deleted and why.
If algorithmic: no reconsideration. Wait for next core update.
Set up weekly GSC monitoring. Impressions trending up = recovery in progress.

Timeline	Expected Signal
30 days	Crawl frequency increases, minor position improvements
60-90 days	Featured snippets start returning, impressions increase
90-120 days	Primary keywords show meaningful position improvement
4-6 months	Traffic approaches pre-penalty levels (algorithmic penalties only)

Pruning case studies (verified)

CNET (2024): Deleted hundreds of thousands of pages. +29% search traffic.
Anonymous brand (Dec 2025): Removed 600,000+ pages. +30% clicks + impressions, sustained 4 months later.
Home Science Tools (Inflow): Pruned/improved blog. +64% strategic content revenue.
192-post deletion case: 20% of content removed. Traffic increased.

11Tool stack

Category	Tool	Purpose	Cost
AI drafting	Claude Opus 4.7, GPT-5.4, Gemini 3.1 Pro	Initial drafts	API costs
SEO content scoring	Surfer SEO Content Editor	Entity coverage, content scoring against top 10	$49-$99/mo
Topical authority	MarketMuse / Clearscope / Frase	Strategy + brief generation	$49-$249/mo
Entity SEO	InLinks / WordLift / Waikay	Entity extraction, internal KG, AI-brand fingerprinting	varies
AI search tracking	Surfer AI Tracker / Otterly.ai / Profound	Monitor AI citation changes across ChatGPT, Perplexity, AI Overviews	$49+/mo
AI detection (audit)	Originality.ai	99% accurate AI detection (Journal of AI 2025)	$15+/mo
Humanizer (compliance only)	Undetectable.ai	Detector bypass for client deliverables	$9.99/mo
Survey/research	Pollfish	Original data for differentiation	$50-200/survey
Newsletter	Beehiiv / ConvertKit	Direct navigation moat	Free-$29/mo
Audit / pruning	Google Search Console + GA4	Pruning decisions	Free
Schema validation	Google Rich Results Test	Schema generation/validation	Free

12Bottom line for our properties

Distilled from ~25,000 words of research into the actions that move the needle.

The 5 things that matter most (in order)

Build brand search volume. The HouseFresh recovery + Tom Capper's BA finding + Pandu Nayak's NavBoost testimony all point to the same thing: a site that generates branded queries and direct navigation has resilience that no on-page tactic can replicate. Newsletter, YouTube, free tools, podcast, Reddit presence, press placements.
Inject first-hand experience into every page. Original screenshots, original photos, original test results, specific metrics. The only path to positive information gain. The thing AI structurally cannot fake.
Make every author a verifiable entity. Real name. Credentials. LinkedIn link. Person schema with sameAs. Author archive page. Bylines on external publications when possible. Knowledge Graph entity association is what separates "trusted source" from "anonymous content farm."
Prune ruthlessly. CNET +29% from deleting hundreds of thousands of pages. Sites with high percentages of zero-traffic pages get site-wide demotion. Delete pages with under 10 impressions and 0 clicks in 12 months.
Build for query fan-out, not single keywords. A pillar page with 5-10 cluster pages that collectively cover 8-12 sub-queries gets cited 3.2x more than a single-page competitor. Topical maps before content production.

The 5 things to stop doing immediately

Publishing raw AI output without SME review + first-hand experience injection.
Anonymous bylines or "Staff Writer" attributions.
Programmatic templates with under 30-40% unique content per page.
"Updating" articles by changing the byline date without changing the content (the three-layer date system catches this).
Pure Google SEO dependence with zero direct-nav drivers (newsletter, YouTube, community).

The next play

If a specific property is suspected of being penalized, the audit path is: (1) export GSC + GA4 last 12 months, (2) calculate DA-to-Brand-Authority ratio (Moz BA proxy), (3) audit top 10 pages against the page-level framework in section 5, (4) audit site-wide against the 10 patterns in section 3, (5) decide between recovery playbook (section 10) and domain migration (Floate's path). Point a URL and the audit happens. The framework is the same regardless of which property: comiai.co, callsetter.ai, moldscanner.ai, or anything else.