Engineering

How We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations

2026-03-23 Updated 2026-05-17 8 min read

A genetic algorithm for SEO keyword research replaces manual brainstorming with automated evolution: you start with a population of candidate keywords, score them against real Google Trends data, kill the losers, and use an LLM to mutate the survivors into new variants you'd never think of yourself. We built this system at Iron Mind AI and use it to discover high-momentum keywords in minutes instead of days.

The idea comes from quantitative finance. On 32-bit forex backtesting systems, you couldn't brute-force millions of parameter combinations. You had to evolve toward the best solutions. Keyword research has the same constraint: the search space is effectively infinite, manual exploration is slow, and most tools just hand you a static spreadsheet. A genetic algorithm treats keywords like trading strategies and Google Trends like historical price data.

Why traditional keyword research tools fall short

Google Ads Keyword Planner reports search volume with a 1-3 month data lag. If a keyword started trending last week, Keyword Planner won't show it for months. By the time you see it, your competitors already rank for it.

Most keyword tools also give you a flat list. They tell you what people searched for historically, but not what's accelerating right now. A keyword with 500 monthly searches and rising 40% week-over-week is far more valuable than one with 2,000 monthly searches and declining. Static volume doesn't capture momentum.

The genetic algorithm approach solves both problems. It uses Google Trends for real-time signal and calculates velocity (week-over-week change) to surface keywords that are rising right now, not keywords that were popular three months ago.

How the two parallel evolution streams work

The system runs two independent keyword populations simultaneously. One stream targets "self-solver" founders — people researching how to build products themselves. The other targets "hire-intent" founders — people ready to pay an agency. These audiences search differently, so they need separate evolutionary pressure.

Each stream maintains its own population of ~15 keywords, its own graveyard of killed terms (O(1) lookup prevents dead keywords from resurrecting), and its own fitness rankings. Both streams run in parallel threads and update a shared terminal dashboard in real time.

How anchor-based normalization makes Google Trends scores comparable

Google Trends only allows 5 keywords per API request, and the scores it returns are relative within each batch. A keyword scoring 80 in one batch might score 30 in another batch, depending on what it's compared against. This makes raw scores useless for ranking keywords across different batches.

The fix is anchor-based normalization. Every batch includes the same anchor keyword. After scores come back, we divide each keyword's score by the anchor's score in that batch. Now all keywords are expressed relative to the same baseline and are directly comparable across any number of batches.

This is a real engineering problem that most people building on top of Google Trends don't even realize exists. Without normalization, your fitness function is comparing apples to oranges and your evolution will drift randomly instead of converging on the best keywords.

How the fitness function scores keywords

Each keyword gets a fitness score calculated as:

fitness = level * (1 + velocity_pct / 100)

The level is the anchor-normalized Google Trends score using a 30-day window, averaged over the last 7 days. The velocity_pct is the week-over-week percentage change in that score. A keyword with a level of 50 and a velocity of +20% gets a fitness of 60. A keyword with the same level but -20% velocity gets a fitness of 40.

One important guard rail: keywords below level 5 don't receive velocity boosts. 250% of nothing is still nothing. This prevents near-zero keywords from looking artificially promising just because they ticked up from 1 to 3.

How LLM mutations stay grounded in real search data

The biggest risk with using an LLM for keyword generation is hallucination. GPT will happily suggest keywords that sound plausible but that nobody actually searches for. We solved this by grounding every mutation in Google's own data.

After each generation, we fetch Google's "related queries" for the top surviving keywords. These are real search terms that real people actually typed. We feed those related queries to the LLM as seeds, and the LLM's job is to recombine and rephrase them — not to invent from scratch.

The related queries are cached to disk because Google's suggestions don't change rapidly. This avoids redundant API calls and keeps costs low across multiple runs.

Why we ended up on DataForSEO after trying five other approaches

Getting reliable Google Trends data programmatically turned out to be the hardest part of the build. Here's the path we took:

pytrends (free Google Trends library) — blocked by Google on all datacenter IPs. Works fine locally, dies in production.

UK datacenter proxies — HTTP 429 on every request. Google fingerprints datacenter IP ranges aggressively.

Israeli residential proxies — marketed as residential, turned out to be datacenter IPs. Same 429 errors.

SerpWow — had a Google Trends endpoint, but required a subscription tier we didn't need for a single feature.

DataForSEO — Google Trends API that actually works. Pay-as-you-go at $0.009 per call, 5 keywords per call, 2,000 requests per minute rate limit. This is what we run in production.

With 50 concurrent workers hitting DataForSEO in parallel via Python's ThreadPoolExecutor, we score 30 keywords in seconds. All batches fire at once and resolve concurrently.

How velocity scoring evolved from broken to reliable

Our first attempt at measuring keyword momentum used daily OLS (ordinary least squares) regression slope over a 30-day window. The idea was simple: fit a line to the daily scores and use the slope as velocity. It failed because Google Trends data has a strong weekend dip pattern. Almost every keyword showed a negative slope, even ones that were genuinely trending upward week-over-week.

The fix was switching from daily granularity to a week-over-week percentage change calculation. By comparing the average of the last 7 days against the average of the prior 7 days, the weekend noise cancels out and you get a clean directional signal. Rising keywords show positive velocity, declining keywords show negative velocity, and the magnitude tells you how fast.

What 15 iterations of evolution actually produce

A typical run executes 15 generations across both streams, evaluating 100+ unique keywords in about 10 minutes. The terminal dashboard (built with Python's Rich library) shows both streams side by side with live-updating Level, Velocity%, and Fitness columns.

In the first few generations, most keywords are generic and low-scoring. By generation 5-7, the population starts clustering around high-fitness niches. By generation 12-15, the survivors are specific, high-momentum phrases that you would never have brainstormed manually.

The graveyard grows with each generation. Every killed keyword is permanently excluded, which forces the LLM to explore new territory instead of rediscovering the same dead-end terms. This is the same principle as tabu search in optimization theory — memory of bad solutions prevents cycling.

The forex backtesting connection that inspired this approach

In quantitative forex trading on resource-constrained systems, you'd start with a population of trading strategies (parameter sets), score them against historical price data, kill the worst performers, breed the winners through crossover and mutation, and repeat until the population converged on profitable configurations.

SEO keyword research maps onto this framework cleanly. Keywords are the "strategies." Google Trends scores are the "backtest results." The fitness function is the "equity curve." LLM mutation is the "crossover operator." The graveyard is the "strategy blacklist." And convergence means you've found the keywords worth targeting.

The advantage over brute force is the same in both domains: you explore a vast search space intelligently instead of exhaustively. You don't need to evaluate every possible keyword. You just need to evolve toward the best ones.

The complete system at a glance

A genetic algorithm for SEO keyword research automates the discovery of high-momentum search terms by combining real Google Trends data, anchor-based normalization for cross-batch comparability, a fitness function that rewards rising keywords, and LLM mutations grounded in Google's own related queries. The system scores keywords in parallel, tracks velocity as week-over-week change, and permanently excludes dead terms to force exploration. After 15 generations, it surfaces specific, high-fitness keywords that manual research would miss entirely. We built it because static keyword tools don't capture momentum, and momentum is what determines whether you're optimizing for where search traffic is going or where it's already been.

Frequently Asked Questions

Why use a genetic algorithm for keyword research?

Traditional tools rank existing keywords by historical volume. A genetic algorithm explores adjacent keyword space — long-tail and emerging terms — and scores them on velocity and competitive opportunity, not just volume.

How does anchor-based normalisation work?

Google Trends scores are relative to the query set, so direct comparisons across runs are meaningless. We include the same anchor keyword in every query and renormalise — turning relative scores into a stable shared scale.

What makes LLM mutations stay grounded?

Instead of asking the model to invent variations from scratch, we feed it the current generation's top performers plus real search-volume data, then prompt for mutations that respect observed user intent. The LLM proposes; the fitness function disposes.

How is this like forex backtesting?

In both, you have a noisy signal, a search space too large to brute-force, and a tendency to overfit the optimiser. The discipline of holding out test data and scoring on velocity (change) rather than absolute value transfers directly.

To cite this article: Iron Mind AI. (2026). "How We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations". Iron Mind AI Blog. https://iron-mind.ai/blog/genetic-algorithm-seo-keyword-research

Niro Knox

Full-stack engineer and AI systems builder with 30+ years of production experience. Specialises in LLM integrations, automation pipelines, and high-performance web applications.

Ready to Build Something?

Turn what you just read into a production system. We move fast.

Book a call