Scoping the Prompts
Choosing the questions a site lives or dies on, before you measure anything.
The answers AI gives shift depending on exactly what you ask. A business can be invisible for one question and dominate a close cousin of it. So the questions you pick decide whether everything after this is right. Get them wrong and you get a confident answer to the wrong question. This is where most of the care goes.
Questions come in clusters
We track clusters of phrasings rather than single prompts. A cluster is one thing a buyer wants to know, written out a few different ways, and scored together. People ask the same thing a dozen ways, and a model answers each one a little differently, so a cluster holds up better than any single wording. A typical site has somewhere around eight to fifteen of them, which leaves you with a handful of readable numbers instead of sixty noisy ones.
Two kinds of question, because they tell you different things
When we ran the two kinds side by side, they behaved completely differently. Ask a buying question like "best AI visibility tracker" and the model names a specific product in nearly every run, so those questions tell you your recommendation rate. Ask an informational question like "how do I get my site cited in ChatGPT" and the model usually answers without naming anyone, but it cites sources in almost every run, so those tell you whether your content is getting picked up. A good set has both kinds in it. A set that is all buying questions never shows you your citation problem, and a set that is all informational never shows you whether you get recommended.
Three levels, looked at separately
We look at three altitudes on their own, because a business can win one and lose another.
- The broad category question: "best AI visibility tracker."
- The specific service or feature: "best tool for tracking AI Overview citations."
- The brand itself: "is Visibility Kit any good," "tell me about Visibility Kit."
The split between the broad question and the specific ones is also how a hidden hole shows up. A business can lead the broad question and be completely missing on one specific service that a competitor owns. We never assume that hole is there. We check each service line on its own and let the data show whether it is.
One rule across all three: keep your own brand name out of the buying-question set. Naming yourself in the question inflates the score. Brand questions stay at the brand level.
Where the questions come from
There is no public dataset of what people actually ask AI tools. The "AI search volume" numbers you can buy turn out to be counts of Google "People Also Ask" questions, not real AI usage. We tested one of those fields against twenty of our own mined questions and it came back zero for all twenty. So we never make questions up, and we never treat a proxy as if it were real AI behavior. We pull candidates from four real sources and lean hardest on the ones we can corroborate.
- Ask the AI models directly. We ask several frontier models, the ones people actually use, for the questions real people bring to them around a topic. This is the only source that surfaces intent before it ever lands in a keyword tool, and it is where the most interesting findings come from.
- Real Google questions. We mine Google's People Also Ask, going a level deep, and weight each question by how often it recurs. Real questions, transparent weighting, and the text doubles as a content brief.
- What competitors already answer. The businesses ranking and getting cited for your topics have published, in effect, a validated question list. We read their pages for the questions they answer, which tells us what to track and what to build to outdo them.
- Your own data. Client reviews, the questions customers ask over the phone and by email, and Google Search Console. Search Console is authentic but it only shows what you already appear for, so it extends your footprint rather than finding gaps. Quora and Reddit help too, since the engines lean on them.
Why we don't just ask you for your keywords
It is tempting to start by handing over the keyword list you already track, and we tested exactly that. On a real business we ran three starting points side by side: the kind of list a person would type from memory, an automated read of the website, and the site's own search data. Each one found real questions the other two missed, and each had a blind spot you could predict in hindsight.
The typed list reproduced the business's existing view of itself. It nailed the geography they think about every day, and it completely missed two services their own homepage advertises, because nobody had ever tracked them. The website read surfaced those services but compressed the service area. And the search data only knew about places the site already ranks, which is the one thing it can ever know. A business that has never ranked in a city generates no data there, no matter how many customers it wants from it.
The lesson we took: your own picture of your business is one source, never the source. The set has to be assembled from the outside in, with your corrections on top, rather than from your head outward.
How we tell real demand from invention
Asking the models directly is powerful, but a model can volunteer a question nobody actually asks. So we do not take any single source on faith. We keep a question when more than one independent source agrees on it: the AI models, real Google demand (a matching People Also Ask question or genuine search volume), and competitor content. Anything two or more of those back becomes core. Anything only one source raised is tracked at lower priority or held for testing.
The interesting part is the questions that several models independently raise but Google has no footprint for at all. Those are not noise. They are demand that already exists on the AI surface before it shows up in any keyword tool. A brand-new program or regulation is the clearest case: the broad "does it cover this" question already has search volume, but the specific local version has none yet, while every model fields it. That is real intent your competitors' keyword tools cannot see, and it is the part we would build content for first.
Staying close to the money
There is a failure mode in question sourcing worth naming: the sources above will happily produce thousands of questions, and most of them are content-marketing material rather than visibility material. "What's the best material for a porch in hot weather" is a real question with real demand, but the person asking it is reading an article, not choosing a builder. Cataloging every question like that is a legitimate project. It is a content project, not this one.
So every candidate passes a commercial-proximity test before it gets tracked. The buying questions are built deliberately as a grid of your services against your markets, in the plain forms buyers actually use, because how often you come up for those is the entire point. The informational side is capped to the questions asked by someone actively scoping a project: what it costs, how financing works, how to vet a provider, whether their situation is buildable. Everything else still gets kept, just somewhere more useful: it becomes the input list for content planning, where those thousands of questions belong.
The reason the cap has to be deliberate is supply bias. Models, People Also Ask, and competitor FAQ pages all over-produce informational questions, so an uncurated set drifts informational no matter what the business actually needs measured. Left alone, you end up watching citation rates on porch-material articles while nobody is checking whether the AI recommends you in the town next door.
Locking a core set
The set is not frozen forever, and it is not rebuilt from scratch each time. We lock a core set so the numbers stay comparable over time, and spend a small budget each cycle trying new candidates: questions the models start raising, fresh competitor pages, new Search Console queries. The ones that prove out get promoted into the core, and the dead ones get dropped. We refresh the sourcing monthly.
This is the part of the method that loops. The same sources we use to find questions come back later as things we measure, and what we learn from measuring sharpens the next round.
What it feeds
The locked clusters are what the Measure step runs. Working out a client's set costs only cents, so we run it at onboarding and refresh it cheaply.