Scoping the Prompts
Choosing the questions a site lives or dies on, before you measure anything.
The answers AI gives shift depending on exactly what you ask. A business can be invisible for one question and dominate a close cousin of it. So the questions you pick decide whether everything after this is right. Get them wrong and you get a confident answer to the wrong question. This is where most of the care goes.
Questions come in clusters
We track clusters of phrasings rather than single prompts. A cluster is one thing a buyer wants to know, written out a few different ways, and scored together. People ask the same thing a dozen ways, and a model answers each one a little differently, so a cluster holds up better than any single wording. A typical site has somewhere around eight to fifteen of them, which leaves you with a handful of readable numbers instead of sixty noisy ones.
Two kinds of question, because they tell you different things
When we ran the two kinds side by side, they behaved completely differently. Ask a buying question like "best AI visibility tracker" and the model names a specific product in nearly every run, so those questions tell you your recommendation rate. Ask an informational question like "how do I get my site cited in ChatGPT" and the model usually answers without naming anyone, but it cites sources in almost every run, so those tell you whether your content is getting picked up. A good set has both kinds in it. A set that is all buying questions never shows you your citation problem, and a set that is all informational never shows you whether you get recommended.
Three levels, looked at separately
We look at three altitudes on their own, because a business can win one and lose another.
- The broad category question: "best AI visibility tracker."
- The specific service or feature: "best tool for tracking AI Overview citations."
- The brand itself: "is Visibility Kit any good," "tell me about Visibility Kit."
The split between the broad question and the specific ones is also how a hidden hole shows up. A business can lead the broad question and be completely missing on one specific service that a competitor owns. We never assume that hole is there. We check each service line on its own and let the data show whether it is.
One rule across all three: keep your own brand name out of the buying-question set. Naming yourself in the question inflates the score. Brand questions stay at the brand level.
Where the questions come from
There is no public dataset of what people actually ask AI tools. The "AI search volume" numbers you can buy turn out to be counts of Google "People Also Ask" questions, not real AI usage. We tested one of those fields against twenty of our own mined questions and it came back zero for all twenty. So we never make questions up, and we never treat a proxy as if it were real AI behavior. We pull candidates from four real sources and lean hardest on the ones we can corroborate.
- Ask the AI models directly. We ask several frontier models, the ones people actually use, for the questions real people bring to them around a topic. This is the only source that surfaces intent before it ever lands in a keyword tool, and it is where the most interesting findings come from.
- Real Google questions. We mine Google's People Also Ask, going a level deep, and weight each question by how often it recurs. Real questions, transparent weighting, and the text doubles as a content brief.
- What competitors already answer. The businesses ranking and getting cited for your topics have published, in effect, a validated question list. We read their pages for the questions they answer, which tells us what to track and what to build to outdo them.
- Your own data. Client reviews, the questions customers ask over the phone and by email, and Google Search Console. Search Console is authentic but it only shows what you already appear for, so it extends your footprint rather than finding gaps. Quora and Reddit help too, since the engines lean on them.
How we tell real demand from invention
Asking the models directly is powerful, but a model can volunteer a question nobody actually asks. So we do not take any single source on faith. We keep a question when more than one independent source agrees on it: the AI models, real Google demand (a matching People Also Ask question or genuine search volume), and competitor content. Anything two or more of those back becomes core. Anything only one source raised is tracked at lower priority or held for testing.
The interesting part is the questions that several models independently raise but Google has no footprint for at all. Those are not noise. They are demand that already exists on the AI surface before it shows up in any keyword tool. A brand-new program or regulation is the clearest case: the broad "does it cover this" question already has search volume, but the specific local version has none yet, while every model fields it. That is real intent your competitors' keyword tools cannot see, and it is the part we would build content for first.
Locking a core set
The set is not frozen forever, and it is not rebuilt from scratch each time. We lock a core set so the numbers stay comparable over time, and spend a small budget each cycle trying new candidates: questions the models start raising, fresh competitor pages, new Search Console queries. The ones that prove out get promoted into the core, and the dead ones get dropped. We refresh the sourcing monthly.
This is the part of the method that loops. The same sources we use to find questions come back later as things we measure, and what we learn from measuring sharpens the next round.
What it feeds
The locked clusters are what the Measure step runs. Working out a client's set costs only cents, so we run it at onboarding and refresh it cheaply.