Measuring Visibility
Recall and grounded channels, across the engines people use, counted by frequency.
Once you know which questions matter, you run them. Measuring well mostly comes down to three choices: which channels, which engines, and what you count.
The two channels
We run every question twice.
With web search off, the model answers from training alone. This is what it believes about your category before it looks anything up, and winning here is the closest thing to a moat. It is earned slowly, on the order of the year or more it takes for a model to be retrained, it is hard for a competitor to take from you quickly, and it does not vanish the next time someone re-crawls the web.
With web search on, the model answers from live results. This moves faster, for you and against you. New content can start surfacing here within days of publishing. A win is real but borrowed. It depends on the current state of the web and can change the next time the index updates.
We run both because the gap between them is itself the diagnosis. Being strong in training and weak in live search is a different problem, with a different fix, from being strong in live search and weak in training. The diagnose step turns that gap into a plan.
Across the engines people actually use
There is no single "AI." ChatGPT, Gemini, Claude, and Perplexity learned different things and behave differently, so we run each one and report them on their own before combining anything.
This matters more than it sounds. The engines genuinely disagree about local businesses: one will confidently name a business with web off, while another refuses to say anything about a local business until it can search. A blended number across all of them hides exactly the differences you need to act on.
They also cite differently. Some point to the business's own site, some lean on directories and aggregators, some hand back citations you cannot even inspect. That feeds two later findings, the owned-citation gap and the citation map.
Google AI Overviews are a special case
Google AI Overviews are the one place the data is the real thing rather than a stand-in. We grab them straight from live search results, so we see exactly which questions trigger an AI answer and what that answer cites, and we can do it localized, the way it would show up in the client's own city. For the chat engines we run the prompts ourselves and treat the result as a solid stand-in for how people query them. For AI Overviews we are looking at the actual surface.
What that answer cites is a direct read on who the AI trusts for your topics, and it feeds the citation map in Reading the gaps.
Count how often, not where
Because the answers are probabilistic, the thing we count is recommendation rate: how often a business comes up across a lot of runs of the same question, per engine, per channel. It is about appearances across many runs. Where you landed in any single answer, or one lucky screenshot, tells you almost nothing. A business that comes up in nine runs out of ten is genuinely baked in. One that showed up once, near the top, in a single run is not.
In practice we run each question five times per engine, per channel. Five is not arbitrary. The research behind this approach and our own variance testing land in the same place: below that, you are reading noise; well above it, you are paying for runs that no longer change the answer.
What about position?
Position inside an AI answer is the metric everyone wants and the one most likely to mislead. Across single runs it is close to random: the same question can put the same business second, fifth, or fourteenth on consecutive runs. One study put the odds of two identical ordered lists from the same prompt at less than one in a thousand. So we never report where you landed in any individual answer.
Averaged across enough runs, though, position settles down and starts carrying real signal. So we treat it as a secondary read: frequency tells you whether you are in the answer at all, and average position adds shading where frequency is already high. A business mentioned in nine of ten runs at an average position of two is in a different spot than one mentioned in nine of ten runs at position eight. Where frequency is low, we ignore position entirely, because an average built on two appearances is still noise.
Not every question counts equally
A question about your highest-demand service should move your score more than a niche one. We weight each tracked question by the search volume of the underlying service it maps to, so the headline number reflects where the actual demand is. The mapping matters: the weight comes from the service itself, not from whatever phrasing the question happens to use, because plenty of real AI questions have phrasings no keyword tool has ever seen. A question can be brand new and still be about a high-demand service. Alongside the weighted number we always keep the plain unweighted rate visible, so a weighting choice can never quietly hide a gap.
What this hands to the next step
By the end you can say, for each question: how often the business gets recommended, in which channel, by which engine, and who else shows up next to it. That is what the diagnose step turns into a named gap with a fix.