How Monroya actually works.
The companion to /how-it-works for RevOps, data, and security teams. Providers, prompt generation, source extraction, scoring, and scan cadence — with the trade-offs called out.
Updated: June 2026
Providers
Every scan queries four providers in parallel. Coverage is chosen for traffic share, not catalog completeness.
| Provider | Model class | Web access | Citation payload |
|---|---|---|---|
| ChatGPT | GPT-5 family with browsing | Yes (native) | Inline URLs + structured cards |
| Claude | Sonnet 4.5 with web_search tool | Yes (tool) | Inline URLs |
| Gemini | Gemini 2.5 with grounding | Yes (Search grounding) | Structured grounding payload |
| Perplexity | Sonar online | Yes (default) | Structured citation array |
Prompt generation
During onboarding, Monroya crawls your site, identifies your category and two-to-three primary competitors, and generates 8–12 candidate prompts spread across three buyer stages:
- Discovery — "what is X", "how do I solve Y" framing.
- Evaluation — "best X for Y", "X vs Y" comparative framing.
- Decision — pricing, reviews, switching risk, implementation effort.
Candidates are derived from a base prompt library plus competitor-aware substitution, not from raw LLM generation — which produces too many low-intent prompts. You confirm or replace each one before the first scan runs.
Source extraction
For each provider response we run two extraction passes:
- Structured — read the provider's native citation payload (Perplexity citations, Gemini groundingMetadata, ChatGPT tool calls).
- Fallback URL parse — extract any URLs in the raw response text the structured payload missed, then resolve through a redirect map.
Both pass through normalization (lowercase, strip tracking params, canonicalize www, resolve known short URLs) before dedup. The result is a per-prompt source ledger you can see in the app.
Scoring
Two scores per prompt per provider:
- Citation rate — share of the three runs in which your domain appears in the source ledger. 0–100%.
- Mention quality — classifier output: cited (linked source), named (mentioned by name without link), category (your category mentioned, you aren't), absent.
The buyer-journey matrix aggregates these into a single score per stage per competitor set: weighted citation rate × mention quality, normalized 0–100. Opportunity ranking is the inverse of that score multiplied by an impact estimate (search volume proxy × buyer-stage weight).
Scan cadence
Three scan types:
- Onboarding scan — runs once when you finish setup. Establishes baseline.
- Weekly automated scan — runs every Monday 09:00 in your account timezone. Doesn't count against your monthly manual limit.
- Manual rescan — triggered after you ship a change. Used to measure the lift from a specific action. Monthly limit varies by plan.
What we store
Per scan: raw provider responses, extracted sources, score deltas, the prompts you confirmed. Per account: site crawl metadata, competitor configuration, draft history.
We do not train models on your data. We do not share your scans with the providers we query — the queries are indistinguishable from any other API call. Account deletion purges all of the above within 30 days.
Frequently asked questions
- Why these four providers and not eight?
- ChatGPT, Claude, Gemini, and Perplexity account for the overwhelming majority of consumer and B2B AI search traffic. Adding long-tail assistants (Copilot, Grok, You) inflates the dashboard without changing what a buyer actually sees. We'd rather cover four providers deeply than eight superficially.
- How do you handle provider response variance?
- Each prompt runs three times per provider per scan; we report the modal answer and flag prompts where responses diverge by more than 30%. High-variance prompts get a separate stability score so you know when a citation is durable vs. coincidental.
- Is this RAG or just direct API calls?
- Direct API calls with web-search enabled where the provider supports it (ChatGPT, Perplexity, Gemini). Claude runs with its native web search tool. We don't run our own RAG — the point is to measure what the actual buyer sees, not what we'd see with our own retrieval layer in the middle.
- How is source attribution extracted?
- Two passes. First, we parse the provider's structured citation payload when available (Perplexity, Gemini grounding). Second, a fallback URL-extraction pass on the raw response text catches inline citations the structured payload misses. Both passes feed the same dedup + normalization step before the source ledger is written.