Technical reference

How Monroya actually works.

The companion to /how-it-works for RevOps, data, and security teams. Providers, prompt generation, source extraction, scoring, and scan cadence — with the trade-offs called out.

Updated: June 2026

Providers

Every scan queries four providers in parallel. Coverage is chosen for traffic share, not catalog completeness.

ProviderModel classWeb accessCitation payload
ChatGPTGPT-5 family with browsingYes (native)Inline URLs + structured cards
ClaudeSonnet 4.5 with web_search toolYes (tool)Inline URLs
GeminiGemini 2.5 with groundingYes (Search grounding)Structured grounding payload
PerplexitySonar onlineYes (default)Structured citation array

Prompt generation

During onboarding, Monroya crawls your site, identifies your category and two-to-three primary competitors, and generates 8–12 candidate prompts spread across three buyer stages:

  • Discovery — "what is X", "how do I solve Y" framing.
  • Evaluation — "best X for Y", "X vs Y" comparative framing.
  • Decision — pricing, reviews, switching risk, implementation effort.

Candidates are derived from a base prompt library plus competitor-aware substitution, not from raw LLM generation — which produces too many low-intent prompts. You confirm or replace each one before the first scan runs.

Source extraction

For each provider response we run two extraction passes:

  1. Structured — read the provider's native citation payload (Perplexity citations, Gemini groundingMetadata, ChatGPT tool calls).
  2. Fallback URL parse — extract any URLs in the raw response text the structured payload missed, then resolve through a redirect map.

Both pass through normalization (lowercase, strip tracking params, canonicalize www, resolve known short URLs) before dedup. The result is a per-prompt source ledger you can see in the app.

Scoring

Two scores per prompt per provider:

  • Citation rate — share of the three runs in which your domain appears in the source ledger. 0–100%.
  • Mention quality — classifier output: cited (linked source), named (mentioned by name without link), category (your category mentioned, you aren't), absent.

The buyer-journey matrix aggregates these into a single score per stage per competitor set: weighted citation rate × mention quality, normalized 0–100. Opportunity ranking is the inverse of that score multiplied by an impact estimate (search volume proxy × buyer-stage weight).

Scan cadence

Three scan types:

  • Onboarding scan — runs once when you finish setup. Establishes baseline.
  • Weekly automated scan — runs every Monday 09:00 in your account timezone. Doesn't count against your monthly manual limit.
  • Manual rescan — triggered after you ship a change. Used to measure the lift from a specific action. Monthly limit varies by plan.

What we store

Per scan: raw provider responses, extracted sources, score deltas, the prompts you confirmed. Per account: site crawl metadata, competitor configuration, draft history.

We do not train models on your data. We do not share your scans with the providers we query — the queries are indistinguishable from any other API call. Account deletion purges all of the above within 30 days.

Frequently asked questions

Why these four providers and not eight?
ChatGPT, Claude, Gemini, and Perplexity account for the overwhelming majority of consumer and B2B AI search traffic. Adding long-tail assistants (Copilot, Grok, You) inflates the dashboard without changing what a buyer actually sees. We'd rather cover four providers deeply than eight superficially.
How do you handle provider response variance?
Each prompt runs three times per provider per scan; we report the modal answer and flag prompts where responses diverge by more than 30%. High-variance prompts get a separate stability score so you know when a citation is durable vs. coincidental.
Is this RAG or just direct API calls?
Direct API calls with web-search enabled where the provider supports it (ChatGPT, Perplexity, Gemini). Claude runs with its native web search tool. We don't run our own RAG — the point is to measure what the actual buyer sees, not what we'd see with our own retrieval layer in the middle.
How is source attribution extracted?
Two passes. First, we parse the provider's structured citation payload when available (Perplexity, Gemini grounding). Second, a fallback URL-extraction pass on the raw response text catches inline citations the structured payload misses. Both passes feed the same dedup + normalization step before the source ledger is written.

Related reading