# Agent Disco — full reference
> Grade any public URL for AI-agent discoverability. This file is the long-form knowledge dump — start at https://agentdisco.io/llms.txt for the short version.
Agent Disco fetches the handful of URIs agent frameworks actually read (robots.txt, llms.txt, /.well-known/ai-plugin.json, /.well-known/agent.json, /.well-known/mcp.json, OpenAPI specs, SDK signals, registry presence) and returns a letter grade A–F plus a per-category breakdown. The scanner runs on-demand only: no background crawling, no sitemap-walking, no link-following. One scan sends 10–20 requests against the target origin at 2 req/s max, then stops.
## How to use
1. Submit a URL at https://agentdisco.io/ — or POST to /api/v1/scans via the API.
2. The scanner runs all registered checks against the target. Per-category weights come from config/packages/agent_disco.yaml; each check's weight is published at /checks.
3. When the scan completes (status=completed), the grade + per-category breakdown + quick-win hints are available at /report/{host} and /api/v1/websites/{host}.
4. Per-scan detail (individual check findings with evidence) lives at /scan/{id}.
## Scoring groups (AD-35a)
The overall grade is a weighted average of six groups:
- 25% protocol_surface (well_known/ai_plugin_json, well_known/agent_json, well_known/mcp_json, protocols/*, identity/tls)
- 25% api_docs (api/*, docs/*)
- 20% onboarding (onboarding/*, identity/email_auth)
- 15% crawl_llm_training (crawl/*, root_level/*, html_meta/*, llm_training/*)
- 10% trust_realtime (anti_bot/*, identity/*)
- 5% economic_federation (registries/*)
Letter bands: A ≥ 85, B ≥ 70, C ≥ 55, D ≥ 40, F < 40. Skipped and errored checks are excluded from scoring (they carry `pointsPossible = null`).
## Full check catalogue
### Well-known URIs
- **well_known.agent_json** (weight 10) — /.well-known/agent.json (A2A AgentCard): Looks for an A2A AgentCard at `/.well-known/agent.json` (a2aprotocol.ai (https://a2aprotocol.ai)) and checks for the four load-bearing top-level keys: `name`, `description`, `skills`, `endpoints`. We only grade presence + shape; deep A2A capability discovery is a separate sub-product.
- **well_known.ai_plugin_json** (weight 8) — /.well-known/ai-plugin.json manifest: Looks for the ChatGPT-plugin manifest at `/.well-known/ai-plugin.json` and counts how many of the load-bearing OpenAI-schema keys are present (`name_for_human`, `name_for_model`, `description_for_model`, `api`). Convention is informally deprecated but many LLM runtimes still probe for it.
- **well_known.mcp_json** (weight 8) — /.well-known/mcp.json (Model Context Protocol): Looks for an MCP (Model Context Protocol) manifest at `/.well-known/mcp.json`. Records the top-level keys but only insists on the presence of at least one MCP indicator (`server`, `capabilities`, or `tools`) because the schema is still evolving.
- **well_known.openapi** (weight 6) — /.well-known/openapi.{json,yaml}: Probes the two RFC 8615 well-known OpenAPI paths (`/.well-known/openapi.json`, `/.well-known/openapi.yaml`) and confirms the body is OpenAPI 3.x. This is narrower than the general OpenAPI discovery check — sites that expose their spec at the well-known path earn both credits.
### Root-level files
- **root_level.ai_txt** (weight 4) — /ai.txt AI-crawler directives: Looks for `/ai.txt` — a secondary, still-draft root-level declaration of AI-crawler directives. Multiple drafts compete, so this check only records presence + a body excerpt, not any specific schema. Absence is not a negative signal (skip, not fail).
- **root_level.llms_full_txt** (weight 8) — /llms-full.txt long-form index: Looks for `/llms-full.txt` — the long-form companion to `/llms.txt` (llmstxt.org (https://llmstxt.org)). Passes on a substantive (≥ 1 KB) text body; warns on small or odd-typed responses; flags an HTML catch-all as a fail.
- **root_level.llms_txt** (weight 8) — /llms.txt index for LLMs: Looks for `/llms.txt` at the site root — a Markdown index of important URLs + summaries for LLM consumers (llmstxt.org (https://llmstxt.org)). Presence is the signal; we don't enforce the schema. Flags a plausible SPA catch-all (HTML at the path) as a fail.
### Crawl & indexing
- **crawl.feed** (weight 4) — RSS/Atom feed: Looks for an RSS/Atom feed via the conventional paths (`/feed`, `/feed.xml`, `/rss`, `/atom.xml`) and the homepage `` tag. The link-alternate declaration is the more reliable signal; the paths are a fallback. Absence is a skip, not a fail — feeds are secondary to sitemaps for agent discoverability.
- **crawl.robots_txt** (weight 13) — robots.txt AI-agent rules: Parses `/robots.txt` and checks whether the major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, and others) are allowed to crawl the site root. A blanket `User-agent: * / Disallow: /` fails the check outright.
- **crawl.sitemap** (weight 10) — XML sitemap discovery: Looks for a sitemap via both the conventional paths (`/sitemap.xml`, `/sitemap_index.xml`) and any `Sitemap:` directives in `/robots.txt`. Accepts `` and ``. Absence is a skip, not a fail — we can't tell "no sitemap" apart from "sitemap lives somewhere undeclared".
### HTML & meta
- **html_meta.description** (weight 3) — meta description: Looks for a `` on the homepage, and checks it's in the 50-300 character range most search engines and AI agents actually surface. Absence fails; an SPA shell (homepage that doesn't render HTML) is a skip.
- **html_meta.json_ld** (weight 8) — JSON-LD structured data: Parses every `