Well-known URIs
| Check | Phase | Weight | Description |
|---|---|---|---|
| /.well-known/agent.json (A2A AgentCard) well_known.agent_json | passive | 10 | Looks for an A2A AgentCard at `/.well-known/agent.json` ([a2aprotocol.ai](https://a2aprotocol.ai)) and checks for the four load-bearing top-level keys: `name`, … |
| /.well-known/ai-plugin.json manifest well_known.ai_plugin_json | passive | 8 | Looks for the ChatGPT-plugin manifest at `/.well-known/ai-plugin.json` and counts how many of the load-bearing OpenAI-schema keys are present (`name_for_human`,… |
| /.well-known/mcp.json (Model Context Protocol) well_known.mcp_json | passive | 8 | Looks for an MCP (Model Context Protocol) manifest at `/.well-known/mcp.json`. Records the top-level keys but only insists on the presence of at least one MCP i… |
| /.well-known/openapi.{json,yaml} well_known.openapi | passive | 6 | Probes the two RFC 8615 well-known OpenAPI paths (`/.well-known/openapi.json`, `/.well-known/openapi.yaml`) and confirms the body is OpenAPI 3.x. This is narrow… |
Root-level files
| Check | Phase | Weight | Description |
|---|---|---|---|
| /ai.txt AI-crawler directives root_level.ai_txt | passive | 4 | Looks for `/ai.txt` — a secondary, still-draft root-level declaration of AI-crawler directives. Multiple drafts compete, so this check only records presence + a… |
| /llms-full.txt long-form index root_level.llms_full_txt | passive | 8 | Looks for `/llms-full.txt` — the long-form companion to `/llms.txt` ([llmstxt.org](https://llmstxt.org)). Passes on a substantive (≥ 1 KB) text body; warns on s… |
| /llms.txt index for LLMs root_level.llms_txt | passive | 8 | Looks for `/llms.txt` at the site root — a Markdown index of important URLs + summaries for LLM consumers ([llmstxt.org](https://llmstxt.org)). Presence is the … |
Crawl & indexing
| Check | Phase | Weight | Description |
|---|---|---|---|
| RSS/Atom feed crawl.feed | passive | 4 | Looks for an RSS/Atom feed via the conventional paths (`/feed`, `/feed.xml`, `/rss`, `/atom.xml`) and the homepage `` tag. The link-alternate declaration is the… |
| robots.txt AI-agent rules crawl.robots_txt | passive | 13 | Parses `/robots.txt` and checks whether the major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, and others) are allowed to crawl the si… |
| XML sitemap discovery crawl.sitemap | passive | 10 | Looks for a sitemap via both the conventional paths (`/sitemap.xml`, `/sitemap_index.xml`) and any `Sitemap:` directives in `/robots.txt`. Accepts `` and ``. Ab… |
HTML & meta
| Check | Phase | Weight | Description |
|---|---|---|---|
| meta description html_meta.description | passive | 3 | Looks for a `` on the homepage, and checks it's in the 50-300 character range most search engines and AI agents actually surface. Absence fails; an SPA shell (h… |
| JSON-LD structured data html_meta.json_ld | passive | 8 | Parses every `` block on the homepage and categorises by `@type`. `WebAPI` or `SoftwareApplication` is the primary agent signal — those entries explicitly decla… |
| Open Graph tags html_meta.open_graph | passive | 4 | Looks for the three load-bearing Open Graph tags on the homepage: `og:title`, `og:description`, `og:type`. LLM-driven link previews and search result cards surf… |
API discoverability
| Check | Phase | Weight | Description |
|---|---|---|---|
| GraphQL introspection api.graphql_introspection | passive | 8 | POSTs a minimal GraphQL introspection query (`{ __schema { queryType { name } } }`) to `/graphql`, `/api/graphql`, `/query`. Passes when the server returns the … |
| JSON error bodies for API callers api.json_error_body | passive | 5 | Requests a random non-existent path with `Accept: application/json`. Passes when the server returns a JSON error body (`application/json` or `application/proble… |
| OpenAPI specification discovery api.openapi_discovery | passive | 10 | Probes nine conventional paths for an OpenAPI spec (`/openapi.json`, `/api/openapi.yaml`, `/swagger.json`, etc.) and confirms the top-level `openapi` key declar… |
Protocols
| Check | Phase | Weight | Description |
|---|---|---|---|
| A2A AgentCard conformance protocols.a2a_agent_card | passive | 8 | Deep conformance check on the A2A AgentCard. Requires `version`, a non-empty `skills` array (each with `name` + `description`) and a non-empty `endpoints` array… |
| Public MCP registry listing protocols.mcp_registry_presence | passive | 10 | Searches the major public MCP registries (Smithery, mcp.so, PulseMCP, Glama) for the target host. A listing in any registry earns full credit — being catalogued… |
Registries
| Check | Phase | Weight | Description |
|---|---|---|---|
| GitHub public repository registries.github_repo | passive | 5 | Searches GitHub for repositories whose name or topic matches the target host. Extra note in evidence when the repo carries an agent-relevant topic (`mcp-server`… |
| npm SDK package registries.npm_package | passive | 6 | Queries the npm registry for packages plausibly attributable to the target — scoped packages at `@/*`, plus any package whose name carries the target's bare hos… |
| PyPI SDK package registries.pypi_package | passive | 6 | Direct-probes PyPI for `` and `-sdk` — the two names most likely to exist if an official Python SDK does. Skips when neither probe succeeds; does not fail, beca… |
Documentation
| Check | Phase | Weight | Description |
|---|---|---|---|
| Docs platform discoverability docs.platform | passive | 6 | Probes the conventional docs paths (`/docs`, `/documentation`, `/api`, `/api/docs`, `/reference`, `/developers`) and fingerprints the first hit. Recognises Mint… |
| SDK availability across languages docs.sdk_availability | passive | 8 | Counts language-level SDKs by combining the npm + PyPI registry findings with any install commands scraped from the docs homepage (`npm install`, `pip install`,… |
LLM training data
| Check | Phase | Weight | Description |
|---|---|---|---|
| Common Crawl index presence llm_training.common_crawl | passive | 8 | Queries the Common Crawl CDX endpoint for the most recent monthly snapshot and counts the target's pages. Common Crawl's corpus underpins most open-source LLM t… |
| Hacker News mentions llm_training.hn_mentions | passive | 5 | Queries the Algolia-hosted HN Search API for mentions of the target host. A handful of stories or comments indicate the service has been discussed enough to sho… |
| Wikipedia article llm_training.wikipedia | passive | 8 | Searches Wikipedia for an article about the service (derived from the org part of the host) and checks whether the domain appears in the article's external link… |
Anti-bot posture
| Check | Phase | Weight | Description |
|---|---|---|---|
| Anti-bot interstitial anti_bot.cloudflare_interstitial | passive | 10 | Fetches the homepage and flags Cloudflare / PerimeterX / Akamai / HUMAN interstitials by header (`cf-mitigated`, `cf-chl-*`, `x-px-captcha`, `x-akamai-session-i… |
| User-agent sniffing anti_bot.user_agent_sniffing | passive | 5 | Fetches the homepage as `AgentDisco/1.0` and again as `curl/7.88.0`, then compares status, content-type, and body length. A large divergence indicates user-agen… |
Identity & verification
| Check | Phase | Weight | Description |
|---|---|---|---|
| Email auth (SPF, DMARC, DKIM) identity.email_auth | passive | 5 | Looks up TXT records on the domain for SPF (`v=spf1`), DMARC (`_dmarc.`), and DKIM at the four most common selector names (`default`, `google`, `s1`, `selector1… |
| TLS + HSTS + HTTPS redirect identity.tls | passive | 10 | Three-part trust check: TLS certificate validity (chain + dates), `Strict-Transport-Security` with `max-age >= 15552000` (6 months, the HSTS preload minimum), a… |
Agent onboarding
| Check | Phase | Weight | Description |
|---|---|---|---|
| API-key / signup path discoverability onboarding.api_key_path | passive | 6 | Looks for API-key signup discoverability: probes conventional paths (/signup, /register, /developers, /api-keys, /account/api) plus anchors on the homepage and … |