SiteOne Crawler: JSON Output Documentation

1. Introduction
2. Potential Use Cases
3. Detailed JSON Structure
4. JSON Schema (Draft)
5. Analysis Tables Description (tables key)
6. Note on Text Output

This document describes the structure and content of the JSON output file generated by the SiteOne Crawler. This JSON file contains detailed information about the crawled website, including metadata about the crawl process, results for each visited URL, quality scores, summary findings, and various analysis tables.

1. Introduction

The JSON output provides a comprehensive dataset about the crawled website. Key information includes:

Crawl Metadata: Details about the crawler execution, such as version, execution time, command used, hostname, and the final user agent.
Options: A complete record of all CLI configuration values used for the crawl.
Quality Scores: Overall and per-category quality scores (0-10) with deduction details.
Visited URL Results: For each URL visited during the crawl:
- URL address
- HTTP status code
- Elapsed time for the request (performance)
- Size of the response body
- Content type (HTML, CSS, JS, Image, etc.)
- Caching information (cache flags, lifetime)
- Additional analysis results stored in the extras field.
Stats: Aggregate statistics about the crawl (total URLs, sizes, timings, status code counts).
Summary: A list of findings (OK, Warning, Critical, Info) that feed into quality scoring.
Analysis Tables: Aggregated data and specific findings presented in structured tables:
- Skipped URLs: Reasons why certain URLs were not crawled (e.g., external domain, disallowed by robots.txt, specific rules).
- Redirects: List of URLs that resulted in redirects (3xx status codes).
- 404 Errors: List of URLs that resulted in a 404 Not Found status.
- SSL/TLS Info: Details about the website's SSL certificate (issuer, subject, validity dates, supported protocols).
- Performance: Tables listing the fastest and slowest URLs encountered during the crawl.
- SEO & Content:
  - SEO metadata (title, description, keywords, H1, indexing directives) for HTML pages.
  - OpenGraph and Twitter Card metadata.
  - Heading structure analysis (correctness of H1-H6 hierarchy).
  - Analysis of non-unique titles and descriptions across pages.
- Technical Details:
  - HTTP Headers: Summary of headers found, their occurrences, and unique values.
  - Caching Analysis: Breakdown of caching strategies by content type and domain.
  - DNS Information: DNS resolution details for the target domain.
  - Security Analysis: Evaluation of security-related HTTP headers.
  - External URLs: List of external URLs discovered during the crawl.
- Crawler Statistics: Performance metrics for the crawler itself, individual analyzers, and content processors.

2. Potential Use Cases

The detailed data within the JSON output enables a wide variety of use cases:

Comprehensive SEO Audits: Analyze titles, descriptions, heading structures, indexing status, and OpenGraph tags across the entire site.
Performance Monitoring & Optimization: Identify the slowest pages and resources, analyze load times, and check caching headers.
Broken Link Checking: Easily extract lists of all 404 errors and the pages where they were found.
Redirect Chain Analysis: Identify and analyze redirect chains.
Security Header Audits: Verify the implementation of crucial security headers (CSP, HSTS, X-Frame-Options, etc.) across the site.
Content Inventory & Analysis: Get a list of all crawled resources, their types, sizes, and status codes. Analyze content type distribution.
Website Archiving/Cloning: While the crawler has a dedicated offline export, the JSON contains the list of all discovered resources, which could inform a custom archiving process.
Competitive Analysis: Run the crawler on competitor sites (respecting their robots.txt) to gather insights into their structure, performance, and technology.
CI/CD Integration: Integrate the crawler into deployment pipelines to automatically check for new errors (404s, performance regressions) after deployments. Use quality scores and thresholds for automated pass/fail decisions.
Technical Debt Assessment: Identify outdated practices, missing security headers, or performance issues that need addressing.

3. Detailed JSON Structure

The JSON output has 8 top-level keys:

3.1. `crawler` (Object)

Contains metadata about the crawler execution:

name (String): Name of the crawler software.
version (String): Version of the crawler.
executedAt (String): Timestamp when the crawl was executed, in the format "YYYY-MM-DD HH:MM:SS" (space separator, no timezone). Example: "2026-03-16 14:55:13".
command (String): The command-line arguments used to run the crawl.
hostname (String): The hostname where the crawler was run.
finalUserAgent (String): The User-Agent string used for the HTTP requests.

3.2. `extraColumnsFromAnalysis` (Array)

An array of objects defining extra columns that might be added during specific analyses. These are primarily intended for augmenting report outputs. Each object contains:

name (String): The display name of the column.
length (Integer): Suggested display length/width.
truncate (Boolean): Whether the content should be truncated if it exceeds the length.
customMethod, customPattern, customGroup: Fields used for custom data extraction logic (null when not configured).

3.3. `options` (Object)

A flat object containing all 132 CLI configuration values used for the crawl. Every option from the command line (or its default value) is recorded here. Keys are the option names in camelCase (e.g., url, workers, maxReqsPerSec, timeout, outputType, userAgent, acceptEncoding, etc.). Values are strings, integers, booleans, or null, depending on the option type.

This is useful for reproducing a crawl or understanding the exact configuration that produced the results.

3.4. `qualityScores` (Object)

Contains overall and per-category quality scores computed after analysis.

overall (Object): The aggregate quality score.
- score (Float): Overall score from 0.0 to 10.0.
- label (String): Human-readable label (e.g., "A+", "A", "B", "C", "D", "F").
- weight (Float): Total weight (1.0 for overall).
- deductions (Array): Array of objects, each with:
  - points (Float): Number of points deducted.
  - reason (String): Explanation for the deduction.
categories (Array): Array of 5 category objects, each with:
- code (String): Category identifier. One of: "performance", "seo", "security", "accessibility", "bestPractices".
- name (String): Human-readable category name.
- score (Float): Category score from 0.0 to 10.0.
- label (String): Human-readable label.
- weight (Float): Weight of this category in the overall score (e.g., 0.20 for SEO, 0.25 for Security).
- deductions (Array): Array of deduction objects (same structure as overall deductions).

3.5. `results` (Array)

An array of objects, where each object represents a single visited URL.

url (String): The absolute URL that was visited.
status (String): The HTTP status code returned (e.g., "200", "404").
elapsedTime (Float): Time taken to fetch the URL in seconds (e.g., 0.005).
size (Integer): Size of the response body in bytes (e.g., 50961).
type (Integer): An enum representing the detected content type:
- 1: HTML
- 2: JavaScript
- 3: CSS
- 4: Image
- 7: Document (e.g., robots.txt)
- 8: JSON
- Other types may exist (Audio, Font, Video, XML, Redirect, Other).
cacheTypeFlags (Integer): Bitmask representing detected caching mechanisms (e.g., Cache-Control, ETag, Last-Modified). For example, 31 typically means Cache-Control + ETag + Last-Modified are all present. 32768 might indicate no caching headers found.
cacheLifetime (Integer): Cache lifetime in seconds derived from Cache-Control: max-age or Expires header. 0 if no lifetime could be determined.
extras (Array): Contains additional data from specific analyzers run on this URL. Typically an empty array [].

3.6. `stats` (Object)

Aggregate statistics about the entire crawl:

totalUrls (Integer): Total number of URLs visited.
totalSize (Integer): Total size of all responses in bytes.
totalSizeFormatted (String): Human-readable formatted total size (e.g., "31.33 MB").
totalExecutionTime (Float): Total wall-clock execution time in seconds.
totalRequestsTimes (Float): Sum of all individual request times in seconds.
totalRequestsTimesAvg (Float): Average request time in seconds.
totalRequestsTimesMin (Float): Minimum request time in seconds.
totalRequestsTimesMax (Float): Maximum request time in seconds.
countByStatus (Object): An object mapping HTTP status codes to counts. Keys are status code strings (e.g., "200", "404", "429"), values are integers. Only status codes that were actually encountered appear as keys.

3.7. `summary` (Object)

Contains a list of summary findings that feed into quality scoring.

items (Array): Array of finding objects, each with:
- aplCode (String): A unique code identifying the finding (e.g., "s201", "s404", "s502").
- status (String): Severity level. One of: "CRITICAL", "WARNING", "OK", "INFO".
- text (String): Human-readable description of the finding (e.g., "Brotli is supported for HTML", "1 URL(s) returned a 404 status code").

3.8. `tables` (Object)

An object where each key is a table identifier (e.g., skipped-summary, 404, seo) and the value is an object describing that table. Each table object contains:

aplCode (String): A unique code for the table.
title (String): A human-readable title for the table.
columns (Object): An object describing the columns of the table. Each key is a column identifier (e.g., reason, url, statusCode). The value is an object detailing the column:
- aplCode (String): Unique code for the column.
- name (String): Display name for the column header.
- width (Integer): Suggested display width (-1 might mean auto).
- formatter (Object | null): Defines how the data should be formatted (e.g., adding units like 'ms' or 'kB'). Empty object {} indicates default formatting.
- renderer (Object | null): Defines how the data should be rendered (e.g., adding color or links). Empty object {} indicates default rendering.
- truncateIfLonger (Boolean): Whether to truncate the value if it exceeds the width.
- Other fields like formatterWillChangeValueLength, nonBreakingSpaces, escapeOutputHtml, getDataValueCallback, forcedDataType provide more hints for rendering.
rows (Array): An array of objects, where each object represents a row in the table. The keys in each row object correspond to the column identifiers defined in columns. Important: All values in all table rows are strings, regardless of whether the data represents a number, count, or other type. For example, a count of 51 appears as "51", a request time of 0.003 appears as "0.003", and an empty value appears as "". Rows may also contain extra keys beyond the declared columns (see individual table descriptions for details).
position (String): A hint about where this table should typically be positioned in a report (e.g., before-url-table, after-url-table).

Note: The specific content and structure within tables depend on the analyzers enabled during the crawl. The set of tables may vary depending on what data was encountered (e.g., certificate-info only appears for HTTPS sites).

4. JSON Schema (Draft)

This is a draft JSON schema based on the actual output. It may need refinement for edge cases.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "SiteOne Crawler JSON Output",
  "description": "Schema for the JSON output file generated by SiteOne Crawler.",
  "type": "object",
  "properties": {
    "crawler": {
      "description": "Metadata about the crawler execution.",
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "version": { "type": "string" },
        "executedAt": { "type": "string", "description": "Format: YYYY-MM-DD HH:MM:SS" },
        "command": { "type": "string" },
        "hostname": { "type": "string" },
        "finalUserAgent": { "type": "string" }
      },
      "required": ["name", "version", "executedAt", "command", "hostname", "finalUserAgent"]
    },
    "extraColumnsFromAnalysis": {
      "description": "Definitions for extra columns used in analyses.",
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "length": { "type": "integer" },
          "truncate": { "type": "boolean" },
          "customMethod": { "type": ["string", "null"] },
          "customPattern": { "type": ["string", "null"] },
          "customGroup": { "type": ["string", "null"] }
        },
        "required": ["name", "length", "truncate"]
      }
    },
    "options": {
      "description": "All CLI configuration values used for the crawl.",
      "type": "object",
      "additionalProperties": true
    },
    "qualityScores": {
      "description": "Overall and per-category quality scores.",
      "type": "object",
      "properties": {
        "overall": {
          "type": "object",
          "properties": {
            "score": { "type": "number" },
            "label": { "type": "string" },
            "weight": { "type": "number" },
            "deductions": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "points": { "type": "number" },
                  "reason": { "type": "string" }
                },
                "required": ["points", "reason"]
              }
            }
          },
          "required": ["score", "label", "weight", "deductions"]
        },
        "categories": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "code": { "type": "string", "enum": ["performance", "seo", "security", "accessibility", "bestPractices"] },
              "name": { "type": "string" },
              "score": { "type": "number" },
              "label": { "type": "string" },
              "weight": { "type": "number" },
              "deductions": {
                "type": "array",
                "items": {
                  "type": "object",
                  "properties": {
                    "points": { "type": "number" },
                    "reason": { "type": "string" }
                  },
                  "required": ["points", "reason"]
                }
              }
            },
            "required": ["code", "name", "score", "label", "weight", "deductions"]
          }
        }
      },
      "required": ["overall", "categories"]
    },
    "results": {
      "description": "Array of results for each visited URL.",
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": { "type": "string", "format": "uri" },
          "status": { "type": "string" },
          "elapsedTime": { "type": "number" },
          "size": { "type": "integer" },
          "type": { "type": "integer", "description": "Enum for content type (1:HTML, 2:JS, 3:CSS, 4:Image, 7:Document, 8:JSON, ...)" },
          "cacheTypeFlags": { "type": "integer", "description": "Bitmask for caching mechanisms" },
          "cacheLifetime": { "type": "integer", "description": "Cache lifetime in seconds, 0 if undetermined" },
          "extras": {
            "type": "array",
            "description": "Additional analysis data for this URL (typically empty)"
          }
        },
        "required": ["url", "status", "elapsedTime", "size", "type", "cacheTypeFlags", "cacheLifetime", "extras"]
      }
    },
    "stats": {
      "description": "Aggregate crawl statistics.",
      "type": "object",
      "properties": {
        "totalUrls": { "type": "integer" },
        "totalSize": { "type": "integer" },
        "totalSizeFormatted": { "type": "string" },
        "totalExecutionTime": { "type": "number" },
        "totalRequestsTimes": { "type": "number" },
        "totalRequestsTimesAvg": { "type": "number" },
        "totalRequestsTimesMin": { "type": "number" },
        "totalRequestsTimesMax": { "type": "number" },
        "countByStatus": {
          "type": "object",
          "additionalProperties": { "type": "integer" }
        }
      },
      "required": ["totalUrls", "totalSize", "totalSizeFormatted", "totalExecutionTime", "totalRequestsTimes", "totalRequestsTimesAvg", "totalRequestsTimesMin", "totalRequestsTimesMax", "countByStatus"]
    },
    "summary": {
      "description": "Summary findings that feed into quality scoring.",
      "type": "object",
      "properties": {
        "items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "aplCode": { "type": "string" },
              "status": { "type": "string", "enum": ["CRITICAL", "WARNING", "OK", "INFO"] },
              "text": { "type": "string" }
            },
            "required": ["aplCode", "status", "text"]
          }
        }
      },
      "required": ["items"]
    },
    "tables": {
      "description": "Aggregated analysis results presented as tables.",
      "type": "object",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "aplCode": { "type": "string" },
          "title": { "type": "string" },
          "columns": {
            "type": "object",
            "additionalProperties": {
              "type": "object",
              "properties": {
                "aplCode": { "type": "string" },
                "name": { "type": "string" },
                "width": { "type": "integer" },
                "formatter": { "type": ["object", "null"] },
                "renderer": { "type": ["object", "null"] },
                "truncateIfLonger": { "type": "boolean" }
              },
              "required": ["aplCode", "name", "width"]
            }
          },
          "rows": {
            "type": "array",
            "items": {
              "type": "object",
              "description": "All row values are strings. Rows may contain extra keys beyond the declared columns.",
              "additionalProperties": { "type": "string" }
            }
          },
          "position": { "type": "string", "enum": ["before-url-table", "after-url-table"] }
        },
        "required": ["aplCode", "title", "columns", "rows", "position"]
      }
    }
  },
  "required": ["crawler", "extraColumnsFromAnalysis", "options", "qualityScores", "results", "stats", "summary", "tables"]
}

5. Analysis Tables Description (`tables` key)

This section details the structure and columns of each table found under the tables key in the JSON output.

Important note on data types: All values in all table rows are strings. Numeric values such as counts, times, and sizes are serialized as strings (e.g., "51" not 51, "0.003" not 0.003). Empty values appear as "". This applies to every table described below. Where column descriptions say "count" or "time", the value is still a string representation of that number.

Some tables include extra row keys beyond the declared columns. These are noted in the individual table descriptions.

5.1. `skipped-summary` (Skipped URLs Summary)

Provides a summary of skipped URLs grouped by domain and reason.

Column	Description
`reason`	A human-readable string describing why URLs from this domain were skipped (e.g., `"Not allowed host"`, `"Blocked by robots.txt"`).
`domain`	The domain name whose URLs were skipped.
`count`	The number of unique URLs skipped for this domain and reason.

5.2. `skipped` (Skipped URLs)

Lists individual URLs that were skipped during the crawl.

Column	Description
`reason`	A human-readable string describing why the URL was skipped (e.g., `"Not allowed host"`, `"Blocked by robots.txt"`, `"File extension is not allowed"`).
`url`	The URL that was skipped.
`sourceAttr`	A string describing the HTML attribute where the skipped URL was found (e.g., `"<a href>"`, `"<link href>"`, `"<script src>"`).
`sourceUqId`	The URL path of the page where the skipped URL was discovered (e.g., `"/"`, `"/docs/getting-started"`). This allows linking back to the source page.

5.3. `redirects` (Redirected URLs)

Lists URLs that resulted in an HTTP redirect (3xx status code).

Column	Description
`statusCode`	The specific redirect status code (e.g., `"301"`, `"302"`).
`url`	The original URL that redirected.
`targetUrl`	The target URL to which the original URL redirected.
`sourceUqId`	URL path of the page where the redirected URL was found.

5.4. `404` (404 URLs)

Lists URLs that resulted in a "404 Not Found" status code.

Column	Description
`statusCode`	The HTTP status code (typically `"404"`).
`url`	The URL that resulted in the 404 error.
`sourceUqId`	URL path of the page where the broken URL was found.

5.5. `certificate-info` (SSL/TLS info)

Provides details about the SSL/TLS certificate of the crawled domain.

Column	Description
`info`	The name of the certificate attribute (e.g., `"Issuer"`, `"Subject"`, `"Valid from"`, `"Valid to"`, `"Supported protocols"`, `"RAW certificate output"`, `"RAW protocols output"`).
`value`	The value of the corresponding certificate attribute. Always a string. For multi-line values like raw certificate or protocol output, the entire content is a single string with embedded newlines.

5.6. `fastest-urls` (TOP fastest URLs)

Lists the URLs with the lowest request times encountered during the crawl.

Column	Description
`requestTime`	The time taken to fetch the URL in seconds (e.g., `"0.003"`).
`statusCode`	The HTTP status code of the URL (e.g., `"200"`).
`url`	The URL itself.

5.7. `slowest-urls` (TOP slowest URLs)

Lists the URLs with the highest request times encountered during the crawl.

Column	Description
`requestTime`	The time taken to fetch the URL in seconds (e.g., `"1.234"`).
`statusCode`	The HTTP status code of the URL (e.g., `"200"`).
`url`	The URL itself.

5.8. `seo` (SEO metadata)

Provides SEO-related metadata extracted from HTML pages.

Column	Description
`urlPathAndQuery`	The path and query string of the URL.
`indexing`	A string describing the indexing status (e.g., `"index, follow"`, `"noindex, follow"`).
`title`	The content of the `<title>` tag, or empty string if not found.
`h1`	The content of the first `<h1>` tag found, or empty string.
`description`	The content of the `meta name="description"` tag, or empty string.
`keywords`	The content of the `meta name="keywords"` tag, or empty string.

Extra row keys (present in each row object but not declared as columns):

robotsIndex (String): Whether the page allows indexing (e.g., "1" for index, "0" for noindex).
deniedByRobotsTxt (String): Whether the page is denied by robots.txt (e.g., "0" for allowed, "1" for denied).

5.9. `open-graph` (OpenGraph metadata)

Provides Open Graph and Twitter Card metadata extracted from HTML pages.

Column	Description
`urlPathAndQuery`	The path and query string of the URL.
`ogTitle`	Content of the `og:title` meta tag, or empty string.
`ogDescription`	Content of the `og:description` meta tag, or empty string.
`ogImage`	Content of the `og:image` meta tag, or empty string.
`twitterTitle`	Content of the `twitter:title` meta tag, or empty string.
`twitterDescription`	Content of the `twitter:description` meta tag, or empty string.
`twitterImage`	Content of the `twitter:image` meta tag, or empty string.

5.10. `seo-headings` (Heading structure)

Provides analysis of the heading (H1-H6) structure for each HTML page.

Column	Description
`headings`	A formatted string representation of the heading structure showing hierarchy and potential errors (e.g., `"OK H1, H2, H2, H3"` or `"ERR H1, H3 (skipped H2)"`).
`headingsCount`	Total number of headings found on the page (e.g., `"5"`).
`headingsErrorsCount`	Number of structural errors found in the headings (e.g., `"0"`, `"2"`).
`urlPathAndQuery`	The path and query string of the URL.

Extra row key:

headingsHtml (String): An HTML string containing the full heading tree with markup (e.g., "H1 Title H2 Section..."). Useful for rendering a visual heading tree in reports.

5.11. `headers` (HTTP headers)

Summarizes the HTTP response headers encountered across all crawled URLs.

Column	Description
`header`	The name of the HTTP header.
`occurrences`	The total number of times this header was found (e.g., `"73"`).
`uniqueValues`	The count of distinct values found for this header, as a string (e.g., `"3"`).
`valuesPreview`	A preview string showing some of the values encountered (truncated if many).
`minValue`	The minimum value found (relevant for numerical or date headers), or empty string.
`maxValue`	The maximum value found, or empty string.

5.12. `headers-values` (HTTP header values)

Lists unique values for each HTTP header and their occurrence count.

Column	Description
`header`	The name of the HTTP header.
`occurrences`	The number of times this specific value occurred for this header (e.g., `"51"`).
`value`	The specific unique value of the HTTP header.

5.13. `caching-per-content-type` (HTTP Caching by content type)

Analyzes caching effectiveness grouped by general content type (HTML, Image, JS, CSS, etc.).

Column	Description
`contentType`	The general content type category (e.g., `"HTML"`, `"Image"`, `"JS"`).
`cacheType`	Description of the caching mechanism detected (e.g., `"Cache-Control + ETag + Last-Modified"`, `"No cache headers"`).
`count`	Number of URLs matching this content type and cache type.
`avgLifetime`	Average cache lifetime in seconds for URLs in this group, or empty string if not determinable.
`minLifetime`	Minimum cache lifetime in seconds, or empty string.
`maxLifetime`	Maximum cache lifetime in seconds, or empty string.

5.14. `caching-per-domain` (HTTP Caching by domain)

Analyzes caching effectiveness grouped by domain.

Column	Description
`domain`	The domain name.
`cacheType`	Description of the caching mechanism detected.
`count`	Number of URLs from this domain matching this cache type.
`avgLifetime`	Average cache lifetime in seconds, or empty string.
`minLifetime`	Minimum cache lifetime in seconds, or empty string.
`maxLifetime`	Maximum cache lifetime in seconds, or empty string.

5.15. `caching-per-domain-and-content-type` (HTTP Caching by domain and content type)

Analyzes caching effectiveness grouped by both domain and general content type.

Column	Description
`domain`	The domain name.
`contentType`	The general content type category.
`cacheType`	Description of the caching mechanism detected.
`count`	Number of URLs matching this domain, content type, and cache type.
`avgLifetime`	Average cache lifetime in seconds, or empty string.
`minLifetime`	Minimum cache lifetime in seconds, or empty string.
`maxLifetime`	Maximum cache lifetime in seconds, or empty string.

5.16. `non-unique-titles` (TOP non-unique titles)

Lists page titles that appear on more than one page.

Column	Description
`count`	The number of pages sharing this title.
`title`	The non-unique page title.

5.17. `non-unique-descriptions` (TOP non-unique descriptions)

Lists meta descriptions that appear on more than one page.

Column	Description
`count`	The number of pages sharing this description.
`description`	The non-unique meta description content.

5.18. `best-practices` (Best practices)

Summarizes the results of various best practice checks performed by analyzers.

Column	Description
`analysisName`	The name of the specific best practice check (e.g., `"Large inline SVGs"`, `"Heading structure"`, `"Brotli support"`).
`ok`	Count of URLs passing this check.
`notice`	Count of URLs with a notice-level finding.
`warning`	Count of URLs with a warning-level finding.
`critical`	Count of URLs with a critical-level finding.

5.19. `accessibility` (Accessibility)

Summarizes the results of accessibility checks.

Column	Description
`analysisName`	The name of the specific accessibility check (e.g., `"Missing image alt attributes"`, `"Missing html lang attribute"`, `"ARIA roles and landmarks"`).
`ok`	Count of elements/pages passing this check.
`notice`	Count of notice-level findings.
`warning`	Count of warning-level findings.
`critical`	Count of critical-level findings.

5.20. `source-domains` (Source domains)

Provides statistics about the domains from which resources were loaded.

Column	Description
`domain`	The domain name.
`totals`	A summary string showing total count, size, and time for resources from this domain (e.g., `"67/30MB/6.2s"`).
`HTML`	Summary string (count/size/time) for HTML resources from this domain.
`Image`	Summary string for Image resources.
`JS`	Summary string for JavaScript resources.
`CSS`	Summary string for CSS resources.
`Document`	Summary string for Document resources (e.g., robots.txt).

Extra row keys (dynamic, present when data exists):

Audio, Font, JSON, Other, Redirect, Video, XML (String): Summary strings for additional content types, included only when resources of that type are present.
totalCount (String): Total number of resources loaded from this domain.

Note: The set of content type columns is dynamic. The declared columns (HTML, Image, JS, CSS, Document) are always present, but additional content type columns appear in row data based on what resource types were actually encountered during the crawl.

5.21. `content-types` (Content types)

Summarizes statistics grouped by general content type.

Column	Description
`contentType`	The general content type category (e.g., `"HTML"`, `"Image"`).
`count`	Total number of URLs of this content type.
`totalSize`	Total size in bytes for this content type.
`totalTime`	Total time spent fetching resources of this content type.
`avgTime`	Average time spent fetching a resource of this content type.
`status20x`	Count of URLs with a 2xx status code.
`status40x`	Count of URLs with a 4xx status code.

Note: The status columns are dynamic. Additional columns like status42x (for HTTP 429) or status30x, status50x may appear depending on which status codes were actually encountered during the crawl. These dynamic columns will also be declared in the table's columns object.

5.22. `content-types-raw` (Content types (MIME types))

Summarizes statistics grouped by the specific MIME type reported in the Content-Type HTTP header.

Column	Description
`contentType`	The raw MIME type string (e.g., `"text/html"`, `"image/svg+xml"`, `"text/html; charset=utf-8"`).
`count`	Total number of URLs with this MIME type.
`totalSize`	Total size in bytes.
`totalTime`	Total time spent fetching.
`avgTime`	Average time spent fetching.
`status20x`	Count of URLs with a 2xx status code.
`status40x`	Count of URLs with a 4xx status code.

Note: Like content-types, the status columns are dynamic. Additional status columns (e.g., status42x) appear when the corresponding status codes are encountered.

5.23. `dns` (DNS info)

Shows the DNS resolution information for the crawled domain(s).

Column	Description
`info`	A line of text representing part of the DNS resolution (e.g., the domain name, an IP address, the DNS server used). Presented as a simple text tree.

5.24. `security` (Security)

Summarizes findings related to security HTTP headers.

Column	Description
`header`	The name of the security header being analyzed (e.g., `"Strict-Transport-Security"`, `"X-Frame-Options"`, `"Content-Security-Policy"`).
`ok`	Count of URLs where the header was configured correctly.
`notice`	Count of URLs with a notice-level finding.
`warning`	Count of URLs with a warning-level finding.
`critical`	Count of URLs with a critical-level finding.
`recommendation`	A string containing textual recommendations for improving the configuration of this header.

Extra row key:

highestSeverity (String): The highest severity level found for this header across all URLs (e.g., "ok", "warning", "critical").

5.25. `analysis-stats` (Analysis stats)

Provides performance metrics for individual analyzer methods.

Column	Description
`classAndMethod`	The class and method name of the analyzer function.
`execTime`	Total execution time in seconds spent in this method across all relevant URLs/data points.
`execCount`	The number of times this method was executed.

Extra row key:

execTimeFormatted (String): Human-readable formatted execution time (e.g., "0.012 s", "1.234 s").

5.26. `content-processors-stats` (Content processor stats)

Provides performance metrics for content processor methods (HTML, CSS, JS, XML processors that run during the crawl).

Column	Description
`classAndMethod`	The class and method name of the content processor function.
`execTime`	Total execution time in seconds spent in this method.
`execCount`	The number of times this method was executed.

Extra row key:

execTimeFormatted (String): Human-readable formatted execution time.

5.27. `external-urls` (External URLs)

Lists external URLs discovered during the crawl along with where they were found.

Column	Description
`url`	The external URL that was discovered.
`count`	The number of times this external URL was found across all crawled pages.
`foundOn`	The URL of the page where this external URL was found (typically the first occurrence).

6. Note on Text Output

While this document focuses on the JSON output, SiteOne Crawler also offers a simpler Text output format (--output-text-file). The Text output provides a human-readable summary suitable for quick review in a terminal or text editor.

See the Text Output Documentation for more details on the Text format.

FilesExpand file tree

JSON-OUTPUT.md

Latest commit

History

JSON-OUTPUT.md

File metadata and controls

SiteOne Crawler: JSON Output Documentation

Table of Contents

1. Introduction

2. Potential Use Cases

3. Detailed JSON Structure

3.1. crawler (Object)

3.2. extraColumnsFromAnalysis (Array)

3.3. options (Object)

3.4. qualityScores (Object)

3.5. results (Array)

3.6. stats (Object)

3.7. summary (Object)

3.8. tables (Object)

4. JSON Schema (Draft)

5. Analysis Tables Description (tables key)

5.1. skipped-summary (Skipped URLs Summary)

5.2. skipped (Skipped URLs)

5.3. redirects (Redirected URLs)

5.4. 404 (404 URLs)

5.5. certificate-info (SSL/TLS info)

5.6. fastest-urls (TOP fastest URLs)

5.7. slowest-urls (TOP slowest URLs)

5.8. seo (SEO metadata)

5.9. open-graph (OpenGraph metadata)

5.10. seo-headings (Heading structure)

5.11. headers (HTTP headers)

5.12. headers-values (HTTP header values)

5.13. caching-per-content-type (HTTP Caching by content type)

5.14. caching-per-domain (HTTP Caching by domain)

5.15. caching-per-domain-and-content-type (HTTP Caching by domain and content type)

5.16. non-unique-titles (TOP non-unique titles)

5.17. non-unique-descriptions (TOP non-unique descriptions)

5.18. best-practices (Best practices)

5.19. accessibility (Accessibility)

5.20. source-domains (Source domains)

5.21. content-types (Content types)

5.22. content-types-raw (Content types (MIME types))

5.23. dns (DNS info)

5.24. security (Security)

5.25. analysis-stats (Analysis stats)

5.26. content-processors-stats (Content processor stats)

5.27. external-urls (External URLs)

6. Note on Text Output

3.1. `crawler` (Object)

3.2. `extraColumnsFromAnalysis` (Array)

3.3. `options` (Object)

3.4. `qualityScores` (Object)

3.5. `results` (Array)

3.6. `stats` (Object)

3.7. `summary` (Object)

3.8. `tables` (Object)

5. Analysis Tables Description (`tables` key)

5.1. `skipped-summary` (Skipped URLs Summary)

5.2. `skipped` (Skipped URLs)

5.3. `redirects` (Redirected URLs)

5.4. `404` (404 URLs)

5.5. `certificate-info` (SSL/TLS info)

5.6. `fastest-urls` (TOP fastest URLs)

5.7. `slowest-urls` (TOP slowest URLs)

5.8. `seo` (SEO metadata)

5.9. `open-graph` (OpenGraph metadata)

5.10. `seo-headings` (Heading structure)

5.11. `headers` (HTTP headers)

5.12. `headers-values` (HTTP header values)

5.13. `caching-per-content-type` (HTTP Caching by content type)

5.14. `caching-per-domain` (HTTP Caching by domain)

5.15. `caching-per-domain-and-content-type` (HTTP Caching by domain and content type)

5.16. `non-unique-titles` (TOP non-unique titles)

5.17. `non-unique-descriptions` (TOP non-unique descriptions)

5.18. `best-practices` (Best practices)

5.19. `accessibility` (Accessibility)

5.20. `source-domains` (Source domains)

5.21. `content-types` (Content types)

5.22. `content-types-raw` (Content types (MIME types))

5.23. `dns` (DNS info)

5.24. `security` (Security)

5.25. `analysis-stats` (Analysis stats)

5.26. `content-processors-stats` (Content processor stats)

5.27. `external-urls` (External URLs)