Skip to content

blackoutsecure/bos-sitemap-generator

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Blackout Secure Sitemap Generator

Copyright Β© 2025-2026 Blackout Secure | Apache License 2.0

Marketplace GitHub release License Made by BlackoutSecure

Enterprise-grade automated sitemap generation (XML/TXT/GZIP) for static sites, SSG frameworks (Next.js, Gatsby, Hugo, Jekyll), and dynamic applications. Built for reliability, performance, and SEO best practices.

✨ Features

  • Multiple Formats: XML, TXT, and GZIP compressed sitemaps
  • Smart Discovery: Auto-detect site URLs and directories
  • Framework Support: Works with Next.js, Gatsby, Hugo, Jekyll, Vite, and more
  • SEO Optimized: Canonical URL parsing, link discovery, lastmod timestamps
  • Validation: Built-in validation against sitemaps.org protocol
  • Large Sites: Auto-splitting for sites with 50,000+ URLs
  • Flexible: Customizable patterns, exclusions, and priorities
  • Git Integration: Last modified dates from git history
  • No Build Required: Can validate existing sitemaps without generation

πŸ“‹ Prerequisites

  • GitHub Actions environment (Ubuntu, macOS, or Windows)
  • Built site files (HTML, CSS, JS, etc.)
  • For git-based lastmod: fetch-depth: 0 in checkout step

πŸš€ Quick Start

Basic Usage

name: Generate Sitemap

on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  sitemap:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0 # Required for git-based lastmod

      - name: Build your site
        run: npm run build # or your build command

      - name: Generate sitemap
        uses: blackoutsecure/bos-sitemap-generator@v1
        with:
          site_url: 'https://example.com'
          public_dir: 'dist'

πŸ“– Examples

Next.js Static Export

- name: Build Next.js site
  run: npm run build

- name: Generate sitemap
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'out'
    lastmod_strategy: 'git'

Gatsby

- name: Build Gatsby site
  run: npm run build

- name: Generate sitemap
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'public'

Hugo

- name: Build Hugo site
  run: hugo --minify

- name: Generate sitemap
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'public'

Jekyll

- name: Build Jekyll site
  run: bundle exec jekyll build

- name: Generate sitemap
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: '_site'

Vite

- name: Build Vite project
  run: npm run build

- name: Generate sitemap
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'

Advanced: Custom Patterns and Exclusions

- name: Generate sitemap with custom rules
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    include_patterns: '**/*.html,**/*.htm,**/*.php'
    exclude_patterns: '**/*.map,**/drafts/**,**/private/**'
    exclude_urls: '*/admin/*,*/test/*'
    changefreq: 'weekly'
    priority: '0.8'

Additional URLs

Include non-HTML pages or external resources:

- name: Generate sitemap with additional URLs
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    additional_urls: 'https://example.com/api,https://example.com/app'

Disable TXT Format

- name: Generate XML sitemap only
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    generate_sitemap_txt: 'false'

βš™οΈ Configuration

Required Inputs

Input Description Example
site_url Public base URL of your site https://example.com

Common Inputs

Input Description Default
public_dir Directory containing built site files dist
sitemap_output_dir Where to write sitemap files Same as public_dir
include_patterns Glob patterns to include **/*.html,**/*.htm
exclude_patterns Glob patterns to exclude **/*.map
lastmod_strategy Source for lastmod dates git
generate_sitemap_gzip Create gzipped version true
generate_sitemap_txt Create TXT format true

SEO Inputs

Input Description Valid Values
changefreq How often pages change always, hourly, daily, weekly, monthly, yearly, never
priority Relative priority on your site 0.0 to 1.0
parse_canonical Use canonical URLs from HTML true (default)
discover_links Auto-discover internal links true (default)

Advanced Inputs

Input Description Default
additional_urls Extra URLs to include -
exclude_urls URL patterns to exclude */sitemap*.xml,*/sitemap*.txt,*/sitemap*.xml.gz
exclude_extensions File extensions to exclude .zip,.exe,.dmg,.pkg,.deb,.rpm,.tar,.gz,.7z,.rar,.iso
sitemap_filename Main sitemap filename sitemap.xml
validate_sitemaps Validate existing sitemaps -
strict_validation Fail on validation issues true

lastmod Strategy Options

  • git - Use git commit timestamp (requires fetch-depth: 0)
  • filemtime - Use file modification time
  • current - Use build/generation time
  • none - Omit lastmod tag

πŸ“€ Outputs

Output Description
sitemap_path Path to main sitemap.xml
sitemap_index_path Path to sitemap index (if split)
sitemap_txt_path Path to TXT sitemap (if enabled)

πŸ” Validation

The action automatically validates:

  • Sitemap size limits (50MB uncompressed per sitemaps.org)
  • URL count limits (50,000 URLs per file)
  • XML format validity
  • URL format compliance

Set strict_validation: false to allow warnings without failing the workflow.

Validating Existing Sitemaps

You can use this action to validate existing sitemaps without generating new ones. This is useful for:

  • Validating sitemaps from external sources
  • Pre-deployment validation checks
  • CI/CD quality gates
- name: Validate existing sitemaps
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    validate_sitemaps: 'dist/sitemap.xml,dist/sitemap-index.xml'
    strict_validation: 'true'

You can validate multiple sitemaps by providing comma-separated paths. The validator checks:

  • XML Sitemaps: Structure, namespace, URL count, URL format, priorities, and change frequencies
  • TXT Sitemaps: URL format, line endings, encoding
  • Sitemap Indexes: Structure, sitemap entries, and referenced sitemap URLs
  • Size Compliance: Uncompressed file size limits
  • Format Compliance: sitemaps.org protocol adherence

πŸ“Š Large Sites

For sites with more than 50,000 URLs, the action automatically:

  1. Splits URLs into multiple sitemap files
  2. Creates a sitemap index file
  3. Ensures each file meets protocol limits

πŸ› Debugging

Enable debug outputs to troubleshoot:

- name: Generate sitemap with debugging
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    debug_list_files: 'true'
    debug_list_urls: 'true'
    debug_show_sitemap: 'true'

Available debug flags:

  • debug_list_files - Show all discovered files
  • debug_list_canonical - Show parsed canonical URLs
  • debug_list_urls - Show all sitemap URLs
  • debug_show_sitemap - Display XML content
  • debug_show_sitemap_txt - Display TXT content
  • debug_show_exclusions - Show excluded files/URLs

❓ Troubleshooting

Issue: "No files found"

Cause: Build step may have failed or public_dir is incorrect.

Solution:

  • Verify build completes successfully
  • Check public_dir matches your build output location
  • Enable debug_list_files: 'true' to see what's being scanned
  • Verify files exist: ls -la dist/

Issue: "Empty sitemap generated"

Cause: Include patterns don't match files, or all files are excluded.

Solution:

  • Check include_patterns - default is **/*.html,**/*.htm
  • Verify files match the pattern
  • Check exclude_patterns and exclude_urls for overlaps
  • Use debug_show_exclusions: 'true' to see what's excluded

Issue: "Lastmod dates are all current date"

Cause: Git history not available or wrong strategy selected.

Solution:

  • For lastmod_strategy: 'git', ensure fetch-depth: 0 in checkout:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
  • Switch to lastmod_strategy: 'filemtime' if git not available
  • Use lastmod_strategy: 'none' to omit lastmod tag

Issue: "Validation fails - XML is invalid"

Cause: Generated XML doesn't match sitemaps.org protocol.

Solution:

  • Check for invalid characters in URLs
  • Ensure priority is between 0.0 and 1.0
  • Validate changefreq values
  • Use debug_show_sitemap: 'true' to inspect output
  • Set strict_validation: 'false' temporarily to see warnings

Issue: "Site URL is wrong in sitemap"

Cause: parse_canonical or auto-detection is overriding site_url.

Solution:

  • Set parse_canonical: 'false' to disable canonical parsing
  • Ensure site_url input is provided explicitly
  • Check if HTML files contain incorrect canonical tags

Issue: "Workflow fails - permission denied"

Cause: Missing permissions or git configuration.

Solution:

  • Ensure proper git configuration:
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
  • Check GitHub token permissions if using custom tokens
  • Verify branch protection rules allow commits

❓ FAQ

How often should I generate sitemaps?

Answer: Run on every build or deployment. The example workflow above triggers on push to main and allows manual trigger via workflow_dispatch.

Can I use this with dynamic sites?

Answer: Yes, build your site first (which pre-renders dynamic pages), then run the action. Works with SSG frameworks that pre-render to static files.

Does this support non-HTML files?

Answer: By default, it indexes HTML/HTM files. Use include_patterns to add other types:

include_patterns: '**/*.html,**/*.htm,**/*.pdf,**/*.json'

Can I exclude certain URLs?

Answer: Yes, use either:

  • exclude_urls: URL patterns (e.g., */admin/*,*/test/*)
  • exclude_patterns: File patterns (e.g., **/*.draft.html)

What's the maximum sitemap size?

Answer: Per sitemaps.org protocol:

  • 50MB uncompressed per file
  • 50,000 URLs per file
  • Action auto-splits large sitemaps into index + multiple sitemaps

Does this detect dynamically added content?

Answer: It discovers links from HTML <a href> tags if discover_links: 'true' (default). For API endpoints or content not in HTML, use additional_urls.

Can I validate sitemaps without generating new ones?

Answer: Yes, use the validate_sitemaps input:

- uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist'
    validate_sitemaps: 'dist/sitemap.xml'

How do I handle multisite/multi-domain?

Answer: Run the action multiple times with different site_url and public_dir:

- name: Generate sitemap for site 1
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.com'
    public_dir: 'dist/site1'
    sitemap_output_dir: 'dist/site1'

- name: Generate sitemap for site 2
  uses: blackoutsecure/bos-sitemap-generator@v1
  with:
    site_url: 'https://example.org'
    public_dir: 'dist/site2'
    sitemap_output_dir: 'dist/site2'

Does this support robots.txt Disallow rules?

Answer: Not automatically. Use exclude_urls or exclude_patterns to manually exclude paths that should be disallowed.

How do I submit the sitemap to search engines?

Answer: Once deployed:

  1. Google: Use Google Search Console
  2. Bing: Use Bing Webmaster Tools
  3. Others: Most support sitemap.xml at the root or via robots.txt
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap.xml.gz

🀝 Contributing

General contribution guidelines (issue triage, PR style, test expectations, security review) come from the organisation default at blackoutsecure/.github/CONTRIBUTING.md, which applies to every repo in the org. The repo-specific bits are below.

All PRs target the dev branch. The main branch is built by the Marketplace release pipeline (the launchpad reusable in bos-automation-hub) and is read-only to humans β€” PRs opened against main will be closed.

Local development

# Install dev deps (Node 20+)
npm ci

# Build the action bundle (mocha pretest also runs this)
npm run build

# Run the test suite (this is what CI runs)
npm test

# Lint + format + test in one shot
npm run check

# Coverage report (HTML + text)
npm run coverage

Style

  • JavaScript: ESLint flat config (eslint.config.js) + Prettier (.prettierrc.yaml) β€” both are managed; CI runs npm run check.
  • Bundle: dist/index.js is committed (ncc bundle) β€” Marketplace consumers fetch the tag, not npm install, so the bundle MUST be in sync with src/ on every release. CI checks for drift.
  • Action contract: action.yml inputs: / outputs: are the published contract; changes are SemVer-significant.
  • YAML (workflows): actionlint clean, pin third-party actions by SHA (not tag), minimise permissions: per job.

Release flow

Releases promote dev β†’ main via the launchpad's workflow_dispatch mode = release. See the Marketplace launchpad reusable for the full event-routing + allowlist model.

πŸ“„ License

Copyright Β© 2025-2026 Blackout Secure

Licensed under the Apache License, Version 2.0. See LICENSE for details.

πŸ’¬ Support

πŸ”— Resources


Made with ❀️ by Blackout Secure

About

Automated XML/TXT/GZIP sitemaps for static sites and SSG frameworks

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

  •  

Contributors