Skip to content

fix: remove staging-fork special case from DOCS_BASE_URL; improve README#67

Merged
planetf1 merged 3 commits into
ibm-granite:mainfrom
planetf1:fix/remove-fork-staging-special-case
Jun 10, 2026
Merged

fix: remove staging-fork special case from DOCS_BASE_URL; improve README#67
planetf1 merged 3 commits into
ibm-granite:mainfrom
planetf1:fix/remove-fork-staging-special-case

Conversation

@planetf1

@planetf1 planetf1 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR fixes several SEO issues found during a site audit of wwwstage.ibm.com/granite/docs (the Docusaurus staging site). These need to be resolved before the production cutover from Mintlify — www.ibm.com/granite/docs currently still serves the old Mintlify site.

Fix: canonical URLs, og:image, and sitemap pointing to wrong domain

DOCS_SITE_URL was always https://ibm-granite.github.io, so every built page emits:

  • <link rel="canonical"> and og:url pointing to ibm-granite.github.io instead of www.ibm.com — Google would treat the GitHub Pages URL as canonical, not the IBM.com URL
  • og:image resolves to a 404 on ibm-granite.github.io (that domain doesn't serve the docs at that path) — social previews would be broken on every page
  • sitemap.xml lists all URLs under ibm-granite.github.io

Fixed by setting DOCS_SITE_URL to https://www.ibm.com for the upstream build. Forks keep https://{owner}.github.io, which combined with the fork's DOCS_BASE_URL gives the correct GitHub Pages preview URL.

Fix: missing robots.txt

GET /granite/docs/robots.txt was falling through to a redirect to the home page. Adds a static robots.txt that allows crawling and points to the canonical sitemap URL on www.ibm.com.

Fix: /search page in sitemap

The Docusaurus search page has no static content and should not appear in the sitemap. Adds ignorePatterns: ['/search'] to the sitemap plugin config.

Cleanup: remove stale staging-fork hardcode

The planetf1/granite-docs special case in DOCS_BASE_URL was added temporarily in #63 to validate the Akamai staging path, which is now confirmed working. Removes it and restores clean two-tier logic: upstream gets /granite/docs/; all forks get /{repo-name}/granite/docs/.

Docs: README improvements

Adds a repository layout reference section and tightens up the contributing guide (optional frontmatter fields, <CodeGroup> component).

Verification

On wwwstage.ibm.com after CI runs:

  • <link rel="canonical"> on any page points to www.ibm.com/granite/docs/...
  • og:image URL is www.ibm.com/granite/docs/images/hero-light.png
  • GET /granite/docs/robots.txt returns the robots file (not a redirect)
  • GET /granite/docs/sitemap.xml — all URLs use www.ibm.com, no /search entry

After production cutover:

  • Same checks pass on www.ibm.com

🤖 Generated with Claude Code

planetf1 and others added 3 commits June 10, 2026 13:09
The planetf1/granite-docs hardcode in the DOCS_BASE_URL condition was
added temporarily to validate the Akamai staging path (ibm-granite#63).
That staging target is confirmed working, so the special case is no longer
needed and actively breaks GitHub Pages previews on that fork — assets 404
because the base URL doesn't match the /repo-name/ path GitHub Pages serves
forks at.

Restores two-tier behaviour:
- ibm-granite/docs → /granite/docs/ (IBM.com public path, proxied by Akamai)
- any fork       → /{repo-name}/granite/docs/ (GitHub Pages preview path)

Also adds a repository layout reference to README and tightens up the
contributing guide with optional frontmatter fields and the CodeGroup
component.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DOCS_SITE_URL was always github.io which caused canonical URLs, og:url,
og:image, and sitemap entries to point to ibm-granite.github.io instead
of www.ibm.com. This breaks SEO — search engines would index the GitHub
Pages URL as canonical rather than the IBM.com URL.

Upstream now gets https://www.ibm.com; forks keep their github.io URL
which is correct for GitHub Pages preview links.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
robots.txt was missing — requests to /granite/docs/robots.txt were
redirecting to the home page. Adds a static robots.txt that allows
crawling and points to the canonical sitemap URL.

The /search page is a Docusaurus UI page with no static content; it
should not appear in the sitemap. Adds ignorePatterns to the sitemap
plugin config to exclude it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@planetf1 planetf1 requested a review from serjikibm June 10, 2026 12:40

@serjikibm serjikibm left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR, also got AI review on it, no additional comments. All looks good. Minor nit (nothing needed changing now), robots.txt will use www.ibm.com even in staging from forks, etc. something to consider when we have the staging strategy.

@planetf1

Copy link
Copy Markdown
Collaborator Author

Thanks @serjikibm. Opened #71 to address

@planetf1 planetf1 merged commit 3d5544b into ibm-granite:main Jun 10, 2026
3 checks passed
@planetf1 planetf1 deleted the fix/remove-fork-staging-special-case branch June 10, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants