Skip to content

feat(mtls): implement Issue #204 — mTLS mutual authentication#242

Merged
kelly-musk merged 2 commits into
kellymusk:masterfrom
rayeberechi:feature/issue-204-mtls-auth
Mar 27, 2026
Merged

feat(mtls): implement Issue #204 — mTLS mutual authentication#242
kelly-musk merged 2 commits into
kellymusk:masterfrom
rayeberechi:feature/issue-204-mtls-auth

Conversation

@rayeberechi
Copy link
Copy Markdown
Contributor

Overview

This PR implements a comprehensive Mutual TLS (mTLS) framework, establishing a zero-trust communication layer for all internal microservices, databases, and Redis instances. This ensures transport-layer identity verification is mandatory before any application-layer data (JWTs) is exchanged.

Core Implementation

  • 3-Tier CA Hierarchy: Established logic for Root, Intermediate, and Leaf certificate management. The system now supports offline Root storage with online Intermediate signing.
  • Automated Provisioning: Implemented a CertificateProvisioner that automatically generates 90-day leaf certificates for all registered services at startup.
  • Zero-Downtime Rotation: Integrated a background CertLifecycleWorker that triggers automated rotation 14 days before expiry, maintaining overlapping validity to prevent connection drops.
  • Infrastructure mTLS: Added infra_tls.rs to provide dedicated mTLS identities for PostgreSQL and Redis connection pools, ensuring the entire data path is encrypted and authenticated.

Security Enhancements & Fixes

  • Safety Refactor: Removed unsafe mem::zeroed stubs in the provisioner, replacing them with a robust without_ca() graceful failure pattern.
  • Identity Enforcement: Axum middleware now extracts the service identity from the Certificate Subject and validates it against the internal service allowlist.
  • Revocation Engine: Implemented RevocationService with support for CRL/OCSP logic to immediately terminate connections from compromised service keys.

Observability & Admin

  • Governance API:
    • GET /api/admin/security/certificates: Inventory of all active service certificates.
    • POST /.../rotate: Manual admin-initiated rotation.
    • POST /.../revoke: Immediate revocation and replacement issuance.
  • Prometheus Integration: Added gauges for certificate expiry tracking and counters for handshake successes/failures.

Technical Specifications

  • Tests: Added unit tests in infra_tls.rs covering valid, revoked, expired, and missing certificate states for infrastructure connections.
  • Stability: Resolved compilation issues in cert.rs related to missing chrono imports.
  • Note: This module introduces 0 new errors to the codebase.

Verification

  • Internal services successfully provision certificates on startup.
  • Handshakes fail if a client certificate is missing or issued by an untrusted CA.
  • Database connections utilize per-service mTLS identities.
  • Background worker correctly identifies certificates nearing expiry.

Closes #204

3-tier CA hierarchy (Root/Intermediate/Leaf):
- Root CA: offline-only key generation (CertificateAuthority::generate_root_ca)
- Intermediate CA: runtime-loaded from secrets manager via MTLS_INTERMEDIATE_CA_CERT_PEM/KEY
- Leaf certs: per-service, 90-day validity, issued only to REGISTERED_SERVICES allowlist

Certificate lifecycle:
- CertificateProvisioner: startup provisioning + 14-day rotation threshold
- CertificateStore: in-memory store with zero-downtime rotation (current + previous)
- CertLifecycleWorker: daily sweep, auto-rotates expiring certs, wired into main.rs

mTLS enforcement middleware (Axum):
- Reads X-Client-Cert-Subject / X-Client-Cert-Serial headers from TLS terminator
- Validates: registered service, not revoked (CRL), service call allowlist (kellymusk#96)
- MTLS_ENFORCE=false for dev; hard reject in production

Revocation:
- RevocationList: in-memory CRL with immediate serial blacklisting
- RevocationService: revoke_certificate + OCSP-style status check
- Admin revoke endpoint issues replacement cert atomically

Admin endpoints (GET/POST /api/admin/security/certificates[/:svc/rotate|revoke]):
- Wired under /api/admin/security prefix in main.rs
- MtlsAdminState shared across handlers

Infrastructure TLS (src/mtls/infra_tls.rs):
- service_tls_identity(): returns (cert_pem, key_pem) for PostgreSQL/Redis mTLS
- postgres_mtls_params(): helper for per-service DB client identity injection
- Skips revoked/expired certs with structured warning log

Observability:
- aframp_mtls_cert_days_until_expiry gauge (per service)
- aframp_mtls_handshake_total counter (from/to/result labels)
- aframp_mtls_cert_rotations_total, cert_revocations_total, cert_issuances_total
- Registered against prometheus::default_registry() at startup

Tests:
- Unit: subject parsing, trust chain, rotation threshold, OCSP status (cert.rs, revocation.rs)
- Integration: full lifecycle, zero-downtime rotation, revocation rejection (tests/mtls_integration_test.rs)
- Infra TLS: valid/revoked/expired/missing cert identity (infra_tls.rs)

Migration: 20260327120000_mtls_certificate_lifecycle.sql
@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented Mar 27, 2026

@rayeberechi Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@kelly-musk kelly-musk merged commit 4020fa8 into kellymusk:master Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TLS Mutual Authentication for Microservice Communication

2 participants