Skip to content

feat(observability): add distributed tracing, metrics, log aggregatio…#1

Open
Hexor-Hash wants to merge 57 commits into
teslims2:mainfrom
Hexor-Hash:feat/observability-518
Open

feat(observability): add distributed tracing, metrics, log aggregatio…#1
Hexor-Hash wants to merge 57 commits into
teslims2:mainfrom
Hexor-Hash:feat/observability-518

Conversation

@Hexor-Hash
Copy link
Copy Markdown

@Hexor-Hash Hexor-Hash commented May 29, 2026

  • Add OpenTelemetry SDK with auto-instrumentation for HTTP, PostgreSQL, Redis
  • Add TracingService with withSpan() helper for custom spans
  • Enhance MetricsService with http_request_duration_seconds, enrollments_total, course_completions_total, auth_attempts_total, active_connections
  • Add Loki transport to Winston logger for log aggregation (opt-in via LOKI_URL)
  • Add Prometheus alerting rules: HighErrorRate, SlowHttpResponses, SlowStellarRpc, HighMemoryUsage, HighEventLoopLag, ServiceDown, HighAuthFailureRate
  • Add Alertmanager configuration with critical/warning routing
  • Add Brain-Storm Overview Grafana dashboard with 11 panels
  • Add Loki datasource to Grafana provisioning
  • Add Loki and Alertmanager services to docker-compose.monitoring.yml
  • Add comprehensive observability documentation (docs/observability.md)
  • Add OTel, winston-loki, nest-winston packages to backend package.json
  • Add OTEL_EXPORTER_OTLP_ENDPOINT and LOKI_URL to .env.example

Closed BrainTease#518
Closed BrainTease#514

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Dependency update
  • CI/CD improvement

Related Issues

Closes #5

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • E2E tests added/updated (if applicable)
  • Manual testing performed

Documentation

  • README updated (if applicable)
  • API documentation updated (if applicable)
  • Code comments added for complex logic
  • Migration guide added (if breaking changes)

Breaking Changes

  • No breaking changes
  • Breaking changes documented below:

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests passed locally with my changes
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Context

Praizfotos and others added 30 commits March 29, 2026 11:33
…ease#44 — credentials page, Navbar, Footer, component library

- BrainTease#41: Add app/[locale]/credentials/page.tsx (protected, fetches GET /credentials/:userId,
  displays credential cards with course name, issued date, tx hash, Verify on Stellar link,
  and shareable URL copy action)
- BrainTease#42: Add components/layout/Navbar.tsx with Brain-Storm logo, Courses link, auth-gated
  Dashboard link, user avatar dropdown (Profile, Credentials, Logout), active route
  highlighting via usePathname, and responsive hamburger menu for mobile
- BrainTease#43: Add components/layout/Footer.tsx with copyright, Docs/GitHub/Stellar Status links
  and accessible footer nav; wire both Navbar and Footer into [locale]/layout.tsx
- BrainTease#44: Add Select (styled dropdown with label/error/helperText) and Spinner (animated,
  accessible) components; extend Input with helperText support; fix Button, Badge,
  Modal, ProgressBar, Toaster, Card to be fully accessible (ARIA, keyboard nav)

Fixes:
- Remove invalid k6 devDependency from root package.json (not an npm package)
- Bump @pact-foundation/pact to ^16.3.0 (v4 does not exist on npm)
- Bump @storybook/* to ^8.6.18 and add eslint-config-next@14.2.0
- Exclude e2e/ and *.pact.test.ts from vitest default run to prevent Playwright
  test runner conflicts
- Fix corrupted duplicate-content files: layout.tsx, Navbar.tsx, LanguageSwitcher.tsx,
  ThemeToggle.tsx, [locale]/courses/page.tsx, [locale]/page.tsx, profile pages
- Fix Button.test.tsx class assertions to match actual Tailwind class names (bg-blue-700,
  border-2) after Button component was updated
- Run eslint --fix across all src files to resolve 111 prettier formatting errors
…inTease#152)

- Fix deploy-approval job missing deployment-id output
- Fix deploy job referencing undefined deployment-id from wrong job
- Fix rollback job referencing context.issue.number on release event
- Use correct Docker build context (repo root) for monorepo layout
- Simplify health check loop using seq
- Add await to all github-script REST calls
- Rewrite docs/development-setup.md with full prerequisites, env var
  table, DB/Redis/Stellar testnet setup, contract deployment steps,
  Makefile shortcuts, and a detailed troubleshooting section
- Enhance Makefile with docker-up, docker-down, and export-openapi targets
- Rewrite scripts/setup.sh with prerequisite checks, Node version guard,
  Docker readiness wait, and clear next-steps output

Closes BrainTease#163
…oyments (BrainTease#159)

- Add id-token: write permission to deploy and rollback jobs in deploy-production.yml
- Replace AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY with role-to-assume in terraform.yml
- Rename secret reference AWS_ROLE_TO_ASSUME -> AWS_ROLE_ARN across all workflows
- Add role-session-name to all configure-aws-credentials steps for audit trail
- Add infra/terraform/modules/oidc with IAM OIDC provider + least-privilege role
- Wire oidc module into main.tf; expose github_actions_role_arn output
- Add github_org/github_repo variables to variables.tf and tfvars.example
- Update infra/terraform/README.md with OIDC setup section
- Add .github/workflows/deploy-api-docs.yml that builds the backend,
  exports openapi.json, and deploys Swagger UI to GitHub Pages on
  every push to main that touches backend src or docs/api
- Remove '(to be deployed)' placeholder from README hosted docs link
- Update README CI/CD section to list all active workflows

Closes BrainTease#161
- Add .github/workflows/release.yml with release-please-action@v4
  that auto-creates release PRs and generates CHANGELOG.md on merge
- On release, build and push Docker image tagged with semver version,
  major.minor, and 'latest' to GHCR
- release-please-config.json already bumps apps/backend/package.json
  and apps/frontend/package.json via extra-files on each release
- commitlint + husky commit-msg hook already enforce conventional
  commits required by release-please

Closes BrainTease#160
…frontend (BrainTease#158)

- Install @sentry/nestjs in backend, @sentry/nextjs in frontend (already in package.json)
- Initialize Sentry in apps/backend/src/instrument.ts with DSN, release, profiling
- Wire SentryModule.forRoot() into AppModule so NestJS exceptions are captured
- Configure client/server/edge Sentry in apps/frontend/sentry.*.config.ts
- Expose SENTRY_DSN and NEXT_PUBLIC_SENTRY_DSN via next.config.js env block
- Tie release to GIT_COMMIT_SHA / NEXT_PUBLIC_GIT_COMMIT_SHA for release tracking
- Filter authorization headers and cookies in beforeSend to avoid PII leakage
- SENTRY_DSN and NEXT_PUBLIC_SENTRY_DSN documented in .env.example
…rd (BrainTease#157)

- Install @willsoto/nestjs-prometheus and prom-client (already in package.json)
- Register PrometheusModule at GET /metrics in MetricsModule
- Implement MetricsService with counters: http_requests_total, credential_issued_total,
  bst_minted_total and histogram: stellar_rpc_latency_seconds
- Add MetricsInterceptor to auto-record http_requests_total on every request
- Wire MetricsInterceptor globally in main.ts via app.useGlobalInterceptors
- Add docker-compose.monitoring.yml with Prometheus + Grafana services
- Provision Grafana datasource (Prometheus) and dashboard via provisioning YAMLs
- Expand nestjs-metrics.json dashboard: request rate, error rate, latency p50/p95/p99,
  memory, event loop lag, CPU panels
- Document setup, available metrics, and custom usage in docs/monitoring.md
…rainTease#166)

- Add StripHtmlSanitizer unit tests covering XSS vectors, HTML stripping, and edge cases
- Add sanitization e2e test asserting HTML is stripped from bio before saving
- Add passport-headerapikey, nest-winston, and winston to backend dependencies
- Add api_keys table migration with FK to users
- Register X-API-KEY security scheme in Swagger config
…plates, and SECURITY policy (BrainTease#165)

- Rewrite CONTRIBUTING.md with branch naming, Conventional Commits table, PR process, and review checklist
- Upgrade CODE_OF_CONDUCT.md to Contributor Covenant v2.1 with enforcement guidelines
- Expand SECURITY.md with responsible disclosure policy, scope, and in-place security measures
- Update bug_report.md issue template with environment fields and clearer structure
- Update feature_request.md issue template with acceptance criteria section
…nd TypeScript integration (BrainTease#164)

- Document all four contracts: Certificate, Token (BST), Analytics, Governance
- Add function signature tables with auth requirements for each contract
- Add event schema tables for all emitted events
- Add stellar-cli invocation examples for every function
- Add TypeScript examples: mintCertificate, recordProgress, getTokenBalance
- Document end-to-end credential issuance flow with ASCII diagram
…provements-152-155

ci: fix production deployment workflow with manual approval gate (BrainTease#152)
…-sentry-prometheus-terraform

Feature/cicd OIDC sentry prometheus terraform
…y-docs-improvements

Feature/security docs improvements
- Add POST /stellar/fund-testnet endpoint (testnet-only)
- Add fundTestnetAccount() to StellarService
- Add Fund Testnet Account button to profile WalletSection (both locale and non-locale)
- Add scripts/fund-testnet.sh to fund STELLAR_SECRET_KEY account
- Document testnet funding in docs/development-setup.md
- Add StellarIndexerService that polls Soroban RPC getEvents every 5s
- Handle analytics:completed -> issue credential + send notification
- Handle token:transfer -> bust BST balance cache in Redis
- Store last processed ledger in Redis to avoid reprocessing
- Add findByStellarPublicKey to UsersService
- Add INDEXER_POLL_INTERVAL_MS, ANALYTICS_CONTRACT_ID, TOKEN_CONTRACT_ID env vars
- Resolve circular deps between StellarModule, CredentialsModule, UsersModule
- Add GET /auth/stellar to issue SEP-0010 challenge transaction
- Add POST /auth/stellar to verify signed challenge and return JWT
- Use Utils.buildChallengeTx and Utils.readChallengeTx from stellar-sdk
- Auto-provision user account on first Stellar login
- Add STELLAR_WEB_AUTH_DOMAIN env var (default: localhost)
- Document SEP-0010 flow in docs/stellar-auth.md
- Add KycCustomer entity (stellarPublicKey, status, providerId)
- Add GET /kyc/status/:stellarPublicKey and PUT /kyc/customer endpoints
- Add POST /kyc/webhook for provider status update callbacks
- Integrate with Synaps KYC provider via KYC_PROVIDER_API_KEY
- Gate credential issuance behind kycStatus === approved when course.requiresKyc is true
- Add requiresKyc boolean field to Course entity and CreateCourseDto
- Add KYC_PROVIDER_API_KEY env var
Hexstar-labs and others added 27 commits March 30, 2026 05:47
…2-43-44-frontend-ui

feat: resolve issues BrainTease#41 BrainTease#42 BrainTease#43 BrainTease#44 — credentials page, Navbar, Foot…
feat: add testnet account funding via Friendbot
feat: add full Swagger decorators to all controllers and export opena…
Implement password recovery UI: forgot-password and reset-password pages
Implement global auth state: AuthContext with useReducer, ProtectedRo…
…n, alerting, and dashboards

- Add OpenTelemetry SDK with auto-instrumentation for HTTP, PostgreSQL, Redis
- Add TracingService with withSpan() helper for custom spans
- Enhance MetricsService with http_request_duration_seconds, enrollments_total,
  course_completions_total, auth_attempts_total, active_connections
- Add Loki transport to Winston logger for log aggregation (opt-in via LOKI_URL)
- Add Prometheus alerting rules: HighErrorRate, SlowHttpResponses, SlowStellarRpc,
  HighMemoryUsage, HighEventLoopLag, ServiceDown, HighAuthFailureRate
- Add Alertmanager configuration with critical/warning routing
- Add Brain-Storm Overview Grafana dashboard with 11 panels
- Add Loki datasource to Grafana provisioning
- Add Loki and Alertmanager services to docker-compose.monitoring.yml
- Add comprehensive observability documentation (docs/observability.md)
- Add OTel, winston-loki, nest-winston packages to backend package.json
- Add OTEL_EXPORTER_OTLP_ENDPOINT and LOKI_URL to .env.example

Closes BrainTease#518
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DevOps] Build observability platform [DevOps] Implement performance optimization Backend: Auth — add Google OAuth2 strategy