Merge develop into main#33
Conversation
* chore: add 'prom-client' dep * feat: add basic e2e integration of grafana and prometheus * feat: integrate app-specific metrics * refactor: make 'createOrder' more readable * refactor: ensure unique items only for 'CreateOrderDto' * ci: extend post-deploy retry period * feat(grafana): add rabbitmq-overview dashboard * ci: seed dbs for stage and prod * refactor: unify docker compose setup * refactor: persist prometheus data * refactor: setup Grafana UI ports * refactor: persist grafana service data * refactor: secure Grafana UI with auth and viewer user only --------- Co-authored-by: Bohdan Chernykh <dev@bchernykh.anonaddy.com>
Sync/main into dev
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis pull request introduces comprehensive observability for the e-commerce platform by adding Prometheus metrics collection and Grafana dashboards. The implementation includes a new MetricsService module that instruments HTTP requests, order lifecycle events, and RabbitMQ message operations, with corresponding integrations into OrdersService, OrdersWorkerService, and RabbitMqService. Docker Compose configuration is refactored to use parameterized images and profiles, with new Prometheus and Grafana services added. The production-specific compose.prod.yml is deprecated in favor of compose.yml. CI/CD workflows and documentation are updated accordingly. ChangesComprehensive Observability and Monitoring Stack
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/orders/orders.service.ts (1)
98-104:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid recording
successbefore the publish succeeds.If
this.rabbitmq.send(...)throws here, the outercatchalso records"error", so one failedcreateOrder()call is counted twice with conflicting outcomes. Move the success increment after the publish so the metric matches the actual result of this request.Suggested change
private handleNewOrder(order: Order, userId: UUID, idempotencyKey: UUID) { - this.metricsService.incrementOrderCreated("success"); this.logger.log("Order created", { userId, idempotencyKey, order }); this.rabbitmq.send( "orders.process", new ProcessOrderMessageDto(order.id, order.id), ); + this.metricsService.incrementOrderCreated("success"); return order; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/orders/orders.service.ts` around lines 98 - 104, The metrics increment is recorded before publishing, causing double/conflicting counts if rabbitmq.send throws; change handleNewOrder so the publish succeeds before recording success: make handleNewOrder async (if not already), await this.rabbitmq.send("orders.process", new ProcessOrderMessageDto(order.id, order.id)), then call this.metricsService.incrementOrderCreated("success") after the awaited send; keep the logger.log as appropriate and do not increment on failures (let existing outer catch record "error").
🧹 Nitpick comments (4)
prometheus/prometheus.yml (1)
1-2: Consider raisingscrape_intervalto the Prometheus default (15s).A global 5-second interval is 3× more frequent than the Prometheus default and applies to all three jobs simultaneously. This increases time-series cardinality, write amplification on TSDB, and steady-state CPU usage in Prometheus — with no dashboard or alerting benefit unless you specifically need sub-15s resolution. You can always lower it per-job for high-frequency counters if needed.
♻️ Suggested change
global: - scrape_interval: 5s + scrape_interval: 15sOr, keep 5s for the API but use a coarser interval for the RabbitMQ jobs:
global: - scrape_interval: 5s + scrape_interval: 15s scrape_configs: - job_name: 'api' + scrape_interval: 5s metrics_path: /api/metrics🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@prometheus/prometheus.yml` around lines 1 - 2, The global scrape_interval is set to 5s which is more frequent than Prometheus default; change the global.scrape_interval value from 5s to 15s to reduce TSDB load, and if you need higher resolution for specific targets, add per-job scrape_interval overrides (e.g., set the API job's scrape_interval to 5s and keep RabbitMQ jobs at 15s or higher) by editing the job definitions in prometheus.yml.grafana/provisioning/datasources/datasource.yml (1)
1-7: ⚡ Quick winConsider pinning a
uidfor stable dashboard-datasource wiring.Without a
uid, Grafana auto-generates one on first start. The Grafana provisioning docs showuidas a field you can set to reference a datasource in other parts of the configuration. If any provisioned dashboard JSON files reference this Prometheus datasource by UID (e.g. using${DS_PROMETHEUS}or a hardcoded UID), the link will break on a fresh deployment unless the same UID is seeded.♻️ Proposed addition
apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 + uid: prometheus isDefault: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grafana/provisioning/datasources/datasource.yml` around lines 1 - 7, Add a stable uid to the Prometheus datasource so dashboards/reference by UID won't break on fresh deploys: update the datasource block (the entry with name: Prometheus and type: prometheus) to include a uid field with a deterministic value (e.g. "prometheus" or "prometheus-main") and ensure any dashboard placeholders or references (e.g. ${DS_PROMETHEUS} or hardcoded UIDs) point to that uid..env.example (1)
26-29: 💤 Low valueKey ordering violates
dotenv-linterrules.
dotenv-linterexpects alphabetical ordering within a group. The correct order is:GRAFANA_ADMIN_PASSWORD,GRAFANA_ADMIN_USER,GRAFANA_PORT,PROMETHEUS_PORT.♻️ Proposed fix
-PROMETHEUS_PORT=9091 -GRAFANA_PORT=8000 -GRAFANA_ADMIN_USER=admin -GRAFANA_ADMIN_PASSWORD=admin +GRAFANA_ADMIN_PASSWORD=changeme +GRAFANA_ADMIN_USER=admin +GRAFANA_PORT=8000 +PROMETHEUS_PORT=9091🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.env.example around lines 26 - 29, The dotenv keys in .env.example are out of alphabetical order; reorder the four variables so they follow dotenv-linter's expected alphabetical grouping: GRAFANA_ADMIN_PASSWORD, GRAFANA_ADMIN_USER, GRAFANA_PORT, PROMETHEUS_PORT — update the block containing PROMETHEUS_PORT, GRAFANA_PORT, GRAFANA_ADMIN_USER, GRAFANA_ADMIN_PASSWORD to that alphabetical sequence, preserving values but changing only the key order.grafana/dashboards/metrics-baseline.json (1)
1-9: ⚡ Quick winAdd a stable
uidto prevent duplicate dashboards on re-provisioning.Grafana warns: "Be careful not to reuse the same title multiple times within a folder or
uidwithin the same Grafana instance to avoid inconsistent behavior." Without auid, Grafana assigns one during provisioning; if you want to reference the dashboard from other dashboards or ensure stable URLs, set it yourself. Each re-provisioning without auidcan create a new dashboard entry rather than updating the existing one.✏️ Suggested addition
{ "title": "E-commerce Metrics", + "uid": "ecomm-metrics-baseline", "schemaVersion": 39,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grafana/dashboards/metrics-baseline.json` around lines 1 - 9, The dashboard JSON lacks a stable "uid" which causes Grafana to create duplicates on re-provisioning; add a top-level "uid" property (e.g., a short stable identifier like "ecommerce-metrics" or a generated UUID) alongside "title": "E-commerce Metrics" and "schemaVersion": 39 so the dashboard is updated in-place rather than recreated; ensure the uid is permanent and unique within the Grafana instance so other dashboards or links can reliably reference this dashboard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.env.example:
- Line 29: Replace the literal default value "admin" for GRAFANA_ADMIN_PASSWORD
in .env.example with a neutral placeholder (e.g.,
GRAFANA_ADMIN_PASSWORD=YOUR_SECURE_PASSWORD_HERE) to avoid encouraging
copy-paste of insecure credentials; also ensure the accompanying
GRAFANA_ADMIN_USER comment or value suggests a non-default admin username or
placeholder to discourage using "admin" in staging/production.
In `@compose.yml`:
- Line 9: Replace optional ${VAR} expansions with the required-variable form so
Compose fails fast when mandatory env vars are missing: update the port mapping
"127.0.0.1:${API_PORT}:3000" and every other occurrence called out (lines
referencing API_PORT, DB_*, RABBITMQ_*, GRAFANA_ADMIN_USER,
GRAFANA_ADMIN_PASSWORD) to use the required-syntax (e.g., change ${API_PORT} to
${API_PORT?API_PORT is required} or similar required-variable form supported by
Compose) so container startup will error immediately if those env vars are
unset.
In `@docs/task-9-ci-cd.md`:
- Line 26: Update the two documented deploy commands that currently read "docker
compose -f compose.yml -p stage up -d" and "docker compose -f compose.yml -p
prod up -d" to match the CI/CD workflows by adding the missing profile flag so
they become "docker compose -f compose.yml --profile seed -p stage up -d" and
"docker compose -f compose.yml --profile seed -p prod up -d".
In `@grafana/provisioning/dashboards/dashboard.yml`:
- Around line 12-19: The grafana dashboard provider "rabbitmq-overview" is
configured to use the same options.path as "metrics-baseline", causing duplicate
loading and UID conflicts; fix by either removing the "rabbitmq-overview"
provider entry or changing its options.path to a separate subdirectory (e.g.,
set the "rabbitmq-overview" provider's options.path to
/etc/grafana/dashboards/rabbitmq-overview and ensure the dashboard JSON is moved
there) so each provider points to a unique directory and no JSON is processed
twice.
In `@src/metrics/metrics.controller.ts`:
- Around line 9-14: The /metrics endpoint currently uses `@Public`() on the
metrics() method (in metrics.controller.ts) which exposes sensitive runtime and
business metrics; replace `@Public`() with an appropriate protection: remove the
`@Public`() decorator and annotate the metrics() method with a guard that enforces
a shared-secret header or network-origin check (e.g., add
`@UseGuards`(SharedSecretGuard) or `@UseGuards`(NetworkOriginGuard) and implement
that guard to validate a configured header or IP/CIDR), or alternatively move
metrics() into a separate server/port only reachable by Prometheus; ensure
metricsService.getMetricsText() remains unchanged but is only invocable when the
guard passes.
In `@src/metrics/metrics.interceptor.ts`:
- Around line 33-35: The route label currently uses request.route?.path ||
request.url || "unknown" which lets raw request.url create unbounded Prometheus
cardinality; in the intercept method of MetricsInterceptor (where the local
variable route is computed) remove the request.url fallback and default to a
fixed token like "unknown" (e.g., use request.route?.path ?? "unknown") so only
the parameterized path or the constant "unknown" is recorded as the route label.
In `@src/rabbit-mq/rabbit-mq.service.ts`:
- Around line 84-98: The metrics currently treat sendToQueue() returning false
(backpressure) as an "error"; update the publish logic in sendToQueue() handling
so that when this.getChannel().sendToQueue(...) returns false you call
this.metricsService.incrementRabbitMqMessagePublished(queue, "buffer_full")
instead of "error" while still returning the result, and keep the catch block to
record real exceptions as "error"; also update the MessageResult union (or
create a new PublishResult) in metrics.service.ts to include "buffer_full" so
the metric type is valid (references: sendToQueue(),
this.metricsService.incrementRabbitMqMessagePublished, MessageResult in
metrics.service.ts).
---
Outside diff comments:
In `@src/orders/orders.service.ts`:
- Around line 98-104: The metrics increment is recorded before publishing,
causing double/conflicting counts if rabbitmq.send throws; change handleNewOrder
so the publish succeeds before recording success: make handleNewOrder async (if
not already), await this.rabbitmq.send("orders.process", new
ProcessOrderMessageDto(order.id, order.id)), then call
this.metricsService.incrementOrderCreated("success") after the awaited send;
keep the logger.log as appropriate and do not increment on failures (let
existing outer catch record "error").
---
Nitpick comments:
In @.env.example:
- Around line 26-29: The dotenv keys in .env.example are out of alphabetical
order; reorder the four variables so they follow dotenv-linter's expected
alphabetical grouping: GRAFANA_ADMIN_PASSWORD, GRAFANA_ADMIN_USER, GRAFANA_PORT,
PROMETHEUS_PORT — update the block containing PROMETHEUS_PORT, GRAFANA_PORT,
GRAFANA_ADMIN_USER, GRAFANA_ADMIN_PASSWORD to that alphabetical sequence,
preserving values but changing only the key order.
In `@grafana/dashboards/metrics-baseline.json`:
- Around line 1-9: The dashboard JSON lacks a stable "uid" which causes Grafana
to create duplicates on re-provisioning; add a top-level "uid" property (e.g., a
short stable identifier like "ecommerce-metrics" or a generated UUID) alongside
"title": "E-commerce Metrics" and "schemaVersion": 39 so the dashboard is
updated in-place rather than recreated; ensure the uid is permanent and unique
within the Grafana instance so other dashboards or links can reliably reference
this dashboard.
In `@grafana/provisioning/datasources/datasource.yml`:
- Around line 1-7: Add a stable uid to the Prometheus datasource so
dashboards/reference by UID won't break on fresh deploys: update the datasource
block (the entry with name: Prometheus and type: prometheus) to include a uid
field with a deterministic value (e.g. "prometheus" or "prometheus-main") and
ensure any dashboard placeholders or references (e.g. ${DS_PROMETHEUS} or
hardcoded UIDs) point to that uid.
In `@prometheus/prometheus.yml`:
- Around line 1-2: The global scrape_interval is set to 5s which is more
frequent than Prometheus default; change the global.scrape_interval value from
5s to 15s to reduce TSDB load, and if you need higher resolution for specific
targets, add per-job scrape_interval overrides (e.g., set the API job's
scrape_interval to 5s and keep RabbitMQ jobs at 15s or higher) by editing the
job definitions in prometheus.yml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4fd099d0-ab7a-4a14-9946-0ff1c3ef581f
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (30)
.dockerignore.env.example.github/workflows/build-and-stage.yml.github/workflows/deploy-prod.ymlREADME.mdcompose.dev.ymlcompose.prod.ymlcompose.ymldocs/task-9-ci-cd.mdgrafana/dashboards/metrics-baseline.jsongrafana/dashboards/rabbitmq-overview.jsongrafana/provisioning/dashboards/dashboard.ymlgrafana/provisioning/datasources/datasource.ymlpackage.jsonprometheus/prometheus.ymlrabbitmq/enabled_pluginssrc/app.module.tssrc/metrics/metrics.controller.tssrc/metrics/metrics.interceptor.tssrc/metrics/metrics.module.tssrc/metrics/metrics.service.tssrc/orders/create-order.dto.tssrc/orders/orders-worker.service.spec.tssrc/orders/orders-worker.service.tssrc/orders/orders.module.tssrc/orders/orders.service.spec.tssrc/orders/orders.service.tssrc/rabbit-mq/rabbit-mq.module.tssrc/rabbit-mq/rabbit-mq.service.spec.tssrc/rabbit-mq/rabbit-mq.service.ts
💤 Files with no reviewable changes (1)
- compose.prod.yml
Summary by CodeRabbit
New Features
Documentation
Chores