feat(opentelemetry): add OpenTelemetry and JMX metrics#9
Conversation
|
@OlegCIB we need to discuss:
|
… and update endpoint
…dundant login and user creation steps
… resolve ARM64 issues
…ions across scripts
…bseven-tomcat.sh and cibseven-wildfly.sh for clarity and consistency
…guration file path and remove unnecessary newline
* direct console output instead of slf4j * once wildfly's logging is reade, otel uses proper logging framework
* remove export at test-opentelemetry-wildfly script
… instructions; add custom JMX metrics config file
There was a problem hiding this comment.
Pull request overview
This pull request migrates from the Prometheus JMX Exporter to the OpenTelemetry Java Agent for metrics and observability. The change replaces the legacy Prometheus-specific implementation with a more modern, comprehensive observability solution that supports metrics, traces, and logs.
Changes:
- Replaced Prometheus JMX Exporter with OpenTelemetry Java Agent (version 2.23.0)
- Updated all test files to verify OpenTelemetry metrics endpoint instead of Prometheus
- Added JMX metrics configuration files for extended JVM metrics collection
- Updated documentation to explain OpenTelemetry configuration and usage
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| Dockerfile | Updated to install OpenTelemetry agent instead of Prometheus exporter, added JMX config files |
| download.sh | Replaced Prometheus JMX exporter download with OpenTelemetry agent download |
| cibseven-wildfly.sh | Replaced Prometheus agent configuration with OpenTelemetry agent and JBoss LogManager setup |
| cibseven-tomcat.sh | Replaced Prometheus agent configuration with OpenTelemetry agent via CATALINA_OPTS |
| cibseven-run.sh | Added OpenTelemetry agent configuration via JAVA_OPTS |
| test/*.sh | Removed old Prometheus test files and added new OpenTelemetry test files for all distributions |
| test/docker-compose.yml | Replaced camunda-prometheus-jmx service with camunda-opentelemetry and added OpenTelemetry collector |
| test/otel-collector-config.yml | Added OpenTelemetry collector configuration for test environment |
| opentelemetry/jmx_config.yaml | Added JMX metrics configuration with extended JVM metrics |
| opentelemetry/jmx_custom_config.yaml | Added placeholder for custom JMX metrics configuration |
| README.md | Replaced Prometheus section with comprehensive OpenTelemetry documentation |
| test/test-debug.sh | Updated error pattern to exclude SLF4J stderr messages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * `OTEL_SERVICE_NAME`: Service name for telemetry data (default: `cibseven`) | ||
| * `OTEL_METRICS_EXPORTER`: Configure metrics exporter (default: `none`, examples: `prometheus`, `otlp`) | ||
| * `OTEL_TRACES_EXPORTER`: Configure traces exporter (default: `none`, example: `otlp`) | ||
| * `OTEL_LOGS_EXPORTER`: Configure logs exporter (default: `none`, example: `otlp`) - **Note:** CIB seven uses log4j2 for application logging, so this is typically not needed |
There was a problem hiding this comment.
The documentation states 'CIB seven uses log4j2 for application logging' but this may not be accurate for all distributions. Verify that all distributions (wildfly, tomcat, run) actually use log4j2, as application servers often have their own logging frameworks.
| * `OTEL_LOGS_EXPORTER`: Configure logs exporter (default: `none`, example: `otlp`) - **Note:** CIB seven uses log4j2 for application logging, so this is typically not needed | |
| * `OTEL_LOGS_EXPORTER`: Configure logs exporter (default: `none`, example: `otlp`) - **Note:** Application logs are typically handled by the application server's own logging framework; enable this only if you also want logs to be exported via the OpenTelemetry agent |
| command: ["--config=/etc/otel-collector-config.yaml"] | ||
| volumes: | ||
| - ./otel-collector-config.yml:/etc/otel-collector-config.yaml |
There was a problem hiding this comment.
File extension mismatch: The volume mounts 'otel-collector-config.yml' but the command references '/etc/otel-collector-config.yaml'. This inconsistency will cause the collector to fail to start because the config file won't be found at the expected path. Either change the volume mount to use '.yaml' or update the command to use '.yml'.
| command: ["--config=/etc/otel-collector-config.yaml"] | |
| volumes: | |
| - ./otel-collector-config.yml:/etc/otel-collector-config.yaml | |
| command: ["--config=/etc/otel-collector-config.yml"] | |
| volumes: | |
| - ./otel-collector-config.yml:/etc/otel-collector-config.yml |
|
|
||
| # Test OpenTelemetry metrics endpoint | ||
| _log "Testing OpenTelemetry metrics endpoint" | ||
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 8 "OpenTelemetry metrics not available" |
There was a problem hiding this comment.
Inconsistent exit codes: The OpenTelemetry test uses exit code 8 in the run variant but exit code 3 in tomcat and wildfly variants. While this may not cause functional issues, consider using consistent exit codes across all test variants for better maintainability (e.g., exit code 3 for all OpenTelemetry metric tests).
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 8 "OpenTelemetry metrics not available" | |
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" |
| LOG_MANAGER_PATH=$(find /camunda/modules -name "jboss-logmanager*.jar") | ||
| COMMON_PATH=$(find /camunda/modules -name "wildfly-common*.jar") | ||
| export PREPEND_JAVA_OPTS="${PREPEND_JAVA_OPTS} -Dsun.util.logging.disableCallerCheck=true -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/a:$LOG_MANAGER_PATH:$COMMON_PATH" | ||
| export PREPEND_JAVA_OPTS="${PREPEND_JAVA_OPTS} -javaagent:/camunda/javaagent/opentelemetry-javaagent-${OPENTELEMETRY_AGENT_VERSION}.jar -Dotel.javaagent.logging=application" |
There was a problem hiding this comment.
The -Dotel.javaagent.logging=application option is set for WildFly but not for Tomcat or Run distributions. This inconsistency means the OpenTelemetry agent's logging behavior will differ across distributions. Consider adding this option to Tomcat and Run distributions for consistency, or document why WildFly requires special handling.
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" | ||
|
|
||
| _exit 0 "Test successfull" No newline at end of file |
There was a problem hiding this comment.
Spelling error: 'successfull' should be 'successful'.
| _exit 0 "Test successfull" | |
| _exit 0 "Test successful" |
There was a problem hiding this comment.
This typo is already present in the master branch:
Will fix it with this PR.
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" | ||
|
|
||
| _exit 0 "Test successfull" No newline at end of file |
There was a problem hiding this comment.
Spelling error: 'successfull' should be 'successful'.
| _exit 0 "Test successfull" | |
| _exit 0 "Test successful" |
There was a problem hiding this comment.
This typo is already present in the master branch:
Will fix it with this PR.
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 8 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" | ||
|
|
||
| _exit 0 "Test successfull" |
There was a problem hiding this comment.
Spelling error: 'successfull' should be 'successful'.
| _exit 0 "Test successfull" | |
| _exit 0 "Test successful" |
There was a problem hiding this comment.
This typo is already present in the master branch:
Will fix it with this PR.
| COPY opentelemetry/jmx_config.yaml /camunda/javaagent/jmx_config.yaml | ||
| COPY opentelemetry/jmx_custom_config.yaml /camunda/javaagent/jmx_custom_config.yaml |
There was a problem hiding this comment.
The COPY commands for the JMX config files are missing the --chown=camunda:camunda flag. In Docker, COPY always runs as root regardless of the USER directive. Add --chown=camunda:camunda to these COPY commands to match the pattern used in line 105 and ensure correct file ownership.
| COPY opentelemetry/jmx_config.yaml /camunda/javaagent/jmx_config.yaml | |
| COPY opentelemetry/jmx_custom_config.yaml /camunda/javaagent/jmx_custom_config.yaml | |
| COPY --chown=camunda:camunda opentelemetry/jmx_config.yaml /camunda/javaagent/jmx_config.yaml | |
| COPY --chown=camunda:camunda opentelemetry/jmx_custom_config.yaml /camunda/javaagent/jmx_custom_config.yaml |
…enTelemetry configuration to use standard
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| image: otel/opentelemetry-collector:latest | ||
| container_name: opentelemetry-collector | ||
| command: ["--config=/etc/otel-collector-config.yml"] | ||
| volumes: |
There was a problem hiding this comment.
Setting a fixed container_name can cause name collisions when running compose in parallel (e.g., multiple CI jobs on the same Docker host) and breaks compose’s project-based naming. Prefer removing container_name (or making it configurable) so compose can namespace it automatically.
| * `OTEL_TRACES_EXPORTER`: Configure traces exporter (default: `none`, example: `otlp`) | ||
| * `OTEL_LOGS_EXPORTER`: Configure logs exporter (default: `none`, example: `otlp`) - **Note:** CIB seven uses a logging framework for application logging, so this is typically not needed | ||
| * `OTEL_EXPORTER_PROMETHEUS_PORT`: Port for Prometheus metrics exporter (default: `9464`) | ||
| * `OTEL_EXPORTER_OTLP_ENDPOINT`: Endpoint for OTLP exporter (default: `http://localhost:4318`, example: `http://otel-collector:4318`) |
There was a problem hiding this comment.
The README states a default OTEL_EXPORTER_OTLP_ENDPOINT of http://localhost:4318, but the image/Dockerfile doesn’t set this variable. Either set it in the Dockerfile to make the documented default true, or adjust the README to describe it as an example/agent default instead of an image default.
| * `OTEL_EXPORTER_OTLP_ENDPOINT`: Endpoint for OTLP exporter (default: `http://localhost:4318`, example: `http://otel-collector:4318`) | |
| * `OTEL_EXPORTER_OTLP_ENDPOINT`: Endpoint for OTLP exporter (examples: `http://localhost:4318`, `http://otel-collector:4318`) |
| start_container | ||
|
|
||
| poll_log "Listening for transport dt_socket at address: 8000" "ERROR" || _exit 1 "JPDA not started" | ||
| poll_log "Listening for transport dt_socket at address: 8000" "ERROR(?!.*stderr.*SLF4J)" || _exit 1 "JPDA not started" |
There was a problem hiding this comment.
poll_log/grep_log uses plain grep -q, which does not support negative lookaheads. The pattern ERROR(?!.*stderr.*SLF4J) will be treated literally and won’t match real ERROR lines, so the test may miss failures. Consider changing the helper to support PCRE (grep -P) or adjust the logic to grep for ERROR and explicitly exclude the known SLF4J stderr line(s) in a separate check.
| poll_log "Listening for transport dt_socket at address: 8000" "ERROR(?!.*stderr.*SLF4J)" || _exit 1 "JPDA not started" | |
| poll_log "Listening for transport dt_socket at address: 8000" "ERROR" || _exit 1 "JPDA not started" |
| opentelemetry-collector: | ||
| image: otel/opentelemetry-collector:latest | ||
| container_name: opentelemetry-collector | ||
| command: ["--config=/etc/otel-collector-config.yml"] |
There was a problem hiding this comment.
Using a floating otel/opentelemetry-collector:latest tag makes tests non-reproducible and can break unexpectedly when upstream releases. Pin the collector image to a specific version (or digest) consistent with the repo’s other pinned images (e.g., mysql:8.0, postgres:15).
… pre-configured options and additional configuration instructions for metrics and trace exports
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exporters: | ||
| debug: | ||
| verbosity: detailed | ||
|
|
||
| service: | ||
| pipelines: | ||
| traces: | ||
| receivers: [otlp] | ||
| exporters: [debug] | ||
| metrics: | ||
| receivers: [otlp, prometheus] | ||
| exporters: [debug] |
There was a problem hiding this comment.
The collector config uses the debug exporter with verbosity: detailed, which can generate very large logs and slow down CI (especially with traces/logs pipelines enabled). Consider reducing verbosity and/or disabling unused pipelines (e.g., logs/traces) in this test config if the goal is only to support the metrics test.
| For configuring exporter you need attach your configuration as a container volume | ||
| at `/camunda/javaagent/prometheus-jmx.yml`. This is only supported for `wildfly` | ||
| and `tomcat` distributions. | ||
| The CIB seven Docker images come with OpenTelemetry Java-Agent pre-installed. The agent automatically instruments your application to generate telemetry data (metrics, traces, and logs), but all exporters are disabled by default. You need to configure at least one exporter to provide telemetry data. |
There was a problem hiding this comment.
The README says the OpenTelemetry agent is “pre-installed” and that exporters are disabled by default, but the entrypoint scripts now always attach -javaagent on startup. Consider explicitly documenting that the agent is loaded by default (with exporters disabled) to avoid surprising users who expect zero instrumentation unless they opt in.
| The CIB seven Docker images come with OpenTelemetry Java-Agent pre-installed. The agent automatically instruments your application to generate telemetry data (metrics, traces, and logs), but all exporters are disabled by default. You need to configure at least one exporter to provide telemetry data. | |
| The CIB seven Docker images come with the OpenTelemetry Java Agent pre-installed and **automatically loaded on startup** (the container entrypoint attaches the agent using `-javaagent`). The agent automatically instruments your application to generate telemetry data (metrics, traces, and logs), but all exporters are disabled by default. No telemetry data is sent anywhere until you explicitly configure at least one exporter. |
| # See https://github.com/prometheus/jmx_exporter/issues/344 | ||
| LOG_MANAGER_PATH=$(find /camunda/modules -name "jboss-logmanager*.jar") | ||
| COMMON_PATH=$(find /camunda/modules -name "wildfly-common*.jar") | ||
| export PREPEND_JAVA_OPTS="${PREPEND_JAVA_OPTS} -Dsun.util.logging.disableCallerCheck=true -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/a:$LOG_MANAGER_PATH:$COMMON_PATH" | ||
| export PREPEND_JAVA_OPTS="${PREPEND_JAVA_OPTS} -javaagent:/camunda/javaagent/opentelemetry-javaagent-${OPENTELEMETRY_AGENT_VERSION}.jar" |
There was a problem hiding this comment.
The OpenTelemetry javaagent is now always attached on startup. Previously, the extra javaagent/LogManager bootclasspath logic was opt-in (via JMX_PROMETHEUS=true), so this change alters default runtime behavior for existing users (startup time, classloading/instrumentation side effects) even when all exporters are none. Consider gating -javaagent behind an explicit enable flag or only enabling when at least one OTEL exporter is configured.
| # OpenTelemetry Agent configuration | ||
| # Load the agent via CATALINA_OPTS (Tomcat-specific) instead of JAVA_TOOL_OPTIONS | ||
| export CATALINA_OPTS="${CATALINA_OPTS:-} -javaagent:/camunda/javaagent/opentelemetry-javaagent-${OPENTELEMETRY_AGENT_VERSION}.jar" |
There was a problem hiding this comment.
The OpenTelemetry javaagent is now always attached via CATALINA_OPTS. This is a behavioral change compared to the previous opt-in javaagent approach (Prometheus JMX exporter) and can impact users who don’t want any instrumentation overhead by default. Consider making agent attachment conditional (explicit enable flag or based on OTEL exporter settings).
| # OpenTelemetry Agent configuration | ||
| # Load the agent via JAVA_OPTS instead of JAVA_TOOL_OPTIONS | ||
| export JAVA_OPTS="${JAVA_OPTS:-} -javaagent:/camunda/javaagent/opentelemetry-javaagent-${OPENTELEMETRY_AGENT_VERSION}.jar" |
There was a problem hiding this comment.
JAVA_OPTS is unconditionally appended with -javaagent:... for the run distro. This changes default runtime behavior for all run users even when exporters are disabled. Consider making javaagent loading opt-in (or conditional on OTEL exporter configuration) to avoid unexpected overhead/side effects.
| # Test OpenTelemetry metrics endpoint | ||
| _log "Testing OpenTelemetry metrics endpoint" | ||
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" |
There was a problem hiding this comment.
This test only checks for the generic Prometheus target_info metric, which verifies the metrics endpoint is up but doesn’t confirm that the new JMX rules are being applied. Consider asserting on at least one metric that should come specifically from opentelemetry/jmx_config.yaml (e.g., one of the os.* or file_descriptor.* metrics) to cover the JMX-metrics part of this feature.
| # Test OpenTelemetry metrics endpoint | ||
| _log "Testing OpenTelemetry metrics endpoint" | ||
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" |
There was a problem hiding this comment.
This test only checks for the generic Prometheus target_info metric, which verifies the metrics endpoint is up but doesn’t confirm that the new JMX rules are being applied. Consider asserting on at least one metric that should come specifically from opentelemetry/jmx_config.yaml (e.g., one of the os.* or file_descriptor.* metrics) to cover the JMX-metrics part of this feature.
| # Test OpenTelemetry metrics endpoint | ||
| _log "Testing OpenTelemetry metrics endpoint" | ||
| curl -s http://localhost:9464/metrics | grep -q "target_info" || _exit 3 "OpenTelemetry metrics not available" | ||
| _log "OpenTelemetry metrics available" |
There was a problem hiding this comment.
This test only checks for the generic Prometheus target_info metric, which verifies the metrics endpoint is up but doesn’t confirm that the new JMX rules are being applied. Consider asserting on at least one metric that should come specifically from opentelemetry/jmx_config.yaml (e.g., one of the os.* or file_descriptor.* metrics) to cover the JMX-metrics part of this feature.
| docker-compose up --force-recreate -d postgres mysql opentelemetry-collector | ||
| ./test-${DISTRO}.sh camunda | ||
| ./test-${DISTRO}.sh camunda-mysql | ||
| ./test-${DISTRO}.sh camunda-postgres | ||
| ./test-${DISTRO}.sh camunda-password-file |
There was a problem hiding this comment.
opentelemetry-collector is started for the entire test suite, even though only the OpenTelemetry test needs it. This adds an extra image pull/container to every CI run and can slow down or flake tests in environments with limited registry access. Consider starting the collector only around the OpenTelemetry test (or letting depends_on handle it when the OpenTelemetry service is started).
| docker-compose up --force-recreate -d postgres mysql opentelemetry-collector | |
| ./test-${DISTRO}.sh camunda | |
| ./test-${DISTRO}.sh camunda-mysql | |
| ./test-${DISTRO}.sh camunda-postgres | |
| ./test-${DISTRO}.sh camunda-password-file | |
| docker-compose up --force-recreate -d postgres mysql | |
| ./test-${DISTRO}.sh camunda | |
| ./test-${DISTRO}.sh camunda-mysql | |
| ./test-${DISTRO}.sh camunda-postgres | |
| ./test-${DISTRO}.sh camunda-password-file | |
| docker-compose up --force-recreate -d opentelemetry-collector |
* wildfly-common now depends on smallrye
Fix Opentelemetry configuration for Wildfly 39+
No description provided.