Skip to content

build: unify Hadoop logging through Log4j2 and expose /actuator/loggers#511

Open
cbb330 wants to merge 4 commits intolinkedin:mainfrom
cbb330:chbush/unify-logging-log4j2
Open

build: unify Hadoop logging through Log4j2 and expose /actuator/loggers#511
cbb330 wants to merge 4 commits intolinkedin:mainfrom
cbb330:chbush/unify-logging-log4j2

Conversation

@cbb330
Copy link
Collaborator

@cbb330 cbb330 commented Mar 21, 2026

Summary

Two related changes that together enable runtime dynamic control of all loggers — including Hadoop, HDFS client, and Iceberg — via Spring Boot Actuator, without a service restart.

Problem

Hadoop's hadoop-client transitively introduces slf4j-log4j12 (SLF4J → Log4j 1.x) alongside log4j-slf4j-impl (SLF4J → Log4j2) in the Spring Boot fat JAR. When both bindings are on the classpath, SLF4J picks one nondeterministically — in practice slf4j-log4j12 wins, which means Hadoop loggers (DFSClient, ipc.Client, RetryInvocationHandler, ObserverReadProxyProvider, etc.) route to Log4j 1.x and are invisible to Spring Boot Actuator's /actuator/loggers endpoint. Setting DFSClient to DEBUG via Actuator returns a 204, but no DEBUG logs appear.

Changes

1. buildSrc/src/main/groovy/openhouse.springboot-conventions.gradle — logging unification

Applies to: services/tables, services/jobs, services/housetables, iceberg/openhouse/internalcatalog, iceberg/openhouse/htscatalog

  • Excludes org.slf4j:slf4j-log4j12 and log4j:log4j from all configurations
  • Adds org.apache.logging.log4j:log4j-1.2-api:2.25.3 — provides the full Log4j 1.x API surface but routes all calls to Log4j2 (version must match log4j-core exactly)

After this, the full logging chain is:

  • SLF4J callers (Hadoop, Iceberg) → log4j-slf4j-impl → Log4j2
  • Direct Log4j 1.x API callers → log4j-1.2-api → Log4j2
  • Log4j2 is the single backend, controlled by Spring Boot and Actuator

2. services/tables/src/main/resources/application.properties — expose loggers endpoint

  • Adds loggers to management.endpoints.web.exposure.include
  • Enables management.endpoint.loggers.enabled=true

Without this, the /actuator/loggers endpoint exists but is not exposed over HTTP, so even after the log4j unification above, Actuator cannot be called remotely to change log levels.

Why this matters

1. Actuator now controls all loggers, including Hadoop and Iceberg
POST /actuator/loggers/org.apache.hadoop.hdfs.DFSClient with {"configuredLevel":"DEBUG"} actually produces DEBUG output. Before, it silently had no effect.

2. Single source of truth for log levels
Previously, log levels existed in two independent places: log4j-config.xml (ConfigMap) for Hadoop code, and Spring Boot config for app code. Conflicting levels were possible with no unified view. Now Log4j2 is authoritative for all loggers.

3. Eliminates SLF4J binding nondeterminism
With two SLF4J bindings on the classpath, which one wins depends on classpath ordering — a function of JVM version and JAR load order, not explicit configuration. This makes "does Hadoop logging work?" an implicit side effect rather than a deterministic property.

4. requestId MDC propagates into Hadoop and Iceberg log lines
Spring Boot populates requestId in MDC on each inbound request. With Log4j2 as the sole backend, that MDC context is present in Hadoop and Iceberg log lines too. A single grep requestId=abc123 correlates Spring MVC → service layer → Iceberg → HDFS across the entire pod log.

5. Removes EOL Log4j 1.x from the fat JAR
log4j:log4j reached end-of-life in 2015 and carries known CVEs (including CVE-2019-17571, a deserialization RCE). Removing it reduces attack surface. log4j-1.2-api is maintained as part of the active Log4j2 project.

6. Log format consistency
All log lines — app code and Hadoop client code — now share the same format and MDC fields, simplifying log aggregation and alerting.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

Verified dependency resolution excludes the right JARs:

./gradlew :services:tables:dependencies --configuration runtimeClasspath \
  | grep -E "slf4j-log4j12|log4j:log4j|log4j-1\.2-api|log4j-slf4j-impl"
# Only log4j-1.2-api and log4j-slf4j-impl present; slf4j-log4j12 and log4j:log4j excluded

./gradlew :services:tables:test
# BUILD SUCCESSFUL

Runtime verification on test cluster: after deploying, POST /actuator/loggers/org.apache.hadoop.hdfs.DFSClient returns 204 and subsequent kubectl logs shows DEBUG output from DFSClient and ipc.Client on real HDFS operations. NativeCodeLoader (a Hadoop class) logs in Spring Boot Log4j2 format, confirming SLF4J → Log4j2 routing end-to-end.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

cbb330 added 4 commits March 20, 2026 22:28
…operties

The Update a Table example in SETUP.md was missing two critical pieces:
- baseTableVersion must come from the GET response's tableVersion field
  (not hardcoded as 'INITIAL_VERSION', which causes 409 after first update)
- tableProperties must include all openhouse.* properties from GET;
  omitting them causes a 500 NPE in the cross-cluster eligibility check
Hadoop's hadoop-client transitively introduces slf4j-log4j12 (SLF4J ->
Log4j 1.x bridge) alongside log4j-slf4j-impl (SLF4J -> Log4j2 bridge)
in the fat JAR. When both are on the classpath, SLF4J emits a 'multiple
bindings' warning and picks one arbitrarily -- in practice slf4j-log4j12
wins, meaning Hadoop loggers (DFSClient, ipc.Client, RetryInvocationHandler,
etc.) route to Log4j 1.x and are invisible to Spring Boot Actuator's
/actuator/loggers endpoint.

Fix:
- Globally exclude slf4j-log4j12 and log4j:log4j from all Spring Boot
  service configurations, consistent with the existing exclusion of
  logback-classic.
- Add log4j-1.2-api, which provides the Log4j 1.x API but routes all
  calls to Log4j2. This preserves compatibility for any code using the
  Log4j 1.x API directly while keeping Log4j2 as the single logging
  implementation.

After this change, all loggers -- including Hadoop HDFS client loggers --
are visible to the Spring Boot Actuator /actuator/loggers endpoint and
can be dynamically adjusted at runtime without restarting the service.
Adds the Spring Boot Actuator /actuator/loggers endpoint to the tables-service,
enabling dynamic log level changes at runtime without pod restarts or deployment
changes.

This supports the hdfs-diagnostics tooling: enable-hdfs-debug.sh uses this
endpoint to activate DEBUG logging on a single pod during incident windows,
mirroring the loggers configured in application-hdfs-diagnostics.properties.
log4j-core is forced to 2.25.3 by transitive resolution. log4j-1.2-api
must match exactly — OptionConverter.convertLevel() signature changed
between 2.13.3 and 2.25.3, causing NoSuchMethodError at startup when
processing the Log4j 1.x XML config.
@cbb330 cbb330 changed the title build: unify Hadoop logging through Log4j2 by excluding slf4j-log4j12 build: unify Hadoop logging through Log4j2 and expose /actuator/loggers Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant