Scrape failed: "i/o timeout" and missing Pod metadata

I am encountering i/o timeout errors when Prometheus tries to scrape metrics from dcgm-exporter. Additionally, I am struggling to get Pod mapping working correctly (metrics are missing pod, namespace labels).

Deployment method:  Helm Chart 4.7.1
Image: nvcr.io/nvidia/k8s/dcgm-exporter:4.5.1-4.8.0-distroless
Kubernetes: 1.33.6
GPU: Nvidia H100 (4 MIG)

values.yaml

```
kubernetes:
  enablePodLabels: true
  enablePodUID: true
  rbac:
    create: true
image:
  repository: nvcr.io/nvidia/k8s/dcgm-exporter
  pullPolicy: IfNotPresent
  tag: 4.5.1-4.8.0-distroless
extraEnv:
  - name: DCGM_EXPORTER_INTERVAL
    value: "1000"
  - name: DCGM_EXPORTER_KUBERNETES_GPU_ID_TYPE
    value: "device-name"
serviceMonitor:
  enabled: true
  interval: 60s
  honorLabels: true
  additionalLabels:
    release: kube-prometheus-stack
```

LOGS

```
time=2026-02-02T04:42:44.018Z level=ERROR msg="Failed to write response." error="write tcp 10.42.0.54:9400->10.42.0.66:47334: i/o timeout"
time=2026-02-02T04:42:44.018Z level=INFO msg="http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/internal/pkg/server.(*MetricsServer).Metrics (server.go:257)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape failed: "i/o timeout" and missing Pod metadata #625

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scrape failed: "i/o timeout" and missing Pod metadata #625

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions