|
| 1 | +## [1.2.9](https://github.com/Cloudzero/cloudzero-agent/compare/v1.2.8...v1.2.9) (2025-11-25) |
| 2 | + |
| 3 | +Release 1.2.9 focuses on quality, scalability, configurability, and consistency. It includes significant improvements to the organization and configurability of the Helm chart, Prometheus 3.x support (with 2.x support preserved for now), an HPA for the webhook server, as well as some early previews of experimental functionality we hope to stabilize over the next few releases. |
| 4 | + |
| 5 | +### Key Improvements |
| 6 | + |
| 7 | +- **Prometheus 3.x Support**: Upgraded default Prometheus version from 2.55.1 to 3.7.3 with automatic backward compatibility. The chart detects Prometheus version and uses the appropriate agent mode flag (`--agent` for 3.x, `--enable-feature=agent` for 2.x). Customers using custom Prometheus 2.x images will continue to work without changes. |
| 8 | + |
| 9 | +- **Webhook Server Autoscaling**: Added Horizontal Pod Autoscaler support for the webhook server, enabling automatic scaling based on CPU and memory utilization. Enable with `insightsController.autoscaling.enabled: true`. |
| 10 | + |
| 11 | +### Configuration Improvements |
| 12 | + |
| 13 | +- **Unified Label/Annotation System**: Refactored label and annotation generation with new `generateLabels` and `generateAnnotations` helpers. Labels now follow Kubernetes recommended practices with `app.kubernetes.io/name` for component identity and `app.kubernetes.io/part-of: cloudzero-agent` for chart membership. |
| 14 | + |
| 15 | +- **Component-Specific Metadata**: Added support for component-specific `labels`, `annotations`, `podLabels`, and `podAnnotations` across all workload types, providing fine-grained control over Kubernetes metadata. |
| 16 | + |
| 17 | +- **Centralized Resource Names**: Implemented unified resource naming pattern (`{release-name}-cz-{component}`) across all Kubernetes resources, improving consistency and enabling programmatic name reconstruction. |
| 18 | + |
| 19 | +- **Unified Mode Configuration**: New `components.agent.mode` property consolidates deployment mode selection with values: `agent`, `server`, `federated`, and `clustered`. Legacy properties continue to work with automatic derivation. |
| 20 | + |
| 21 | +- **Cohesive Replicas System**: New `defaults.replicas` property provides global default with mode-specific constraints. |
| 22 | + |
| 23 | +- **Persistent Volume Strategy**: Changed deployment strategy to `Recreate` when persistent volumes are enabled, preventing volume mount conflicts during rolling updates. |
| 24 | + |
| 25 | +### Reliability Improvements |
| 26 | + |
| 27 | +- **Subchart Isolation**: Excluded `.global` sections from configuration checksum calculation, preventing unnecessary pod restarts when parent chart globals change in subchart deployments. |
| 28 | + |
| 29 | +- **Cert-Manager Compatibility**: Fixed ArgoCD reconciliation failures by removing empty `caBundle` key from webhook configuration when using cert-manager (which injects the CA bundle via annotation). |
| 30 | + |
| 31 | +- **DNS Resolution**: Updated cAdvisor configuration to use fully qualified domain name `kubernetes.default.svc.cluster.local:443` for improved DNS resolution reliability. |
| 32 | + |
| 33 | +### Support Tooling |
| 34 | + |
| 35 | +- **Diagnostic Script**: Added `scripts/anaximander.sh` for comprehensive diagnostic information gathering. Customers can run this script to collect logs, configurations, resource status, and environment context for CloudZero support. |
| 36 | + |
| 37 | +- **Post-Install Guidance**: Updated Helm NOTES.txt with improved post-installation guidance and next steps. |
| 38 | + |
| 39 | +### Experimental Features |
| 40 | + |
| 41 | +The following features are experimental and may change in future releases: |
| 42 | + |
| 43 | +- **Grafana Alloy Integration**: Added Grafana Alloy as an alternative to Prometheus for metrics collection in high-volume environments. Configure with `components.agent.mode: clustered` to enable. |
| 44 | + |
| 45 | +- **GPU Metrics Collection**: Added NVIDIA DCGM GPU metrics scraping. Enable with `prometheusConfig.scrapeJobs.gpu.enabled: true`. Note that this is just for collection, CloudZero does not yet support cost allocation based on GPU. |
| 46 | + |
| 47 | +### Upgrade Steps |
| 48 | + |
| 49 | +This release includes changes to immutable Kubernetes selectors, requiring the `--force` flag to recreate affected resources: |
| 50 | + |
| 51 | +```sh |
| 52 | +helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.9 --force |
| 53 | +``` |
0 commit comments