Skip to content

Releases: berops/claudie

0.11.1

27 Mar 18:38
fea63ed

Choose a tag to compare

v0.11.1

What's Changed

  • General maintenance update by updating dependencies. #2020

v0.11.0

24 Mar 14:56
b9473d8

Choose a tag to compare

v0.11.0

What's Changed

  • Added CloudRift cloud provider support #2000

  • Updated longhorn to version v1.11.1 #2007
    Before upgrading to this Claudie version from v0.10.2, detach all Longhorn volumes and follow the manual checks described here: https://longhorn.io/docs/1.11.1/deploy/upgrade/#manual-checks-before-upgrade

  • More validation of the input manifest was moved into the webhook for the operator so that more immediate feedback is given when kubectl apply is executed #2008

  • When a node is scheduled for deletion, its drain is now limited to a ~30 minute timeout, after which the node will be deleted #2011

  • For node deletion disk scheduling on the longhorn level will now be applied before the node is deleted #2012

v0.10.2

06 Mar 13:28
39c1bc4

Choose a tag to compare

v0.10.2

What's Changed

  • Hetzner DNS will now be considered to be part of the hetzner cloud (hcloud) provider within claudie #1993
    If you're using hetzner for DNS you will also need to use the v0.9.19 templates as from
    this Claudie version onwards the previous templates will not work with the old hetzner dns solution.

  • Claudie will now deploy longhorn with version 1.10.2 #1998

    Before upgrading to this Claudie version from v0.10.1, detach all Longhorn volumes and follow the manual checks described here: https://longhorn.io/docs/1.10.2/deploy/upgrade/#manual-checks-before-upgrade

    Additional manual steps may also be required to ensure Longhorn upgrades correctly. To see the necessary steps, look at the Migration Requirement Before Longhorn v1.10 Upgrade section in Longhorn v1.10.1 release

Bug fixes

  • Fix API endpoint changes with proxy turned on #1996

v0.10.1

23 Feb 19:25
8f11256

Choose a tag to compare

What's Changed

  • Exoscale template version bumped to v0.9.18

Bug fixes

  • Fixed GCP autoscaler adapter crashing when the zone field is omitted from the InputManifest. The adapter now uses aggregated list requests to query machine types across all zones #1989

v0.10.0

19 Feb 10:58
b14fa17

Choose a tag to compare

Most notable changes (TL;DR)

  • This version introduces a regular loop that will periodically ensure that the created infrastructure always matches the specs from the InputManifest and it is aligned and corrected if it drifts.
    This mechanism will be applied on any newly created clusters. Clusters imported from older versions of Claudie will become regularly reconciled after their first modification in InputManifest.

  • Longhorn v1.9.2 will now be deployed for clusters built with Claudie. For existing clusters built with v0.9.16 manual steps need to be done
    before deploying Claudie v0.10.0:

    • Please read about the manual steps here
  • The Builder service has been completely removed from Claudie. It is also recommended that you delete the Builder deployment after deploying the v0.10.x versions of Claudie. Claudie now uses NATS instead of the builder to dispatch tasks among the other services.

  • The BuilderTTL field, which was internal to Claudie's task dispatching process, was completely removed in favor of a work queue. Previously, when the BuilderTTL reached 0, a new diff with the current desired state was made, even if the scheduled task did not finish. Thus, it was possible for another task to be dispatched. This is no longer possible, as the move to NATS requires an explicit acknowledgment of the task to progress the building of the cluster.

  • The identification and scheduling of tasks has been overhauled. Claudie now has an initial version of a reconciliation loop. In the v0.9.x versions of Claudie, whenever a change was detected after running kubectl apply -f <your-input-manifest>
    Claudie stopped and did not continue to health check or fix the error, even if the error was simply a network inconvenience, upon either a failure or success of building that change. As of now, with the reconciliation loop, every
    kubectl apply -f <your-input-manifest> will explicitly state the desired state of your clusters, and Claudie will try endlessly to reach that desired state. This means that, in the event of any errors, changes will be reverted and then
    reapplied, along with health checking, which helps identify potential misconfigurations or infrastructure issues. Claudie will then try to auto-repair these issues, if possible. The goal is to further improve the reconciliation loop with each release.

  • DynamoDB was removed in favor of native locking supported by newer versions of OpenTofu which ship with Claudie v0.10.x

  • Support for Exoscale

v0.10.0

What's Changed

  • Use native state locking provided by OpenTofu instead of relying on DynamoDB #1906

  • Upgrade kubeone to v1.12.1. Claudie now supports building the following Kubernetes versions: v1.32, v1.33, v1.34 #1913

  • Making use of a provider cache in the Terraformer, essentially removing the time spent downloading the provider on a cache hit #1907

  • Preventing kubeone from overriding config.toml which would collide with NvidiaGPU operator overrides #1916

  • Longhorn will now be deployed with the best-effort data-locality setting #1933

  • The Ansibler stage has been tweaked to take less time overall #1917

  • Genesis Cloud provider support dropped #1941

  • The zone field is now optional for dynamic nodepools defined in the Input Manifest. If omitted, Claudie will automatically distribute the nodes across zones #1947

  • Claudie will now deploy Longhorn with version 1.9.2 #1956.
    Manual steps need to be done before
    upgrading to Claudie v0.10.0 for Longhorn.

  • Claudie will now support GPU guest accelerator for GCP nodepools #1952
    Previously, it was not possible to communicate this information to the templates used to spawn the infrastructure. With
    the new changes, the GPU type and count will now be passed to the templates, correctly spawning a VM with the requested GPU.

     nodePools:
       dynamic:
         - name: gpu-workers
           providerSpec:
             name: gcp-provider
             region: europe-west1
             zone: europe-west1-b
           count: 1
           serverType: n1-standard-4
           image: ubuntu-2204-lts
           machineSpec:
             nvidiaGpuCount: 1              # <-- specify number of gpus.
             nvidiaGpuType: nvidia-tesla-t4 # <-- specify gpu type
    
  • Initial version of the reconciliation loop was added to Claudie #1951
    Claudie will now endlessly healthcheck and try to fix errors on identified tasks. While currently this only resolves
    basic scenarios, such as unreachable nodes, the aim is to broaden this with every release.

  • Claudie will no longer expect NGINX to be installed on existing clusters #1980

  • Part of the reconciliation loop is to refresh the current state infrastructure periodically after no tasks have been identified #1979

  • Added support for a new provider Exoscale

Bug fixes

  • Deletion process was fixed for newer versions of Kubernetes #1919

  • Deploy kubelet-csr-approver to approve kubelet server CSRs #1934

v0.9.16

26 Nov 10:55
56e059b

Choose a tag to compare

v0.9.16

What's Changed

  • The open stack provider will now use image names instead of image ids, this was due to the possibility of the ids being replaced by the provider and no longer valid #1902

Bug fixes

  • Fix cloudflare account id propagation when updating to newer claudie versions #1904

v0.9.15

13 Nov 09:41
97232b0

Choose a tag to compare

v0.9.15

Bug fixes

  • Fixes issues with incompatible docker api in the ansibler service that resulted in the error from #1885

v0.9.14

17 Oct 12:27
e0b73c1

Choose a tag to compare

v0.9.14

What's Changed

  • Correctly remove taints,annotations,labels when removed from a NodePool in the InputManifest #1852
  • In some cases unnecessary tasks were spawned which would prolong the building of the cluster without any side-effect, these have been removed #1856
  • Expand machine spec to contain number of GPUs #1854
    Inside the NodePool specification it is now possible to specify the number of GPUs the instance has
    which is made use of when autoscaling based on GPU workload.
- name: autoscaled
  providerSpec:
   name: aws
   region: eu-central-1
   zone: eu-central-1a
  autoscaler:
    min: 0
    max: 20
  # GPU machine type name.
  serverType: g4dn.xlarge
  machineSpec:
    # explicitly specify how many GPU's the instance type provides.
    nvidiaGpu: 1
  image: ami-07eef52105e8a2059
  • Add support for OpenStack provider, with the main aim of supporting the openstack offering from OVH #1857
    It is now possible to use an openstack provider within the InputManifest.
    The support for openstack has been added in the v0.9.14 version of the claudie templates.
    - name: ovh-1
      providerType: openstack
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.9.14
        path: "templates/terraformer/openstack"
      secretRef:
        name: ovh-secret
        namespace: e2e-secrets

v0.9.13

13 Sep 15:04
159b095

Choose a tag to compare

v0.9.13

What's Changed

  • Concurrency limits are now configurable #1838
  • Autoscaled nodepools are now limited to 256 nodes #1839
  • Metadata secret will now be updated after node deletion #1841
  • Builder TTL has been made configurable via the BUILDER_TTL env, with a default value of 2 hours #1850

Bug fixes

  • Prometheus metric for currently deleted nodes has been fixed #1849

v0.9.12

19 Aug 09:12
b9efabf

Choose a tag to compare

v0.9.12

What's Changed

  • Retries were added to reading the output from OpenTofu, which could occasionally fail. #1824
  • Increased concurrency limits to decrease the build time of larger clusters. This change also affects Claudie's memory requirements, which should fit within 8 GB. #1819
  • For autoscaled events, Terraformer will now skip refreshing the LoadBalancers and DNS infrastructure, if present. #1830