From 443892e76eb45c620415002be203c86e3f228fb2 Mon Sep 17 00:00:00 2001 From: haseeb Date: Wed, 29 Apr 2026 21:40:08 +0530 Subject: [PATCH] docs(nautobot-worker): updates --- charts/argocd-understack/README.md | 1 - docs/deploy-guide/components/index.md | 1 - docs/deploy-guide/components/nautobot-site.md | 45 ------ .../components/nautobot-worker.md | 139 ++++++++++++++---- docs/deploy-guide/components/nautobot.md | 47 ++++-- docs/deploy-guide/site-cluster.md | 1 - docs/operator-guide/nautobot-celery-queues.md | 61 ++++---- .../nautobot-mtls-certificate-renewal.md | 34 ++++- docs/operator-guide/nautobot.md | 82 ++++++++--- mkdocs.yml | 1 - 10 files changed, 272 insertions(+), 140 deletions(-) delete mode 100644 docs/deploy-guide/components/nautobot-site.md diff --git a/charts/argocd-understack/README.md b/charts/argocd-understack/README.md index bff1999d6..9a176f77f 100644 --- a/charts/argocd-understack/README.md +++ b/charts/argocd-understack/README.md @@ -177,7 +177,6 @@ Components deployed on site clusters: | argo-workflows | `site.argo_workflows` | Workflow engine | | chrony | `site.chrony` | NTP service | | envoy-configs | `site.envoy_configs` | Gateway configs | -| nautobot-site | `site.nautobot_site` | Site Nautobot config | | openstack-exporter | `site.openstack_exporter` | Metrics exporter | | openstack-memcached | `site.openstack_memcached` | Caching | | site-workflows | `site.site_workflows` | Site workflows | diff --git a/docs/deploy-guide/components/index.md b/docs/deploy-guide/components/index.md index 9611d3e7f..fe1a02a29 100644 --- a/docs/deploy-guide/components/index.md +++ b/docs/deploy-guide/components/index.md @@ -90,7 +90,6 @@ enablement defaults, validation, and troubleshooting notes. | [monitoring](./monitoring.md) | global, site | | [nautobot](./nautobot.md) | global | | [nautobot-api-tokens](./nautobot-api-tokens.md) | global | -| [nautobot-site](./nautobot-site.md) | site | | [nautobotop](./nautobotop.md) | global | | [openebs](./openebs.md) | global, site | | [openstack](./openstack.md) | site | diff --git a/docs/deploy-guide/components/nautobot-site.md b/docs/deploy-guide/components/nautobot-site.md deleted file mode 100644 index b24835956..000000000 --- a/docs/deploy-guide/components/nautobot-site.md +++ /dev/null @@ -1,45 +0,0 @@ ---- -source_text: The deploy guide and chart README list `nautobot-site` as a site-scoped - component. -argocd_extra: -- The current chart does not contain `charts/argocd-understack/templates/application-nautobot-site.yaml`, - so there is not yet an ArgoCD source definition to describe here. -- Until that template exists, there is no deploy-repo values file or overlay directory - consumed by ArgoCD for this page. ---- - -# nautobot-site - -Site-level Nautobot integration resources. - -## Deployment Scope - -- Cluster scope: site -- Values key: `site.nautobot_site` -- ArgoCD Application template: `charts/argocd-understack/templates/application-nautobot-site.yaml` - -## How to Enable - -Set this component to enabled in your deployment values file: - -```yaml title="$CLUSTER_NAME/deploy.yaml" -site: - nautobot_site: - enabled: true -``` - -## How ArgoCD Builds It - -{{ component_argocd_builds() }} - -## Deployment Repo Content - -{{ secrets_disclaimer }} - -Required or commonly required items: - -- None today. Add the final Secret or manifest contract here when the `nautobot-site` Application template is implemented. - -Optional additions: - -- If you are carrying site-specific Nautobot resources out of tree, document them with the component that currently applies them rather than assuming a future `nautobot-site` Application. diff --git a/docs/deploy-guide/components/nautobot-worker.md b/docs/deploy-guide/components/nautobot-worker.md index f65bf4130..fc70a5188 100644 --- a/docs/deploy-guide/components/nautobot-worker.md +++ b/docs/deploy-guide/components/nautobot-worker.md @@ -45,6 +45,24 @@ site: enabled: true ``` +## Configuration Architecture + +The worker uses the same Helm chart `fileParameters` mechanism as the +global Nautobot deployment. The default config path is +`$understack/components/nautobot/nautobot_config.py`, but site +deployments can override it with `site.nautobot_worker.nautobot_config`. +A site deployment can point this value at a shared deploy-repo config: + +```yaml title="$CLUSTER_NAME/deploy.yaml" +site: + nautobot_worker: + nautobot_config: '$deploy/apps/nautobot-config/nautobot_config.py' +``` + +Use the same deploy-specific config for global Nautobot and site workers +when they must share mTLS, plugin, SSO, and production-hardening +behavior. + ## Architecture Site workers connect to the global cluster's PostgreSQL (CNPG) and Redis @@ -80,12 +98,13 @@ worker pod and the server. ## Plugin Loading -The shared `nautobot_config.py` supports a generic plugin loading -mechanism described in the +Deployment-specific plugin configuration can live in the shared deploy +`nautobot_config.py`, with credentials supplied by `nautobot-custom-env`. +Site workers use the same config as global Nautobot, so plugin changes +belong in the shared deploy config, not in worker-only values. For +details, see the [Nautobot Plugin Loading](../../operator-guide/nautobot.md#plugin-loading) -operator guide. Site workers use the same mechanism -- open-source -plugins are loaded automatically, and additional plugins can be added -via the `NAUTOBOT_EXTRA_PLUGINS` environment variable. +operator guide. ## Connection Security @@ -147,6 +166,53 @@ Both PostgreSQL (port 5432) and Redis (port 6379) use `routes.tls` entries with TLS passthrough mode. The gateway routes traffic based on SNI hostname without terminating TLS, preserving end-to-end mTLS. +#### Firewall Requirements + +Site workers reach the global PostgreSQL and Redis services through the +Envoy Gateway LoadBalancer address. Because these are separate +`routes.tls` listeners, HTTPS access to the Nautobot web endpoint is not +enough. The network path must allow the database and Redis listener +ports as well. + +For each site, request or configure firewall/security policy with: + +| Field | Value | +|---|---| +| Source | Site worker egress CIDRs, such as node CIDRs, pod CIDRs, or NAT ranges | +| Destination | Envoy Gateway LoadBalancer/VIP for the global cluster | +| Services | TCP/5432 and TCP/6379 | +| Protocol handling | TLS/SSL passthrough if the firewall requires an application/protocol match | +| Action | Allow | + +The Envoy config should have TLS routes similar to: + +```yaml +routes: + tls: + - name: nautobot-db + fqdn: nautobot-db..example.com + gatewayPort: 5432 + namespace: nautobot + service: + name: nautobot-cluster-rw + port: 5432 + - name: nautobot-redis + fqdn: nautobot-redis..example.com + gatewayPort: 6379 + namespace: nautobot + service: + name: nautobot-redis-master + port: 6379 +``` + +Both FQDNs must resolve to the Envoy Gateway LoadBalancer/VIP. From a +site worker pod, verify routing before debugging mTLS: + +```bash +kubectl exec -n nautobot deploy/nautobot-worker-celery-site-dc -- sh -lc \ + 'nc -vz nautobot-db..example.com 5432 && nc -vz nautobot-redis..example.com 6379' +``` + ## Certificate Infrastructure ### Global Cluster @@ -161,6 +227,12 @@ and distributed to site clusters through your external secrets provider. The CA private key never leaves the global cluster -- a compromised site cannot forge certificates for other sites. +The site worker mounts only the site-local Secret named +`nautobot-mtls-client`. ExternalSecret creates that Secret from the +secrets provider. The provider data should come from the per-site source +Secret issued on the global cluster, named +`nautobot-mtls-client-`. + Each site needs two credentials from the secrets provider: | Credential | Content | Scope | @@ -171,7 +243,8 @@ Each site needs two credentials from the secrets provider: The ExternalSecret on the site cluster combines these into a single `nautobot-mtls-client` secret (type `kubernetes.io/tls`) with `tls.crt`, `tls.key`, and `ca.crt`. This secret is mounted into worker pods at -`/etc/nautobot/mtls/`. +`/etc/nautobot/mtls/`. The shared `nautobot_config.py` uses those stable +file paths for both global pods and site workers. Note: if your secrets provider stores PEM data with `\r\n` line endings or concatenates multiple PEM blocks in a single field, use the @@ -195,6 +268,8 @@ Before starting, ensure the global cluster already has: - CNPG configured with `serverTLSSecret`, `serverCASecret`, `clientCASecret`, and `pg_hba` - Redis TLS enabled with `authClients: true` - Envoy Gateway TLS passthrough routes on ports 5432 and 6379 +- Firewall/security policy allowing TCP/5432 and TCP/6379 from the site + worker egress CIDRs to the Envoy Gateway LoadBalancer/VIP You also need the pre-issued client certificate stored in your external secrets provider (see Step 1). @@ -215,7 +290,7 @@ metadata: spec: secretName: nautobot-mtls-client- duration: 26280h # 3 years - renewBefore: 2160h # 90 days + renewBefore: 720h # 30 days commonName: app usages: - client auth @@ -263,15 +338,18 @@ cluster and is never extracted or distributed. ### Step 3: Create ExternalSecrets for credentials Create ExternalSecret resources that pull credentials from your secrets -provider into the `nautobot` namespace. You need five: +provider into the `nautobot` namespace. A deployment-specific config +that reads additional environment variables also needs +`nautobot-custom-env`: | ExternalSecret | Target Secret | Purpose | |---|---|---| | `externalsecret-nautobot-django.yaml` | `nautobot-django` | Django `SECRET_KEY` -- must match the global instance | | `externalsecret-nautobot-db.yaml` | `nautobot-db` | CNPG app user password (satisfies Helm chart requirement) | -| `externalsecret-nautobot-worker-redis.yaml` | `nautobot-redis` | Redis password | +| `externalsecret-nautobot-worker-redis.yaml` | `nautobot-worker-redis` | Redis password | | `externalsecret-dockerconfigjson-github-com.yaml` | `dockerconfigjson-github-com` | Container registry credentials | | `externalsecret-nautobot-mtls-client.yaml` | `nautobot-mtls-client` | mTLS client cert + CA cert (two credentials combined) | +| `externalsecret-nautobot-custom-env.yaml` | `nautobot-custom-env` | Deployment-specific integration credentials or runtime settings | The mTLS ExternalSecret pulls from two separate credentials in your secrets provider -- the per-site client cert+key and the shared CA @@ -338,20 +416,21 @@ resources: - externalsecret-nautobot-worker-redis.yaml - externalsecret-dockerconfigjson-github-com.yaml - externalsecret-nautobot-mtls-client.yaml + - externalsecret-nautobot-custom-env.yaml ``` ### Step 5: Create the values file -Create `values.yaml` with the site-specific overrides. Replace -`` with your environment identifier and `` with -the site's partition name. +Create `values.yaml` with the site-specific overrides. +The Celery worker name and queue are rendered by ArgoCD Application from the +`understack.rackspace.com/site` app label. ```yaml nautobot: db: - host: "nautobot-db..undercloud.rackspace.net" + host: "nautobot-db..example.com" redis: - host: "nautobot-redis..undercloud.rackspace.net" + host: "nautobot-redis..example.com" ssl: true image: registry: "ghcr.io" @@ -365,10 +444,10 @@ celery: extraEnvVars: - name: NAUTOBOT_CONFIG value: /opt/nautobot/nautobot_config.py - - name: NAUTOBOT_EXTRA_PLUGINS - value: '' - name: NAUTOBOT_DB_SSLMODE value: verify-ca + - name: NAUTOBOT_DB_SSLNEGOTIATION + value: direct - name: NAUTOBOT_REDIS_SSL_CERT_REQS value: required - name: NAUTOBOT_REDIS_SSL_CA_CERTS @@ -377,10 +456,6 @@ celery: value: /etc/nautobot/mtls/tls.crt - name: NAUTOBOT_REDIS_SSL_KEYFILE value: /etc/nautobot/mtls/tls.key - - name: SSL_CERT_FILE - value: /etc/nautobot/mtls/ca.crt - - name: REQUESTS_CA_BUNDLE - value: /etc/nautobot/mtls/ca.crt extraVolumes: - name: mtls-certs secret: @@ -400,6 +475,7 @@ Add `nautobot_worker` to the site's `deploy.yaml`: site: nautobot_worker: enabled: true + nautobot_config: '$deploy/apps/nautobot-config/nautobot_config.py' ``` ### Step 7: Verify @@ -425,6 +501,7 @@ kubectl logs -n nautobot -l app.kubernetes.io/component=nautobot-celery --tail=5 externalsecret-nautobot-db.yaml externalsecret-nautobot-django.yaml externalsecret-nautobot-mtls-client.yaml + externalsecret-nautobot-custom-env.yaml externalsecret-nautobot-worker-redis.yaml kustomization.yaml values.yaml @@ -450,18 +527,19 @@ operator guide. | `NAUTOBOT_REDIS_SSL_CA_CERTS` | Site worker values | Path to CA cert for Redis | | `NAUTOBOT_REDIS_SSL_CERTFILE` | Site worker values | Path to client cert for Redis | | `NAUTOBOT_REDIS_SSL_KEYFILE` | Site worker values | Path to client key for Redis | -| `SSL_CERT_FILE` | Site worker values | System-wide CA bundle override for outbound HTTPS | -| `REQUESTS_CA_BUNDLE` | Site worker values | Python requests library CA bundle override | +| `SSL_CERT_FILE` | Optional site value | System-wide CA bundle override for outbound HTTPS | +| `REQUESTS_CA_BUNDLE` | Optional site value | Python requests library CA bundle override | | `NAUTOBOT_CONFIG` | Both global and site | Path to `nautobot_config.py` | -| `NAUTOBOT_EXTRA_PLUGINS` | Both global and site values | Comma-separated list of additional plugin module names to load (beyond the open-source defaults). Plugins are loaded only if installed in the container. | -| `NAUTOBOT_EXTRA_PLUGINS_CONFIG` | Both global and site values | JSON object with plugin configuration. Supports `${ENV_VAR}` syntax for referencing environment variables in string values (useful for secrets). Merged into `PLUGINS_CONFIG`. | | `UNDERSTACK_PARTITION` | `cluster-data` ConfigMap (patched by ArgoCD from `appLabels`) | Site partition identifier used by computed fields (e.g. device URN generation). Exposed as a Django setting. | +| `UNDERSTACK_SITE` | `cluster-data` ConfigMap (patched by ArgoCD from `appLabels`) | Site identifier available to worker pods. The same app label is used by ArgoCD to render the worker name and Celery queue. | +| Deployment-specific integration variables | `nautobot-custom-env` secret | Extra credentials and runtime settings consumed by the selected `nautobot_config.py`. | ## Design Decisions - The cert-manager CA hierarchy (self-signed bootstrap -> root CA -> - CA issuer) handles issuance and renewal on both global and site - clusters without manual intervention. + CA issuer) handles issuance and renewal on the global cluster. Site + clusters receive the issued client certificate through their external + secrets provider. - CNPG's native TLS support (`serverTLSSecret`, `serverCASecret`, `clientCASecret`, `replicationTLSSecret`) integrates directly with @@ -488,7 +566,9 @@ operator guide. - The `nautobot_config.py` SSL logic is conditional on `NAUTOBOT_DB_SSLMODE`, so the same config file works for both global pods and site workers. All pods set `verify-ca` to present client - certificates for `pg_hba cert` authentication. + certificates for `pg_hba cert` authentication. If a deployment sets + `NAUTOBOT_DB_SSLNEGOTIATION=direct`, keep that setting paired with + PostgreSQL/libpq 17 or newer. - The Redis mTLS logic in `nautobot_config.py` auto-detects the CA cert file at the default mount path. If the cert volume is mounted, Redis @@ -513,6 +593,11 @@ operator guide. env var is unset, no SSL options are applied and the connection will be rejected by the `hostssl ... cert` pg_hba rule. +- **Deploy-specific configs may expect nautobot-custom-env.** If the + selected deploy config reads extra environment variables from the + `nautobot-custom-env` secret, keep the ExternalSecret in the site + worker kustomization. + - **mtls-ca-cert secret contains a private key.** cert-manager Certificate resources always produce `tls.crt`, `tls.key`, and `ca.crt`. CNPG only reads `ca.crt` from the referenced secret, so diff --git a/docs/deploy-guide/components/nautobot.md b/docs/deploy-guide/components/nautobot.md index 04ac46449..96f7866e2 100644 --- a/docs/deploy-guide/components/nautobot.md +++ b/docs/deploy-guide/components/nautobot.md @@ -36,16 +36,26 @@ global: ## Configuration Architecture -The `nautobot_config.py` file is managed in git at -`components/nautobot/nautobot_config.py` and injected into pods via the -Helm chart's `fileParameters` feature. ArgoCD reads the file, the Helm -chart creates a ConfigMap, and pods mount it at -`/opt/nautobot/nautobot_config.py`. The `NAUTOBOT_CONFIG` environment -variable tells Nautobot to load from that path. +The `nautobot_config.py` file is injected into pods via the Helm chart's +`fileParameters` feature. The default path is +`$understack/components/nautobot/nautobot_config.py`, but deployments can +override it with `global.nautobot.nautobot_config`. A deployment can +point this value at a shared deploy-repo config: + +```yaml title="$CLUSTER_NAME/deploy.yaml" +global: + nautobot: + nautobot_config: '$deploy/apps/nautobot-config/nautobot_config.py' +``` + +ArgoCD reads the selected file, the Helm chart creates a ConfigMap, and +pods mount it at `/opt/nautobot/nautobot_config.py`. The +`NAUTOBOT_CONFIG` environment variable tells Nautobot to load from that +path. The effective configuration is built from four layers: Nautobot defaults, -the component config, Helm chart env vars from the base values, and -deploy repo value overrides. +the selected `nautobot_config.py`, Helm chart env vars from the base +values, and deploy repo value overrides. For the full details on how `fileParameters` works, why the baked-in image config is not used, config layering, and the Helm list replacement @@ -55,8 +65,9 @@ operator guide. ## Plugin Loading -For details on how plugins are loaded, configured via environment -variables, and how to add custom plugins, see the +Deployment-specific plugin configuration can live in the shared deploy +`nautobot_config.py`, with credentials supplied by `nautobot-custom-env`. +For details, see the [Plugin Loading](../../operator-guide/nautobot.md#plugin-loading) operator guide. @@ -75,9 +86,19 @@ used by both the global Nautobot deployment and site-level workers: | `nautobot-cluster-replication` | Certificate | Streaming replication client certificate (`CN=streaming_replica`). Required so CNPG does not need the CA private key in `clientCASecret`. | | `nautobot-redis-server-tls` | Certificate | Redis server certificate | | `nautobot-mtls-client` | Certificate | Client certificate for global Nautobot/Celery pods (`CN=app`). Used for both PostgreSQL `pg_hba cert` auth and Redis `authClients`. | +| `nautobot-mtls-client-` | Certificate | Per-site worker client certificate (`CN=app`) issued on the global cluster and distributed to the site cluster through the secrets provider. | All resources live in the `nautobot` namespace. +Client certificate naming: + +- Global cluster `nautobot-mtls-client` is mounted directly by global + Nautobot web and Celery pods. +- Global cluster `nautobot-mtls-client-` is the source Secret for + one site. Its cert/key are copied to the secrets provider. +- Site cluster `nautobot-mtls-client` is created by ExternalSecret from + the provider data for that site and mounted by site workers. + For certificate renewal and distribution to site clusters, see the [mTLS Certificate Renewal](../../operator-guide/nautobot-mtls-certificate-renewal.md) operator guide. @@ -133,6 +154,10 @@ Both global Nautobot pods and site workers set (`CN=app`) during the TLS handshake. The `pg_hba cert` rule maps the certificate CN to the PostgreSQL user. +Deployments may also set `NAUTOBOT_DB_SSLNEGOTIATION=direct` for both +global Nautobot pods and site workers. Only use `direct` when both +PostgreSQL and libpq are 17 or newer. + ## Deployment Repo Content {{ secrets_disclaimer }} @@ -143,13 +168,13 @@ Required or commonly required items: - `nautobot-django` Secret: Provide a `NAUTOBOT_SECRET_KEY` value. - `nautobot-redis` Secret: Provide a `NAUTOBOT_REDIS_PASSWORD` value. - `nautobot-superuser` Secret: Provide `username`, `password`, `email`, and `apitoken` for the initial administrative account. +- `nautobot-custom-env` Secret: Required when using a deploy-specific config that reads additional integration credentials or runtime settings from environment variables. Optional additions: - `nautobot-sso` Secret: Provide `client-id`, `client-secret`, and `issuer` when Nautobot authenticates through an external identity provider. - `aws-s3-backup` Secret: Provide `access-key-id` and `secret-access-key` when scheduled backups are pushed to object storage. - `dockerconfigjson-github-com` Secret: Provide `.dockerconfigjson` if Nautobot images or plugins come from a private registry. -- `nautobot-custom-env` Secret: Add any extra environment variables the deployment should inject into Nautobot, such as integration credentials or DSNs. - `Database cluster and backup manifests`: Add a CloudNativePG cluster, backup schedule, or similar database resources if this deployment owns its own PostgreSQL cluster. - `Catalog and bootstrap content`: Add app definitions, device types, location types, locations, rack groups, or racks if you want Nautobot preloaded with inventory metadata. diff --git a/docs/deploy-guide/site-cluster.md b/docs/deploy-guide/site-cluster.md index 51ece33fd..65c23aa59 100644 --- a/docs/deploy-guide/site-cluster.md +++ b/docs/deploy-guide/site-cluster.md @@ -383,7 +383,6 @@ Enable these as needed for your deployment. | Envoy configs | `site.envoy_configs` | Gateway routes and policies | | etcd backup | `site.etcdbackup` | etcd backup | | Monitoring | `site.monitoring` | Prometheus/Grafana | -| Nautobot site | `site.nautobot_site` | Site-specific Nautobot resources | | OpenEBS | `site.openebs` | Storage (if using OpenEBS) | | OpenStack exporter | `site.openstack_exporter` | Prometheus metrics for OpenStack | | OpenStack Resource Controller | `site.openstack_resource_controller` | OpenStack resource operator | diff --git a/docs/operator-guide/nautobot-celery-queues.md b/docs/operator-guide/nautobot-celery-queues.md index 18ced9edb..c42c26db2 100644 --- a/docs/operator-guide/nautobot-celery-queues.md +++ b/docs/operator-guide/nautobot-celery-queues.md @@ -2,20 +2,20 @@ This guide covers how Celery task queues work in the understack nautobot-worker deployment, how the queue name is derived from the -site partition, and how to route jobs to site-specific queues +site name, and how to route jobs to site-specific queues programmatically. ## How the Queue Name is Set The ArgoCD Application template for `nautobot-worker` automatically -sets the Celery queue name to match the site's partition label -(`understack.rackspace.com/partition`). The relevant section in +sets the Celery queue name to match the site label +(`understack.rackspace.com/site`). The relevant section in `application-nautobot-worker.yaml`: {% raw %} ```yaml -{{- with index $.Values.appLabels "understack.rackspace.com/partition" }} +{{- with index $.Values.appLabels "understack.rackspace.com/site" }} values: | workers: default: @@ -28,20 +28,23 @@ values: | {% endraw %} -For a site with partition `rax-dev`, this renders as: +For a site label `site-dc`, this renders as: ```yaml workers: default: enabled: false - rax-dev: + site-dc: enabled: true - taskQueues: "rax-dev" + taskQueues: "site-dc" ``` -This produces a Deployment named `nautobot-worker-celery-rax-dev` with -the label `app.kubernetes.io/component: nautobot-celery-rax-dev` and -the environment variable `CELERY_TASK_QUEUES=rax-dev`. +This produces a Deployment named `nautobot-worker-celery-site-dc` with +the label `app.kubernetes.io/component: nautobot-celery-site-dc` and +the environment variable `CELERY_TASK_QUEUES=site-dc`. + +The queue name comes from the ArgoCD Application label +`understack.rackspace.com/site`. ### Why workers.default must be disabled @@ -62,7 +65,7 @@ request with a validation error. Navigate to Jobs > Job Queues > Add and create a queue with: -- Name: `rax-dev` (must match the worker's `taskQueues` value) +- Name: `site-dc` (must match the worker's `taskQueues` value) - Queue Type: `celery` ### Create via the REST API @@ -72,7 +75,7 @@ curl -X POST \ -H "Authorization: Token $TOKEN" \ -H "Content-Type: application/json" \ https://nautobot.example.com/api/extras/job-queues/ \ - --data '{"name": "rax-dev", "queue_type": "celery"}' + --data '{"name": "site-dc", "queue_type": "celery"}' ``` ### Create via pynautobot @@ -81,18 +84,18 @@ curl -X POST \ import pynautobot nb = pynautobot.api("https://nautobot.example.com", token="your-token") -nb.extras.job_queues.create(name="rax-dev", queue_type="celery") +nb.extras.job_queues.create(name="site-dc", queue_type="celery") ``` ### Automate via Ansible -The `ansible/roles/jobs/tasks/main.yml` role enables Rackspace jobs -but does not currently create JobQueues. You can extend it: +The `ansible/roles/jobs/tasks/main.yml` role enables jobs but does not +currently create JobQueues. You can extend it: {% raw %} ```yaml -- name: "Ensure partition JobQueue exists" +- name: "Ensure site JobQueue exists" ansible.builtin.uri: url: "{{ nautobot_url }}/api/extras/job-queues/" method: POST @@ -100,7 +103,7 @@ but does not currently create JobQueues. You can extend it: Authorization: "Token {{ nautobot_token }}" body_format: json body: - name: "{{ partition }}" + name: "{{ site }}" queue_type: "celery" status_code: [200, 201, 400] ``` @@ -123,7 +126,7 @@ from nautobot.apps.jobs import Job class SyncSiteConfig(Job): class Meta: name = "Sync Site Config" - task_queues = ["rax-dev", "default"] + task_queues = ["site-dc", "default"] ``` ### Option 2: Via the Nautobot UI @@ -141,7 +144,7 @@ curl -X PATCH \ -H "Content-Type: application/json" \ https://nautobot.example.com/api/extras/jobs/$JOB_ID/ \ --data '{ - "job_queues": [{"name": "rax-dev"}, {"name": "default"}], + "job_queues": [{"name": "site-dc"}, {"name": "default"}], "job_queues_override": true }' ``` @@ -157,8 +160,8 @@ nb = pynautobot.api("https://nautobot.example.com", token="your-token") job = nb.extras.jobs.get(name="my_app.jobs.SyncSiteConfig") -# Run on the rax-dev site worker -result = job.run(data={"device": "server-01"}, task_queue="rax-dev") +# Run on the site worker +result = job.run(data={"device": "server-01"}, task_queue="site-dc") ``` The `task_queue` parameter (or `job_queue` -- both are accepted in @@ -174,7 +177,7 @@ curl -X POST \ https://nautobot.example.com/api/extras/jobs/$JOB_ID/run/ \ --data '{ "data": {"device": "server-01"}, - "task_queue": "rax-dev" + "task_queue": "site-dc" }' ``` @@ -196,7 +199,7 @@ Nautobot validates two things before accepting a job run request: 1. The requested queue must be in the job's allowed queues list. If not, the API returns: - `{"task_queue": ["\"rax-dev\" is not a valid choice."]}` + `{"task_queue": ["\"site-dc\" is not a valid choice."]}` 2. At least one Celery worker must be actively listening on the requested queue. If no worker is found, the API returns a @@ -209,20 +212,20 @@ To confirm a site worker is consuming from the correct queue: ```bash # Check the CELERY_TASK_QUEUES env var in the running pod -kubectl -n nautobot get deploy nautobot-worker-celery-rax-dev \ +kubectl -n nautobot get deploy nautobot-worker-celery-site-dc \ -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="CELERY_TASK_QUEUES")].value}' # Check worker logs for the queue binding kubectl logs -n nautobot \ - -l app.kubernetes.io/component=nautobot-celery-rax-dev \ + -l app.kubernetes.io/component=nautobot-celery-site-dc \ --tail=20 | grep "ready" ``` ## Multiple Sites -Each site gets its own queue named after its partition. For example: +Each site gets its own queue named after its site label. For example: -| Site | Partition | Queue Name | Deployment | +| Site | Site Label | Queue Name | Deployment | |---|---|---|---| | DC1 Staging | dc1-staging | dc1-staging | nautobot-worker-celery-dc1-staging | | DC1 Prod | dc1-prod | dc1-prod | nautobot-worker-celery-dc1-prod | @@ -247,12 +250,12 @@ The job does not have the requested queue in its allowed queues. Either: No worker is listening on the requested queue. Check: - The site's nautobot-worker ArgoCD Application is synced and healthy -- The worker pod is running: `kubectl get pods -n nautobot -l app.kubernetes.io/component=nautobot-celery-` +- The worker pod is running: `kubectl get pods -n nautobot -l app.kubernetes.io/component=nautobot-celery-` - The `CELERY_TASK_QUEUES` env var matches the queue name ### Job runs but nothing happens The job was dispatched to a queue that no worker is consuming. This can happen if `task_queue` was not specified and the job defaulted to -`"default"`, but the site worker is listening on `"rax-dev"`. Always +`"default"`, but the site worker is listening on `"site-dc"`. Always pass `task_queue` explicitly when targeting a site worker. diff --git a/docs/operator-guide/nautobot-mtls-certificate-renewal.md b/docs/operator-guide/nautobot-mtls-certificate-renewal.md index 8ed9d37df..a828609f4 100644 --- a/docs/operator-guide/nautobot-mtls-certificate-renewal.md +++ b/docs/operator-guide/nautobot-mtls-certificate-renewal.md @@ -9,8 +9,18 @@ see the [nautobot-worker deploy guide](../deploy-guide/components/nautobot-worke ## How Certificates Are Issued Client certificates are issued by cert-manager on the global cluster -using the `mtls-ca-issuer` (backed by a self-signed root CA). Each site -gets its own Certificate resource: +using the `mtls-ca-issuer` (backed by a self-signed root CA). + +There are three relevant Secret names: + +- Global cluster `nautobot-mtls-client`: mounted by global Nautobot web + and Celery pods. +- Global cluster `nautobot-mtls-client-`: source Secret for one + site. Its cert/key are copied to the secrets provider. +- Site cluster `nautobot-mtls-client`: created by ExternalSecret from + the provider data for that site and mounted by the site worker. + +Each site gets its own per-site Certificate resource: ```yaml apiVersion: cert-manager.io/v1 @@ -36,6 +46,17 @@ spec: cert-manager automatically renews the certificate 30 days before expiry, updating the Kubernetes secret on the global cluster. +For example, the global cluster issues `nautobot-mtls-client-` +from its `nautobot/` overlay. Operators or automation copy that Secret's +cert/key into the secrets provider for the site. The site worker does +not mount the global Secret directly; it mounts the site-local +`nautobot-mtls-client` Secret created by ExternalSecret. + +Global pods and site workers both mount +`/etc/nautobot/mtls/tls.crt`, `/etc/nautobot/mtls/tls.key`, and +`/etc/nautobot/mtls/ca.crt`, but the Kubernetes Secret behind those +paths is local to each cluster. + The global cluster also has: - `nautobot-mtls-client` -- client cert for global Nautobot/Celery pods @@ -56,6 +77,11 @@ ExternalSecret picks it up on its next refresh cycle. By default, this is a manual process: an operator extracts the renewed cert from the global cluster and uploads it to the secrets provider. +Site workers consume the provider data through an ExternalSecret named +`nautobot-mtls-client`. That ExternalSecret combines the per-site client +cert/key credential and the shared CA credential into one +`kubernetes.io/tls` secret with `tls.crt`, `tls.key`, and `ca.crt` using +`filterPEM`, then refreshes on the configured interval. ## Automation Approaches @@ -128,4 +154,6 @@ To recover, manually extract the renewed cert from the global cluster and upload it to your secrets provider. The site ExternalSecret will pick it up on the next refresh cycle, and the worker pods will automatically get the new cert on their next restart (or when the -secret volume is refreshed by kubelet). +secret volume is refreshed by kubelet). If the worker does not recover +after the ExternalSecret reports synced, restart the site worker +Deployment so the Celery process reopens the mounted certificate files. diff --git a/docs/operator-guide/nautobot.md b/docs/operator-guide/nautobot.md index 61d6a131e..2666358a9 100644 --- a/docs/operator-guide/nautobot.md +++ b/docs/operator-guide/nautobot.md @@ -25,6 +25,20 @@ pods automatically. | `nautobot-cluster-replication` | Certificate | CN=`streaming_replica`, usage=`client auth`, 1yr | CNPG streaming replication | | `nautobot-redis-server-tls` | Certificate | CN=`nautobot-redis-master.nautobot.svc`, 1yr | Redis server TLS | | `nautobot-mtls-client` | Certificate | CN=`app`, usage=`client auth`, 3yr | Client cert for nautobot/celery pods | +| `nautobot-mtls-client-` | Certificate | CN=`app`, usage=`client auth`, 3yr | Per-site client cert issued on the global cluster and copied to the site cluster through the secrets provider | + +### Client Certificate Names + +There are three relevant Secret names: + +- Global cluster `nautobot-mtls-client`: generated by the global + Certificate and mounted by global Nautobot web and Celery pods. +- Global cluster `nautobot-mtls-client-`: generated by a per-site + Certificate. Workloads do not mount this Secret directly; its cert/key + are copied to the external secrets provider for that site. +- Site cluster `nautobot-mtls-client`: generated by ExternalSecret from + the provider data for that site. This is the Secret mounted by + site-level worker pods at `/etc/nautobot/mtls/`. The server certificates (`nautobot-cluster-server-tls` and `nautobot-redis-server-tls`) include site-specific dnsNames that vary @@ -254,9 +268,10 @@ for site workers: !!! important "All config changes go in the deploy repo" The public `nautobot_config.py` at `$understack/components/nautobot/nautobot_config.py` is intentionally kept as simple and generic as possible for open-source consumers. - It does **not** contain mTLS, plugin loading, UNDERSTACK_PARTITION, or any extra - plugins mechanism. **All deployment-specific Nautobot configuration changes MUST be - made in the shared deploy config** at `$deploy/apps/nautobot-config/nautobot_config.py`. + It does **not** contain mTLS, plugin loading, UNDERSTACK_PARTITION, + UNDERSTACK_SITE, or any extra plugins mechanism. **All deployment-specific + Nautobot configuration changes MUST be made in the shared deploy config** at + `$deploy/apps/nautobot-config/nautobot_config.py`. Do not modify the public config for private deployment needs. Nautobot requires a `nautobot_config.py` file that defines Django @@ -270,30 +285,47 @@ for non-private deployments. ### How fileParameters Works Both the `nautobot` and `nautobot-worker` ArgoCD Applications use a -multi-source setup. The Helm chart source includes: +multi-source setup. The Helm chart source includes a configurable +`fileParameters` entry: ```yaml helm: fileParameters: - name: nautobot.config - path: $understack/components/nautobot/nautobot_config.py + path: ``` -ArgoCD reads the file content from the understack git repo and passes -it as the `nautobot.config` Helm value. The Nautobot Helm chart then -creates a ConfigMap from that content and mounts it into pods at -`/opt/nautobot/nautobot_config.py`. The `NAUTOBOT_CONFIG` environment -variable (set in the deploy repo values) tells Nautobot to load its -configuration from that path. +By default, both values point at +`$understack/components/nautobot/nautobot_config.py`. Private +deployments can override them to a deploy-repo file. The current +site and global deployments can set: + +```yaml +global: + nautobot: + nautobot_config: '$deploy/apps/nautobot-config/nautobot_config.py' + +site: + nautobot_worker: + nautobot_config: '$deploy/apps/nautobot-config/nautobot_config.py' +``` + +ArgoCD reads the selected file content from either the understack or +deploy repo and passes it as the `nautobot.config` Helm value. The +Nautobot Helm chart then creates a ConfigMap from that content and +mounts it into pods at `/opt/nautobot/nautobot_config.py`. The +`NAUTOBOT_CONFIG` environment variable (set in the deploy repo values) +tells Nautobot to load its configuration from that path. This approach means: -- The config file is version-controlled in git alongside the component - it configures +- The config file is version-controlled in the selected ArgoCD source + (`$understack` or `$deploy`) - Changes to the config trigger ArgoCD syncs and pod restarts automatically (the Helm chart checksums the ConfigMap) -- The same config file is shared by both the global nautobot deployment - and site-level workers, avoiding drift +- Global Nautobot and site-level workers can share the same private + config file when they need identical mTLS, plugin, and hardening + behavior ### Why Not Use the Baked-In Config? @@ -325,12 +357,13 @@ The effective configuration is built from multiple layers: 1. **Nautobot defaults** -- `from nautobot.core.settings import *` provides all default Django and Nautobot settings -2. **Deploy config** -- `$deploy/apps/nautobot-config/nautobot_config.py` - (for private deployments) overrides defaults with all deployment-specific - settings: mTLS, plugin loading, SSO, Sentry, production hardening, - UNDERSTACK_PARTITION, and verbose logging. For non-private deployments, - the public `components/nautobot/nautobot_config.py` provides a minimal - default with SSO and basic settings only. +2. **Selected config file** -- either the public + `$understack/components/nautobot/nautobot_config.py` default or a + private deploy config such as + `$deploy/apps/nautobot-config/nautobot_config.py`. A + deployment-specific config can contain settings such as PostgreSQL + mTLS, Redis mTLS, SSO, production hardening, `UNDERSTACK_PARTITION`, + `UNDERSTACK_SITE`, plugin configuration, and logging. 3. **Helm chart env vars** -- the base `components/nautobot/values.yaml` sets database, Redis, and other connection parameters as environment variables that the config reads via `os.getenv()` @@ -374,6 +407,13 @@ preserve. has a static `PLUGINS_CONFIG` entry for `vni_custom_model`. Do not add plugins or plugin config to the public config. +Deployment-specific plugin credentials and integration settings can be +injected through the `nautobot-custom-env` secret, which is referenced by +both the global Nautobot values and the site `nautobot-worker` component +via `extraEnvVarsSecret`. Keep secret names and environment variable +names generic in public docs; document provider-specific mappings in the +deploy repo that owns those secrets. + ## Nautobot Django shell You can access the Nautobot Django shell by connecting to the pod and running the diff --git a/mkdocs.yml b/mkdocs.yml index c791d971a..5d52b5bad 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -188,7 +188,6 @@ nav: - deploy-guide/components/mariadb-operator.md - deploy-guide/components/monitoring.md - deploy-guide/components/nautobot-api-tokens.md - - deploy-guide/components/nautobot-site.md - deploy-guide/components/nautobot.md - deploy-guide/components/nautobotop.md - deploy-guide/components/nautobot-worker.md