Verify it's working with vtysh -c 'show bgp ipv4' ; vtysh -c 'show bgp ipv6'.
! -*- bgp -*-
!
frr defaults traditional
!
router bgp 64513
bgp router-id 192.168.5.1
no bgp ebgp-requires-policy
maximum-paths 9
neighbor k8s-v4 peer-group
neighbor k8s-v4 remote-as 64514
bgp listen range 192.168.20.0/24 peer-group k8s-v4
address-family ipv4 unicast
neighbor k8s-v4 next-hop-self
neighbor k8s-v4 soft-reconfiguration inbound
exit-address-family
!
neighbor k8s-v6 peer-group
neighbor k8s-v6 remote-as 64514
bgp listen range ${ipv6_prefix}::/64 peer-group k8s-v6
address-family ipv6 unicast
neighbor k8s-v6 activate
neighbor k8s-v6 next-hop-self
neighbor k8s-v6 soft-reconfiguration inbound
neighbor k8s-v6 maximum-prefix 256
exit-address-family
!
route-map ALLOW-ALL permit 10
!
line vty
!
SSH into router, then run:
Note
This installs the on boot script, and then sets up the healthchecks.io ping.
You can leave the 05- and 06- scripts that are installed by default if you use them, I don't.
curl -fsL "https://raw.githubusercontent.com/unifi-utilities/unifios-utilities/HEAD/on-boot-script-2.x/remote_install.sh" | /bin/bash
rm /data/on_boot.d/05-install-cni-plugins.sh
rm /data/on_boot.d/06-cni-bridge.sh
URL=https://hc-ping.com/example-uuid
cat > /data/on_boot.d/20-healthchecksio.sh << EOF
#!/bin/sh
echo '* * * * * root curl -X POST $URL' > /etc/cron.d/healthchecksio
EOF
chmod a+x /data/on_boot.d/20-healthchecksio.sh
/data/on_boot.d/20-healthchecksio.shThere might be a situation where you want to destroy your Kubernetes cluster. The following command will reset your nodes back to maintenance mode, append --force to completely format your the Talos installation. Either way the nodes should reboot after the command has successfully ran.
task talos:reset # --force-
Install Talos:
[!NOTE] It might take a while for the cluster to be setup (10+ minutes is normal). During which time you will see a variety of error messages like: "couldn't get current server API group list," "error: no matching resources found", etc. 'Ready' will remain "False" as no CNI is deployed yet. This is a normal. If this step gets interrupted, e.g. by pressing
Ctrl+C, you likely will need to reset the cluster before trying againtask talos:generate-config task bootstrap:talos
-
Push your changes to git:
git add -A git commit -m "chore: add talhelper encrypted secret :lock:" git push -
Install cilium, coredns, cert-manager, external-secrets, flux and sync the cluster to the repository state:
task bootstrap:apps
-
Watch the rollout of your cluster happen:
watch kubectl get pods --all-namespaces
By default Flux will periodically check your git repository for changes. In order to have Flux reconcile on git push you must configure Github to send push events to Flux.
-
Obtain the webhook path:
[!NOTE] Hook id and path should look like
/hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123kubectl -n flux-system get receiver github-receiver -o jsonpath='{.status.webhookPath}' -
Piece together the full URL with the webhook path appended:
https://flux-webhook.${cloudflare.domain}/hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123 -
Navigate to the settings of your repository on Github, under "Settings/Webhooks" press the "Add webhook" button. Fill in the webhook URL and your
${github.webhook_token}secret from the secret, Content type:application/json, Events: Choose Just the push event, and save.
Important
Ensure you have updated talconfig.yaml and any patches with your updated configuration. In some cases you not only need to apply the configuration but also upgrade talos to apply new configuration.
# (Re)generate the Talos config
task talos:generate-config
# Apply the config to the node
task talos:apply-node IP=? MODE=?
# e.g. task talos:apply-node IP=10.10.10.10 MODE=autoImportant
Ensure the talosVersion and kubernetesVersion in talconfig.yaml are up-to-date with the version you wish to upgrade to.
The system-upgrade-controller should handle this now, just update the versions in its ks.yaml.
It is however a good idea to manually run apply-cluster afterward so the in cluster config update.
# Upgrade node to a newer Talos version
task talos:upgrade-node IP=?
# e.g. task talos:upgrade-node IP=10.10.10.10# Upgrade cluster to a newer Kubernetes version
task talos:upgrade-k8s
# e.g. task talos:upgrade-k8sBelow is a general guide on trying to debug an issue with an resource or application. For example, if a workload/resource is not showing up or a pod has started but in a CrashLoopBackOff or Pending state. Most of these steps do not include a way to fix the problem as the problem could be one of many different things.
-
Verify the Git Repository is up-to-date and in a ready state.
flux get sources git -A
Force Flux to sync your repository to your cluster:
flux -n flux-system reconcile ks flux-system --with-source
-
Verify all the Flux kustomizations are up-to-date and in a ready state.
flux get ks -A
-
Verify all the Flux helm releases are up-to-date and in a ready state.
flux get hr -A kubectl get hr -A kubectl describe hr -n namespace release-name # look at the bottom, for the recent helm logs -
Do you see the pod of the workload you are debugging?
kubectl -n <namespace> get pods -o wide
-
Check the logs of the pod if its there.
kubectl -n <namespace> logs <pod-name> -f
-
If a resource exists try to describe it to see what problems it might have.
kubectl -n <namespace> describe <resource> <name>
-
Check the namespace events
kubectl -n <namespace> get events --sort-by='.metadata.creationTimestamp'
Resolving problems that you have could take some tweaking of your YAML manifests in order to get things working, other times it could be a external factor like permissions on a NFS server. If you are unable to figure out your problem see the support sections below.