This project provides an Infrastructure as Code (IaC) solution to deploy an Azure Kubernetes Service (AKS) cluster, API Management (APIM) instance, and a demo API using Azure Service Operator and GitOps with Flux. The setup is designed to follow Azure best practices for security, scalability, and automation.
- AKS Cluster: Deploys an AKS cluster with managed identities, autoscaling, and Azure CNI networking.
- Azure Service Operator: Manages Azure resources declaratively within the AKS cluster.
- API Management: Creates an APIM instance and a demo API.
- GitOps with Flux: Synchronizes configurations from a GitHub repository.
- An active Azure subscription.
- Azure CLI installed on your local machine.
- Git installed on your local machine.
- A GitHub account to host the repository.
I have created this project to explore the Azure Service Operator (ASO) and see what the use-cases are to implement this tool. Whilst also keeping in mind Best Practices which you can find below.
| Best Practice | Status | Notes |
|---|---|---|
| Least Privilege | ✅/ |
Bicep itself doesn't assign excessive permissions, but make sure the deployment identity has only required roles (e.g., Contributor on the resource group, Kubernetes Cluster - Azure Arc Onboarding for Flux). |
| Idempotence | ✅ | Bicep is declarative. Redeploying will not create duplicates or errors on existing resources (as long as resource names match). |
| Identity | ✅ | Use of system-assigned managed identity for AKS and extensions |
| GitOps | ✅ | Using Flux v2 extension natively for GitOps |
| Security | ✅ | SSH key authentication (no passwords) |
| Resource Registration | ✅ | CLI script includes --wait to handle async provider registration |
| Modular Parameters | ✅ | Use of parameters for Git repo, SSH key, and paths |
| Flux Kustomization | ✅ | GitOps configured with prune, sync intervals, and scoped paths |
| Auto-upgrades | ✅ | Flux extension uses autoUpgradeMinorVersion: true |
| Config drift | ✅ | Bicep and ASO together with FLUX will make sure that there will not be any configuration drift between the config & the actual deployment, you can even set azure to 'read-only' |
The example used is NOT production ready, there are some things that you will need to consider which will be explained below! The example is purely as a Proof Of Concept to address my motivation, but I will give some tips how to make it production ready!
ASO also advises to not run their system on a free-tier based AKS cluster
For networking I now use the default that Microsoft provides but for security reasons like pod-level isolation you should consider using Azure Network Policies or Calico.
For production workload you should use private AKS clusters, this way you prevent that you expose the Kubernetes API Server publicly
You should create your own ACR instead of getting images directly from the source so that you can proxy/mirror those helm charts and not get rate limited.
For security and audit reasons you should always disable the Local Admin on your kubernetes cluster and force AAD-based authentication
You should add a keyvault to store your flux secrets and potentially other passwords or credentials so that they are no available in your source code.
To prevent accidental deletion you should consider adding resource locks to all resources or at least the one that are statefull or cannot be offline for a short period of time.
Based on your requirements and SLA needs you should add 3 availability zones to increase resilience.
You can do this by setting the following value in the agentpoolprofile;
availabilityZones: [
'1', '2', '3'
]Enable Azure Monitor for containers for observability.
Consider integrating Azure Policy for Kubernetes.
Set up alerts for Flux sync failures or AKS node health.
Implement a strategy using tools like Velero to back up cluster state and PVs. You could use velero for this purpose.
Add autoscaling to your cluster to prevent you to overspend on resources that are not being utilized.
You can do this by setting the following value in the agentpoolprofile;
enableAutoScaling: true
minCount: 1
maxCount: 3For production based environments it is always good to keep atleast 2/3 of your environment up and running while upgrades are being done.
You can do this by setting the following value in the agentpoolprofile;
upgradeSettings: {
maxSurge: '33%'
}You can automatically upgrade your AKS cluster and nodepools when a new version releases (either LTS or none LTS).
You can do this by setting the following value in the properties;
autoUpgradeProfile: {
upgradeChannel: 'stable'
}git clone <repository-url>
cd azure-aks-asoEnsure you are working in the correct Azure subscription:
az account set --subscription <subscription_id_or_name>Create a Service Principal to manage Azure resources:
az ad sp create-for-rbac --name <your-service-principal-name> --role Contributor --scopes /subscriptions/<your-subscription-id> --sdk-authCopy the output as the whole JSON, which includes the Client ID, Client Secret, and Tenant ID.
If you use github just like me than you need to save the output in github.
- Go to your GitHub repository (the-stratbook).
- Navigate to Settings > Secrets and variables > Actions.
- Click New repository secret.
- Name the secret AZURE_CREDENTIALS.
- Paste the entire JSON output from the Azure CLI command into the Secret value box.
- Click Add secret.
To deploy the infrastructure you can use 2 methods, either you use the deploy.yaml and run it from Github or you run it by hand using the instruction below
Push the repository to GitHub and configure the GitHub Actions workflow:
git remote add origin <github-repo-url>
git push -u origin mainRun the following commands to deploy the infrastructure:
az deployment group create \
--resource-group <resource_group_name> \
--template-file infra/main.bicepThe GitHub Actions workflow will automatically deploy the infrastructure and synchronize configurations. Monitor the workflow in the GitHub Actions tab of your repository.
infra/
main.bicep # Bicep template for AKS
manifests/
apim/
apim-instance.yaml # APIM instance manifest
demo-api.yaml # Demo API manifest
operator/
azure-service-operator.yaml # Azure Service Operator manifest
- Replace placeholders like
<spName>,<client_id>,<client_secret>,<tenant_id>,<resource_group_name>, and<github-repo-url>with your actual values. - Ensure you have the necessary permissions to create and manage Azure resources.
- Use
az account showto verify the active subscription. - Check the GitHub Actions logs for deployment errors.
- Use
kubectlto debug issues in the AKS cluster.
TODO: shouldnt this be scriped in the bicep?
enable using;
az aks update -g <rgName> -n <aksName> --enable-azure-rbac
When you get an error like;
Error from server (Forbidden): pods is forbidden: User <email> cannot list resource <resource> in API group "" at the cluster scope: User does not have access to the resource in Azure. Update role assignment to allow access.
Than youll need to (re-)add yourself to the group with the correct rights like so;
az role assignment create --assignee <email> --role "Azure Kubernetes Service RBAC Cluster Admin" --scope $(az aks show -g <rgName> -n <aksName> --query id -o tsv) This project is licensed under the MIT License.