This repo provides Airflow provider for managing Cloudera CDP clusters. It can be utilised to orchestrate CDO cluster actions in your airflow dag.
CDP Airflow provider includes airflow operators and connection type to interact with Cloudera on Cloud (CDP-PC) clusters. It includes operator to manage (a) CDP datahub clusters (b) COD database clusters
Provides CDP datahub operator to utilise in Airflow DAGs. It supports
- Start / Stop CDP DataHub cluster operations
- Wait for cluster to be ready (optional)
- Configurable operation timeouts
CDPDataHubOperator accepts following parameters
cluster_name(required): Name of the CDP DataHub clusterenvironment_name(required): CDP environment nameoperation(required): The operation to perform ('start' or 'stop')wait_for_cluster(optional): Whether to wait for cluster to be ready before proceeding (default: True)cluster_wait_timeout(optional): Timeout in seconds for waiting cluster to be ready (default: 1800)
Provides COD operator to utilise in Airflow DAGs. It supports
- Start / Stop COD database cluster operations
- Wait for cluster to be ready (optional)
- Configurable operation timeouts
CDPDataHubOperator accepts following parameters
database_name(required): Name of the COD database clusterenvironment_name(required): CDP environment nameoperation(required): The operation to perform ('start' or 'stop')wait_for_cluster(optional): Whether to wait for cluster to be ready before proceeding (default: True)cluster_wait_timeout(optional): Timeout in seconds for waiting cluster to be ready (default: 1800)
CDP Airflow operators utilises cdp cli and requires CDP access key to interact with CDP cluster. To generate CDP access key, see the steps provided from Cloudera documentation. Note : cdp-airflow operatior does not require creating .credentials file. Instead installation of cdp-airflow-provider provides "CDP" airflow connection type with ability to supply access-keyid and private-key. Both are required for interaction with CDP cluster.
- New connection type - CDP
- Accepts credentials - Cloudera access key id & private key.
- Define Region to use the correct Cloudera Control Plane region (optional)
- Install CDP provider:
pip install cdp-airflow-provider @ git+https://github.com/nrladdha/CDP-AirflowProvider-
Restart Airflow
-
Using Airflow UI, add new connection of connection type as "cdp" using Admin--> Connections--> Add connection menu. Enter the following information : Access key ID : Copy and paste the access key ID that generated in the Cloudera Management Console. Private key : Copy and paste the private key that generated in the Cloudera Management Console. Region : Optionally define region name to use the correct Cloudera Control Plane region. For example: cdp_region = eu-1
Click here to read steps to generate CDP access-key
- It assume CDP datahub cluster is already created and its cluster-name and environment-name are required while implementing a Airflow dag (workflow). Tasks like start and/or stop can be orchestrated using CDPDataHubOperator. Please see example dag example_cdp_datahub_dag.py, example_cdp_cod_dag.py
- Python 3.7+
- Apache Airflow 2.5.0+
- CDP CLI installed and configured
- Proper permissions to manage CDP DataHub clusters
We welcome contributions! Whether you're experimenting with the cdp-airflow provider plugin and identified bugs, fixing a bug, adding a feature, or improving documentation, all contributions are welcomed and helps drive this project. Please see our Contributing Guidelines for details.