PulsarGeoReplication is a custom resource that enables the configuration of geo-replication between different Pulsar instances. It allows you to set up unidirectional replication of data from one Pulsar cluster to another, even when these clusters are geographically distributed or in separate Pulsar instances.
Key points about PulsarGeoReplication:
- It's used for configuring replication between separate Pulsar instances.
- The replication is unidirectional. To set up bidirectional replication, you need to create two PulsarGeoReplication resources, one for each direction.
- It creates a new cluster in the destination Pulsar instance for each PulsarGeoReplication resource.
- It's different from configuring geo-replication between clusters within a single Pulsar instance. For that purpose, use the
replicationClustersfield in thePulsarNamespaceresource instead.
PulsarGeoReplication is particularly useful for scenarios where you need to replicate data across different Pulsar deployments, such as disaster recovery, data locality, or compliance with data residency requirements.
This resource should be used specifically for configuring geo-replication between clusters in different Pulsar instances. If you're looking to set up geo-replication between clusters within the same Pulsar instance, you should use the replicationClusters field in the PulsarNamespace resource instead.
Using PulsarGeoReplication for clusters within the same Pulsar instance may lead to unexpected behavior and is not the intended use case for this resource. Always ensure you're dealing with separate Pulsar instances before utilizing PulsarGeoReplication.
The PulsarGeoReplication resource has the following specifications:
| Field | Description | Required |
|---|---|---|
connectionRef |
Reference to the PulsarConnection resource used to connect to the source Pulsar cluster. | Yes |
destinationConnectionRef |
Reference to the PulsarConnection resource used to connect to the destination Pulsar cluster. | Yes |
lifecyclePolicy |
Determines whether to keep or delete the geo-replication configuration when the Kubernetes resource is deleted. Options: CleanUpAfterDeletion, KeepAfterDeletion. Default is CleanUpAfterDeletion. |
No |
clusterParamsOverride |
Allows overriding specific cluster parameters when setting up geo-replication. This is useful when the destination cluster requires different configuration than what's defined in the destinationConnectionRef. See Cluster Parameters Override for details. |
No |
The PulsarGeoReplication resource is designed to configure geo-replication between separate Pulsar instances. It creates a new "Cluster" in the destination Pulsar cluster identified by destinationConnectionRef. This setup allows configuring the replication of data from the source cluster (identified by connectionRef) to the destination cluster. By establishing this connection, the brokers in the source cluster can communicate with and replicate data to the brokers in the destination cluster, enabling geo-replication between the two separate Pulsar instances.
The lifecyclePolicy field not only affects the geo-replication configuration but also determines how the Destination Cluster is handled in the source cluster when the PulsarGeoReplication resource is deleted:
-
CleanUpAfterDeletion(default): When the PulsarGeoReplication resource is deleted, the operator will remove the Destination Cluster configuration from the source cluster. This means that the cluster entry created in the source cluster for the destination will be deleted using the Pulsar admin API's DELETE /admin/v2/clusters/{cluster} endpoint, effectively removing all traces of the geo-replication setup. -
KeepAfterDeletion: If this policy is set, the Destination Cluster configuration will remain in the source cluster even after the PulsarGeoReplication resource is deleted. This can be useful if you want to temporarily remove the Kubernetes resource while maintaining the ability to quickly re-establish the geo-replication later. The cluster configuration can be viewed using the Pulsar admin API's GET /admin/v2/clusters/{cluster} endpoint.
It's important to note that this deletion behavior applies to the cluster configuration in the source Pulsar instance. The actual destination Pulsar instance and its data are not affected by this deletion process. The operator only manages the configuration that enables communication between the two Pulsar instances for geo-replication purposes, which is set up using the Pulsar admin API's PUT /admin/v2/clusters/{cluster} endpoint when the PulsarGeoReplication resource is created.
Both connectionRef and destinationConnectionRef are of type corev1.LocalObjectReference, which means they should reference existing PulsarConnection resources in the same namespace. For detailed information on how to create and manage PulsarConnection resources, please refer to the PulsarConnection documentation.
Note: When configuring geo-replication between connectionRef and destinationConnectionRef, it is important to ensure:
- The brokers in the
connectionRefcluster are able to communicate with thedestinationConnectionRefcluster, and thedestinationConnectionRefcluster is able to authenticate the connections from theconnectionRefcluster.
The clusterParamsOverride field provides a powerful way to customize cluster configuration for geo-replication without modifying the underlying PulsarConnection resource. This is particularly useful when:
- Different authentication is required for geo-replication compared to regular cluster operations
- Alternative URLs need to be used for inter-cluster communication
- Specific TLS configurations are needed for cross-cluster connections
The clusterParamsOverride supports the following fields:
-
URL Configuration:
serviceURL: Override the HTTP(S) URL for the Pulsar cluster's admin serviceserviceSecureURL: Override the HTTPS URL for secure admin connectionsbrokerServiceURL: Override the non-TLS URL for broker connectionsbrokerServiceSecureURL: Override the TLS-enabled URL for secure broker connections
-
TLS Configuration:
brokerClientTrustCertsFilePath: Override the path to trusted TLS certificates
-
Authentication Configuration:
authentication.authPlugin: Override the authentication plugin class nameauthentication.authParameters: Override the authentication parameters
When authentication is specified in the override, the system automatically:
- Skips secret validation checks for the destination connection
- Avoids unnecessary Secret API calls for improved performance
- Uses the override authentication directly without processing the destinationConnectionRef authentication
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarGeoReplication
metadata:
name: us-east-to-west-geo-replication
namespace: us-east
spec:
connectionRef:
name: us-east-local-connection
destinationConnectionRef:
name: us-east-to-west-connection
clusterParamsOverride:
# Override URLs for cross-cluster communication
serviceURL: "https://geo-replication-admin.us-west.example.com:8443"
brokerServiceURL: "pulsar://geo-replication-broker.us-west.example.com:6650"
# Override authentication for geo-replication
authentication:
authPlugin: "org.apache.pulsar.client.impl.auth.AuthenticationToken"
authParameters: "token:geo-replication-specific-token"Important Notes:
- Override parameters take precedence over the corresponding fields in
destinationConnectionRef - Only non-null override values will replace the destination connection values
- Authentication override is particularly useful for scenarios requiring different credentials for geo-replication
- The override does not affect how the operator connects to manage other resources in the destination cluster
The lifecyclePolicy field determines what happens to the geo-replication configuration when the Kubernetes PulsarGeoReplication resource is deleted:
CleanUpAfterDeletion(default): The geo-replication configuration will be removed from both Pulsar clusters when the Kubernetes resource is deleted.KeepAfterDeletion: The geo-replication configuration will remain in both Pulsar clusters even after the Kubernetes resource is deleted.
For more information about lifecycle policies, refer to the PulsarResourceLifeCyclePolicy documentation.
For more detailed information about geo-replication in Pulsar, refer to the Pulsar Geo-replication documentation.
The PulsarGeoReplication is a unidirectional setup. When you create a PulsarGeoReplication for us-east only, data will be replicated from us-east to us-west. If you need to replicate data between clusters us-east and us-west, you need to create PulsarGeoReplication for both us-east and us-west.
The Pulsar Resource Operator will create a new cluster in the source connection for each PulsarGeoReplication. This new cluster represents the destination cluster and is created using the information from the PulsarConnection resource referenced by destinationConnectionRef. The name of this cluster will be derived from the destination cluster's name, which is specified in the clusterName field of the destination PulsarConnection. This setup allows the source cluster to recognize and replicate data to the destination cluster, enabling geo-replication between the two separate Pulsar instances.
- Define a Geo-replication named
us-east-geoand save the YAML file asus-east-geo.yaml.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarGeoReplication
metadata:
name: us-east-geo
namespace: us-east
spec:
connectionRef:
name: us-east-cluster-local-connection
destinationConnectionRef:
name: us-east-cluster-destination-connection
lifecyclePolicy: CleanUpAfterDeletion- Create the PulsarGeoReplication resource.
kubectl apply -f us-east-geo.yaml- Verify that the PulsarGeoReplication resource is created successfully.
kubectl get pulsargeoreplication us-east-geo -n us-east- Verify that the new cluster is created in the source Pulsar instance.
pulsar-admin clusters list --url http://<us-east-public-address>:8080This section describes how to configure Geo-replication between clusters us-east-sn-platform and us-west-sn-platform in different namespaces of the same Kubernetes cluster.
The relation is shown below.
graph TD;
us-east-->us-west;
us-west-->us-east;
- Deploy two separate pulsar clusters, each in a different namespace. Each cluster has its own configuration store.
- Ensure that both clusters can access each other.
Add the Pulsar cluster us-east-sn-platform information through the clusterName and brokerServiceURL fields to the existing PulsarConnection.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarConnection
metadata:
name: us-east-local-connection
namespace: us-east
spec:
# The local us-east cluster name
clusterName: us-east-sn-platform
# The local us-east cluster connection URL, you can use Kubernetes internal DNS.
adminServiceURL: http://us-east-sn-platform-broker.us-east.svc.cluster.local:8080
brokerServiceURL: pulsar://us-east-sn-platform-broker.us-east.svc.cluster.local:6650The destination PulsarConnection has the information of the Pulsar clusterus-west-sn-platform. Add the Pulsar cluster us-west-sn-platform information through the clusterName and brokerServiceURL fields to the destination PulsarConnection.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarConnection
metadata:
name: us-west-dest-connection
namespace: us-west
spec:
# The destination us-west cluster name
clusterName: us-west-sn-platform
# The destination us-west cluster connection URL, you should use Kubernetes external LB address.
adminServiceURL: http://<us-west-public-address>:8080
brokerServiceURL: pulsar://<us-west-public-address>:6650When you want to use tls to connect remote cluster, you need to do some extra steps.
- For a selfsigning cert, you need to create a secret to store the cert file of connecting the
us-westbrokers.
apiVersion: v1
data:
ca.crt: xxxxx
kind: Secret
metadata:
name: us-west-tls-broker
namespace: us-esat
type: Opaque- Mount the secret to
us-westpulsarbroker by adding these line to thepulsarbroker.spec.pod.secretRefs. The mount path will be used inus-westpulsar connection.
spec:
pod:
secretRefs:
- mountPath: /etc/tls/us-west
secretName: us-west-tls-broker- Add
adminServiceSecureURLandbrokerServiceSecureURLto the destination connection
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarConnection
metadata:
name: us-east-to-west-connection
namespace: us-east
spec:
# The destination us-west cluster name
clusterName: us-west-sn-platform
# The destination us-west cluster connection URL, you should use Kubernetes external LB address.
adminServiceURL: http://<us-west-public-address>:8080
brokerServiceURL: pulsar://<us-west-public-address>:6650
authentication:
token:
value: xxxx
adminServiceSecureURL: https://<us-west-public-address:8443 # the remote pulsar admin secure service
brokerServiceSecureURL: pulsar+ssl://<us-west-public-address:6651 # the remote pulsar broker secure service
brokerClientTrustCertsFilePath: /etc/tls/us-west/ca.crt # Optional. The cert path is the mountPath in the above step if you are using selfsigning cert. This section enabled Geo-replication on us-east, which replicates data from us-east to us-west. The operator will create a new cluster entry called us-west-sn-platform in us-east cluster.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarGeoReplication
metadata:
name: us-east-to-west-geo-replication
namespace: us-east
spec:
# The local us-east cluster connection
connectionRef:
name: us-east-local-connection
# The destination us-west cluster connection
destinationConnectionRef:
name: us-east-to-west-connection
lifecyclePolicy: CleanUpAfterDeletionYou can create a new tenant or update an existing tenant by adding the field geoReplicationRefs. It will add the cluster us-west-sn-platform to the tenant.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarTenant
metadata:
name: geo-tenant
namespace: us-east
spec:
name: geo-tenant
connectionRef:
name: us-east-local-connection
geoReplicationRefs:
- name: us-east-to-west-geo-replication
lifecyclePolicy: CleanUpAfterDeletionYou can create a new namespace or update an existing namespace by adding the field geoReplicationRefs. It will add the namespace to us-west-sn-platform.
Note
Once you enable Geo-replication at the namespace level, messages to all topics within that namespace are replicated across clusters.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarNamespace
metadata:
name: geo-namespace
namespace: us-east
spec:
name: geo-test/geo-namespace
connectionRef:
name: us-east-local-connection
geoReplicationRefs:
- name: us-east-to-west-geo-replication
lifecyclePolicy: CleanUpAfterDeletionYou can create a new topic or update an existing topic by adding the field geoReplicationRefs. It will add the topic to us-west-sn-platform.
apiVersion: resource.streamnative.io/v1alpha1
kind: PulsarTopic
metadata:
name: geo-topic
namespace: us-east
spec:
name: persistent://geo-test/geo-namespace/geo-topic
partitions: 1
connectionRef:
name: us-east-local-connection
geoReplicationRefs:
- name: us-east-to-west-geo-replication
lifecyclePolicy: CleanUpAfterDeletionAfter the resources are ready, you can test Geo-replication by producing and consuming messages.
- Open a terminal and run the command
./bin/pulsar-client produce geo-test/geo-namespace/geo-topic -m "hello" -n 10to produce messages tous-east. - Open another terminal and run the command
./bin/pulsar-client consume geo-test/geo-namespace/geo-topic -s sub -n 0to consume messages fromus-west.