Skip to content

Commit 978f44f

Browse files
[improve][doc] Clarify producer name uniqueness and Key_Shared batching requirements (#1093)
1 parent 572b928 commit 978f44f

15 files changed

Lines changed: 220 additions & 19 deletions

docs/client-libraries-consumers.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,12 @@ The `Shared` subscription is different from the `Exclusive` and `Failover` subsc
240240

241241
This is a new subscription type since 2.4.0 release. Create new consumers and subscribe with `Key_Shared` subscription type.
242242

243+
:::note Producer batching requirement
244+
245+
When using Key_Shared subscriptions, producers **must** either **disable batching** or **use key-based batching** (e.g., `BatcherBuilder.KEY_BASED` in Java). Default batching may pack messages with different keys into the same batch, breaking Key_Shared routing semantics. See [below](#key_shared-batching) for code examples.
246+
247+
:::
248+
243249
````mdx-code-block
244250
<Tabs groupId="lang-choice"
245251
defaultValue="Java"

docs/client-libraries-producers.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,49 @@ This example shows how to create a producer.
6161
</Tabs>
6262
````
6363

64+
### Producer naming
65+
66+
Every producer has a name that must be **unique across all Pulsar clusters**. If you do not explicitly set a name, Pulsar generates a globally unique name automatically. If you assign a name, the broker enforces that only one producer with that name can publish on a topic at a time.
67+
68+
You **must** set an explicit producer name when using [message deduplication](cookbooks-deduplication.md). Even when deduplication is not required, setting a meaningful producer name is recommended — it makes debugging significantly easier because the name appears in broker logs, admin stats, and metrics, letting you quickly trace messages back to the producing application.
69+
70+
````mdx-code-block
71+
<Tabs groupId="lang-choice"
72+
defaultValue="Java"
73+
values={[{"label":"Java","value":"Java"},{"label":"C++","value":"C++"},{"label":"Python","value":"Python"}]}>
74+
75+
<TabItem value="Java">
76+
77+
```java
78+
Producer<String> producer = pulsarClient.newProducer(Schema.STRING)
79+
.topic("my-topic")
80+
.producerName("my-unique-producer-name")
81+
.create();
82+
```
83+
84+
</TabItem>
85+
86+
<TabItem value="C++">
87+
88+
```cpp
89+
ProducerConfiguration producerConfig;
90+
producerConfig.setProducerName("my-unique-producer-name");
91+
Producer producer;
92+
Result result = client.createProducer("my-topic", producerConfig, producer);
93+
```
94+
95+
</TabItem>
96+
97+
<TabItem value="Python">
98+
99+
```python
100+
producer = client.create_producer('my-topic', producer_name='my-unique-producer-name')
101+
```
102+
103+
</TabItem>
104+
</Tabs>
105+
````
106+
64107
## Publish messages
65108

66109
Pulsar supports both synchronous and asynchronous publishing of messages in most clients. In some language-specific clients, such as Node.js and C#, you can publish messages synchronously based on the asynchronous method using language-specific mechanisms (like `await`).

docs/concepts-clients.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@ Pulsar client libraries support transparent reconnection and/or connection failo
1313

1414
Before an application creates a producer/consumer, the Pulsar client library needs to initiate a setup phase including two steps:
1515

16-
1. The client attempts to determine the owner of the topic by sending an HTTP lookup request to the broker.
16+
1. The client attempts to determine the owner of the topic by sending an HTTP lookup request to the broker.
1717

1818
The request could reach one of the active brokers which, by looking at the (cached) Zookeeper metadata knows who is serving the topic or, in case nobody is serving it, tries to assign it to the least loaded broker.
1919

20-
2. Once the client library has the broker address, it creates a TCP connection (or reuses an existing connection from the pool) and authenticates it.
21-
20+
2. Once the client library has the broker address, it creates a TCP connection (or reuses an existing connection from the pool) and authenticates it.
21+
2222
Within this connection, the client and broker exchange binary commands from a custom protocol. At this point, the client sends a command to create producer/consumer to the broker, which will comply after having validated the authorization policy.
2323

2424
Whenever the TCP connection breaks, the client immediately re-initiates this setup phase and keeps trying with exponential backoff to re-establish the producer or consumer until the operation succeeds.
@@ -27,6 +27,12 @@ Whenever the TCP connection breaks, the client immediately re-initiates this set
2727

2828
A producer is a process that attaches to a topic and publishes messages to a Pulsar [broker](concepts-architecture-overview.md#broker). The Pulsar broker processes the messages.
2929

30+
### Producer naming
31+
32+
Every producer has a name that **must be unique across all Pulsar clusters**. If you do not explicitly assign a name when creating a producer, Pulsar automatically generates a globally unique name. If you choose to set a name explicitly, the broker enforces that only one producer with that name can be publishing on a topic at any given time — attempting to create a second producer with the same name on the same topic will fail.
33+
34+
Explicitly naming producers is required when using [message deduplication](cookbooks-deduplication.md), because Pulsar uses the producer name together with the sequence ID to identify and filter duplicate messages. It is also useful for debugging and monitoring, since the producer name appears in metrics and admin stats.
35+
3036
### Send mode
3137

3238
Send mode is a mechanism determining whether producers send messages to brokers synchronously (sync) or asynchronously (async).
@@ -146,10 +152,10 @@ try {
146152
// Send messages within transaction
147153
producer.newMessage(txn).value("message-1").send();
148154
producer.newMessage(txn).value("message-2").send();
149-
150-
// Acknowledge messages within transaction
155+
156+
// Acknowledge messages within transaction
151157
consumer.acknowledgeAsync(messageId, txn);
152-
158+
153159
// Commit transaction
154160
txn.commit().get();
155161
} catch (Exception e) {

docs/concepts-messaging.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Messages are the basic "unit" of Pulsar. They're what producers publish to topic
2828
| Value / data payload | The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data [schemas](schema-get-started.md). |
2929
| Key | The key (string type) of the message. It is a short name of message key or partition key. Messages are optionally tagged with keys, which is useful for features like [topic compaction](concepts-topic-compaction.md). |
3030
| Properties | An optional key/value map of user-defined properties. |
31-
| Producer name | The name of the producer who produces the message. If you do not specify a producer name, the default name is used. |
31+
| Producer name | The name of the producer who produces the message. If you do not specify a producer name, Pulsar automatically generates a globally unique name. If you explicitly assign a name, it **must be unique across all Pulsar clusters**, otherwise the producer will fail to create. The broker enforces that only one producer with the same name can be publishing on a topic at any given time. See [Producer naming](concepts-clients.md#producer-naming) for details. |
3232
| Topic name | The name of the topic that the message is published to. |
3333
| Schema version | The version number of the schema that the message is produced with. |
3434
| Sequence ID | Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of a message is initially assigned by its producer, indicating its order in that sequence, and can also be customized.<br />Sequence ID can be used for message deduplication. If `brokerDeduplicationEnabled` is set to `true`, the sequence ID of each message is unique within a producer of a topic (non-partitioned) or a partition. |
@@ -657,6 +657,14 @@ Shared subscriptions do not guarantee message ordering or support cumulative ack
657657

658658
The Key_Shared subscription type in Pulsar allows multiple consumers to attach to the same subscription. But different with the Shared type, messages in the Key_Shared type are delivered in distribution across consumers and messages with the same key or same ordering key are delivered to only one consumer. No matter how many times the message is re-delivered, it is delivered to the same consumer.
659659

660+
:::note Producer requirements for Key_Shared
661+
662+
When using Key_Shared subscriptions, producers **must** either **disable batching** or **use key-based batching** (e.g., `BatcherBuilder.KEY_BASED` in Java). The default batching strategy may pack messages with different keys into the same batch, which breaks Key_Shared routing because the broker uses the first message's key to route the entire batch.
663+
664+
See [Batching for Key_Shared Subscriptions](#batching-for-key_shared-subscriptions) for details and code examples.
665+
666+
:::
667+
660668
![Key_Shared subscription type in Pulsar](/assets/pulsar-key-shared-subscriptions.svg)
661669

662670
:::note

versioned_docs/version-3.0.x/client-libraries-consumers.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,12 @@ The `Shared` subscription is different from the `Exclusive` and `Failover` subsc
239239

240240
This is a new subscription type since 2.4.0 release. Create new consumers and subscribe with `Key_Shared` subscription type.
241241

242+
:::note Producer batching requirement
243+
244+
When using Key_Shared subscriptions, producers **must** either **disable batching** or **use key-based batching** (e.g., `BatcherBuilder.KEY_BASED` in Java). Default batching may pack messages with different keys into the same batch, breaking Key_Shared routing semantics. See [below](#key_shared-batching) for code examples.
245+
246+
:::
247+
242248
````mdx-code-block
243249
<Tabs groupId="lang-choice"
244250
defaultValue="Java"

versioned_docs/version-3.0.x/concepts-clients.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@ Whenever the TCP connection breaks, the client immediately re-initiates this set
2121

2222
A producer is a process that attaches to a topic and publishes messages to a Pulsar [broker](reference-terminology.md#broker). The Pulsar broker processes the messages.
2323

24+
### Producer naming
25+
26+
Every producer has a name that **must be unique across all Pulsar clusters**. If you do not explicitly assign a name when creating a producer, Pulsar automatically generates a globally unique name. If you choose to set a name explicitly, the broker enforces that only one producer with that name can be publishing on a topic at any given time — attempting to create a second producer with the same name on the same topic will fail.
27+
28+
Explicitly naming producers is required when using [message deduplication](cookbooks-deduplication.md), because Pulsar uses the producer name together with the sequence ID to identify and filter duplicate messages. It is also useful for debugging and monitoring, since the producer name appears in metrics and admin stats.
29+
2430
### Send mode
2531

2632
Producers send messages to brokers synchronously (sync) or asynchronously (async).

versioned_docs/version-3.0.x/concepts-messaging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Messages are the basic "unit" of Pulsar. The following table lists the component
2727
| Value / data payload | The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data [schemas](schema-get-started.md). |
2828
| Key | The key (string type) of the message. It is a short name of message key or partition key. Messages are optionally tagged with keys, which is useful for features like [topic compaction](concepts-topic-compaction.md). |
2929
| Properties | An optional key/value map of user-defined properties. |
30-
| Producer name | The name of the producer who produces the message. If you do not specify a producer name, the default name is used. |
30+
| Producer name | The name of the producer who produces the message. If you do not specify a producer name, Pulsar automatically generates a globally unique name. If you explicitly assign a name, it **must be unique across all Pulsar clusters**, otherwise the producer will fail to create. The broker enforces that only one producer with the same name can be publishing on a topic at any given time. See [Producer naming](concepts-clients.md#producer-naming) for details. |
3131
| Topic name | The name of the topic that the message is published to. |
3232
| Schema version | The version number of the schema that the message is produced with. |
3333
| Sequence ID | Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of a message is initially assigned by its producer, indicating its order in that sequence, and can also be customized.<br />Sequence ID can be used for message deduplication. If `brokerDeduplicationEnabled` is set to `true`, the sequence ID of each message is unique within a producer of a topic (non-partitioned) or a partition. |

versioned_docs/version-4.0.x/client-libraries-consumers.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,12 @@ The `Shared` subscription is different from the `Exclusive` and `Failover` subsc
240240

241241
This is a new subscription type since 2.4.0 release. Create new consumers and subscribe with `Key_Shared` subscription type.
242242

243+
:::note Producer batching requirement
244+
245+
When using Key_Shared subscriptions, producers **must** either **disable batching** or **use key-based batching** (e.g., `BatcherBuilder.KEY_BASED` in Java). Default batching may pack messages with different keys into the same batch, breaking Key_Shared routing semantics. See [below](#key_shared-batching) for code examples.
246+
247+
:::
248+
243249
````mdx-code-block
244250
<Tabs groupId="lang-choice"
245251
defaultValue="Java"

versioned_docs/version-4.0.x/client-libraries-producers.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,49 @@ This example shows how to create a producer.
6161
</Tabs>
6262
````
6363

64+
### Producer naming
65+
66+
Every producer has a name that must be **unique across all Pulsar clusters**. If you do not explicitly set a name, Pulsar generates a globally unique name automatically. If you assign a name, the broker enforces that only one producer with that name can publish on a topic at a time.
67+
68+
You **must** set an explicit producer name when using [message deduplication](cookbooks-deduplication.md). Even when deduplication is not required, setting a meaningful producer name is recommended — it makes debugging significantly easier because the name appears in broker logs, admin stats, and metrics, letting you quickly trace messages back to the producing application.
69+
70+
````mdx-code-block
71+
<Tabs groupId="lang-choice"
72+
defaultValue="Java"
73+
values={[{"label":"Java","value":"Java"},{"label":"C++","value":"C++"},{"label":"Python","value":"Python"}]}>
74+
75+
<TabItem value="Java">
76+
77+
```java
78+
Producer<String> producer = pulsarClient.newProducer(Schema.STRING)
79+
.topic("my-topic")
80+
.producerName("my-unique-producer-name")
81+
.create();
82+
```
83+
84+
</TabItem>
85+
86+
<TabItem value="C++">
87+
88+
```cpp
89+
ProducerConfiguration producerConfig;
90+
producerConfig.setProducerName("my-unique-producer-name");
91+
Producer producer;
92+
Result result = client.createProducer("my-topic", producerConfig, producer);
93+
```
94+
95+
</TabItem>
96+
97+
<TabItem value="Python">
98+
99+
```python
100+
producer = client.create_producer('my-topic', producer_name='my-unique-producer-name')
101+
```
102+
103+
</TabItem>
104+
</Tabs>
105+
````
106+
64107
## Publish messages
65108

66109
Pulsar supports both synchronous and asynchronous publishing of messages in most clients. In some language-specific clients, such as Node.js and C#, you can publish messages synchronously based on the asynchronous method using language-specific mechanisms (like `await`).

0 commit comments

Comments
 (0)