Skip to content

Commit 604674a

Browse files
committed
update 3.0.x version doc
1 parent b040181 commit 604674a

2 files changed

Lines changed: 308 additions & 12 deletions

File tree

versioned_docs/version-3.0.x/deploy-bare-metal.md

Lines changed: 170 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,17 @@ Directory | Contains
125125
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that Pulsar uses
126126
`logs` | Logs that the installation creates
127127

128-
## Install Built-in Connectors (optional)
128+
The `conf` directory contains configuration files for various Pulsar components. Below is a brief overview of the main configuration categories:
129+
130+
- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options (`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`, `BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
131+
- **Broker Configuration** (`broker.conf`): Core runtime parameters for the Pulsar Broker, including metadata store connection, cluster name, ports, message retention policies, authentication, and authorization settings.
132+
- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters for BookKeeper Bookies, including journal and ledger directories, ZooKeeper connection, compaction, and disk usage thresholds.
133+
- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings including log levels, output format, file rolling strategies, and log output directories.
134+
- **Dynamic Configuration**: Some Broker configuration properties can be updated at runtime without restarting the service, using the `pulsar-admin` CLI tool or the Admin REST API. Dynamic configurations are stored in the metadata store (ZooKeeper) and take effect across all Brokers in the cluster.
135+
136+
For a complete list of all available configuration properties, see the [Pulsar Configuration Reference](https://pulsar.apache.org/reference/#/next/).
137+
138+
### Install Built-in Connectors (optional)
129139

130140
To use `built-in` connectors, you need to download the connectors tarball release on every broker node in one of the following ways :
131141

@@ -301,17 +311,116 @@ You can obtain the metadata service URI of the existing BookKeeper cluster by us
301311

302312
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
303313

314+
### Configure BookKeeper
315+
316+
BookKeeper configuration is split across two files:
317+
318+
- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters, including metadata store connection, storage directories, compaction settings, and disk usage thresholds.
319+
- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process, including memory allocation (`BOOKIE_MEM`), garbage collection options (`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
320+
321+
#### Metadata store connection
322+
304323
You can configure BookKeeper bookies using the [`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration file. The most important step in configuring bookies for our purposes here is ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster. The following is an example:
305324

306325
```properties
307326
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
308327
```
309328

310-
Which using `;` as separator in `metadataServiceUri`
329+
:::note
330+
331+
Use `;` as the separator in `metadataServiceUri`.
332+
333+
:::
334+
335+
For more information about ZooKeeper and BookKeeper administration, see [ZooKeeper and BookKeeper administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
336+
337+
#### Storage directories
338+
339+
In a production environment, you should configure dedicated disks for journal and ledger storage. Keeping them on separate disks significantly improves write performance.
340+
341+
```properties
342+
# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
343+
journalDirectory=/data/bookkeeper/journal
344+
345+
# Ledger storage directory — use a separate disk from the journal
346+
ledgerDirectories=/data/bookkeeper/ledgers
347+
```
348+
349+
- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a write-ahead log that records every write before it is applied to the ledger storage. Using a dedicated high-speed SSD for the journal directory is critical for write latency.
350+
- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where the actual ledger data is stored. Separating it from the journal directory avoids I/O contention and improves throughput.
351+
352+
#### GC and Compaction
353+
354+
BookKeeper writes entries from multiple ledgers into shared Entry Log files (default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted — for example, after Pulsar's retention policy trims expired data — the Entry Log files that contained those ledgers develop unused space. The Bookie's GC thread periodically scans for deleted ledgers and triggers compaction to reclaim disk space by rewriting the remaining valid entries into new files.
355+
356+
BookKeeper provides two levels of compaction:
357+
358+
- **Minor Compaction**: Targets Entry Log files where the valid data ratio is below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at `minorCompactionInterval` (default: every hour). Designed to quickly reclaim heavily fragmented files.
359+
- **Major Compaction**: Targets Entry Log files where the valid data ratio is below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at `majorCompactionInterval` (default: every day). Covers a wider range of files with moderate fragmentation.
360+
361+
```properties
362+
# GC scan interval (ms), default: 900000 (15 min)
363+
gcWaitTime=900000
364+
365+
# Minor Compaction: threshold and interval
366+
minorCompactionThreshold=0.2
367+
minorCompactionInterval=3600
368+
369+
# Major Compaction: threshold and interval
370+
majorCompactionThreshold=0.5
371+
majorCompactionInterval=86400
372+
```
373+
374+
:::note
375+
376+
`minorCompactionInterval` and `majorCompactionInterval` must be greater than `gcWaitTime`, otherwise compaction will not run.
377+
378+
:::
379+
380+
#### Disk usage thresholds
381+
382+
BookKeeper monitors disk usage and can automatically switch a Bookie to read-only mode to prevent disk exhaustion.
383+
384+
```properties
385+
# Bookie enters read-only mode when disk usage exceeds this threshold (default: 0.95)
386+
diskUsageThreshold=0.95
387+
388+
# Warning threshold — Major Compaction is paused when disk usage exceeds this value (default: 0.90)
389+
diskUsageWarnThreshold=0.90
390+
391+
# Low water mark — Bookie returns to read-write mode only after disk usage drops below this value
392+
# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching (recommended: 0.87)
393+
diskUsageLwmThreshold=0.87
394+
```
395+
396+
#### JVM configuration (bkenv.sh)
397+
398+
The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
399+
400+
- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust based on your storage workload. Insufficient heap memory leads to frequent GC, which increases write and read latency — especially under high throughput, GC pauses can cause write timeouts. Direct memory is primarily used for Netty ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory allocator, which allocates all ByteBuf from direct memory for network I/O and internal data handling.
311401

312-
Once you appropriately modify the `metadataServiceUri` parameter, you can make any other configuration changes that you require. You can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.
402+
```bash
403+
# Example: increase heap and direct memory for high-throughput workloads
404+
BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
405+
```
406+
407+
- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process. Examples:
408+
409+
```bash
410+
# Enable heap dump on OOM (the default script only enables ExitOnOutOfMemoryError,
411+
# without a heap dump file you cannot diagnose the root cause)
412+
BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"
413+
414+
# Temporarily enable Netty leak detection for troubleshooting off-heap memory leaks
415+
# (default is disabled; set to advanced level when investigating)
416+
BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
417+
```
313418

314-
Once you apply the desired configuration in `conf/bookkeeper.conf`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
419+
After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.
420+
421+
### Start BookKeepers
422+
423+
With the desired configuration applied in `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
315424

316425
To start the bookie in the background, use the [`pulsar-daemon`](reference-cli-tools.md) CLI tool:
317426

@@ -348,6 +457,13 @@ Pulsar brokers are the last thing you need to deploy in your Pulsar cluster. Bro
348457

349458
### Configure Brokers
350459

460+
Broker configuration is split across two files:
461+
462+
- **`conf/broker.conf`**: Contains all Broker runtime parameters, including metadata store connection, cluster name, ports, replication settings, and feature toggles.
463+
- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker process, including memory allocation (`PULSAR_MEM`), garbage collection options (`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
464+
465+
#### Metadata store and cluster settings
466+
351467
You can configure brokers using the `conf/broker.conf` configuration file. The most important element of broker configuration is ensuring that each broker is aware of the ZooKeeper cluster that you have deployed. Ensure that the [`metadataStoreUrl`](reference-configuration.md#broker) and [`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters are correct. In this case, since you only have 1 cluster and no configuration store setup, the `configurationMetadataStoreUrl` point to the same `metadataStoreUrl`.
352468

353469
```properties
@@ -370,6 +486,23 @@ webServicePort=8080
370486
webServicePortTls=8443
371487
```
372488

489+
#### Managed ledger settings
490+
491+
These parameters control how the Broker creates BookKeeper ledgers for message storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers) model:
492+
493+
```properties
494+
# Ensemble size (E): number of bookies to use when creating a ledger (default: 2)
495+
managedLedgerDefaultEnsembleSize=2
496+
497+
# Write quorum (Qw): number of copies to store for each entry (default: 2)
498+
managedLedgerDefaultWriteQuorum=2
499+
500+
# Ack quorum (Qa): number of acks to wait before a write is considered complete (default: 2)
501+
managedLedgerDefaultAckQuorum=2
502+
```
503+
504+
The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
505+
373506
> If you deploy Pulsar in a one-node cluster, you should update the replication settings in `conf/broker.conf` to `1`.
374507
>
375508
> ```properties
@@ -383,6 +516,39 @@ webServicePortTls=8443
383516
> managedLedgerDefaultAckQuorum=1
384517
> ```
385518
519+
#### JVM configuration (pulsar_env.sh)
520+
521+
The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
522+
523+
- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust based on your machine's available memory. Insufficient heap memory leads to frequent GC, and GC pauses increase message publish and consume latency — in severe cases, Full GC can make the Broker temporarily unavailable. Direct memory is critical for the Broker's message caching and Netty I/O operations.
524+
525+
```bash
526+
# Example: increase heap and direct memory for production workloads
527+
PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
528+
```
529+
530+
- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after other JVM options on the command line, it can also be used to **override** existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup script (later flags take precedence). Examples:
531+
532+
```bash
533+
# Enable heap dump on OOM (the default script only enables ExitOnOutOfMemoryError,
534+
# without a heap dump file you cannot diagnose the root cause after the process exits)
535+
PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"
536+
537+
# Enable IPv6 support (the default script sets -Djava.net.preferIPv4Stack=true;
538+
# override this if your deployment uses IPv6 networking)
539+
PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"
540+
541+
# Tune Netty memory pool parameters (increase maxOrder and maxCachedBufferCapacity
542+
# if your messages are large, to avoid Netty bypassing the memory pool for allocation)
543+
PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13 -Dio.netty.allocator.numDirectArenas=8 -Dio.netty.allocator.maxCachedBufferCapacity=1048576"
544+
```
545+
546+
:::tip
547+
548+
You can also refer to the default configuration in the [Pulsar Helm Chart values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml) as a tuning reference.
549+
550+
:::
551+
386552

387553
### Enable Pulsar Functions (optional)
388554

0 commit comments

Comments
 (0)