You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: versioned_docs/version-3.0.x/deploy-bare-metal.md
+170-4Lines changed: 170 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -125,7 +125,17 @@ Directory | Contains
125
125
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that Pulsar uses
126
126
`logs` | Logs that the installation creates
127
127
128
-
## Install Built-in Connectors (optional)
128
+
The `conf` directory contains configuration files for various Pulsar components. Below is a brief overview of the main configuration categories:
129
+
130
+
-**JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options (`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`, `BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
131
+
-**Broker Configuration** (`broker.conf`): Core runtime parameters for the Pulsar Broker, including metadata store connection, cluster name, ports, message retention policies, authentication, and authorization settings.
132
+
-**BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters for BookKeeper Bookies, including journal and ledger directories, ZooKeeper connection, compaction, and disk usage thresholds.
133
+
-**Log4j Configuration** (`log4j2.yaml`): Logging framework settings including log levels, output format, file rolling strategies, and log output directories.
134
+
-**Dynamic Configuration**: Some Broker configuration properties can be updated at runtime without restarting the service, using the `pulsar-admin` CLI tool or the Admin REST API. Dynamic configurations are stored in the metadata store (ZooKeeper) and take effect across all Brokers in the cluster.
135
+
136
+
For a complete list of all available configuration properties, see the [Pulsar Configuration Reference](https://pulsar.apache.org/reference/#/next/).
137
+
138
+
### Install Built-in Connectors (optional)
129
139
130
140
To use `built-in` connectors, you need to download the connectors tarball release on every broker node in one of the following ways :
131
141
@@ -301,17 +311,116 @@ You can obtain the metadata service URI of the existing BookKeeper cluster by us
301
311
302
312
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
303
313
314
+
### Configure BookKeeper
315
+
316
+
BookKeeper configuration is split across two files:
317
+
318
+
-**`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters, including metadata store connection, storage directories, compaction settings, and disk usage thresholds.
319
+
-**`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process, including memory allocation (`BOOKIE_MEM`), garbage collection options (`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
320
+
321
+
#### Metadata store connection
322
+
304
323
You can configure BookKeeper bookies using the [`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration file. The most important step in configuring bookies for our purposes here is ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster. The following is an example:
Which using `;` as separator in `metadataServiceUri`
329
+
:::note
330
+
331
+
Use `;` as the separator in `metadataServiceUri`.
332
+
333
+
:::
334
+
335
+
For more information about ZooKeeper and BookKeeper administration, see [ZooKeeper and BookKeeper administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
336
+
337
+
#### Storage directories
338
+
339
+
In a production environment, you should configure dedicated disks for journal and ledger storage. Keeping them on separate disks significantly improves write performance.
340
+
341
+
```properties
342
+
# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
343
+
journalDirectory=/data/bookkeeper/journal
344
+
345
+
# Ledger storage directory — use a separate disk from the journal
346
+
ledgerDirectories=/data/bookkeeper/ledgers
347
+
```
348
+
349
+
-`journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a write-ahead log that records every write before it is applied to the ledger storage. Using a dedicated high-speed SSD for the journal directory is critical for write latency.
350
+
-`ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where the actual ledger data is stored. Separating it from the journal directory avoids I/O contention and improves throughput.
351
+
352
+
#### GC and Compaction
353
+
354
+
BookKeeper writes entries from multiple ledgers into shared Entry Log files (default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted — for example, after Pulsar's retention policy trims expired data — the Entry Log files that contained those ledgers develop unused space. The Bookie's GC thread periodically scans for deleted ledgers and triggers compaction to reclaim disk space by rewriting the remaining valid entries into new files.
355
+
356
+
BookKeeper provides two levels of compaction:
357
+
358
+
-**Minor Compaction**: Targets Entry Log files where the valid data ratio is below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at `minorCompactionInterval` (default: every hour). Designed to quickly reclaim heavily fragmented files.
359
+
-**Major Compaction**: Targets Entry Log files where the valid data ratio is below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at `majorCompactionInterval` (default: every day). Covers a wider range of files with moderate fragmentation.
360
+
361
+
```properties
362
+
# GC scan interval (ms), default: 900000 (15 min)
363
+
gcWaitTime=900000
364
+
365
+
# Minor Compaction: threshold and interval
366
+
minorCompactionThreshold=0.2
367
+
minorCompactionInterval=3600
368
+
369
+
# Major Compaction: threshold and interval
370
+
majorCompactionThreshold=0.5
371
+
majorCompactionInterval=86400
372
+
```
373
+
374
+
:::note
375
+
376
+
`minorCompactionInterval` and `majorCompactionInterval` must be greater than `gcWaitTime`, otherwise compaction will not run.
377
+
378
+
:::
379
+
380
+
#### Disk usage thresholds
381
+
382
+
BookKeeper monitors disk usage and can automatically switch a Bookie to read-only mode to prevent disk exhaustion.
383
+
384
+
```properties
385
+
# Bookie enters read-only mode when disk usage exceeds this threshold (default: 0.95)
386
+
diskUsageThreshold=0.95
387
+
388
+
# Warning threshold — Major Compaction is paused when disk usage exceeds this value (default: 0.90)
389
+
diskUsageWarnThreshold=0.90
390
+
391
+
# Low water mark — Bookie returns to read-write mode only after disk usage drops below this value
392
+
# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching (recommended: 0.87)
393
+
diskUsageLwmThreshold=0.87
394
+
```
395
+
396
+
#### JVM configuration (bkenv.sh)
397
+
398
+
The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
399
+
400
+
-`BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust based on your storage workload. Insufficient heap memory leads to frequent GC, which increases write and read latency — especially under high throughput, GC pauses can cause write timeouts. Direct memory is primarily used for Netty ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory allocator, which allocates all ByteBuf from direct memory for network I/O and internal data handling.
311
401
312
-
Once you appropriately modify the `metadataServiceUri` parameter, you can make any other configuration changes that you require. You can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.
402
+
```bash
403
+
# Example: increase heap and direct memory for high-throughput workloads
Once you apply the desired configuration in `conf/bookkeeper.conf`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
419
+
After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.
420
+
421
+
### Start BookKeepers
422
+
423
+
With the desired configuration applied in `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
315
424
316
425
To start the bookie in the background, use the [`pulsar-daemon`](reference-cli-tools.md) CLI tool:
317
426
@@ -348,6 +457,13 @@ Pulsar brokers are the last thing you need to deploy in your Pulsar cluster. Bro
348
457
349
458
### Configure Brokers
350
459
460
+
Broker configuration is split across two files:
461
+
462
+
-**`conf/broker.conf`**: Contains all Broker runtime parameters, including metadata store connection, cluster name, ports, replication settings, and feature toggles.
463
+
-**`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker process, including memory allocation (`PULSAR_MEM`), garbage collection options (`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
464
+
465
+
#### Metadata store and cluster settings
466
+
351
467
You can configure brokers using the `conf/broker.conf` configuration file. The most important element of broker configuration is ensuring that each broker is aware of the ZooKeeper cluster that you have deployed. Ensure that the [`metadataStoreUrl`](reference-configuration.md#broker) and [`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters are correct. In this case, since you only have 1 cluster and no configuration store setup, the `configurationMetadataStoreUrl` point to the same `metadataStoreUrl`.
352
468
353
469
```properties
@@ -370,6 +486,23 @@ webServicePort=8080
370
486
webServicePortTls=8443
371
487
```
372
488
489
+
#### Managed ledger settings
490
+
491
+
These parameters control how the Broker creates BookKeeper ledgers for message storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers) model:
492
+
493
+
```properties
494
+
# Ensemble size (E): number of bookies to use when creating a ledger (default: 2)
495
+
managedLedgerDefaultEnsembleSize=2
496
+
497
+
# Write quorum (Qw): number of copies to store for each entry (default: 2)
498
+
managedLedgerDefaultWriteQuorum=2
499
+
500
+
# Ack quorum (Qa): number of acks to wait before a write is considered complete (default: 2)
501
+
managedLedgerDefaultAckQuorum=2
502
+
```
503
+
504
+
The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
505
+
373
506
> If you deploy Pulsar in a one-node cluster, you should update the replication settings in `conf/broker.conf` to `1`.
374
507
>
375
508
> ```properties
@@ -383,6 +516,39 @@ webServicePortTls=8443
383
516
> managedLedgerDefaultAckQuorum=1
384
517
> ```
385
518
519
+
#### JVM configuration (pulsar_env.sh)
520
+
521
+
The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
522
+
523
+
- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust based on your machine's available memory. Insufficient heap memory leads to frequent GC, and GC pauses increase message publish and consume latency — in severe cases, Full GC can make the Broker temporarily unavailable. Direct memory is critical for the Broker's message caching and Netty I/O operations.
524
+
525
+
```bash
526
+
# Example: increase heap and direct memory for production workloads
-`PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after other JVM options on the command line, it can also be used to **override** existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup script (later flags take precedence). Examples:
531
+
532
+
```bash
533
+
# Enable heap dump on OOM (the default script only enables ExitOnOutOfMemoryError,
534
+
# without a heap dump file you cannot diagnose the root cause after the process exits)
You can also refer to the default configuration in the [Pulsar Helm Chart values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml) as a tuning reference.
0 commit comments