Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Visit SynchDB documentation site [here](https://docs.synchdb.com/) for more desi
SynchDB extension consists of these major components:
* Debezium Runner (Java) - Responsible for connecting to source databases and get change events.
* SynchDB Worker - Responsible for polling change events from Debezium Runner via JNI.
* Event Processor - Reponsible for processing raw events into internal structures.
* Event Processor - Responsible for processing raw events into internal structures.
* Data Converter - Responsible for transforming data values.
* Replication Agent - Responsible for applying changes to PostgreSQL.

Expand All @@ -30,7 +30,7 @@ SynchDB extension consists of these major components:
The following software is required to build and run SynchDB. The versions listed are the versions tested during development. Older versions may still work.
* Java Development Kit 17 or later. Download [here](https://www.oracle.com/ca-en/java/technologies/downloads/)
* Apache Maven 3.6.3 or later. Download [here](https://maven.apache.org/download.cgi)
* PostgreSQL source or build enviornment. Git clone [here](https://github.com/postgres/postgres). Refer to this [wiki](https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code) to build PostgreSQL from source or this [page](https://www.postgresql.org/download/linux/) to install PostgreSQL via packages
* PostgreSQL source or build environment. Git clone [here](https://github.com/postgres/postgres). Refer to this [wiki](https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code) to build PostgreSQL from source or this [page](https://www.postgresql.org/download/linux/) to install PostgreSQL via packages
* Docker compose 2.28.1 (for testing). Refer to [here](https://docs.docker.com/compose/install/linux/)
* Unix based operating system like Ubuntu 22.04 or MacOS

Expand Down Expand Up @@ -131,7 +131,7 @@ Run ldconfig to reload:
sudo ldconfig
```

Ensure synchdo.so extension can link to libjvm Java library on your system:
Ensure synchdb.so extension can link to libjvm Java library on your system:
``` BASH
ldd synchdb.so
linux-vdso.so.1 (0x00007ffeae35a000)
Expand Down Expand Up @@ -171,9 +171,9 @@ CREATE EXTENSION synchdb CASCADE;
```

### Create a Connector
A connector represents the details to connecto to a remote heterogeneous database and describes what tables to replicate from. It can be created with `synchdb_add_conninfo()` function.
A connector represents the details to connect to a remote heterogeneous database and describes what tables to replicate from. It can be created with `synchdb_add_conninfo()` function.

Create a MySQL connector and replicate `inventory.orders` and `inventory.customers` tables under `invnetory` database:
Create a MySQL connector and replicate `inventory.orders` and `inventory.customers` tables under `inventory` database:
``` SQL
SELECT synchdb_add_conninfo('mysqlconn','127.0.0.1', 3306, 'mysqluser', 'mysqlpwd', 'inventory', 'postgres', 'inventory.orders,inventory.customers', 'null', 'mysql');
```
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/architecture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Architecture Overview

## **Overall Archtecture Diagram**
## **Overall Architecture Diagram**

![img](../../images/synchdb-arch2.jpg)

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/batch_change_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
SynchDB periodically fetches a batch of change request from Debezium runner engine at a period of `synchdb.naptime` milliseconds (default 100). This batch of change request is then processed by SynchDB. If all the change requests within the batch have been processed successfully (parsed, transformed and applied to PostgreSQL), SynchDB will notify the Debezium runner engine that this batch has been completed. This signals Debezium runner to commit the offset up until the last successfully completed change record. With this mechanism in place, SynchDB is able to track each change record and instruct Debezium runner not to fetch an old change that has been processed before, or not to send a duplcate change record.

## **Batch Handling**
SynchDB processes a batch within one transaction. This means the change events inside a batch are either all or none processed. When all the changes have been successfully processed, SynchDB simply sends a message to Debezium runner engine to mark the batch as processed and completed. This action causes offsets to be committed and eventually flush to disk. An offset represents a logical location during a replication similar to the LSN (Log Seqeuence Number) in PostgreSQL.
SynchDB processes a batch within one transaction. This means the change events inside a batch are either all or none processed. When all the changes have been successfully processed, SynchDB simply sends a message to Debezium runner engine to mark the batch as processed and completed. This action causes offsets to be committed and eventually flush to disk. An offset represents a logical location during a replication similar to the LSN (Log Sequence Number) in PostgreSQL.

![img](../../images/synchdb-batch-new.jpg)

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/debezium_event_processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,4 +265,4 @@ the SPI Client component exists under the Replication Agent, which serves as a b

### **11) Executor APIs**

Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string likst SPI does.
Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string like SPI does.
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/debezium_runner_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

![img](../../images/synchdb-dbzrunner-component2.jpg)

Debezium Runner resides on Java side of the deployment. It is the main faciliator between embedded Debezium engine (Java) and SynchDB Worker (C). It provides several Java methods that SynchDB worker can interact via JNI library. These interactions include initializing a Debezium engine, start or stop the engine, obtain a batch of change events and mark a batch as done. These operations are essential for ensuring replication consistency. Main components are:
Debezium Runner resides on Java side of the deployment. It is the main facilitator between embedded Debezium engine (Java) and SynchDB Worker (C). It provides several Java methods that SynchDB worker can interact via JNI library. These interactions include initializing a Debezium engine, start or stop the engine, obtain a batch of change events and mark a batch as done. These operations are essential for ensuring replication consistency. Main components are:

1. Parameter Class
2. Controller
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/fdw_based_snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ WARNING: **BACKUP_ADMIN permission is required to obtain the "cut-point" paramet

* Before snapshot begins, read the current SCN value, which serves as a "cut-off" point for the snapshot
* During foreign table schema migration, extra attribute "AS OF SCN xxx" will be associated with each desired foreign table,causing all the foreign reads to use Oracle's FLASHBACK query
* FLASHBACK query returns the table results as of the SCN specified so consistency is automatically guarenteed. No extra locking needed.
* FLASHBACK query returns the table results as of the SCN specified so consistency is automatically guaranteed. No extra locking needed.
* Migrate all desired tables schema and data with proper type translations with FLASHBACK query.
* Once done, the CDC can resume from the cut-off point, which will handle the data changes that happened during the snapshot.

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/architecture/non_native_datatype_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## **Handling Non-Native Data Types**

It is possible that a table contains a column data type that is custom created by the user or created by another installed extension. In this case, it cannot be processed using tradition native data type handling becasue the type is most likely not supported natively. Instead, the DML Converter accesses the catalog, obtains the OID of the non-native data type, and looks up its "category" as defined in PostgreSQL. Below is a list of category supported by PostgreSQL as of version 17:
It is possible that a table contains a column data type that is custom created by the user or created by another installed extension. In this case, it cannot be processed using tradition native data type handling because the type is most likely not supported natively. Instead, the DML Converter accesses the catalog, obtains the OID of the non-native data type, and looks up its "category" as defined in PostgreSQL. Below is a list of category supported by PostgreSQL as of version 17:

```
#define TYPCATEGORY_INVALID '\0'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The Oracle Parser is responsible for parsing a Oracle query (DDL only) and produ

The JSON Parser is responsible for parsing the incoming JSON change event into C structures that SynchDB can work with. SynchDB relies on PostgreSQL's native JSONB utility for all the parsing and iteration needs. Each DML event contains the `scn` and `commit scn` values, tells how each column value is represented based on data types, and the before / after values.

Unlink a DDL event from Debezium, Openlog Replicator's DDL Event contains the raw Oracle DDL query instead of a broken-down structure. This means that a `2) Oracle parser` is required to parse this DDL query further to learn about its intended actions.
Unlike a DDL event from Debezium, Openlog Replicator's DDL Event contains the raw Oracle DDL query instead of a broken-down structure. This means that a `2) Oracle parser` is required to parse this DDL query further to learn about its intended actions.

**DML payload:**
```json
Expand Down Expand Up @@ -165,7 +165,7 @@ The following Oracle features declared in DDL commands are not supported by Open
* Index organized tables (IOT)
* `CREATE TABLE AS` clauses
* `CREATE TYPE` clauses
* `CREATE TABLE OF` caluses
* `CREATE TABLE OF` clauses
* `ALTER TABLE MODIFY name DEFAULT`
* `ALTER TABLE MODIFY name NOT NULL`
* `ALTER TABLE MODIFY name NULL`
Expand All @@ -174,7 +174,7 @@ The following Oracle features declared in DDL commands are not supported by Open
* `ALTER TABLE RENAME`


The following constraints clauses are accpeted but ignored by Openlog Replicator connector:
The following constraints clauses are accepted but ignored by Openlog Replicator connector:

* ENABLE VALIDATE
* ENABLE NOVALIDATE
Expand Down Expand Up @@ -229,4 +229,4 @@ the SPI Client component exists under the Replication Agent, which serves as a b

### **10) Executor APIs**

Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string likst SPI does.
Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string like SPI does.
2 changes: 1 addition & 1 deletion doc/docs/en/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ SynchDB 1.3 delivers a major performance enhancement with the new FDW-based snap
### Changed

* Openlog Replicator Connector: Enhanced Oracle parser to support more contraint operators: enable, disable, novalidate, validate
* Openlog Replicator Connector: Enhanced Oracle parser to support MODIFY clauses with and without paraenthesis
* Openlog Replicator Connector: Enhanced Oracle parser to support MODIFY clauses with and without parenthesis
* Openlog Replicator Connector: Enhanced Oracle parser to support DEFAULT ON NULL clauses.
* Openlog Replicator Connector: Improve processing performance by removing one pre-scan loop.
* Openlog Replicator Connector: Optimized non-null terminated event processing as PostgreSQL text type to save one copy operation. (PG17+)
Expand Down
8 changes: 4 additions & 4 deletions doc/docs/en/getting-started/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ synchdb.dbz_offset_flush_interval_ms=60000 # Flush
synchdb.dbz_capture_only_selected_table_ddl=false # Debezium will only capture the schema of selected tables rather than all tables
synchdb.max_connector_workers=10 # 10 connector workers can be run at a time
synchdb.error_handling_strategy='retry' # connector should retry on error
synchdb.dbz_log_leve='error' # Debezium Runner should log error messages only
synchdb.dbz_log_level='error' # Debezium Runner should log error messages only
synchdb.log_change_on_error=true # log JSON change event on error
synchdb.cdc_start_delay_ms=30000 # wait 30s after snapshot completes and before CDC begins
synchdb.olr_snapshot_engine="fdw" # use FDW based snapshot engine to complete the snapshot process
Expand Down Expand Up @@ -119,11 +119,11 @@ synchdb.letter_casing_strategy="asis" # preser
10. **synchdb.dbz_incremental_snapshot_chunk_size**
- Lower values: Slower processing of change events at lower JVM memory usage during incremental snapshot
- Higher values: Faster processing of change events at higher JVM memory usage during incremental snapshot
- Recommended to set it the same as `synchdb.dbz_batch_size` and adjust Adjust based on resource requirements
- Recommended to set it the same as `synchdb.dbz_batch_size` and adjust based on resource requirements

11. **synchdb.dbz_offset_flush_interval_ms**
- Lower values: More frequent update to offset file, more IO, less old batches to re-preocess after fault restored
- Higher values: Less frequent update to offset file, less IO, more old batches to re-preocess after fault restored
- Lower values: More frequent update to offset file, more IO, less old batches to re-process after fault restored
- Higher values: Less frequent update to offset file, less IO, more old batches to re-process after fault restored
- Recommended to set it to 60000 as Debezium's recommendation

12. **synchdb.max_connector_workers**
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ Run ldconfig to reload:
sudo ldconfig
```

Ensure synchdo.so extension can link to libjvm Java library on your system:
Ensure synchdb.so extension can link to libjvm Java library on your system:
``` BASH
ldd synchdb.so
linux-vdso.so.1 (0x00007ffeae35a000)
Expand Down
4 changes: 2 additions & 2 deletions doc/docs/en/getting-started/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ By default, the connector will perform a `initial` snapshot to capture both the

## Simulate an INSERT Event and Observe CDC

We can use `docker exec` to similate an INSERT for each connector type and observe the Change Data Capture (CDC).
We can use `docker exec` to simulate an INSERT for each connector type and observe the Change Data Capture (CDC).

**MySQL:**
```bash
Expand Down Expand Up @@ -679,4 +679,4 @@ SELECT synchdb_del_conninfo('olrconn');
SELECT synchdb_stop_engine_bgw('postgresconn');
SELECT synchdb_del_conninfo('postgresconn');

```
```
2 changes: 1 addition & 1 deletion doc/docs/en/getting-started/remote_database_setups.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ sqlplus sys/oracle@//localhost:1521/FREE as sysdba

### **Enable Supplemental Log Data for Tables Designated for Capture**

This configuration needs to be run on each table designzted for catpure in order to correctly handle the UPDATE and DELETE operations.
This configuration needs to be run on each table designated for capture in order to correctly handle the UPDATE and DELETE operations.

```sql
ALTER TABLE customer ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This architecture allows PostgreSQL to leverage the rich ecosystem of Debezium c
- PostgreSQL: 16, 17, 18
- IvorySQL: 4, 5

## **Supported Source Databasess**
## **Supported Source Databases**

- MySQL: 8.0.x, 8.2
- SQL Server: 2017, 2019, 2022
Expand Down
4 changes: 2 additions & 2 deletions doc/docs/en/monitoring/state_view.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Column Details:
| stage | the stage of the connector. See below.|
| state | the state of the connector. See below.|
| err | the last error message encountered by the worker which would have caused it to exit. This error could originated from PostgreSQL while processing a change, or originated from Debezium running engine while accessing data from heterogeneous database. |
| last_dbz_offset | the last Debezium offset captured by synchdb. Note that this may not reflect the current and real-time offset value of the connector engine. Rather, this is shown as a checkpoint that we could restart from this offeet point if needed.|
| last_dbz_offset | the last Debezium offset captured by synchdb. Note that this may not reflect the current and real-time offset value of the connector engine. Rather, this is shown as a checkpoint that we could restart from this offset point if needed.|

**Possible States**:

Expand All @@ -38,7 +38,7 @@ Column Details:
- ⚪ `executing` - Applying changes
- 🟤 `updating offset` - Updating checkpoint
- 🟨 `restarting` - Reinitializing
- ⚪ `dumping memory` - JVM is prepaaring to dump memory info in log file
- ⚪ `dumping memory` - JVM is preparing to dump memory info in log file
- ⚫ `unknown` - Indeterminate state

**Possible Stages**:
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/user-guide/configure_snapshot_engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,4 @@ sudo make install

```

postgre is reasdy to go. Start a connector normally with synchdb.olr_snapshot_engine set to 'fdw'. If a snapshot is required, SynchDB will complete it via FDW. You do not have to run `CREATE EXTENSION mysql_fdw` prior to using FDW based initial snapshot, nor do you have to `CREATE SERVER` or `CREATE USER MAPPING`. SynchDB takes care of all of these when it performs the snapshot..
postgres is ready to go. Start a connector normally with synchdb.olr_snapshot_engine set to 'fdw'. If a snapshot is required, SynchDB will complete it via FDW. You do not have to run `CREATE EXTENSION mysql_fdw` prior to using FDW based initial snapshot, nor do you have to `CREATE SERVER` or `CREATE USER MAPPING`. SynchDB takes care of all of these when it performs the snapshot..
2 changes: 1 addition & 1 deletion doc/docs/en/user-guide/create_a_connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Creating a connector can be done with utility SQL function `synchdb_add_conninfo

synchdb_add_conninfo takes these arguments:

| argumet | description |
| argument | description |
|-------------------- |-|
| name | a unique identifier that represents this connector info |
| hostname | the IP address or hostname of the heterogeneous database. |
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/user-guide/default_datatype_mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ DatatypeHashEntry postgres_defaultTypeMappings[] =
{{"cidr", false}, "cidr", 0},
{{"circle", false}, "circle", 0},
{{"date", false}, "date", 0},
{{"decimal", false}, "dedcimal", -1},
{{"decimal", false}, "decimal", -1},
{{"double precision", false}, "double precision", 0},
{{"float", false}, "float", 0},
{{"float4", false}, "float4", 0},
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/en/user-guide/start_stop_connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## **Control a Connector**

SynchDB provides several utility function to control the behavior and life cycle or a created connector.
SynchDB provides several utility function to control the behavior and life cycle of a created connector.

## **Start a Connector with Default Snapshot Mode**

Expand Down
Loading