diff --git a/README.md b/README.md index 46e7a40..918deb3 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Visit SynchDB documentation site [here](https://docs.synchdb.com/) for more desi SynchDB extension consists of these major components: * Debezium Runner (Java) - Responsible for connecting to source databases and get change events. * SynchDB Worker - Responsible for polling change events from Debezium Runner via JNI. -* Event Processor - Reponsible for processing raw events into internal structures. +* Event Processor - Responsible for processing raw events into internal structures. * Data Converter - Responsible for transforming data values. * Replication Agent - Responsible for applying changes to PostgreSQL. @@ -30,7 +30,7 @@ SynchDB extension consists of these major components: The following software is required to build and run SynchDB. The versions listed are the versions tested during development. Older versions may still work. * Java Development Kit 17 or later. Download [here](https://www.oracle.com/ca-en/java/technologies/downloads/) * Apache Maven 3.6.3 or later. Download [here](https://maven.apache.org/download.cgi) -* PostgreSQL source or build enviornment. Git clone [here](https://github.com/postgres/postgres). Refer to this [wiki](https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code) to build PostgreSQL from source or this [page](https://www.postgresql.org/download/linux/) to install PostgreSQL via packages +* PostgreSQL source or build environment. Git clone [here](https://github.com/postgres/postgres). Refer to this [wiki](https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code) to build PostgreSQL from source or this [page](https://www.postgresql.org/download/linux/) to install PostgreSQL via packages * Docker compose 2.28.1 (for testing). Refer to [here](https://docs.docker.com/compose/install/linux/) * Unix based operating system like Ubuntu 22.04 or MacOS @@ -131,7 +131,7 @@ Run ldconfig to reload: sudo ldconfig ``` -Ensure synchdo.so extension can link to libjvm Java library on your system: +Ensure synchdb.so extension can link to libjvm Java library on your system: ``` BASH ldd synchdb.so linux-vdso.so.1 (0x00007ffeae35a000) @@ -171,9 +171,9 @@ CREATE EXTENSION synchdb CASCADE; ``` ### Create a Connector -A connector represents the details to connecto to a remote heterogeneous database and describes what tables to replicate from. It can be created with `synchdb_add_conninfo()` function. +A connector represents the details to connect to a remote heterogeneous database and describes what tables to replicate from. It can be created with `synchdb_add_conninfo()` function. -Create a MySQL connector and replicate `inventory.orders` and `inventory.customers` tables under `invnetory` database: +Create a MySQL connector and replicate `inventory.orders` and `inventory.customers` tables under `inventory` database: ``` SQL SELECT synchdb_add_conninfo('mysqlconn','127.0.0.1', 3306, 'mysqluser', 'mysqlpwd', 'inventory', 'postgres', 'inventory.orders,inventory.customers', 'null', 'mysql'); ``` diff --git a/doc/docs/en/architecture/architecture.md b/doc/docs/en/architecture/architecture.md index 033a7d3..c04b28c 100644 --- a/doc/docs/en/architecture/architecture.md +++ b/doc/docs/en/architecture/architecture.md @@ -1,6 +1,6 @@ # Architecture Overview -## **Overall Archtecture Diagram** +## **Overall Architecture Diagram** ![img](../../images/synchdb-arch2.jpg) diff --git a/doc/docs/en/architecture/batch_change_handling.md b/doc/docs/en/architecture/batch_change_handling.md index 8276d05..ced6e93 100644 --- a/doc/docs/en/architecture/batch_change_handling.md +++ b/doc/docs/en/architecture/batch_change_handling.md @@ -4,7 +4,7 @@ SynchDB periodically fetches a batch of change request from Debezium runner engine at a period of `synchdb.naptime` milliseconds (default 100). This batch of change request is then processed by SynchDB. If all the change requests within the batch have been processed successfully (parsed, transformed and applied to PostgreSQL), SynchDB will notify the Debezium runner engine that this batch has been completed. This signals Debezium runner to commit the offset up until the last successfully completed change record. With this mechanism in place, SynchDB is able to track each change record and instruct Debezium runner not to fetch an old change that has been processed before, or not to send a duplcate change record. ## **Batch Handling** -SynchDB processes a batch within one transaction. This means the change events inside a batch are either all or none processed. When all the changes have been successfully processed, SynchDB simply sends a message to Debezium runner engine to mark the batch as processed and completed. This action causes offsets to be committed and eventually flush to disk. An offset represents a logical location during a replication similar to the LSN (Log Seqeuence Number) in PostgreSQL. +SynchDB processes a batch within one transaction. This means the change events inside a batch are either all or none processed. When all the changes have been successfully processed, SynchDB simply sends a message to Debezium runner engine to mark the batch as processed and completed. This action causes offsets to be committed and eventually flush to disk. An offset represents a logical location during a replication similar to the LSN (Log Sequence Number) in PostgreSQL. ![img](../../images/synchdb-batch-new.jpg) diff --git a/doc/docs/en/architecture/debezium_event_processor.md b/doc/docs/en/architecture/debezium_event_processor.md index cb0144c..ccc6f6c 100644 --- a/doc/docs/en/architecture/debezium_event_processor.md +++ b/doc/docs/en/architecture/debezium_event_processor.md @@ -265,4 +265,4 @@ the SPI Client component exists under the Replication Agent, which serves as a b ### **11) Executor APIs** -Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string likst SPI does. \ No newline at end of file +Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string like SPI does. \ No newline at end of file diff --git a/doc/docs/en/architecture/debezium_runner_components.md b/doc/docs/en/architecture/debezium_runner_components.md index 403bedf..98ab273 100644 --- a/doc/docs/en/architecture/debezium_runner_components.md +++ b/doc/docs/en/architecture/debezium_runner_components.md @@ -4,7 +4,7 @@ ![img](../../images/synchdb-dbzrunner-component2.jpg) -Debezium Runner resides on Java side of the deployment. It is the main faciliator between embedded Debezium engine (Java) and SynchDB Worker (C). It provides several Java methods that SynchDB worker can interact via JNI library. These interactions include initializing a Debezium engine, start or stop the engine, obtain a batch of change events and mark a batch as done. These operations are essential for ensuring replication consistency. Main components are: +Debezium Runner resides on Java side of the deployment. It is the main facilitator between embedded Debezium engine (Java) and SynchDB Worker (C). It provides several Java methods that SynchDB worker can interact via JNI library. These interactions include initializing a Debezium engine, start or stop the engine, obtain a batch of change events and mark a batch as done. These operations are essential for ensuring replication consistency. Main components are: 1. Parameter Class 2. Controller diff --git a/doc/docs/en/architecture/fdw_based_snapshot.md b/doc/docs/en/architecture/fdw_based_snapshot.md index 4d52fb3..45d386c 100644 --- a/doc/docs/en/architecture/fdw_based_snapshot.md +++ b/doc/docs/en/architecture/fdw_based_snapshot.md @@ -36,7 +36,7 @@ WARNING: **BACKUP_ADMIN permission is required to obtain the "cut-point" paramet * Before snapshot begins, read the current SCN value, which serves as a "cut-off" point for the snapshot * During foreign table schema migration, extra attribute "AS OF SCN xxx" will be associated with each desired foreign table,causing all the foreign reads to use Oracle's FLASHBACK query -* FLASHBACK query returns the table results as of the SCN specified so consistency is automatically guarenteed. No extra locking needed. +* FLASHBACK query returns the table results as of the SCN specified so consistency is automatically guaranteed. No extra locking needed. * Migrate all desired tables schema and data with proper type translations with FLASHBACK query. * Once done, the CDC can resume from the cut-off point, which will handle the data changes that happened during the snapshot. diff --git a/doc/docs/en/architecture/non_native_datatype_handling.md b/doc/docs/en/architecture/non_native_datatype_handling.md index 0603b60..189fc44 100644 --- a/doc/docs/en/architecture/non_native_datatype_handling.md +++ b/doc/docs/en/architecture/non_native_datatype_handling.md @@ -2,7 +2,7 @@ ## **Handling Non-Native Data Types** -It is possible that a table contains a column data type that is custom created by the user or created by another installed extension. In this case, it cannot be processed using tradition native data type handling becasue the type is most likely not supported natively. Instead, the DML Converter accesses the catalog, obtains the OID of the non-native data type, and looks up its "category" as defined in PostgreSQL. Below is a list of category supported by PostgreSQL as of version 17: +It is possible that a table contains a column data type that is custom created by the user or created by another installed extension. In this case, it cannot be processed using tradition native data type handling because the type is most likely not supported natively. Instead, the DML Converter accesses the catalog, obtains the OID of the non-native data type, and looks up its "category" as defined in PostgreSQL. Below is a list of category supported by PostgreSQL as of version 17: ``` #define TYPCATEGORY_INVALID '\0' diff --git a/doc/docs/en/architecture/openlog_replicator_event_processor.md b/doc/docs/en/architecture/openlog_replicator_event_processor.md index 6d8cebc..66a30c1 100644 --- a/doc/docs/en/architecture/openlog_replicator_event_processor.md +++ b/doc/docs/en/architecture/openlog_replicator_event_processor.md @@ -29,7 +29,7 @@ The Oracle Parser is responsible for parsing a Oracle query (DDL only) and produ The JSON Parser is responsible for parsing the incoming JSON change event into C structures that SynchDB can work with. SynchDB relies on PostgreSQL's native JSONB utility for all the parsing and iteration needs. Each DML event contains the `scn` and `commit scn` values, tells how each column value is represented based on data types, and the before / after values. -Unlink a DDL event from Debezium, Openlog Replicator's DDL Event contains the raw Oracle DDL query instead of a broken-down structure. This means that a `2) Oracle parser` is required to parse this DDL query further to learn about its intended actions. +Unlike a DDL event from Debezium, Openlog Replicator's DDL Event contains the raw Oracle DDL query instead of a broken-down structure. This means that a `2) Oracle parser` is required to parse this DDL query further to learn about its intended actions. **DML payload:** ```json @@ -165,7 +165,7 @@ The following Oracle features declared in DDL commands are not supported by Open * Index organized tables (IOT) * `CREATE TABLE AS` clauses * `CREATE TYPE` clauses -* `CREATE TABLE OF` caluses +* `CREATE TABLE OF` clauses * `ALTER TABLE MODIFY name DEFAULT` * `ALTER TABLE MODIFY name NOT NULL` * `ALTER TABLE MODIFY name NULL` @@ -174,7 +174,7 @@ The following Oracle features declared in DDL commands are not supported by Open * `ALTER TABLE RENAME` -The following constraints clauses are accpeted but ignored by Openlog Replicator connector: +The following constraints clauses are accepted but ignored by Openlog Replicator connector: * ENABLE VALIDATE * ENABLE NOVALIDATE @@ -229,4 +229,4 @@ the SPI Client component exists under the Replication Agent, which serves as a b ### **10) Executor APIs** -Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string likst SPI does. \ No newline at end of file +Also residing in the Replication Agent. This component is responsible for initialize a executor context, open the table, acquire proper locks, create TupleTableSlot (TTS) from the output of DML Converter, call the executor API to execute INSERT, UPDATE, DELETE operations and do resource cleanup. This is generally a much faster approach to do data operations than SPI because it does not need to parse an input query string like SPI does. \ No newline at end of file diff --git a/doc/docs/en/changelog.md b/doc/docs/en/changelog.md index dc23684..6fc4710 100644 --- a/doc/docs/en/changelog.md +++ b/doc/docs/en/changelog.md @@ -41,7 +41,7 @@ SynchDB 1.3 delivers a major performance enhancement with the new FDW-based snap ### Changed * Openlog Replicator Connector: Enhanced Oracle parser to support more contraint operators: enable, disable, novalidate, validate -* Openlog Replicator Connector: Enhanced Oracle parser to support MODIFY clauses with and without paraenthesis +* Openlog Replicator Connector: Enhanced Oracle parser to support MODIFY clauses with and without parenthesis * Openlog Replicator Connector: Enhanced Oracle parser to support DEFAULT ON NULL clauses. * Openlog Replicator Connector: Improve processing performance by removing one pre-scan loop. * Openlog Replicator Connector: Optimized non-null terminated event processing as PostgreSQL text type to save one copy operation. (PG17+) diff --git a/doc/docs/en/getting-started/configuration.md b/doc/docs/en/getting-started/configuration.md index 2a1fcba..6553997 100644 --- a/doc/docs/en/getting-started/configuration.md +++ b/doc/docs/en/getting-started/configuration.md @@ -63,7 +63,7 @@ synchdb.dbz_offset_flush_interval_ms=60000 # Flush synchdb.dbz_capture_only_selected_table_ddl=false # Debezium will only capture the schema of selected tables rather than all tables synchdb.max_connector_workers=10 # 10 connector workers can be run at a time synchdb.error_handling_strategy='retry' # connector should retry on error -synchdb.dbz_log_leve='error' # Debezium Runner should log error messages only +synchdb.dbz_log_level='error' # Debezium Runner should log error messages only synchdb.log_change_on_error=true # log JSON change event on error synchdb.cdc_start_delay_ms=30000 # wait 30s after snapshot completes and before CDC begins synchdb.olr_snapshot_engine="fdw" # use FDW based snapshot engine to complete the snapshot process @@ -119,11 +119,11 @@ synchdb.letter_casing_strategy="asis" # preser 10. **synchdb.dbz_incremental_snapshot_chunk_size** - Lower values: Slower processing of change events at lower JVM memory usage during incremental snapshot - Higher values: Faster processing of change events at higher JVM memory usage during incremental snapshot - - Recommended to set it the same as `synchdb.dbz_batch_size` and adjust Adjust based on resource requirements + - Recommended to set it the same as `synchdb.dbz_batch_size` and adjust based on resource requirements 11. **synchdb.dbz_offset_flush_interval_ms** - - Lower values: More frequent update to offset file, more IO, less old batches to re-preocess after fault restored - - Higher values: Less frequent update to offset file, less IO, more old batches to re-preocess after fault restored + - Lower values: More frequent update to offset file, more IO, less old batches to re-process after fault restored + - Higher values: Less frequent update to offset file, less IO, more old batches to re-process after fault restored - Recommended to set it to 60000 as Debezium's recommendation 12. **synchdb.max_connector_workers** diff --git a/doc/docs/en/getting-started/installation.md b/doc/docs/en/getting-started/installation.md index 82fe328..72f5273 100644 --- a/doc/docs/en/getting-started/installation.md +++ b/doc/docs/en/getting-started/installation.md @@ -216,7 +216,7 @@ Run ldconfig to reload: sudo ldconfig ``` -Ensure synchdo.so extension can link to libjvm Java library on your system: +Ensure synchdb.so extension can link to libjvm Java library on your system: ``` BASH ldd synchdb.so linux-vdso.so.1 (0x00007ffeae35a000) diff --git a/doc/docs/en/getting-started/quick_start.md b/doc/docs/en/getting-started/quick_start.md index 1a82049..8fe1c8d 100644 --- a/doc/docs/en/getting-started/quick_start.md +++ b/doc/docs/en/getting-started/quick_start.md @@ -485,7 +485,7 @@ By default, the connector will perform a `initial` snapshot to capture both the ## Simulate an INSERT Event and Observe CDC -We can use `docker exec` to similate an INSERT for each connector type and observe the Change Data Capture (CDC). +We can use `docker exec` to simulate an INSERT for each connector type and observe the Change Data Capture (CDC). **MySQL:** ```bash @@ -679,4 +679,4 @@ SELECT synchdb_del_conninfo('olrconn'); SELECT synchdb_stop_engine_bgw('postgresconn'); SELECT synchdb_del_conninfo('postgresconn'); -``` \ No newline at end of file +``` diff --git a/doc/docs/en/getting-started/remote_database_setups.md b/doc/docs/en/getting-started/remote_database_setups.md index 57d22ca..d135d77 100644 --- a/doc/docs/en/getting-started/remote_database_setups.md +++ b/doc/docs/en/getting-started/remote_database_setups.md @@ -264,7 +264,7 @@ sqlplus sys/oracle@//localhost:1521/FREE as sysdba ### **Enable Supplemental Log Data for Tables Designated for Capture** -This configuration needs to be run on each table designzted for catpure in order to correctly handle the UPDATE and DELETE operations. +This configuration needs to be run on each table designated for capture in order to correctly handle the UPDATE and DELETE operations. ```sql ALTER TABLE customer ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; diff --git a/doc/docs/en/index.md b/doc/docs/en/index.md index c81765a..79d26e6 100644 --- a/doc/docs/en/index.md +++ b/doc/docs/en/index.md @@ -34,7 +34,7 @@ This architecture allows PostgreSQL to leverage the rich ecosystem of Debezium c - PostgreSQL: 16, 17, 18 - IvorySQL: 4, 5 -## **Supported Source Databasess** +## **Supported Source Databases** - MySQL: 8.0.x, 8.2 - SQL Server: 2017, 2019, 2022 diff --git a/doc/docs/en/monitoring/state_view.md b/doc/docs/en/monitoring/state_view.md index 94385d7..0c45963 100644 --- a/doc/docs/en/monitoring/state_view.md +++ b/doc/docs/en/monitoring/state_view.md @@ -25,7 +25,7 @@ Column Details: | stage | the stage of the connector. See below.| | state | the state of the connector. See below.| | err | the last error message encountered by the worker which would have caused it to exit. This error could originated from PostgreSQL while processing a change, or originated from Debezium running engine while accessing data from heterogeneous database. | -| last_dbz_offset | the last Debezium offset captured by synchdb. Note that this may not reflect the current and real-time offset value of the connector engine. Rather, this is shown as a checkpoint that we could restart from this offeet point if needed.| +| last_dbz_offset | the last Debezium offset captured by synchdb. Note that this may not reflect the current and real-time offset value of the connector engine. Rather, this is shown as a checkpoint that we could restart from this offset point if needed.| **Possible States**: @@ -38,7 +38,7 @@ Column Details: - ⚪ `executing` - Applying changes - 🟤 `updating offset` - Updating checkpoint - 🟨 `restarting` - Reinitializing -- ⚪ `dumping memory` - JVM is prepaaring to dump memory info in log file +- ⚪ `dumping memory` - JVM is preparing to dump memory info in log file - ⚫ `unknown` - Indeterminate state **Possible Stages**: diff --git a/doc/docs/en/user-guide/configure_snapshot_engine.md b/doc/docs/en/user-guide/configure_snapshot_engine.md index 48aff28..4f7919d 100644 --- a/doc/docs/en/user-guide/configure_snapshot_engine.md +++ b/doc/docs/en/user-guide/configure_snapshot_engine.md @@ -143,4 +143,4 @@ sudo make install ``` -postgre is reasdy to go. Start a connector normally with synchdb.olr_snapshot_engine set to 'fdw'. If a snapshot is required, SynchDB will complete it via FDW. You do not have to run `CREATE EXTENSION mysql_fdw` prior to using FDW based initial snapshot, nor do you have to `CREATE SERVER` or `CREATE USER MAPPING`. SynchDB takes care of all of these when it performs the snapshot.. \ No newline at end of file +postgres is ready to go. Start a connector normally with synchdb.olr_snapshot_engine set to 'fdw'. If a snapshot is required, SynchDB will complete it via FDW. You do not have to run `CREATE EXTENSION mysql_fdw` prior to using FDW based initial snapshot, nor do you have to `CREATE SERVER` or `CREATE USER MAPPING`. SynchDB takes care of all of these when it performs the snapshot.. \ No newline at end of file diff --git a/doc/docs/en/user-guide/create_a_connector.md b/doc/docs/en/user-guide/create_a_connector.md index 110d51b..294d51e 100644 --- a/doc/docs/en/user-guide/create_a_connector.md +++ b/doc/docs/en/user-guide/create_a_connector.md @@ -8,7 +8,7 @@ Creating a connector can be done with utility SQL function `synchdb_add_conninfo synchdb_add_conninfo takes these arguments: -| argumet | description | +| argument | description | |-------------------- |-| | name | a unique identifier that represents this connector info | | hostname | the IP address or hostname of the heterogeneous database. | diff --git a/doc/docs/en/user-guide/default_datatype_mapping.md b/doc/docs/en/user-guide/default_datatype_mapping.md index 2be9ba1..712efbc 100644 --- a/doc/docs/en/user-guide/default_datatype_mapping.md +++ b/doc/docs/en/user-guide/default_datatype_mapping.md @@ -236,7 +236,7 @@ DatatypeHashEntry postgres_defaultTypeMappings[] = {{"cidr", false}, "cidr", 0}, {{"circle", false}, "circle", 0}, {{"date", false}, "date", 0}, - {{"decimal", false}, "dedcimal", -1}, + {{"decimal", false}, "decimal", -1}, {{"double precision", false}, "double precision", 0}, {{"float", false}, "float", 0}, {{"float4", false}, "float4", 0}, diff --git a/doc/docs/en/user-guide/start_stop_connector.md b/doc/docs/en/user-guide/start_stop_connector.md index 9daabb0..6508e05 100644 --- a/doc/docs/en/user-guide/start_stop_connector.md +++ b/doc/docs/en/user-guide/start_stop_connector.md @@ -2,7 +2,7 @@ ## **Control a Connector** -SynchDB provides several utility function to control the behavior and life cycle or a created connector. +SynchDB provides several utility function to control the behavior and life cycle of a created connector. ## **Start a Connector with Default Snapshot Mode**