Component supports two access modes:
Direct access to delta tables in your blob storage. We currently support the following providers:
- AWS S3: Access Grants Credentials
- Azure Blob Storage: Create SAS Tokens
- Google Cloud Storage: Create service account
In this mode, the Delta Table path is defined by specifying the bucket/container and blob location where the table data is stored.
Currently we support only Azure Blob Storage backend.
Setup Requirements:
- Access Token: How to get access token in Databricks
- External Data Access: Enable external data access on the metastore
- Permissions: Grant EXTERNAL USE SCHEMA permission
-
Navigate to: Workspace > Permissions > Add external use schema
In this mode, the user selects the catalog, schema, and table in the configuration row.
-
Unity Catalog supports two table types
- External tables - Writing from the component directly to the underlying blob storage, and updating metadata in Unity Catalog.
- Native (Databricks) tables - Data are loaded using the selected DBX Warehouse.
Component can have mapped either one input table, or one or multiple parquet files with the same schema (not supported by the Native Databricks write mode).
- Load Type Append, Overwrite, Upsert (supported only for native tables, using the PK from the input table), Raise error when existing (supported only for external table).
- Columns to partition by [optional] - List of columns to partition the table.
- Warehouse DBX Warehouse to use for loading data (only for native tables).
- Batch size - Bigger batch will increase speed but also can cause out-of-memory issues more likely.
- Preserve Insertion Order - Disabling this option may help prevent out-of-memory issues.