Snowflake+ Iceberg / Open Catalog#

Snowflake+ is a drop-in replacement for OSS Snowflake destination that adds Apache Iceberg tables creation and related features.

It uses Snowflake to manage Iceberg data - tables are created and data is copied via Snowflake SQL and automatically visible in Snowflake (HORIZON)
catalog as other (native) tables. On top of that, Snowflake provides table maintenance (like compacting, deleting snapshot etc.).

Snowflake Open Catalog (Polaris) is fully supported via CATALOG SYNC option. Both new data and all schema migrations performed by dlt are visible in it without any additional code or setup.

All data access methods (pandas, arrow, Ibis, SQL etc.) that dlt supports via pipeline.dataset() are available.

This destination is available starting from dltHub version 0.9.0. It fully supports all the functionality of the standard Snowflake destination, plus:

The ability to create Iceberg tables in Snowflake by configuring iceberg_mode in your config.toml file.
Additional configuration for Iceberg tables in Snowflake via:
- external_volume: The external volume name where Iceberg data is stored.
- catalog: The catalog name in which Iceberg tables are created. Defaults to "SNOWFLAKE".
- base_location: A template string for the base path that Snowflake uses for storing the table data in external storage, supporting placeholders.
- extra_placeholders: Additional values that can be used in the base_location template.
- catalog_sync: The name of a catalog integration configured for Snowflake Open Catalog. If specified, Snowflake syncs Snowflake-managed Iceberg tables in the database with an external catalog in your Snowflake Open Catalog account.

Installation#

Install the dlt package with the snowflake extra:

pip install "dlt[hub,snowflake]"

Once the snowflake extra is installed, you can configure a pipeline to use snowflake_plus exactly the same way you would use the snowflake destination.

Setup#

Configure your Snowflake credentials
Set up a database user and permissions
Configure an external volume in Snowflake
Grant usage on the external volume to the role you are using to load data:

GRANT USAGE ON EXTERNAL VOLUME <external_volume_name> TO ROLE <role_name>;

If you don't have a dltHub workspace yet, scaffold one with uvx dlthub-start@latest (see the installation guide). Then, from inside the workspace, add a Snowflake pipeline:

dlthub pipeline init sql_database snowflake

Add the configuration to your config.toml file. Set external_volume to the name of the external volume you created in step 3 and enable Iceberg table creation by setting iceberg_mode:

[destination.snowflake]
external_volume = "<external_volume_name>"
iceberg_mode = "all"

Use the snowflake_plus destination in your pipeline:

import dlt

pipeline = dlt.pipeline(
    pipeline_name="my_snowflake_plus_pipeline",
    destination="snowflake_plus",
    dataset_name="my_dataset"
)

@dlt.resource
def my_iceberg_table():
    ...

Configuration#

The snowflake_plus destination extends the standard Snowflake configuration with additional options:

`iceberg_mode`#

Controls which tables are created as Iceberg tables.

Possible values:
- "all": All tables including dlt system tables are created as Iceberg tables
- "data_tables": Only data tables (non-dlt system tables) are created as Iceberg tables
- "none": No tables are created as Iceberg tables
Required: No
Default: "none"

`external_volume`#

The external volume to store Iceberg metadata.

Required: Yes
Default: None

`catalog`#

The catalog to use for Iceberg tables.

Required: No
Default: "SNOWFLAKE". This will use Snowflake as the catalog for the Iceberg tables.

`base_location`#

Template string for the base location where Iceberg data is stored in the external volume. Supports placeholders like {dataset_name} and {table_name}.

Required: No
Default: "{dataset_name}/{table_name}"

`extra_placeholders`#

Dictionary of additional values that can be used in the base_location template. The values can be static strings or functions that accept the dataset name and table name as arguments and return a string.

Required: No
Default: None

`catalog_sync`#

The name of a catalog integration for syncing Iceberg tables to an external catalog in Snowflake Open Catalog.

Required: No
Default: None

Configure these options in your config.toml file under the [destination.snowflake] section.

Base location templating#

The base_location parameter controls where Snowflake stores your Iceberg table data and metadata in the external volume. It's a template string that supports the following built-in placeholders:

{dataset_name}: The name of your dataset
{table_name}: The name of the table

For more flexibility, you can also define custom placeholders using the extra_placeholders option.

Examples#

The default pattern {dataset_name}/{table_name} creates paths like my_dataset/customers in your external volume.
Custom static path:
```
[destination.snowflake]
base_location = "custom/static/path"
```
This creates all tables in the same directory custom/static/path.

Using custom placeholders:

[destination.snowflake]
base_location = "{env}/{dataset_name}/{table_name}"
extra_placeholders = { env = "prod" }

This creates paths like prod/my_dataset/customers.

How Snowflake uses the base location#

When you provide a base_location, Snowflake uses it to create the paths where data and metadata are stored in your external cloud storage. The actual directory structure Snowflake creates follows this pattern:

STORAGE_BASE_URL/BASE_LOCATION.<randomId>/[data | metadata]/

Where <randomId> is a random Snowflake-generated 8-character string appended to create a unique directory.

For more details on how Snowflake organizes Iceberg table files in external storage, see the Snowflake documentation on data and metadata directories.

Table format for individual tables#

You can specify table format (Iceberg/Native) for individual dlt resources. For example:

@dlt.resource(
  table_format="native"
)
def my_resource():
    ...

pipeline = dlt.pipeline("loads_native", destination="snowflake_plus")

Will create a native (non-iceberg) my_resource table, also when you set the iceberg_mode to all or data_tables.

Write dispositions#

All standard write dispositions (append, replace, and merge) are supported for both regular Snowflake tables and Iceberg tables.

Data types#

The Snowflake Plus destination supports all standard Snowflake destination data types, with additional type mappings for Iceberg tables:

dlt Type	Iceberg Type
`text`	`string`
`bigint`	`long`, `int`
`double`	`double`
`bool`	`boolean`
`timestamp`	`timestamp`
`date`	`date`
`time`	`time`
`decimal`	`decimal`
`binary`	`binary`
`json`	`string`

Syncing Snowflake-managed Iceberg tables to Snowflake Open Catalog#

To enable querying of Snowflake-managed Iceberg tables by third-party engines (for example, Apache Spark) via an external catalog (Snowflake Open Catalog), use the catalog_sync configuration option. This setting specifies a catalog integration that syncs Iceberg tables to the external catalog.

Setup#

Create an external catalog in Snowflake Open Catalog.
Create a catalog integration in Snowflake. Example:

  CREATE OR REPLACE CATALOG INTEGRATION my_open_catalog_int
    CATALOG_SOURCE = POLARIS
    TABLE_FORMAT = ICEBERG
    REST_CONFIG = (
      CATALOG_URI = 'https://<orgname>-<my-snowflake-open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog'
      CATALOG_NAME = 'myOpenCatalogExternalCatalogName'
    )
    REST_AUTHENTICATION = (
      TYPE = OAUTH
      OAUTH_CLIENT_ID = 'myClientId'
      OAUTH_CLIENT_SECRET = 'myClientSecret'
      OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
    )
    ENABLED = TRUE;

Refer to the Snowflake documentation for detailed setup instructions.

Configure the catalog_sync option in your config.toml:

[destination.snowflake]
# ... other configuration
catalog_sync = "my_open_catalog_int"

Additional Resources#

For more information on basic Snowflake destination functionality, please refer to the Snowflake destination documentation.

Snowflake+ Iceberg / Open Catalog#

Installation#

Setup#

Configuration#

iceberg_mode#

external_volume#

catalog#

base_location#

extra_placeholders#

catalog_sync#

Base location templating#

Examples#

How Snowflake uses the base location#

Table format for individual tables#

Write dispositions#

Data types#

Syncing Snowflake-managed Iceberg tables to Snowflake Open Catalog#

Setup#

Additional Resources#

`iceberg_mode`#

`external_volume`#

`catalog`#

`base_location`#

`extra_placeholders`#

`catalog_sync`#