Source configuration#
import { DltHubFeatureAdmonition } from '@theme/DltHubFeatureAdmonition';
The dlt.yml file enables a fully declarative setup of your data source and its parameters. It supports built-in sources such as REST APIs, SQL databases, and cloud storage, as well as any custom source you define.
Credential placeholders for the defined sources are automatically generated in .dlt/secrets.toml. Alternatively, configuration may also be provided directly within dlt.yml.
REST API#
The built-in rest_api-type enables configuration of REST-based integrations. Multiple endpoints can be defined under a single source.
sources:
pokemon_api:
type: rest_api
client:
base_url: https://pokeapi.co/api/v2/
paginator: auto
resource_defaults:
primary_key: name
resources:
- pokemon
- berry
-
name: encounter_conditions
endpoint:
path: encounter-conditions
params:
offset:
type: incremental
cursor_path: name
write_disposition: append
type: rest_api: Specifies the use of the built-in REST API source.client.base_url: Sets the root URL for all API requests.paginator: auto: Enables automatic detection and handling of pagination.resource_defaults: Contains the default values to configure the dlt resources. This configuration is applied to all resources unless overridden by the resource-specific configuration.- Each item in
resourcesdefines an endpoint to extract. Simple entries likepokemonandberrywill fetch from/pokemonand/berry, respectively. - The
encounter-conditiosresource uses an advanced configuration:path: Point to the/encounter-conditionendpoint.params.offset: Enables incremental loading using thenamefield as the cursor.write_disposition: replace: Replaces the destination dataset with whatever the source produced on this run.
SQL database#
For SQL-base extractions that require no table-specific parameter configuration, it's possible to initialize type: sql_database and declare multiple tables at once.
General SQL database source#
sources:
sql_source:
type: sql_database
table_names:
- family
- clan
incremental:
cursor_path: updated
initial_value: 2023-01-12T11:21:28Z
This defines a connection to a SQL database with incremental loading applied across multiple tables.
-
type: sql_database: Specifies the SQL database connector. -
table_names: List of tables to extract. -
incremental: Global configuration for incremental extraction.
Table-specific configuration#
For table specific configurationssettings such as different primary_keys, individual tables can be defined as standalone sources using the sql_table type.
sources:
sql_family:
type: sql_database.sql_table
table: family
incremental:
cursor_path: updated
initial_value: 2023-01-12T11:21:28Z
primary_key: rfam_id
-
type: sql_table: Indicates a single-table extraction. -
table: Name of the table to extract. -
incremental: Enables incremental loading for the table. -
primary_key: Specifies the table's unique identifier for deduplication and merges.
Filesystem#
Filesystem sources can be set via the readers type and the filesystem specific resources can be called via the CLI run pipeline command.
sources:
file_source:
type: filesystem.readers
bucket_url: file://Users/admin/Documents/csv_files
file_glob: '*.csv'
dlt pipeline file_pipeline run --resources read_csv