Release highlights: 1.17
New: DuckLake destination
You can now use the DuckLake destination — supporting all bucket and catalog combinations. It’s a great fit for lightweight data lakes and local development setups.
Custom metrics in pipelines
You can now collect custom metrics directly inside your resources and transform steps. This makes it easy to track things like page counts, skipped rows, or API calls — right where the data is extracted.
Use dlt.current.resource_metrics() to store custom values while your resource runs. These metrics are automatically merged into the pipeline trace and visible in the run summary.
Example:
import dlt
from dlt.sources.helpers.rest_client import RESTClient
from dlt.sources.helpers.rest_client.paginators import JSONLinkPaginator
client = RESTClient(
base_url="https://pokeapi.co/api/v2",
paginator=JSONLinkPaginator(next_url_path="next"),
data_selector="results",
)
@dlt.resource
def get_pokemons():
metrics = dlt.current.resource_metrics()
metrics["page_count"] = 0
for page in client.paginate("/pokemon", params={"limit": 100}):
metrics["page_count"] += 1
yield page
pipeline = dlt.pipeline("get_pokemons", destination="duckdb")
load_info = pipeline.run(get_pokemons)
print("Custom metrics:", pipeline.last_trace.last_extract_info.metrics)
Custom metrics are grouped together with performance and transform stats under resource_metrics, so you can view them easily in traces or dashboards.
Limit your data loads for testing
When working with large datasets, you can now limit how much data a resource loads using the new add_limit method. This is perfect for sampling a few records to preview your data or test transformations faster.
Example:
import itertools
import dlt
# Load only the first 10 items from an infinite stream
r = dlt.resource(itertools.count(), name="infinity").add_limit(10)
You can also:
-
Count rows instead of yields:
my_resource().add_limit(10, count_rows=True) -
Or stop extraction after a set time:
my_resource().add_limit(max_time=10)
It’s a simple but powerful way to test pipelines quickly without pulling millions of rows.
Incremental loading for filesystem
You can now use incremental loading with the filesystem source even easier — ideal for tracking updated or newly added files in S3 or local folders.
dlt detects file changes (using fields like modification_date) and loads only what’s new.
Example:
import dlt
from dlt.sources.filesystem import filesystem, read_parquet
filesystem_resource = filesystem(
bucket_url="s3://my-bucket/files",
file_glob="**/*.parquet",
incremental=dlt.sources.incremental("modification_date")
)
pipeline = dlt.pipeline("my_pipeline", destination="duckdb")
pipeline.run((filesystem_resource | read_parquet()).with_name("table_name"))
You can also split large incremental loads into smaller chunks:
- Partition loading – divide your files into ranges and load each independently (even in parallel).
- Split loading – process files sequentially in small batches using
row_order,files_per_page, oradd_limit().
This makes it easy to backfill large file collections efficiently and resume incremental updates without reloading everything.
Split and partition large SQL loads
When working with huge tables, you can now split incremental loads into smaller chunks or partition backfills into defined ranges. This makes data appear faster and allows you to retry only failed chunks instead of reloading everything.
Split loading
If your source returns data in a deterministic order (for example, ordered by created_at), you can combine incremental with add_limit() to process batches sequentially:
import dlt
from dlt.sources.sql_database import sql_table
pipeline = dlt.pipeline("split_load", destination="duckdb")
messages = sql_table(
table="chat_message",
incremental=dlt.sources.incremental(
"created_at",
row_order="asc", # required for split loading
range_start="open" # disables deduplication
),
)
# Load one-minute chunks until done
while not pipeline.run(messages.add_limit(max_time=60)).is_empty:
pass
Partitioned backfills
You can also load large datasets in parallel partitions using initial_value and end_value. Each range runs independently, helping you rebuild large tables safely and efficiently.
Together, these methods make incremental loading more flexible and robust for both testing and production-scale pipelines.
Shout-out to new contributors
Big thanks to our newest contributors:
- @rik-adegeest — #3070
- @AndreiBondarenko — #3086
- @alkaline-0 — #3096
- @ianedmundson1 — #3043
- @chulkilee — #3120
Full release notes