Metadata-Version: 2.4
Name: dlt
Version: 1.23.0
Summary: dlt is an open-source python-first scalable data loading library that does not require any backend to run.
Project-URL: Homepage, https://github.com/dlt-hub
Project-URL: Repository, https://github.com/dlt-hub/dlt
Author-email: "dltHub Inc." <services@dlthub.com>
Maintainer-email: Marcin Rudolf <marcin@dlthub.com>, Adrian Brudaru <adrian@dlthub.com>, Anton Burnashev <anton@dlthub.com>, David Scharf <david@dlthub.com>
License-Expression: Apache-2.0
License-File: LICENSE.txt
Keywords: etl
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.9.2
Requires-Dist: click>=7.1
Requires-Dist: fsspec>=2022.4.0
Requires-Dist: gitpython>=3.1.29
Requires-Dist: giturlparse>=0.10.0
Requires-Dist: humanize>=4.4.0
Requires-Dist: jsonpath-ng<1.8,>=1.5.3
Requires-Dist: orjson!=3.10.1,!=3.9.11,!=3.9.12,!=3.9.13,!=3.9.14,<4,>=3.6.7; platform_python_implementation != 'PyPy' and sys_platform != 'emscripten'
Requires-Dist: orjson>=3.10.1; platform_python_implementation != 'PyPy' and sys_platform != 'emscripten'
Requires-Dist: orjson>=3.11.0; python_version > '3.13'
Requires-Dist: packaging>=21.1
Requires-Dist: pathvalidate>=2.5.2
Requires-Dist: pendulum>=2.1.2
Requires-Dist: pendulum>=3.0.0; python_version > '3.13'
Requires-Dist: pluggy>=1.3.0
Requires-Dist: pytz>=2022.6
Requires-Dist: pywin32>=306; sys_platform == 'win32'
Requires-Dist: pyyaml>=5.4.1
Requires-Dist: requests>=2.26.0
Requires-Dist: requirements-parser>=0.5.0
Requires-Dist: rich-argparse>=1.6.0
Requires-Dist: semver>=3.0.0
Requires-Dist: setuptools>=65.6.0
Requires-Dist: simplejson>=3.17.5
Requires-Dist: sqlglot>=25.4.0
Requires-Dist: tenacity>=8.0.2
Requires-Dist: tomlkit>=0.11.3
Requires-Dist: typing-extensions>=4.8.0
Requires-Dist: tzdata>=2022.1
Requires-Dist: win-precise-time>=1.4.2; os_name == 'nt' and python_version < '3.13'
Provides-Extra: athena
Requires-Dist: botocore>=1.28; extra == 'athena'
Requires-Dist: pyarrow>=16.0.0; extra == 'athena'
Requires-Dist: pyathena>=2.9.6; extra == 'athena'
Requires-Dist: s3fs>=2022.4.0; extra == 'athena'
Provides-Extra: az
Requires-Dist: adlfs>=2024.7.0; extra == 'az'
Provides-Extra: bigquery
Requires-Dist: db-dtypes>=1.2.0; extra == 'bigquery'
Requires-Dist: gcsfs>=2022.4.0; extra == 'bigquery'
Requires-Dist: google-cloud-bigquery>=2.26.0; extra == 'bigquery'
Requires-Dist: grpcio>=1.50.0; extra == 'bigquery'
Requires-Dist: pyarrow>=16.0.0; extra == 'bigquery'
Provides-Extra: cli
Requires-Dist: cron-descriptor>=1.2.32; extra == 'cli'
Requires-Dist: pip>=23.0.0; extra == 'cli'
Requires-Dist: pipdeptree<2.10,>=2.9.3; extra == 'cli'
Provides-Extra: clickhouse
Requires-Dist: adlfs>=2024.7.0; extra == 'clickhouse'
Requires-Dist: clickhouse-connect>=0.7.7; extra == 'clickhouse'
Requires-Dist: clickhouse-driver>=0.2.7; extra == 'clickhouse'
Requires-Dist: gcsfs>=2022.4.0; extra == 'clickhouse'
Requires-Dist: pyarrow>=16.0.0; extra == 'clickhouse'
Requires-Dist: s3fs>=2022.4.0; extra == 'clickhouse'
Provides-Extra: databricks
Requires-Dist: databricks-sdk>=0.38.0; extra == 'databricks'
Requires-Dist: databricks-sql-connector>=2.9.3; (python_version <= '3.12') and extra == 'databricks'
Requires-Dist: databricks-sql-connector>=3.6.0; (python_version >= '3.13') and extra == 'databricks'
Provides-Extra: dbml
Requires-Dist: pydbml; extra == 'dbml'
Provides-Extra: deltalake
Requires-Dist: deltalake>=0.25.1; extra == 'deltalake'
Requires-Dist: pyarrow>=16.0.0; extra == 'deltalake'
Provides-Extra: dremio
Requires-Dist: pyarrow>=16.0.0; extra == 'dremio'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.9; extra == 'duckdb'
Provides-Extra: ducklake
Requires-Dist: duckdb>=1.2.0; extra == 'ducklake'
Requires-Dist: pyarrow>=16.0.0; extra == 'ducklake'
Provides-Extra: fabric
Requires-Dist: adlfs>=2024.7.0; extra == 'fabric'
Requires-Dist: pyarrow>=16.0.0; extra == 'fabric'
Requires-Dist: pyodbc>=4.0.39; extra == 'fabric'
Provides-Extra: filesystem
Requires-Dist: botocore>=1.28; extra == 'filesystem'
Requires-Dist: s3fs>=2022.4.0; extra == 'filesystem'
Provides-Extra: gcp
Requires-Dist: db-dtypes>=1.2.0; extra == 'gcp'
Requires-Dist: gcsfs>=2022.4.0; extra == 'gcp'
Requires-Dist: google-cloud-bigquery>=2.26.0; extra == 'gcp'
Requires-Dist: grpcio>=1.50.0; extra == 'gcp'
Provides-Extra: gs
Requires-Dist: gcsfs>=2022.4.0; extra == 'gs'
Provides-Extra: hf
Requires-Dist: huggingface-hub>=1.4.1; extra == 'hf'
Requires-Dist: pyarrow>=21.0.0; extra == 'hf'
Provides-Extra: http
Requires-Dist: aiohttp>3.9.0; extra == 'http'
Provides-Extra: hub
Requires-Dist: dlt-runtime<0.24,>=0.21.2; (python_version >= '3.10') and extra == 'hub'
Requires-Dist: dlthub<0.24,>=0.21.1; (python_version >= '3.10') and extra == 'hub'
Provides-Extra: lancedb
Requires-Dist: duckdb>=1.4.3; extra == 'lancedb'
Requires-Dist: ibis-framework>=12.0.0; (python_version >= '3.10') and extra == 'lancedb'
Requires-Dist: lancedb>=0.22.0; extra == 'lancedb'
Requires-Dist: pyarrow>=16.0.0; extra == 'lancedb'
Provides-Extra: motherduck
Requires-Dist: duckdb>=0.9; extra == 'motherduck'
Requires-Dist: pyarrow>=16.0.0; extra == 'motherduck'
Provides-Extra: mssql
Requires-Dist: pyodbc>=4.0.39; extra == 'mssql'
Provides-Extra: oracle
Requires-Dist: oracledb>=3.4.1; extra == 'oracle'
Provides-Extra: parquet
Requires-Dist: pyarrow>=16.0.0; extra == 'parquet'
Provides-Extra: postgis
Requires-Dist: psycopg2-binary>=2.9.1; extra == 'postgis'
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.1; extra == 'postgres'
Provides-Extra: pyiceberg
Requires-Dist: pyarrow>=16.0.0; extra == 'pyiceberg'
Requires-Dist: pyiceberg-core>=0.6.0; extra == 'pyiceberg'
Requires-Dist: pyiceberg>=0.9.1; extra == 'pyiceberg'
Requires-Dist: sqlalchemy>=1.4; extra == 'pyiceberg'
Provides-Extra: qdrant
Requires-Dist: qdrant-client[fastembed]>=1.8; extra == 'qdrant'
Provides-Extra: redshift
Requires-Dist: psycopg2-binary>=2.9.1; extra == 'redshift'
Provides-Extra: s3
Requires-Dist: botocore>=1.28; extra == 's3'
Requires-Dist: s3fs>=2022.4.0; extra == 's3'
Provides-Extra: sftp
Requires-Dist: paramiko>=3.3.0; extra == 'sftp'
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python>=3.5.0; extra == 'snowflake'
Provides-Extra: sql-database
Requires-Dist: sqlalchemy>=1.4; extra == 'sql-database'
Provides-Extra: sqlalchemy
Requires-Dist: alembic>1.10.0; extra == 'sqlalchemy'
Requires-Dist: sqlalchemy>=1.4; extra == 'sqlalchemy'
Provides-Extra: synapse
Requires-Dist: adlfs>=2024.7.0; extra == 'synapse'
Requires-Dist: pyarrow>=16.0.0; extra == 'synapse'
Requires-Dist: pyodbc>=4.0.39; extra == 'synapse'
Provides-Extra: weaviate
Requires-Dist: weaviate-client<5.0.0,>=4.0.0; extra == 'weaviate'
Provides-Extra: workspace
Requires-Dist: duckdb>=0.9; extra == 'workspace'
Requires-Dist: fastmcp>=3.0.0; (python_version >= '3.10') and extra == 'workspace'
Requires-Dist: ibis-framework>=12.0.0; (python_version >= '3.10') and extra == 'workspace'
Requires-Dist: marimo>=0.14.5; extra == 'workspace'
Requires-Dist: mowidgets>=0.2.1; (python_version >= '3.11') and extra == 'workspace'
Requires-Dist: pathspec>=0.11.2; extra == 'workspace'
Requires-Dist: pyarrow>=16.0.0; extra == 'workspace'
Requires-Dist: pydbml>=1.2.0; extra == 'workspace'
Description-Content-Type: text/markdown

<h1 align="center">
    <strong>data load tool (dlt) — the open-source Python library that automates all your tedious data loading tasks</strong>
</h1>
<p align="center">
Be it a Google Colab notebook, AWS Lambda function, an Airflow DAG, your local laptop,<br/>or a GPT-4 assisted development playground—<strong>dlt</strong> can be dropped in anywhere.
</p>


<h3 align="center">

🚀 Join our thriving community of likeminded developers and build the future together!

</h3>

<div align="center">
  <a target="_blank" href="https://dlthub.com/community" style="background:none">
    <img src="https://img.shields.io/badge/slack-join-dlt.svg?labelColor=191937&color=6F6FF7&logo=slack" style="width: 260px;"  />
  </a>
</div>
<div align="center">
  <a target="_blank" href="https://pypi.org/project/dlt/" style="background:none">
    <img src="https://img.shields.io/pypi/v/dlt?labelColor=191937&color=6F6FF7">
  </a>
  <a target="_blank" href="https://pypi.org/project/dlt/" style="background:none">
    <img src="https://img.shields.io/pypi/pyversions/dlt?labelColor=191937&color=6F6FF7">
  </a>
  <a target="_blank" href="https://pypi.org/project/dlt/" style="background:none">
    <img src="https://img.shields.io/pypi/dm/dlt?labelColor=191937&color=6F6FF7">
  </a>
</div>

## Installation

dlt supports Python 3.9 through Python 3.14. Note that some optional extras are not yet available for Python 3.14, so support for this version is considered experimental.

```sh
pip install dlt
```

## Quick Start

Load chess game data from chess.com API and save it in DuckDB:

```python
import dlt
from dlt.sources.helpers import requests

# Create a dlt pipeline that will load
# chess player data to the DuckDB destination
pipeline = dlt.pipeline(
    pipeline_name='chess_pipeline',
    destination='duckdb',
    dataset_name='player_data'
)

# Grab some player data from Chess.com API
data = []
for player in ['magnuscarlsen', 'rpragchess']:
    response = requests.get(f'https://api.chess.com/pub/player/{player}')
    response.raise_for_status()
    data.append(response.json())

# Extract, normalize, and load the data
pipeline.run(data, table_name='player')
```


Try it out in our **[Colab Demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing)** or directly on our wasm-based [playground](https://dlthub.com/docs/tutorial/playground) in our docs.

## Features

dlt is an open-source Python library that loads data from various, often messy data sources into well-structured datasets. It provides lightweight Python interfaces to extract, load, inspect, and transform data. dlt and dlt docs are built from the ground up to be used with LLMs: the [LLM-native workflow](https://dlthub.com/docs/dlt-ecosystem/llm-tooling/llm-native-workflow) will take your pipeline code to data in a notebook for over [5000 sources](https://dlthub.com/workspace).

dlt is designed to be easy to use, flexible, and scalable:

- dlt extracts data from [REST APIs](https://dlthub.com/docs/tutorial/rest-api), [SQL databases](https://dlthub.com/docs/tutorial/sql-database), [cloud storage](https://dlthub.com/docs/tutorial/filesystem), [Python data structures](https://dlthub.com/docs/tutorial/load-data-from-an-api), and [many more](https://dlthub.com/docs/dlt-ecosystem/verified-sources).
- dlt infers [schemas](https://dlthub.com/docs/general-usage/schema) and [data types](https://dlthub.com/docs/general-usage/schema/#data-types), [normalizes the data](https://dlthub.com/docs/general-usage/schema/#data-normalizer), and handles nested data structures.
- dlt supports a variety of [popular destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/) and has an interface to add [custom destinations](https://dlthub.com/docs/dlt-ecosystem/destinations/destination) to create reverse ETL pipelines.
- dlt automates pipeline maintenance with [incremental loading](https://dlthub.com/docs/general-usage/incremental-loading), [schema evolution](https://dlthub.com/docs/general-usage/schema-evolution), and [schema and data contracts](https://dlthub.com/docs/general-usage/schema-contracts).
- dlt supports [Python and SQL data access](https://dlthub.com/docs/general-usage/dataset-access/), [transformations](https://dlthub.com/docs/dlt-ecosystem/transformations), [pipeline inspection](https://dlthub.com/docs/general-usage/dashboard.md), and [visualizing data in Marimo Notebooks](https://dlthub.com/docs/general-usage/dataset-access/marimo).
- dlt can be deployed anywhere Python runs, be it on [Airflow](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [serverless functions](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions), or any other cloud deployment of your choice.

## Documentation

For detailed usage and configuration, please refer to the [official documentation](https://dlthub.com/docs).

## Examples

You can find examples for various use cases in the [examples](docs/examples) folder, or in the [code examples section](https://dlthub.com/docs/examples) of our docs page.

## Adding as dependency

`dlt` follows the semantic versioning with the [`MAJOR.MINOR.PATCH`](https://peps.python.org/pep-0440/#semantic-versioning) pattern.

* `major` means breaking changes and removed deprecations
* `minor` new features, sometimes automatic migrations
* `patch` bug fixes

We suggest that you allow only `patch` level updates automatically:
* Using the [Compatible Release Specifier](https://packaging.python.org/en/latest/specifications/version-specifiers/#compatible-release). For example **dlt~=1.0** allows only versions **>=1.0** and less than **<1.1**
* Poetry [caret requirements](https://python-poetry.org/docs/dependency-specification/). For example **^1.0** allows only versions **>=1.0** to **<1.0**

Please also see our [release notes](https://github.com/dlt-hub/dlt/releases) for notable changes between versions.

## Get Involved

The dlt project is quickly growing, and we're excited to have you join our community! Here's how you can get involved:

- **Connect with the Community**: Join other dlt users and contributors on our [Slack](https://dlthub.com/community)
- **Report issues and suggest features**: Please use the [GitHub Issues](https://github.com/dlt-hub/dlt/issues) to report bugs or suggest new features. Before creating a new issue, make sure to search the tracker for possible duplicates and add a comment if you find one.
- **Track progress of our work and our plans**: Please check out our [public Github project](https://github.com/orgs/dlt-hub/projects/9)
- **Improve documentation**: Help us enhance the dlt documentation.

## Contribute code
Please read [CONTRIBUTING](CONTRIBUTING.md) before you make a PR.

- 📣 **New destinations are unlikely to be merged** due to high maintenance cost (but we are happy to improve SQLAlchemy destination to handle more dialects)
- Significant changes require tests and docs and in many cases writing tests will be more laborious than writing code
- Bugfixes and improvements are welcome! You'll get help with writing tests and docs + a decent review.

## License

`dlt` is released under the [Apache 2.0 License](LICENSE.txt).
