Sources
Sources are typed data schemas that help Vinyl understand what data is available
Setting up a Resource
A resource serves as a connection to data sources. Resources can be a data warehouse (BigQuery, Snowflake), transactional databases (Postgres), object storage (S3), data lakes (Delta) or local files (parquet, csv, json). Vinyl comes with standard resource connectors:
- TableFileConnector (csv, json, parquet)
- BigQueryConnector
- PostgresConnector
- DatabaseFileConnector (duckdb)
Resources are defined in the resources.py
file of our project. Resources are used to fetch metadata and source data queries.
Here’s an example of a postgres connection:
Generate Sources
Once we’ve defined a resource, we can pull in source data schemas automatically using the following vinyl CLI command
This command populates the sources/
directory with typed source data schemas. These schemas are used to understand the structure of the data and to generate source data queries.
Here’s an example of a source schema for a postgres table:
Using Sources
Sources are used just like any python class. You can import a source in a notebook, script or vinyl project:
Generated sources serve as a data contract. With sources, you can validate data transforms without hitting the database and detect upstream schema drift to prevent breaking changes.
List Sources
As you’re adding resources and generating sources, you can list all the sources in your project using the following vinyl CLI command:
Was this page helpful?