Models & Metrics
Display inline code and code blocks
Models
Models are the backbone of a vinyl project. Models are cleaned representations of your source data. Models can either be virtualized or backed by a table (common in data warehouses). Models can be published for use in the semantic layer or used internally for analysis.
To create a model, import the model
decorator from vinyl and create function.
Models are isolated functions that handle transforms of data. Models are built in a pipelined way through mutable state. Models are deferred in their execution allowing for lazy evaluation.
Let’s add a dependency to our model. Dependencies can either be a source or another model.
Vinyl is built on Ibis, a powerful and performant data manipulation library. This allows for a wide range of transformations to be performed on your data all within a model function. You can reference all of the model transforms in the reference section.
Let’s write a quick model to get the stores with the most transactions. Vinyl helps write data transforms for analytics use cases (like aggregations) quickly with only a few lines of code.
store_nbr | city | num_transactions |
---|---|---|
01003 | San Francisco, CA | 11026 |
03001 | Brooklyn, NY | 10981 |
06002 | Austin, TX | 8193 |
Use in Notebooks
Models can also be written and used outside of context of a Vinyl project (like in a jupyter notebook or a python script) using a context manager. We can write the model above as:
Metrics
Timeseries data is an important component of many analytics projects. Vinyl comes with a powerful metrics abstraction to help aggregation and slice time series data without having to write long and complicated SQL.
Let’s expand on our example above but now look at how sales for each store has changed over time. To use the metric layer of Vinyl, all we have to do is pass in the MetricStore
object to our function and pass in the timestamp column to the metric
method.
Our new metric shop_sales
can now be used throughout our Vinyl project. The Metrics layer allows for dynamic queries across dimensions and time buckets. Metrics also provide useful features like fill options and trailing windows.
If we want to get the sales across each shop location over the past trailing 3 months, we can query it like:
ts | store_nbr | city | sales |
---|---|---|---|
2023-12-01 | 01003 | San Francisco, CA | 752 |
2023-12-01 | 03001 | Brooklyn, NY | 1130 |
2023-12-01 | 06002 | Austin, TX | 534 |
2023-01-01 | 01003 | San Francisco, CA | 304 |
2023-01-01 | 03001 | Brooklyn, NY | 540 |
2023-01-01 | 06002 | Austin, TX | 121 |
2023-02-01 | 01003 | San Francisco, CA | 910 |
2023-02-01 | 03001 | Brooklyn, NY | 1520 |
2023-02-01 | 06002 | Austin, TX | 862 |
If we want to change the time grain to weeks and look at the last 2 weeks, it’s as easy as changing the metric select to:
ts | store_nbr | city | sales |
---|---|---|---|
2023-02-01 | 01003 | San Francisco, CA | 619 |
2023-02-01 | 03001 | Brooklyn, NY | 811 |
2023-02-01 | 06002 | Austin, TX | 407 |
2023-02-15 | 01003 | San Francisco, CA | 291 |
2023-02-15 | 03001 | Brooklyn, NY | 709 |
2023-02-15 | 06002 | Austin, TX | 455 |
We can wrap this metric as a model in our project:
Currently Vinyl does not support deploying metrics directly and they must be wrapped as a model to be shared outside a vinyl project. This constraint will be removed in future releases.
Was this page helpful?