This feature is currently in private beta. Please reach out to us on Slack if you’re interested in trying it out

Overview

AI Transforms can be used to augment data models using large language models in either the cloud or locally. An LLMTransform handles token management, column metadata context and more that is provided via Vinyl internals.

Lets take some user reviews of movies from the IMDB dataset. Suppose we want to tag each review with sentiment analysis using the latest LLM models available.

idmovie_titlecomment_idcomment
1A Quiet Place101An intense experience with a few plot holes.
2The Room102So bad its almost good. A cult classic for all the wrong reasons.
3Lost in Translation103Subtly powerful and emotionally complex.
4Requiem for a Dream104Disturbing content that leaves a lasting impression.

We can import the LLMTransform and OpenAIProvider from Vinyl to run an LLM-powered transform on a specific column of our dataset.

models.py
from my_project.sources.taxi_sample import TaxiSample
from vinyl.ai import LLMTransform, OpenAIProvider

@model(deps=Movies)
def movie_sentiment(m):
    infer_sentiment = LLMTransform(
        OpenAIProvider,
        cols=[
            m.comment,
        ]
        prompt="Rate the sentiment of this movie review comment column either 'positive', 'neutral' or 'negative'"
    )

    m.mutate({
        "sentiment": infer_sentiment()
    })

    return t

Vinyl handles concurrency, caching and incremental updates. Vinyl will only run the transform on new rows. The OpenAIProvider allows you to bring your own OpenAI API key.

We can preview the results of this using:

vinyl preview model movie_sentiment

Here are 4 rows of revised example data with movie titles and comments that better align with their respective sentiment labels:

idmovie_titlecomment_idcommentsentiment
1A Quiet Place101An intense experience with a few plot holes.Neutral
2The Room102So bad its almost good. A cult classic for all the wrong reasons.Negative
3Lost in Translation103Subtly powerful and emotionally complex.Positive
4Requiem for a Dream104Disturbing content that leaves a lasting impression.Negative

Local AI Inference

Working with data workflows comes with a host of privacy challenges. If you’re interested in running AI Transforms using self-hosted models and GPU inference reach out to us.