Table Operations
aggregate
Returns an aggregated table for cols, grouped by by
and sort
.
If fill is specified, the table will be interpolated using the specified fill strategy, taking into account direction from the sort
argument. fill
can either be a single value or a list of values, one for each column in cols
.
aggregate_all
Aggregate_all is a generalized form of aggregate
that can apply apply the same operation (specified in f) to multiple columns. The col_selector field can be a list of column fields, where each element select
, and f should be a list of functions of the same length.
If f is a single function, it will be applied to all columns. If f is a list of functions, the functions will be applied to the corresponding columns. If f is shorter than the number of columns, the last function will be applied to all remaining columns.
By, sort, and fill operate as in aggregate
.
If rename is True, the columns will be renamed to the name of the function that was applied to them. If rename is False, the columns will names to the original column name.
chart
Visualize the table using a chart. The geoms argument should be a geom or a list of geoms. The x, y, color, fill, size, alpha, and facet arguments should be column expressions. If a list of columns is passed, the chart will be faceted by the columns in the list.
If coord_flip is True, the x and y axes will be flipped.
copy
None
count
Return the count of rows in the table.
define
Mutate is identical to select
, except all current columns are included, and the new columns are added to the table. If a new column has the same name as an existing column, the existing column will be replaced.
define_all
Mutate_all is identical to select_all
, except all current columns are included, and the new columns are added to the table. If a new column has the same name as an existing column, the existing column will be replaced.
distinct
Return distinct rows from the table.
If on
is specified, the distinct rows will be based on the columns in on
. If it is not, the distinct rows will be based on all columns.
If keep
is specified, the first or last row will be kept.
drop
Remove columns from the table.
dropna
Remove rows from the table with missing values.
If on
is specified, the missing values will be checked for the columns in on
. If it is not, the missing values will be checked for all columns.
If how
is “any”, the row will be removed if any of the values are missing. If it is “all”, the row will be removed if all of the values are missing.
eda
Return summary statistics for each column in the table.
If cols is specified, the summary statistics will be returned for the columns in cols. If it is not, the summary statistics will be returned for all columns.
execute
Run the query and return the result in the specified format. If twin is True, the twin connection will be used.
filter
Filter the table based on the conditions specified. This function should be used in place of WHERE, HAVING, and QUALIFY clauses in SQL.
filter_all
Similar to other “_all” variants, this is a generalized form of filter
that can apply the same operation (specified in condition_f) to multiple columns.
The col_selector field can be a list of column fields, where each element select
, and condition_f should be a list of functions of the same length.
Useful if you want to apply the same filter (e.g. value > 0) to multiple columns.
Conditions are evaluated together using the condition_type argument. If condition_type is “and”, all conditions must be met. If condition_type is “or”, any condition can be met. If you’d like to use a mix of “and” and “or” conditions, call the filter
function multiple times.
from_file
Create a VinylTable from a csv, json, or parquet file
from_memory
Create a VinylTable from a pandas, polars, or pyarrow object
get_name
Return the name of the table.
limit
Return the first n
rows of the table, starting at the offset
row.
Note that the result set may not be idempotent.
pickle
None
relocate
Relocate columns before or after other specified columns.
rename
Rename columns in the table. The rename_dict should be a dictionary with the new column name as the key and the original column name as the value.
sample
Sample a fraction of rows from a table. Results may not be idempotent.
See specific note from Ibis below:
Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases. In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.
save
Run the query and save the result to the specified path in the specified format.
schema
Return the schema of the table.
select
Computes a new table with the columns in cols. Can be a single column, a list of columns, or a dictionary of columns with their new names as keys. The column values themselves can be specified as strings (the column name), table attributes, one-argument lambda functions, or selectors.
If an aggregated column is passed, this will be treated as a windowed column, using the by field for partitioning, the sort field for ordering, and the window_type and window_bounds fields for the actual window.
Fill can be used optionally to add interpolation to cols. You must either specify one value for each column or a list of values that is the same length as the column list
select_all
Select_all is a generalized form of select
that can apply apply the same operation (specified in f) to multiple columns. The col_selector field can be a list of column fields, where each element select
, and f should be a list of functions of the same length.
If f is a single function, it will be applied to all columns. If f is a list of functions, the functions will be applied to the corresponding columns. If f is shorter than the number of columns, the last function will be applied to all remaining columns.
By, sort, window_type, and window_bounds operate as in select
.
If rename is True, the columns will be renamed to the name of the function that was applied to them. If rename is False, the columns will names to the original column name.
sort
Sort the table by the columns in by
.
If by
is not specified, the table will be sorted by all columns.
To sort a column in descending order, place a -
in front of the column.
to_sql
Output the table as a SQL string. The dialect argument can be used to specify the SQL dialect to use.
If optimized is True, the SQL will be optimized using the SQLglot optimizer. If formatted is True, the SQL will be formatted for readability.
visualize
Print a visualize representation of the query plan
Was this page helpful?