Core concepts
Invariants
Fearless at Runtime
At Cotera, we use era to "productionize" data flows. Its important to us that we can do things correctly and at scale with data. In addition to type checking and unit tests, the era compiler ships with "invariants". As opposed to something like dbt tests which can have TOCTOU issues, "Invariants" are runtime data checks that can be used to fail transactions that don't uphold them. They compile directly into the running query and validate that a condition holds. They also give you understandable error messages, regardless of which warehouse you're runnng on.
Anatomy of a "runtime" data check
The following reads from a table named some-table
in the schema some-schema
, that table has two attributes, foo
and bar
.
It also defines some invariants. Data
will fail at runtime if any rows meet the following conditions.
- If
bar
is not unique (a.k.a. distinct) in the returned rows - If
foo
isnull
foo
is equal tobar
import { From } from '@cotera/era'
export const Data = From({
name: 'foo',
schema: 'some-schema',
attributes: {
foo: 'string',
bar: 'int',
},
}).select((t) => ({
// ERA provides some built in invariants for convenience, this one makes
// sure that the expression it's called on is unique, and then returns the
// expression
bar: t.attr('bar').assertDistinct(),
// An invariant can be any expression that evaluates to a boolean
foo: t.attr('foo').invariants({
'Foo is not null': t.attr('foo').isNotNull(),
// invariants can look at the entire row
'Foo does not equal Bar': t.attr('bar').cast('string').neq(t.attr('foo')),
}),
}))
The (ugly...) SQL
Under the hood this compiles to the following SQL (feel free to copy this into your SQL source... if you dare). It has to be ugly to be sure to be lazily evaluated in the database. Invariants work by only triggering an invalid operation (usually casting an invalid string containing the name of an invariant to an integer) when an invariant is false
.
Postgres
select case
when (not (cast(count(true) over (partition by "bar" ) as integer) = 1)) then cast(to_jsonb(cast((
'Invariant *Is Unique* failed! Got ' || cast((cast(c
ount(true) over(partition by "bar") as integer) =
1) as text)) as integer)) as integer)
else "bar" end as "bar",
case
when (not (cast("bar" as text) != "foo"))
then (json_build_object('value', to_jsonb(cast(('Invariant *Foo does not equal Bar* failed! Got ' ||
cast((cast("bar" as text) != "foo") as text)) as integer)))) ->>'value'
when (not (not ("foo" is null))) then (json_build_object('value', to_jsonb(cast((
'Invariant *Foo is not null* failed! Got ' || cast((not ("foo" is null)
) as text)) as integer)))) ->> 'value'
else "foo" end as "foo"
from "some-schema"."foo"
BigQuery
select case
when (not (cast((count(true) over (partition by `bar` )) as int) = 1)) then INT64(to_json(cast(((
'Invariant *Is Unique* failed! Got ' ||
cast(((cast((count(true) over (partition by `bar` )) as int) = 1)) as string))) as int)))
else `bar` end as `bar`,
case
when (not (cast((`bar`) as string) != `foo`)) then STRING(to_json(cast(((
'Invariant *Foo does not equal Bar* failed! Got ' ||
cast(((cast((`bar`) as string) != `foo`)) as string))) as int)))
when (not (not (`foo` is null))) then STRING(to_json(cast((('Invariant *Foo is not null* failed! Got ' ||
cast(((not (`foo` is null))) as string))) as int)))
else `foo` end as `foo`
from `some-schema`.`foo`
Redshift
select case
when (not (cast(count(true) over (partition by "bar" ) as integer) = 1)) then cast(cast(cast((
cast('Invariant *Is Unique* failed! Got ' as character varying) || case
when ((cast(count(true) over (partition by "bar" ) as integer) = 1))
then 'true'
else 'false' end) as integer) as super) as integer)
else "bar" end as "bar",
case
when (not (cast("bar" as character varying) != "foo")) then cast(cast(cast((
cast('Invariant *Foo does not equal Bar* failed! Got ' as character varying) || case
when ((cast("bar" as character varying) != "foo"))
then 'true'
else 'false' end) as integer) as super) as character varying)
when (not (not ("foo" is null))) then cast(cast(cast((
cast('Invariant *Foo is not null* failed! Got ' as character varying) || case
when ((not ("foo" is null)))
then 'true'
else 'false' end) as integer) as super) as character varying)
else "foo" end as "foo"
from "some-schema"."foo"
DuckDB
select case
when (not (cast(count(true) over (partition by "bar") as integer) = 1)) then cast(
(to_json(cast(('Invariant *Is Unique* failed! Got ' ||
cast((cast(count(true) over (partition by "bar") as integer) = 1) as text)) as integer))) ->>'$' as integer)
else "bar" end as "bar",
case
when (not (cast("bar" as text) != "foo")) then cast(
(to_json(cast(('Invariant *Foo does not equal Bar* failed! Got ' ||
cast((cast("bar" as text) != "foo") as text)) as integer))) ->>'$' as text)
when (not (not ("foo" is null))) then cast((to_json(cast(('Invariant *Foo is not null* failed! Got ' ||
cast((not ("foo" is null)) as text)) as integer))) ->>'$' as text)
else "foo" end as "foo"
from "some-schema"."foo"