Core concepts

Invariants

Fearless at Runtime

At Cotera, we use era to "productionize" data flows. Its important to us that we can do things correctly and at scale with data. In addition to type checking and unit tests, the era compiler ships with "invariants". As opposed to something like dbt tests which can have TOCTOU issues, "Invariants" are runtime data checks that can be used to fail transactions that don't uphold them. They compile directly into the running query and validate that a condition holds. They also give you understandable error messages, regardless of which warehouse you're runnng on.

Anatomy of a "runtime" data check

The following reads from a table named some-table in the schema some-schema, that table has two attributes, foo and bar.

It also defines some invariants. Data will fail at runtime if any rows meet the following conditions.

  1. If bar is not unique (a.k.a. distinct) in the returned rows
  2. If foo is null
  3. foo is equal to bar
import { From } from '@cotera/era'

export const Data = From({
  name: 'foo',
  schema: 'some-schema',
  attributes: {
    foo: 'string',
    bar: 'int',
  },
}).select((t) => ({
  // ERA provides some built in invariants for convenience, this one makes
  // sure that the expression it's called on is unique, and then returns the
  // expression
  bar: t.attr('bar').assertDistinct(),
  // An invariant can be any expression that evaluates to a boolean
  foo: t.attr('foo').invariants({
    'Foo is not null': t.attr('foo').isNotNull(),
    // invariants can look at the entire row
    'Foo does not equal Bar': t.attr('bar').cast('string').neq(t.attr('foo')),
  }),
}))

The (ugly...) SQL

Under the hood this compiles to the following SQL (feel free to copy this into your SQL source... if you dare). It has to be ugly to be sure to be lazily evaluated in the database. Invariants work by only triggering an invalid operation (usually casting an invalid string containing the name of an invariant to an integer) when an invariant is false.

Postgres

select case
           when (not (cast(count(true) over (partition by "bar" ) as integer) = 1)) then cast(to_jsonb(cast((
                   'Invariant *Is Unique* failed! Got ' || cast((cast(c
                                                                     ount(true) over(partition by "bar") as integer) =
                                                                 1) as text)) as integer)) as integer)
           else "bar" end as "bar",
       case
           when (not (cast("bar" as text) != "foo"))
               then (json_build_object('value', to_jsonb(cast(('Invariant *Foo does not equal Bar* failed! Got ' ||
                                                               cast((cast("bar" as text) != "foo") as text)) as integer)))) ->>'value'
           when (not (not ("foo" is null))) then (json_build_object('value', to_jsonb(cast((
                   'Invariant *Foo is not null* failed! Got ' || cast((not ("foo" is null)
                   ) as text)) as integer)))) ->> 'value'
           else "foo" end as "foo"
from "some-schema"."foo"

BigQuery

select case
           when (not (cast((count(true) over (partition by `bar` )) as int) = 1)) then INT64(to_json(cast(((
                   'Invariant *Is Unique* failed! Got ' ||
                   cast(((cast((count(true) over (partition by `bar` )) as int) = 1)) as string))) as int)))
           else `bar` end as `bar`,
       case
           when (not (cast((`bar`) as string) != `foo`)) then STRING(to_json(cast(((
                   'Invariant *Foo does not equal Bar* failed! Got ' ||
                   cast(((cast((`bar`) as string) != `foo`)) as string))) as int)))
           when (not (not (`foo` is null))) then STRING(to_json(cast((('Invariant *Foo is not null* failed! Got ' ||
                                                                       cast(((not (`foo` is null))) as string))) as int)))
           else `foo` end as `foo`
from `some-schema`.`foo`

Redshift

select case
           when (not (cast(count(true) over (partition by "bar" ) as integer) = 1)) then cast(cast(cast((
                   cast('Invariant *Is Unique* failed! Got ' as character varying) || case
                                                                                          when ((cast(count(true) over (partition by "bar" ) as integer) = 1))
                                                                                              then 'true'
                                                                                          else 'false' end) as integer) as super) as integer)
           else "bar" end as "bar",
       case
           when (not (cast("bar" as character varying) != "foo")) then cast(cast(cast((
                   cast('Invariant *Foo does not equal Bar* failed! Got ' as character varying) || case
                                                                                                       when ((cast("bar" as character varying) != "foo"))
                                                                                                           then 'true'
                                                                                                       else 'false' end) as integer) as super) as character varying)
           when (not (not ("foo" is null))) then cast(cast(cast((
                   cast('Invariant *Foo is not null* failed! Got ' as character varying) || case
                                                                                                when ((not ("foo" is null)))
                                                                                                    then 'true'
                                                                                                else 'false' end) as integer) as super) as character varying)
           else "foo" end as "foo"
from "some-schema"."foo"

DuckDB

select case
           when (not (cast(count(true) over (partition by "bar") as integer) = 1)) then cast(
                       (to_json(cast(('Invariant *Is Unique* failed! Got ' ||
                                      cast((cast(count(true) over (partition by "bar") as integer) = 1) as text)) as integer))) ->>'$' as integer)
           else "bar" end as "bar",
       case
           when (not (cast("bar" as text) != "foo")) then cast(
                       (to_json(cast(('Invariant *Foo does not equal Bar* failed! Got ' ||
                                      cast((cast("bar" as text) != "foo") as text)) as integer))) ->>'$' as text)
           when (not (not ("foo" is null))) then cast((to_json(cast(('Invariant *Foo is not null* failed! Got ' ||
                                                                     cast((not ("foo" is null)) as text)) as integer))) ->>'$' as text)
           else "foo" end as "foo"
from "some-schema"."foo"

Ready to learn more?

Then come and say hello to us on Discord! Or head over to GitHub to try out the ERA examples on your computer.
Previous
Unit testing
Next
Markup