Core concepts
Unit Testing
Fearless Refactoring
ERA allows even more confidence in the code you're writing by making it easy to develop via unit tests. Tests allow you to isolate small pieces of functionality and reason about them independently.
For more info on unit testing here's some resources we like
Unit Testing ERA Expressions
ERA contains a large expression language and a Javascript interpreter for writing unit tests and evaluating macros. At Cotera, we take composable bits of ERA code and expose them as abstractions. We then write unit tests to make sure they're correct. This is how most of the software industry works, and we wanted the same workflow in our data transformations.
The ERA expression language has been carefully designed to work exactly the same across all supported database and the Javascript interpreter, so you can have confidence that your tests running in JS or DuckDB will work the same in Redshift, BigQuery, Postgres, or Snowflake.
Case study: "reportingPeriod" for a finance company
This is real code we've written to calculate the reporting period for a finance company that's working with us. Their business has a formal internal definition of a "reporting period".
The rules are as follows
- Any date after the 15th but before the end of the month is reported on the 15th of that month
- Any date before the 15th is reported on the 15 of the previous month
- Years must wrap properly
Below is a function called reportingPeriod
that implements that logic. We chose to encapsulate this logic in a function central place so that other people working on the account can use it without having to implement it by hand each time. We have confidence it's correct because of the unit tests, and we know if we need to change it or fix an edge case, we can change it in once place and across the project without regressions in the other cases.
// reporting-period.ts
import { Expression, If } from '@cotera/era'
export const reportingPeriod = (date: Date | Expression): Expression => {
const expr = Expression.wrap(date)
return If(expr.datePart('day').gte(15), {
then: expr.dateTrunc('month').dateAdd('days', 14),
else: expr
.dateTrunc('month')
.dateSub('days', 1)
.dateTrunc('month')
.dateAdd('days', 14),
})
}
The tests use ERA's built in interpreter to run on your local machine almost instantly.
import { reportingPeriod } from './reporting-period'
test('for a date before the 15th of the month', () => {
const result = reportingPeriod(new Date('2023-07-10Z')).evaluate()
expect(result).toEqual(new Date('2023-06-15Z'))
})
test('for a date after the 15th of the month', () => {
const result = reportingPeriod(new Date('2023-07-20Z')).evaluate()
expect(result).toEqual(new Date('2023-07-15Z'))
})
test('for a date on the 15th of the month', () => {
const result = reportingPeriod(new Date('2023-07-15Z')).evaluate()
expect(result).toEqual(new Date('2023-07-15Z'))
})
test('it correctly finds the period for a date in early January', () => {
const result = reportingPeriod(new Date('2023-01-02Z'))
expect(result).toEqual(new Date('2022-12-15Z'))
})
Unit Testing ERA Relation pipelines
Since ERA works the same way across all warehouses, we can write Relation tests too, to verify our logic on complex data transformations. They run locally against the DuckDB backend and so they run almost instantaneously.
Case Study: "NextProductFlow"
One of the services we provide at Cotera is analysis of our customer's customers buying behavior. This example was taken as a small building block of a larger analysis component. It is tested using a Values
clause and runs locally on DuckDB
// product-affinity.ts
import { And, CountDistinct, Expression, Min, Relation } from '@cotera/era'
export const NextProductFlow = (params: {
Orders: Relation
productId: string | Expression
}): Relation => {
const { Orders, productId } = params
const ProductCustomers = Orders.where((t) =>
t.attr('PRODUCT_ID').eq(productId),
)
.groupBy((t) => t.pick('__COTERA_CUSTOMER_ID'))
.select((t) => ({
...t.group(),
FIRST_ORDER_DATE: Min(t.attr('ORDER_DATE')),
}))
const NextProductOrders = Orders.innerJoin(ProductCustomers, (ord, cust) => ({
on: And(
ord.attr('__COTERA_CUSTOMER_ID').eq(cust.attr('__COTERA_CUSTOMER_ID')),
cust.attr('FIRST_ORDER_DATE').lt(ord.attr('ORDER_DATE')),
),
select: { ...cust.star(), ...ord.star() },
}))
.groupBy((t) => t.pick('PRODUCT_ID'))
.select((t) => ({
...t.group(),
KEY_PRODUCT_ID: productId,
CUSTOMER_COUNT: CountDistinct(t.attr('__COTERA_CUSTOMER_ID')),
}))
return NextProductOrders
}
// product-affinity.test.ts
import { Values } from '@cotera/era'
import { NextProductFlow } from './product-affinity'
import { DuckDbNativeDriver } from '@cotera/era-drivers'
// Run on a throw away, in memory DuckDB instance
const db = DuckDbNativeDriver.emphemeral()
describe(NextProductFlow.name, () => {
test('finds the orders with next products correctly', async () => {
// "Values" clauses are relations, just like tables and files
const Orders = Values([
{
PRODUCT_ID: '1',
ORDER_DATE: new Date('2023-1-1Z'),
__COTERA_CUSTOMER_ID: 42,
},
{
PRODUCT_ID: '2',
ORDER_DATE: new Date('2023-1-2Z'),
__COTERA_CUSTOMER_ID: 42,
},
{
PRODUCT_ID: '1',
ORDER_DATE: new Date('2023-1-1Z'),
__COTERA_CUSTOMER_ID: 43,
},
{
PRODUCT_ID: '2',
ORDER_DATE: new Date('2023-1-2Z'),
__COTERA_CUSTOMER_ID: 43,
},
])
expect(
await NextProductFlow({ Orders, productId: '1' }).execute(db),
).toEqual([
{
CUSTOMER_COUNT: 2,
KEY_PRODUCT_ID: '1',
PRODUCT_ID: '2',
},
])
})
})