How we serve our customer API’s

Consumer Data Right provides customers with greater access to their energy and banking data than ever before, this is supported by a set of API’s that market participants must implement to a prescribed set of standards. The focus of this article is on the energy part of the API and how we built a low cost, high performance and well tested system that several retailers are using today to serve these requests.

The easy part, serving API requests

The API to service requests is a simple set of functions that read from our target schema and serialize the response as JSON, this was the easy part. The hard part was getting the data into a schema that was optimized for reading just enough to get the job done.

To abide by the standards, each party (that doesn’t have an exemption) must:

Serve valid responses application/json
Serve responses within 1 second for a subset of the API’s where others are more lenient on the time to reply
Response data should reflect that data which is served by a customer portal if available, basically if a customer could login to their account in another service and see their data then that same state should be served by the CDR API

This first part is reasonably easy, the performance part not so much.

The hard part, preparing the data

There are a few reasons why this is a hard problem, most might see 1000ms as a horrifically low bar, reducing the problem to; “just read off a replicated data set and call it a day”.

While we could have done that, it comes with some major drawbacks:

The queries required to fulfil the requests would need to be written for each distinct source system
We couldn’t change the source system directly to support CDR, these databases generally serve crucial business processes, invoicing, transacting with the market etc.
We would need to rely upon the performance of queries where the underlying schema was under constant change

Due to the barriers above, we chose a different approach. We made the following assumptions:

We don’t need all the data from the source system, only just enough to service the API requests
We don’t need to listen for all data changes, only those that would affect our state

Assumption one: We don’t need all the data

This was pretty straightforward, the documentation of the CDR API’s is comprehensive, and our team have a very deep knowledge of the domain so we were able to model a purpose built schema to store “just enough” data to serve these requests. The majority of each table in schema includes a simple linking structure:

source_system - the name that the system we get data from, useful for multiple tenants
source_id - the original identifier

Not in all cases could we use a single identifier, but the vast majority of traditional systems use some incrementing integer or UUID to uniquely identify a row, where this wasn’t the case we would generate our own composite key and utilise that in synchronisation, more on that below.

Assumption two: We don’t need to listen to all changes

Energy retail data systems tend to be relational in nature, the use of RDBMS is pretty much ubiquitous in at least the Australian energy industry. We know these systems well, and while the schema that we use may require fetching from many tables, we know that we only need to listen to a subset of writes to the source system tables.

It’s common place to visualise relational databases using an Entity Relationship Diagram (ERD). An ERD can also be read, for the purpose of synchronisation as a graph. Where we found edge nodes in the graph that only ever change with their predecessor we chose to capture only changes to that predecessor node. This dramatically cut down the number of events were interested in.

A good example of this is invoices, most energy bills are modelled using many relations, to present the detail for a bill might involve relations for:

A header for the bill
Detail for the charges
Address details for the customer
Metering information related to how the customer uses or generates energy

In reality there could be many dozens of relationships that are all linked and form the state of the bill.

For CDR we know we just are after the header and the bill charges, charges tend to outnumber the header sometimes significantly but importantly:

The bill doesn’t change, it’s immutable
You can’t move charges between bills

With this in mind we can listen to only the header events and not the detail events, skipping a lot of unnecessary processing.

Hydration and ongoing synchronisation

When listening to change events from the source system, we can determine what has changed in the graph and then fetch the minimum related data from the source system. Once we have all the data we need, we can then write that data to our database. Of course applying new events is fine, but this will only capture what has changed when we start listening, but we also need a baseline. To form this baseline we did the following:

Read from the source system and create dummy change events in our stream
Read the stream of events in batches, fetching additional data from the source system if needed
Write to our database

The ongoing change data capture (CDC) then runs the second two steps constantly.

Design decisions made to make our lives easier:

Consume events with a single process

Reading change events in parallel and ordered is a hard problem, a really hard problem. We don’t have the performance needs of such a process, and verifying it’s correctness (and stopping writers from locking each other) is worth avoiding, so we did.

Delete on miss

If we try to get data from the source system and it does not exist, then it’s gone, and we remove the data (and linked data) from our target system. This avoids running into problems where the consumer might read an INSERT after a DELETE and that is a path to pain, put simply the approach is:

Always check for the data, if it’s missing then it’s gone, no matter which order we processed the events

With multiple consumers the above doesn’t work, due to a classic concurrency problem, put simply the “last write wins” and we sleep easier at night not having to think about this.

Batching event processing

Just because we use a single consumer doesn’t mean we do one thing at a time, even for smaller energy retailers this proves too slow, as such we run a simple batching process:

Get a batch from the event stream, say 1000 events
Order those events in memory with a simple sort, this is to avoid doing sill things like trying to read the data for an accounts contacts before reading the data for the account itself, which would result in a “miss”
Read data from the source system for each distinct group, for example:
1. In a given batch we might have 500 account change events and 500 invoice change events, we don’t submit 500 queries to the source system, we submit 2 queries, each including the relevant keys from the events

Divide and retry, and if all else fails replay

Sometimes things go wrong, these can be outside our control. We need to be comfortable with both happening frequently, practicing some humility is good for the soul. To minimise the effect of these issues we:

Split batches in half up to and repeat up to 10 times
- Retry is particularly important as either the source system or target system may be unavailable
Retain data in the event stream once processed
- Although there is some storage cost here, being able to shift the “read marker” back is a handy lever to have
Keep a copy of the events
- Data isn’t just consumed for further processing by our event consumers, we also write it to persistent storage which is partitioned by date, this is incredibly useful for replaying events through the stream when we wrote consumer bugs

Building a system that adheres to the Consumer Data Right standards was a complex but rewarding task. By focusing not just on serving API requests but also on how the data is prepared, we have created a system that is both efficient and robust.

Consumer Data Right (CDR) for Energy

How we serve our customer API’s

The easy part, serving API requests

The hard part, preparing the data

Assumption one: We don’t need all the data

Assumption two: We don’t need to listen to all changes

Hydration and ongoing synchronisation

Design decisions made to make our lives easier:

Consume events with a single process

Delete on miss

Batching event processing

Divide and retry, and if all else fails replay

CORE for Alinta Energy

Beige Things Newsletter

IkeGPS

Managing long-running queries and locks

Your Energy Data has a Fingerprint