How we made our staging environment useful

Geoffroy Cowan
Dec 11, 2019 · 6 min read

TLDR: We anonymise production data every night.

Before we get started, I’ll give you a quick overview of the Spaceship stack just in case:

  • Go microservices (grpc/protobuf comms).
  • Go APIs (one internal, one external).
  • GCP K8s.
  • 1 postgres database (each service has its own schema).
  • … And some read replicas.
  • … And some stuff that’s not relevant.

Our problem

Our staging database was seeded by hand, resulting in a handful of test users that would get modified by hand (usually directly via db) to be in whatever state someone wanted to test their change.

It also meant that load testing was completely out of the picture.

Options?

After a quick Google, this seemed to be a common approach. Unfortunately it was ruled out fairly quickly, as generating realistic data across each service that would make sense at a holistic level seemed a bit tricky. Lining up fake trades, with fake bank transactions, with fake unit applications, with fake unit prices, etc., etc., — it was just too much.

Another slight complication here was generating data at a rate that matched production so we would be roughly in sync in terms of “scale.”

Option two: Clean up and anonymise a production dump once; lots of people do that.

I’ve worked with something similar in the past and found that it degraded quickly over time, was annoying to keep in sync with any large scale changes, and again, doesn’t grow over time at a realistic rate.

So, we decided it must be possible to take a pgdump file, anonymise it, and stand up a fresh db regularly.

We write Go at Spaceship, so we wrote a Go program. At first I was really surprised at the simplicity of the program — the whole thing with tests is <500 lines, and most of that is the ruleset — because anonymising gigabytes of production data sounded hard. But turns out you really are only worrying about a line at a time from a text file. Pretty standard stuff!

At a high level it reads a dump file, subs out values for generated ones, hashes ids, and changes password hashes to a known value. Giving us a complete copy of the production database with zero sensitive or personal data.

The anonymiser itself

The bulk of the work is handled by an anonymise function that looks like this:

func Anonymize(r io.Reader, w io.Writer, ruleset *Ruleset) error

Which, hopefully, does what you think: applies the ruleset to the reader, outputting to the writer.

Using readers and writers was definitely worth the tiny bit of bootstrapping required to run in production where it uses GCP bucket objects. It allowed for some really nice tests, and local development with standard in/out became trivial.

Hang on a minute! What’s this third param, the magical ruleset? Good question…

type Ruleset struct {
FieldRules map[string]func(string) string
TestRules []TestRule
TableRules map[string]func(string) string
TableSkips map[string]bool
}
type TestRule struct {
Name string
Test func(string) bool
Rule func(string) string
}

That’s the Ruleset. It definitely could do with a version 2, as it’s grown and flexed a little from its original design.

TableSkips are fairly self-explanatory; if a table is in that map, it is completely skipped. We skip a couple of large audit tables that haven’t yet been needed in staging.

Test rules run the Test function against a specific columns value, and if true, applies the Rule function to the current value to create a new one. The only test rule we have is for uuids, which creates a new uuid which is a consistent hash of the original which means ids spanning tables still line up as expected.

Field rules are the most common, and are probably easier explained with an example:

“account.person_contact.first_name”: func(string) string {
return fake.FirstName()
}

We make heavy use of the phenomenal fake package.

I’ve hidden a couple of details here, which, if you’re doing this yourself, you probably want to think about; the program also takes in a seed and a salt. The seed is mainly used for making testing a whole lot easier/possible. We use a salt because as an old colleague would never let me forget: security starts with you. It’s provided to the app as secret config.

The main loop goes line by line through a postgres dump file, and maintains some fairly simple state which basically says, “Am I in a table, and if so, what’s it called and what does it look like.” If it encounters a line while in that table state, it runs the line through the ruleset, and then writes to output. All other lines are written straight to output.

Won’t that take forever?

Anything to improve?

As far as major structure goes, this great talk by Rob Pike, in which he talks about lexers, introduced me to a fairly neat pattern that definitely could have cleaned areas of this program that hold and switch state. But it’s always nice to leave room for improvement for the next person, right?

Table skips ideally would be eliminated if the whole thing was faster, and would allow for more flexible scheduling.

The fact it happens nightly can mean it is tricky to do multi-day testing. At the moment, we just turn one of the jobs in the pipeline off (by hand), and then back on once testing has finalised.

Was it worth it?

Make a new endpoint? It takes about five minutes to write a program to query down thousands of users and hit your new endpoint. You can then check the timings and response codes in Datadog, and you can then be fairly confident you’re going to be okay in production.

This has 100% stopped us from pushing code to production before, which pretty much makes the whole exercise worth it by itself.

The other huge area that it has helped in has been handling support tickets.

If we get an issue from a user, we can log in to their anonymised account in staging, see exactly what’s going on, make changes, and then test that those changes actually resolve the issue before sending them to production.

We also use the staging database as a place to do test runs of production database migrations, allowing us to establish realistic time estimates and load impacts.

Takeaways

Also, if you don’t think your staging/test/development environment is actually providing value, why not? Can you fix it? It probably costs money to host, so you should definitely be getting something in return!

And finally, it’s surprisingly fun to get a new name for your test account each day.

Anything else you want to talk about…

Spaceship

The Spaceship engineering, product and design blog