Fly runs apps (and databases) close to users, by taking Docker images and transmogrifying them into Firecracker micro-vms, running on our hardware around the world. You should try deploying an app, right now: it only takes a few minutes.
100 milliseconds is the magic number. For a backend application, a sub-100ms response time is effectively instantaneous, and people love using “instant” apps. Since we’re all dirty capitalists, we’d add: if people love your app, you’ll make more money. But you can also chase sub-100ms for the endorphins.
When app developers need to shave tens of milliseconds off their request times, they start with their database. But optimizing databases is painful. When it comes to making “things-that-involve-data” fast, it’s easier to add a layer of caching than to make database changes. Even basic cache logic can shave tens of milliseconds off database queries. So backend developers love caches.
If you want to skip the end, all you need to know is we’ve built a Redis app that runs globally, includes “instant” purge, and works with whatever backend framework you build apps with. Here’s the source.
We love caches too. And we think we can do better than incremental 10ms improvements.
What’s better than good cache logic in centralized app servers? Caching data close to end users, regardless of whether they’re in Singapore or Amsterdam. Geo-routing can shave hundreds of milliseconds, plural, off application response times. When a request hits a server in your own city, and is served hot from data conveniently stored in memory, its response is perceptibly faster than it would have been had it instead schlepped across an entire continent to get to you.
Does this sound like promotional content? That’s because we believe it. It’s why we built Fly.io: we think backend apps can, uh, scream, when you ship them on CDN-like infrastructure.
And, as it happens, Redis has very interesting knobs that make it work well when you scatter instances around the world.
One weird CDN thing
When you build a CDN, you learn stuff. Here’s an important thing we learned about geographic caching.
Cache data overlaps a lot less than you assume it will. For the most part, people in Singapore will rely on a different subset of data than people in Amsterdam or São Paulo or New Jersey.
Stop and think about it and it makes sense. People in Singapore eat in restaurants in Singapore, send cash to their friends in Singapore, and talk to their Singapore friends about meeting up. People in New Jersey care about show schedules in New Jersey and the traffic in New Jersey. They care a lot more about hoagies than people in Singapore. Humans are data magnets. They tend to work in companies together with people who live relatively near them, and talk with their friends, who, again, are relatively close to each other.
What you find when you look at a CDN cache is that for most apps, data is only duplicated in one or two regions. It almost never shows up in all the regions.
This simplifies things. Take an app that needs ten Redis servers to keep up with its load. The conventional way to build that system is to park all those servers in us-east-1
, and then implement sharding logic to spread the load across the cache servers. That sharding logic infects the rest of the system. But we can usually exploit our CDN observation to avoid that: if we can deploy our app in multiple geographic regions, we can just have one server per region, without any explicit sharding logic. Cities. They’re nature’s shards!
Let’s talk a bit about how you’d do this.
We have to talk our book here for a second, because it’ll make the rest of this make sense. The whole premise of Fly.io is that we make it trivial to get a Docker container running in a bunch of different geographic regions. There are other ways to run containers around the world, and if you prefer them, what we have to say here still makes sense. Just take it as a given that you can easily boot stuff up in Singapore, Newark, and Amsterdam.
JBOR
The most boring way to exploit geographic cache locality is “just a bunch of Redii”.
Run standalone Redis servers and app servers in each region you care about. Treat them as independent caches. When a user looks up the review score for Johnny’s Beef in Chicago, the Chicago app server checks the Chicago Redis cache. Everyone involved in the Chicago request is blissfully unaware of whatever is going on in Singapore.
We lean on caches because apps are read-heavy. But writes happen. If you’re running caches all over the world, they can eventually drift from their source of truth. Bad cache data will really irritate people. It can break apps entirely, which is why you have the keyboard shortcut for “hard refresh” in muscle memory. So, when data changes, the global cache fabric should also change, even when the cache fabric is JBOR.
This sounds distributed-systems-hard. But it doesn’t have to be. We can use a key based cache invalidation scheme to keep things fresh for an app with standalone cache servers. Key-based invalidation inverts the intuitive roles of keys and values: instead of a durable key pointing to changeable value, values never change, only the keys (for instance, by timestamping them). Database changes generate new keys; stale cache values eventually expire from neglect. When all is right with the world, this can be good enough.
But we live in a fallen world. Apps have bugs. So do people. Bad information eventually pollutes caches. If we can purge bad cache data our life is easier. If we can purge it everywhere, instantly, we’ll be as wizards. Wizards with a “build a whole CDN and take it public” level of power.
Let’s seize this power for ourselves.
Abusing replication for instant cache purge
Redis has a simple replication model: we can start a Redis server with replicaof primary-redis.internal 6379
and it will grab a copy of the existing database and keep it in sync until we shut it down. The primary server doesn’t even need to know ahead of time. It’s blissfully simple.
We can exploit this. Create a primary Redis in Dallas. Add replicas in Singapore, Amsterdam, and Sydney. Now: write to the primary. The whole world updates. We’ve got a global cache fabric that’s always up to date.
Like any distributed cache fabric, we’ll inevitably cache something we shouldn’t. Somehow, the cache key global-restaurant-ranking-johnnys-beef
reads 105
. Not OK! But we can just issue a DEL global-restaurant-ranking-johnnys-beef
and it’ll be back to 1
, everywhere, fast enough to seem instant.
This seems great. But there’s a catch: that CDN observation we made earlier. If Singapore shared a lot of information with Chicago, this would be close to the “right” global Redis configuration. But they don’t; very little data overlaps between regions.
So we’re not quite there yet. But we have more tricks.
Eventually consistent, never consistent: why not both?
One kind of database cluster has “strong” consistency: once data is written, we trust that subsequent reads, anywhere in the cluster, see the new data. More frequently, we have some degree of “eventual” consistency: the data will populate the whole cluster… at some point, and we’re not waiting.
Our Redis replica scheme has eventual consistency. We write to a primary Redis instance and trust the replicas will get themselves in sync later on.
Meanwhile, the JBOR cluster is never consistent – in the same way that two people who’ve never met each other “aren’t dating”. But on the other hand, we like it because it optimizes storage by storing only what each region needs.
What we need is a way to treat each region mostly independently and sync some changes from a central source of truth.
Which gets us to the an interesting config option: replica-read-only
. This setting defaults to yes
, and does what you’d expect. But check this out:
replica-read-only no
replicaof primary-redis.internal 6379
Now we have a replica that also accepts writes. 😱. This is terrible for a backend database. But this isn’t a database; it’s a cache fabric. Our primary is in Dallas. But Singapore’s app server can still write directly to Singapore’s Redis replica. And it still syncs changes from the primary! It’s still in charge!
So, when we need to make sure bad cache data goes away everywhere, we can perform an “instant” cache purge by issuing that DEL
to the the Dallas primary.
Beyond the purge
A thing that happens when you build a platform for running clusters of globally-distributed stuff is that you stumble onto interestingly simple distributed designs. This seems like one of them. We like it!
This sort of implied, selective replication potentially gives us a lot more than just global cache purge. We can push any global content to all the cache regions, simply by writing to the primary:
SET daily_message "happy tuesday nerds"
Or, we can simulate a distributed fan-out queue by pushing to a list on the primary…
LPUSH notifications "time for ice cream"
… and then popping from that list in each region.
BRPOP notifications
"time for ice cream"
BRPOP notifications
""
We can let regions selectively replicate by choosing when to write to their local cache and when to write globally, in a manner similar to the one we used to globally distribute Postgres. No doubt there’s a zillion other things we haven’t thought about. It’s been hard to play with these ideas, because almost nobody runs multi-region AWS for simple applications. But anyone can run a multi-region Fly.io app with just a couple commands.
We’re psyched to see what else people come up with.
Want to know more? Join the discussion.