Lightboard Session: Automated Failover with MariaDB Databases


This is the first in a series of short lightboard talks highlighting popular database topics. In this five-minute session, you’ll learn how the MariaDB MaxScale database proxy works with multiple MariaDB instances to support high availability using a primary/replica architecture.

Video Transcript

Hi, my name is Manjot Singh. I’m here to talk to you about MariaDB’s classic replication model. A lot of times you have a primary server and you have an application. That application connects to that primary server and does its work – it saves data, reads data. And what you may have after some time is some success. Your application’s growing, your user base is growing and this one primary node is no longer viable to handle all of your workload.

The other thing you may also notice is if this primary node has any issues – say the hardware crashes, maybe not a database application or something on that server crashes – you’re going to have to restore from backups. So this is a reason you might have a replica that’s a primary candidate, right? And you’d want that replica to become the primary in case the primary has an issue.

In the old days, what you’d have to do is have an ops person realize that it’s down, reconfigure the application, deploying the replica so that it’s now the primary. That’s usually a lot of overhead, a lot of time to get back up. So you want to look for an automated service, and MariaDB has this great product called MaxScale.

And what MaxScale does is automated fail over. What it does is a health check on the primary that says, “Hey, primary, are you up?” And the primary returns back saying, “Yeah, everything’s great.” And then it also is monitoring the replica. Now, if the primary doesn’t return “good” on the health check, MaxScale will take the transactions that it was previously sending here … and instead play them on the replica.

So the replica becomes a primary automatically. It gets that transaction replay and everything’s great. The application is already redirecting to MaxScale. It’s not having to be reconfigured.

The other thing you may want to consider is more read replicas, and the reason for this is a lot of applications have more reads than writes. MaxScale can do the read/write splitting as well. So the application doesn’t need any changes to the code or the configuration, but your reads and your writes are split so that you have reads going to the replicas and writes going to the primary. And then you still have the health check going, which will rotate your topology so that the replica becomes the primary and the old primary, if it does come back up, is now a replica and the other replicas fall in line under the new primary. So MaxScale gives you those great features of read/write splitting, round robining of the reads, and that failover connection. So hopefully this helps you out.

And then let’s talk about how does this work across data centers? What you do is you would duplicate this topology in another data center. So let’s say that this is in San Jose, California, for example. And you may have that great data center in Virginia and you would re-create that primary and that replica. You’d have another replica, and then what you’d want to do is have a bi-directional replication so that you could fail back easily, because if one of the data centers goes down, when it comes back up, it will catch up to the other data center automatically with that bi-directional replication.

What I see a lot is I’ll see applications deployed on both sides because they’ll have their own global load balancers. The other option you have is a global load balancer here that connects to both of these that maybe the application goes through that global load balancing.

So hopefully this helped you understand the classic MariaDB architecture and then again, scaling it across geographies. Hopefully you can try this out and do some great work. Thank you.