Metro Area Clustering: Deployment and Configuration

This is the second blog in our series about ClustrixDB 9’s new Zones feature that supports creating Metro Area Clusters. If you missed the first blog “Database Metro Area Clustering Across Data Centers”, we recommend giving it a read since it provides a great overview of Metro Area Clustering.

In this blog we’ll walk through the installation and configuration of ClustrixDB 9 using Zones, thereby creating a Metro Area Cluster.

High-Level Steps

Build a cluster with instances in three zones
Assign nodes a zone ID

ClustrixDB needs the servers it runs on to meet some requirements. To see what ClustrixDB needs, check out the recommended server specs in our public documentation.

We recommend, at a minimum, a 9 node cluster (3 nodes in each zone) as it lets you lose a whole zone, and still protect the cluster from another node failure in one of the remaining zones.

You can, of course, have more than 9 nodes if your workload or data volumes require more. Nodes can be as small as 8 cores, so a 9 node cluster can have as few as 24 cores in each zone.

Building a Cluster In AWS

If you are deploying in AWS, then the creation of the nodes and installation of the ClustrixDB software is really easy. The high-level steps are outlined below, and covered in more detail in the ClustrixDB online documentation.

Spin up instances with the private Clustrix AMI in three different Availability Zones (AZs) within the same AWS region
Apply the license on the node that will be the first node in the cluster
Add nodes to the cluster from the first node

The main set of instructions on forming a cluster in AWS can be found here: ClustrixDB AWS Installation Guide.

Pay special attention to make sure the AWS security group settings are configured to allow the instances to communicate to each other. For more detail on this take a look at Best Practices For AWS Security Groups.

Once the cluster is formed, follow the instructions in Creating Zones below.

Building a Cluster Somewhere Else
(aka Bare Linux OS Installation)

But let’s assume you are not deploying in AWS, but instead are deploying in your own data center (or colo), or in Google Cloud, Microsoft Azure, or Rackspace. Then you will be installing ClustrixDB on a bare Linux OS environment.

For a bare OS installation, these are the basic steps:

Build CentOS 7 servers (aka nodes) in three different zones (where a zone is a failure domain as described in the previous blog)
Install some dependent packages on each node
Download ClustrixDB install package and copy them to the other nodes
Uncompress the downloaded ClustrixDB compressed tar file and run the ClustrixDB installer on each node.
Install the license key on the first node
Add nodes to the cluster from the first node

Below are the commands we used in steps 2 through 4 in our lab environment. Your environment will likely differ, so to see the full Bare OS installation steps, head over to our online docs and read ClustrixDB Installation Guide Bare OS Instructions.

These steps were performed on each node:

# Create the directory if it doesn't already exist and then cd to it

cd /data/clustrix


# In case your CentOS 7 installation doesn't include this by default

yum -y install bzip2


# Extract the ClustrixDB software, and run the installer

tar xjf clustrix-9.0.4.el7.tar.bz2

cd clustrix-9.0.4.el7

./clxnode_install.py -y --non-root

Once the ClustrixDB software installation is completed on all the nodes, you can verify that the installation is successful by seeing if the node shows an “OK” status on each node:

$ /opt/clustrix/bin/clx stat

Cluster Name:    clb509e64b55b120e2

Cluster Version: 9.0.4

Cluster Status:   OK

Cluster Size:    1 nodes - 16 CPUs per Node

Current Node:    karma063 - nid 1

nid |  Hostname | Status |  IP Address | TPS |     Used | Total

----+-----------+--------+--------------+-----+-----------------+--------

  1 |  karma063 |    OK  | 10.2.14.115 |   0 | 138.8M (0.02%) |  767.0G

----+-----------+--------+--------------+-----+-----------------+--------

                                            0 | 138.8M (0.02%) | 767.0G

If so, you can form the cluster by adding the license key to one node, then commanding the other nodes to join that first node. Our example is below.

This step was performed on the first node, using the mysql command-line client:

-- Replacing xxxx with a valid license key

set global license='{xxxx}';

alter cluster add 'internal_IP1', 'internal_IP2', 'internal_IP3', 'internal_IP4', 

'internal_IP5', 'internal_IP6', 'internal_IP7', 'internal_IP8';

At this point, we had a 9-node cluster formed.

Creating Zones

Once the cluster is up and running with all of the nodes in the cluster, you can then assign the nodes to zones.

The syntax to assign a zone ID is:

alter cluster [node_id1], [node_id2], [node_id3] zone [id];

Here’s an example:

alter cluster 1, 2, 3 zone 1; -- assigns nodes 1, 2 and 3 to zone 1

alter cluster 4, 5, 6 zone 2; -- assigns nodes 4, 5 and 6 to zone 2

alter cluster 7, 8, 9 zone 3; -- assigns nodes 7, 8 and 9 to zone 3

Once this is done, the ClustrixDB Rebalancer will start moving data around. If the cluster is empty or has little data, this will happen pretty quickly. But if the cluster has a lot of data, this may take a while. Allow the Rebalancer to do its job and move the data around.

To determine if the Rebalancer is finished, you can run this query. When the Rebalancer is finished, the resulting value should be 0:

select count(*) from system.rebalancer_queued_activity;

Best Practices

Here are some things you should keep in mind when building a Metro Area Cluster.

Over Provision
Plan to over-provision the cluster rather than just having enough nodes for your current workload. As a guideline, without zones, if your optimized workload consistently generates an average CPU load of over 70%-80%, then it’s time to consider adding more nodes from both a capacity and node failure perspective. With zones, you may want to consider adding more nodes if your workload is consistently generating an average CPU load of 50-60%. This is because if you lose a zone, you will not only need the free disk capacity for data re-protection, but your workload will also take a hit if there is a third less cores available for processing transactions.
Use a Load Balancer
You will need a load balancer sitting in front of the instances and if you’re using an AWS ELB, it will need to be configured for multi-zone load balancing.
Make the zones independent from failures
If you’re building this on your own infrastructure, ensure that each zone has its own power feed, networking, and environmental controls as you don’t want one localized failure to affect the other zones.

What’s Next?

To learn more about the new ClustrixDB 9 Zones feature, head on over to our online documentation on ClustrixDB Zones.

ClustrixDB 9 and Zones are generally available now.

If you have a project that you’re actively working on and would like to ask questions live with one of our top-notch technical solution engineers, we’d be happy to hear from you. We are very excited about this new deployment capability, so just ping us at [email protected].

Or you can apply for a trial of ClustrixDB 9 directly from our website.

We have some more blogs coming this exciting topic, so stay tuned.