Tag: Load balancing

CONNECT BY is dead, long live CTE! In MariaDB Server 10.2!
Yes, you got that right, the old CONNECT BY as used by recursive SQL with Oracle has been replaced by Common Table Expressions, or the WITH statement in SQL:1999 which is now also available in MariaDB Server 10.2.4 (RC). Now, the SQL WITH construct, using Common Table Expressions or CTE, is useful for other things than just recursive queries, but this is the one feature that WITH enables that was previously very hard to do without some procedural code, the non-recursive use of Common Table Expressions could previously mostly be replaced by using temporary tables.

This blog post explains what recursive SQL is all about and why this is useful, and I will show some examples of both CONNECT BY and how the same SQL is written using the WITH clause.

The most common example for recursive SQL is probably for doing a parts explosion, where we have a table of parts of some component where each part is either a main, top level, part or is a part of another part. For example, a car with an engine, where the engine consists of pistons, cylinders and a camshaft, where the latter also includes some camshaft bearings. I think you get the basic idea here. To query this data to create a list of components that make up some other component, you need to recursively visit the data, i.e. each row is evaluated using conditions from any other row already fetched, except the first row fetched that is.

Now, let’s look at some data first. I assume we have two tables here, one table that contains information on the different parts and then one table that contains information on the individual parts and then one table that contains the hierarchy of the parts, called components. Like this:
```
CREATE TABLE parts(part_id INTEGER NOT NULL PRIMARY KEY,
  part_name VARCHAR(60) NOT NULL);

CREATE TABLE components(comp_id INTEGER NOT NULL PRIMARY KEY,
  comp_name VARCHAR(60),
  comp_count INTEGER NOT NULL,
  comp_part INTEGER NOT NULL,
  comp_partof INTEGER,
  FOREIGN KEY(comp_part) REFERENCES parts(part_id));
ALTER TABLE components ADD FOREIGN KEY(comp_partof) REFERENCES components(comp_id);
```
The two things to note here is that the components table has a column, comp_partof, that implements the hierarchy and that there is a self-referencing FOREIGN KEY constraint on this table. Given these tables, assuming that we are a small privately held car-manufacturing company in southern Germany, let’s insert some data:
```
INSERT INTO parts VALUES(1, 'Car');
INSERT INTO parts VALUES(2, 'Bolt');
INSERT INTO parts VALUES(3, 'Nut');
INSERT INTO parts VALUES(4, 'V8 engine');
INSERT INTO parts VALUES(5, '6-cylinder engine');
INSERT INTO parts VALUES(6, '4-cylinder engine');
INSERT INTO parts VALUES(7, 'Cylinder block');
INSERT INTO parts VALUES(8, 'Cylinder');
INSERT INTO parts VALUES(9, 'Piston');
INSERT INTO parts VALUES(10, 'Camshaft');
INSERT INTO parts VALUES(11, 'Camshaft bearings');
INSERT INTO parts VALUES(12, 'Body');
INSERT INTO parts VALUES(13, 'Gearbox');
INSERT INTO parts VALUES(14, 'Chassie');
INSERT INTO parts VALUES(15, 'Rear axle');
INSERT INTO parts VALUES(16, 'Rear break');
INSERT INTO parts VALUES(17, 'Wheel');
INSERT INTO parts VALUES(18, 'Wheel bolts');

INSERT INTO components VALUES(1, '320', 1, 1, NULL);
INSERT INTO components VALUES(2, NULL, 1, 6, 1);
INSERT INTO components VALUES(3, NULL, 1, 7, 2);
INSERT INTO components VALUES(4, NULL, 4, 8, 3);
INSERT INTO components VALUES(5, NULL, 4, 9, 3);
INSERT INTO components VALUES(6, NULL, 1, 10, 3);
INSERT INTO components VALUES(7, NULL, 3, 11, 6);
INSERT INTO components VALUES(8, NULL, 1, 12, 1);
INSERT INTO components VALUES(9, NULL, 1, 14, 1);
INSERT INTO components VALUES(10, NULL, 1, 15, 9);
INSERT INTO components VALUES(11, NULL, 2, 16, 10);

INSERT INTO components VALUES(12, '323 i', 1, 1, NULL);
INSERT INTO components VALUES(13, NULL, 1, 5, 12);
```
If you are not into mechanics, let me tell you that there are more parts than this to a car, for example, I left out a few critical components, such as the cupholder, the dog that stands on the pickup cargo area and the insulting bumber-sticker, but I think you get the idea. Note that there are two “main” components, the ‘320’ and ‘323 i’ and that these are top level components are indicated by the comp_partof column being set to NULL.

Now, assume you want to list all the parts that make up a 320. The way this works when using the CONNECT BY syntax, you compose one single SQL statement and provide a CONNECT BY clause to indicate the relationship. Like this:
```
SELECT LPAD('-', level, '-')||'>' level_text, comp_count, NVL(comp_name, part_name) name
FROM components c, parts p
WHERE c.comp_part = p.part_id
START WITH c.comp_name = '320'
CONNECT BY PRIOR c.comp_id = c.comp_partof;
```
Let me explain this a bit, but there is nothing really magic here. We are selecting from the two tables and joining them just as usual. Then we use the START WITH clause to define the top level component and then the rest of the components are have a comp_partof of a component that matches the comp_id of the START WITH component or a comp_id of any other component that has been fetched.

This way of writing recursive SQL has some advantages, such as it is relatively compact and is easy to understand. The disadvantage is that there are some quirks and limitation to this and that once your queries gets more complex, CONNECT BY gets a bit hairy. One sure sign that CONNECT BY is going away, even though I and many others tend to like it because of the ease of use, is that even Oracle, as of Oracle 11g, also has implemented the WITH construct, or Common Table Expressions or CTE. So looking at the above statement how this would work in MariaDB 10.2, this is what it would look like using the WITH construct:
```
WITH RECURSIVE comp(comp_id, comp_name, comp_partof, comp_count) AS (
  SELECT comp_id, comp_name, comp_partof, comp_count
    FROM components JOIN parts ON comp_part = part_id
    WHERE comp_partof IS NULL AND comp_name = '320'
  UNION ALL
  SELECT c1.comp_id, p.part_name, c1.comp_partof, c1.comp_count
  FROM components c1 JOIN parts p ON c1.comp_part = p.part_id
    JOIN comp c2 ON c1.comp_partof = c2.comp_id)
SELECT comp_count, comp_name FROM comp;
```
Comparing this CTE version to the CONNECT BY version as above, this is a bit more complex, but how it works is actually pretty clear once you look at it carefully. To begin with, the top level item or anchor is the first SELECT in the UNION ALL and the following components are fetched using the second SELECT. Then the recursive aspect is handled by this UNION being run until there are no more rows returned from it? As you can see, although this requires more text and more complex SQL to write, it is also a fair bit more flexible. For example, the anchor point is defined by a completely separate SELECT which means it can be whatever SELECT you want, selecting from any odd table. Secondly, the column you use and the conditions for defining the hierarchy can be as complex as you want. And thirdly, there is also the power of that last SELECT which in the case above just gets the data from the UNION, but you can actually apply any kind of filter, ordering or column filter to this query. The result of the query above is this:
```
comp_count      comp_name
1               320
1               4-cylinder engine
1               Body
1               Chassie
1               Cylinder block
1               Rear axle
4               Cylinder
4               Piston
1               Camshaft
2               Rear break
3               Camshaft bearings
```
Before I finish this off, the WITH RECURSIVE statement is somewhat overly complex, in MariaDB 10.2 you can, for example, skip listing the column names of the recursive table, like this:
```
WITH RECURSIVE comp AS (
  SELECT comp_id, comp_name, comp_partof, comp_count
    FROM components JOIN parts ON comp_part = part_id
    WHERE comp_partof IS NULL AND comp_name = '320'
  UNION ALL
  SELECT c1.comp_id, p.part_name, c1.comp_partof, c1.comp_count
  FROM components c1 JOIN parts p ON c1.comp_part = p.part_id
    JOIN comp c2 ON c1.comp_partof = c2.comp_id)
SELECT comp_count, comp_name FROM comp;
```
And although Oracle 11 and up supports the CTEs, it works a bit differently. For one thing, the RECURSIVE keyword isn’t supported (it is assumed to be recursive by default) and the way I read the SQL standard, this is actually wrong, for recursive queries you have to use the RECURSIVE keyword. Second, Oracle does require the SELECT-list. So in Oracle, you would see something like this:
```
WITH comp(comp_id, comp_name, comp_partof, comp_count) AS (
  SELECT comp_id, comp_name, comp_partof, comp_count
    FROM components JOIN parts ON comp_part = part_id
    WHERE comp_partof IS NULL AND comp_name = '320'
  UNION ALL
  SELECT c1.comp_id, p.part_name, c1.comp_partof, c1.comp_count
  FROM components c1 JOIN parts p ON c1.comp_part = p.part_id
    JOIN comp c2 ON c1.comp_partof = c2.comp_id)
SELECT comp_count, comp_name FROM comp;
```
Yes, we are all happily following the same SQL standard. Somewhat…
See the MariaDB Knowledge Base for more information on common table expressions.

Happy SQL’ing

/Karlsson
March 20, 2017
ColumnStore: Storage Architecture Choices
Join MariaDB ColumnStore Use Cases Webinar on March 23, 2017 10 a.m. PT

Introduction

MariaDB ColumnStore provides a complete solution for automated high availability of a cluster. On the data processing side:
- Multiple front-end or User Module (UM) servers can be deployed to provide failover on SQL processing.
- Multiple back-end or Performance Module (PM) servers can be deployed to provide failover on distributed data processing.
Due to the shared-nothing data processing architecture, ColumnStore requires the storage tier to deliver high availability for a complete solution. In this blog, I intend to provide some clarity and direction on the architecture and choices available.

Architecture

As a recap, here is an outline of MariaDB ColumnStore’s architecture:

ColumnStore provides data processing redundancy by individually scaling out the front-end MariaDB Server UM nodes and back-end Distributed Query Engine PM nodes.

All ColumnStore data is accessed and managed by the PM nodes. Each PM manages one or more DBRoots, which contain Extents that hold the data values for a single column. A DBRoot belongs to exactly one PM server (at a point in time). The complete data for a given column is spread across all DBRoots with a given column value being stored exactly once. To learn more about the storage architecture, please refer to my previous blog.

Storage Choices

During installation, a choice must be made between utilizing:
- Local Storage: The DBRoot is created on the local disk for the PM node, specifically under /usr/local/mariadb/columnstore/data<DBROOTID>.
- External Storage: The DBRoot will be stored on storage external to the PM server, then will be mounted for access. An entry for each DBRoot mount must exist in the /etc/fstab file on each PM server.
To provide data redundancy, ColumnStore relies on external storage to provide resilient storage and enable a particular DBRoot volume to be remounted on another PM server. This generally implies a remote networked storage solution, although filesystems such as GlusterFS can allow deployment without additional servers.

When internal storage is utilized, journaling filesystems and RAID deployment provide for resilient storage. However, since the storage is only available within a given PM server, the storage cannot be remounted on another PM server should one fail. In this case, the failed server must be recovered before ColumnStore can support additional queries.

With external storage, ColumnStore can provide automated failover and continuity in the event a PM server fails. This is because a given DBRoot storage is external to the failed PM server and can be remounted on another PM server. The following diagram illustrates how this works:

In this case:
1. Server PM4 crashes.
2. The ProcMgr process on PM1 detects that PM4 is no longer reachable and instructs PM3 to mount DBRoot4 and process reads and writes in addition to DBRoot3.
3. When PM4 recovers, ProcMgr instructs PM3 to unmount DBRoot4, and then PM4 to mount DBRoot4, thereby returning the system to a steady state.
4. If the PM1 server crashes, which contains the active ProcMgr process, the system will promote another PM server to act as the active ProcMgr, which will then initiate the above process.
Storage choices that support this model include:
- AWS EBS (if deployed on AWS) – The ColumnStore AMI image utilizes this and provides automation around storage management.
- Other Cloud Platforms – Are available and provide similar capabilities such as Persistent Disks for Google Cloud.
- SAN / NAS – Many vendor choices exist and may include capabilities such as snapshotting to simplify backup and asynchronous replication to DR storage.
- GlusterFS – Open source software filesystem (with support available from RedHat). On a small cluster this can be co-deployed with the PM servers for a simplified topology.
- CEPH – Open source storage cluster (with support available from RedHat). This enables deployment of a lower cost software-defined storage cluster.
- DBRD – A community member has been working with us on testing this as a storage layer, though it is not certified yet.
- Other Options – Are available and will work as long as they present a compliant file system layer to the Linux operation system and ensure that a data volume can be remounted onto a different PM server.
Disaster Recovery

Warm Standby

To enable a warm standby DR setup, a second MariaDB ColumnStore cluster should be installed and configured identically in a secondary location. Available storage cluster replication should be configured to provide replication between the locations. Should the cluster in Data Center 1 fail, the cluster in Data Center 2 can be initiated to provide continuity.

Active-Active

To enable an active-active DR setup, a second MariaDB ColumnStore cluster should be installed, configured and run in a secondary location. Potentially this could have a different topology of PMs from the primary data center. The data feed output of the ETL process must be duplicated and run on both clusters in order to provide a secondary active cluster.

Conclusion

MariaDB ColumnStore offers fully automated high availability. To provide this at the storage level it relies upon the storage layer to provide both:
1. Resiliency
2. The ability to remount a DBRoot on a different PM server.
In the majority of cloud providers, such networked storage capabilities are a baseline offering. Private clouds such as OpenShift also come with similar capabilities such as the CEPH storage cluster. Finally, for a bare metal installation, storage appliances and software-defined storage offerings can provide this. Storage-based replication provides tried and trusted replication that works and handles the many edge cases that arise in real-world replication scenarios.

Not needing to focus on this enables my engineering team to support more customer use cases to broaden the reach and impact of ColumnStore. For example, in response to community and customer feedback, we are working on improving the text data type as well as providing a bulk write API to better support streaming use cases (i.e., Kafka integration). These improvements, plus other roadmap items, will be addressed in our 1.1 release later this year.
March 14, 2017
Webyog – M|17 Sponsor

Authored by Shree Nair, Product Manager, Webyog

We are very excited to sponsor MariaDB’s inaugural user conference, M|17! It’s a great opportunity to introduce Webyog to the MariaDB community, and demonstrate the close collaboration between Webyog and MariaDB – we work together to provide powerful administrative tools for the fastest growing open source database.

By partnering with MariaDB, one of the most popular databases in the world, we have been able to help thousands of organizations. The products and services from Webyog and MariaDB form a joint solution for meeting database challenges as the market matures.

We develop best-in-class management and monitoring tools (SQLyog and Monyog) for MariaDB. In fact, they are included in a MariaDB Enterprise subscription. We offer quality products at an affordable price – our core competitive advantage. We consider word of mouth to be the key to our success and a strong barometer of quality. Providing the open source community with the most cost-effective management and monitoring tools is our mission, which is why we partnered with MariaDB.

While we intend to compete with enterprise solutions providers in the future, we will continue contributing to the open source community. We have received requests from other partners to better align Webyog in the interests of supporting their customers; however, we believe the best way to serve the open source community is with MariaDB.

While we have seen a growing interest from large MariaDB deployments in enterprises, we will continue to provide our world-class tools to MariaDB customers at no additional cost. Right now, we are working on a version of Monyog, optimized for MariaDB, that will include an unrivaled performance monitoring and trending system for key metrics.

March 8, 2017
Replication Manager, Flashback and Much More!

Note: Replication Manager is developed and maintained by community members. It can be used with MariaDB Server, but it is not supported with an enterprise subscription.

MariaDB 10.2.4 has fantastic new features that perfectly match Replication Manager’s ultimate goals: transparent automated failover on MariaDB master slave architecture (with as little as possible lost in transaction:)). We are going to explore those new features and how Replication Manager uses them for your benefit!

The first feature is constant binlog fetching from remote master via mysqlbinlog.

Replication Manager will use this feature when your old master comes back to live. It will take a snapshot of the transactions events differences from the position where the new elected master was introduced and the current position of the joiner old master.

Those events are saved in a crash directory in the replication-manager working directory for later use.

Another exciting feature is the binlog flashback.

Replication Manager will use this feature to reintroduce the old master in the active topology in multiple cases.

The first case, when semi-sync replication was synced during the crash: This is good as it saves you from recovering the dead master from a backup of the new master.

The picture looks like this: In semi-sync the transactional state of the dead master can be ahead of the new leader, as the sync part in the feature name refers to the state of the client, not the state of the database.

I’ll try a metaphor:

A regular MariaDB replication GOAL is to make sure you mostly never lose a transaction under HA by accepting the nature of an unpredictable future. So If this were a performance of a show, you could enter even if you are a criminal. If you disturb the show, the show can recover: in this example, if the show gets disturbed, Replication Manager will transfer you and others to the same show at a later time, or in another place, or begin again at the same position in the show. Semi-sync is the speed of light delay, as if the event has already happened on the active stage but never made it to your eyes. We will transfer you before or after that time that is under your control and make sure that the show is closely synchronized!

So in semi-sync, the state is SYNC, the show stops in a state that is ahead of where others would be stuck with a “delayed show”. Since clients’ connections have never seen such differences, you can flashback by rewinding the show from when the disturbance occurred and continue the show from the same point in another location.

This is exactly the same concept as a delayed broadcast. If going to the bathroom takes more time than the broadcast delay, you may have lost some important parts of the story when you press resume.

The second case is when semi-sync delay is passed or you have been not running semi sync replication: We can resume the show but you have possibly lost events, Replication Manager can or flashback or use dump for recovering.

Let examine the options available to make this happen.

      # MARIADB >= 10.2
# —————
mariadb-binary-path = “/usr/sbin”
# REJOIN
# ——–
autorejoin = true
autorejoin-semisync = true
autorejoin-flashback = true
autorejoin-mysqldump = false

Don’t forget to set in the cluster configuration that you want to auto resume:

      interactive = false

The default of Replication Manager is to alert on failure, not to do the job of failover.

Another exciting feature is the “no slaves behind” availability.

Can you use your slaves and transparently load balance reads with replication-manager topology? The answer was maybe with MaxScale read write splitting, but only if you didn’t care about reading delayed slave in auto commit workload.

For example, insert followed by close connection and passing the ball to another micro service that read that same data, would be insecure.

Now there is a possibility to configure read write splitter to failback to master under some replication delay lower than the one setup via no slaves behinds.

It brings the solution to slow down the master commit workload under that delay so that read on slave can become committed read!

Extra features:

The new Replication Manager release also addresses some requirement to manage multi clusters management within the same replication-manager (note the change in the configuration file).

      [cluster1]
specific options for this cluster
[cluster3]
specific options for this cluster
[default]

If you have a single cluster just use default.

In the console mode one can switch cluster using Ctrl-P & Ctrl-N and in HTTP mode a drop box is available to switch the active cluster view.

Some respected members of the community have addressed some possible issues and here with the choice made to separate failover logic in Replication Manager instead of putting it directly in MariaDB MaxScale proxy. This new release addresses such concerns.

Let’s look at the core new features of Replication Manager when it comes to MaxScale Proxy.

      failover-falsepositive-heartbeat = true
failover-falsepositive-heartbeat-timeout = 3
failover-falsepositive-maxscale = true
failover-falsepositive-maxscale-timeout = 14

One can get this just by the names. Having separate pieces make it possible for better false positive detection of leader death. Here all your slaves acting have a leader failure detection and MaxScale does as well. This is on top of all previous checks and conditions checks.

      failcount = 5
failover-max-slave-delay = 30
failover-limit = 3
failover-at-sync = true
failover-time-limit = 10

Stay tuned as more time will pass and the failover-falsepositive method will be added as it is already in the roadmap. I guess this task is addressing some of our fellow ace director musings found here. Also, Etcd is already in the roadmap and will be worked on in the future and receive contributions for sure!

While failover and MaxScale monitoring can be tricky (as noted by Johan and Shlomi), Replication Manager is addressing the issue of the last slave available slave being elected as the new master.

In this case MaxScale is lost without a topology and this will be similar to having a single slave for HA. The solution to address this issue is to let Replication Manager fully drive MaxScale server state.

      maxscale-monitor = false
# maxinfo|maxadmin
maxscale-get-info-method = “maxinfo”
maxscale-maxinfo-port = 4002
maxscale-host = “192.168.0.201”
maxscale-port = 4003
maxscale-user = “admin”
maxscale-pass = “mariadb”

By setting MaxScale monitoring = false replication-manager. Tell MaxScale to disable monitoring and it will impose server status to MaxScale.

Don’t forget to simply activate maxscale usage.

      maxscale = true

Last but not least of the Replication Manager’s new features is the tracking of metrics via an embedded carbon graphite server. This internal server can be a relay for graphite for custom reporting and is also used by the HTTP server of Replication Manager.

      graphite-metrics = true
graphite-carbon-host = “127.0.0.1”
graphite-carbon-port = 2003
graphite-embedded = true
graphite-carbon-api-port = 10002
graphite-carbon-server-port = 10003
graphite-carbon-link-port = 7002
graphite-carbon-pickle-port = 2004
graphite-carbon-pprof-port = 7007

All those features are passing the new non regression test cases and can be found in the dev branch of Replication Manager.

Are you thinking, “Hey this is good technical content but my team does not know much about replication technical details?” No worries! We DO have helpers in Replication Manager to enforce best practices, and it’s always best to plan HA before starting any new serious DB project.

      force-slave-heartbeat= true
force-slave-gtid-mode = true
force-slave-semisync = true
force-slave-readonly = true
force-binlog-row = true
force-binlog-annotate = true
force-binlog-slowqueries = true
force-inmemory-binlog-cache-size = true
force-disk-relaylog-size-limit = true
force-sync-binlog = true
force-sync-innodb = true
force-binlog-checksum = true

*Note that some following enforcements do not get covered by test cases and we would welcome any contributors.

March 7, 2017
How MariaDB ColumnStore Handles Big Data Workloads – Data Loading and Manipulation
MariaDB ColumnStore is a massively parallel scale out columnar database. Data loading and modification behaves somewhat differently from how a row based engine works. This article outlines the options available and how these affect performance.

Data Loading and Manipulation Options

MariaDB ColumnStore provides several options for writing data:
1. DML operations: insert, update, delete
2. Bulk DML: INSERT INTO … SELECT
3. MariaDB Server bulk file load: LOAD DATA INFILE
4. ColumnStore bulk data load: cpimport
5. ColumnStore bulk delete: ColumnStore partition drop.
DML Operations

ColumnStore supports transactionally consistent insert, update, and delete statements using standard syntax. Performance of individual statements will be significantly slower than you’d expect with row based engines such as InnoDB. This is due to the system being optimized for block writes and the fact that a column based change must affect multiple underlying files. Updates in general will be faster since updates are performed inline only to the updated columns.

Bulk DML

INSERT INTO .., SELECT statements where the destination table is ColumnStore are optimized by default to internally convert and use the cpimport utility executing in mode 1 which will offer greater performance than utilizing raw DML operations. This can be a useful capability to migrate a non ColumnStore table. For further details please refer to the following knowledge base article: https://staging-mdb.com/kb/en/mariadb/columnstore-batch-insert-mode/

LOAD DATA INFILE

The LOAD DATA INFILE command can also be used and similarly is optimized by default to utilize cpimport mode 1 over DML operations to provide better performance. However greater performance (approximately 2x) and flexibility is provided for by utilizing cpimport directly. This can be useful for compatibility purposes. For further details please refer to the following knowledge base article: https://staging-mdb.com/kb/en/mariadb/columnstore-load-data-infile/

cpimport

The cpimport utility is the fastest and most flexible data loading utility for ColumnStore. It works directly with the PM WriteEngine processes eliminating many overheads of the prior options. Cpimport is designed to work with either delimited text files or delimited data provided via stdin. The latter option provides for some simple integration capabilities such as streaming a query result from another database directly into cpimport. Multiple tables can be loaded in parallel and a separate utility colxml is provided to help automate this. For further details please refer to the following knowledge base article: https://staging-mdb.com/kb/en/mariadb/columnstore-bulk-data-loading/

The utility can operate in different modes as designated by the -m flag (default 1):

Mode 1 – Centralized Trigger, Distributed Loading

The data to be loaded is provided as one input file on a single server. The data is then divided and distributed evenly to each of the PM nodes for loading. The extent map is referenced to aim for even data distribution. In addition the -P argument can be utilized to send the data to specific PM nodes which allows for the finer grain control of modes 2 and 3 while preserving centralized loading.

Mode 2 – Centralized Trigger, Local Loading

In this mode, the data must be pre divided and pushed to each pm server. The load on each server is triggered from a central location and triggers a local cpimport on each pm server.

Mode 3 – Local Trigger, Local Loading

This mode allows for loading data individually per PM node across some or all of the nodes. The load is triggered from the PM server and runs cpimport locally on that PM node only.

Modes 2 and 3 allow for more direct control of where data is loaded and in what order within the corresponding extents however care needs to be taken to allow for even distribution across nodes. This direct control does allow for explicit partitioning by PM node, for example, with 3 nodes you could have one node with only America’s data, one with EMEA, and one with APAC data. Local query can be enabled to allow querying a PM directly limiting to that regions data while still allowing querying all data from the UM level.

Partition Drop

ColumnStore provides a mechanism to support bulk delete by extents. An extent can be dropped by partition id or by using a value range corresponding to the minimum and maximum values for the extents to be dropped. This allows for an effective and fast purging mechanism if the data has an increasing date based column then the minimum and maximum values for the columns extents will form a (potentially overlapping) range based partitioning scheme. Data can be dropped by specifying the range values to be removed. This can form a very effective information lifecycle management strategy by removing old data by partition range. For further details please refer to the following knowledge base article: https://staging-mdb.com/kb/en/mariadb/columnstore-partition-management/

Transactional Consistency

MariaDB ColumnStore provides read committed transaction isolation. Changes to the data whether performed through DML or bulk import are always applied such that reads are not blocked and other transactions maintain a consistent view of the prior data until the data is successfully committed.

The cpimport utility interacts with a high water mark value for each column. All queries will only read below the high water mark and cpimport will insert new rows above the high water mark. When the load is completed the high water mark is updated atomically.

SQL DML operations utilize a block based MVCC architecture to provide for transactional consistency. Other transactions will read blocks at a particular version while the uncommitted version is maintained in a version buffer.
February 22, 2017
MariaDB MaxScale 2.1: Power to the Modules
The 2.1 version of MaxScale introduced a new concept of module commands. This blog post takes a look at what they are, why they were implemented and where you see them when you use MaxScale.

The Idea Behind Module Commands

The core concept of the module commands was driven by the need to do things with modules that couldn’t be done with all modules of that type. One example of this need is the dbfwfilter firewall filter module. The module needs to be able to provide a mechanism which is capable of testing the validity of the rules as well as updating them out on the fly. We could have introduced a reload function into the filter API but only a few filters would ever implement it. But what if a filter needs to be able to do even more specific tasks which do not apply to other modules of the same type?

As the type and variety of modules in MaxScale starts to grow, so do the requirements that those module place on the MaxScale core. At this point, we have realized that we need a way to allow modules to expose module specific functionality to end users regardless of the module API they use. And so the idea of a module command was born.

How Module Commands Were Implemented

The module command implementation is relatively simple. It allows modules to register commands that are exposed to the user via the administrative interface. All the module needs to do is to declare the parameters the command expects, document the parameters and give a function which is called when the module command is executed. Everything else is handled by the MaxScale core.

Calling Module Commands

The user can then call these commands via the administrative interface. The use of module commands is intended to be very similar to the use of the built in maxadmin commands and it should feel familiar to the users of MaxScale.

Two new commands were added to MaxAdmin itself: list commands and call command. The former lists all registered module commands and the latter calls a module command. The execution of call command always takes at least two arguments, the module and command names.

Example Use of Module Commands

The best way to show how module commands work is to show them in action. One of the MaxScale modules that implements module commands is the dbfwfilter database firewall.

Micro-Tutorial: Dbfwfilter Module Commands

A practical example of using module commands is the reloading of the firewall rules in the dbfwfilter module when changes occur in the database.
The module registers two commands. Here’s the output of maxadmin list commands when the dbfwfilter module has been loaded.
```
Command: dbfwfilter rules
Parameters: FILTER


    FILTER - Filter to inspect


Command: dbfwfilter rules/reload
Parameters: FILTER [STRING]


    FILTER - Filter to reload
    [STRING] - Path to rule file
```
The rules commands shows diagnostic information about the active rules. The rules/reload command reloads the rules from a file and optionally changes the location of the rule file. The following filter configuration will use the rules file to detect which types of queries should be allowed. The action=allow parameter tells the filter to let all queries pass that match a rule.
```
[Database-Firewall]
type=filter
module=dbfwfilter
action=allow
rules=/home/markusjm/rules
```
Using the following rule file for the dbfwfilter will only allow statements that target the name or address columns.
```
rule allowed_columns allow columns name address
users %@% match any rules allowed_columns
```
But what if we need to alter the client application to allow statements to the email column? In older versions this would require a restart of MaxScale so that the rules could be reloaded. With the 2.1 version of MaxScale, you can alter the rules and reload them at runtime. To do this, we add the email column to the rule file.
```
rule allowed_columns allow columns name address email
users %@% match any rules allowed_columns
```
Then we reload the rules by calling the rules/reload command of the dbfwfilter. The rules/reload command can be called by executing the call command MaxAdmin command and giving it the filter name as a parameter.
```
maxadmin call command dbfwfilter rules/reload Database-Firewall
```
Once the command is complete, the dbfwfilter has reloaded and taken the new rules into use. With this new functionality in the dbfwfilter, online modifications to both the client applications and backend databases can be made without interruptions to the service.

Summary

Module commands make the usage of MaxScale’s modules easier as it allows modules to perform custom actions. The example shows a practical use-case that makes the use of MaxScale more convenient by leveraging the module command system.

For more information on the implementation of the module commands, read the MaxScale documentation on module commands. You can find a list of modules that implement module commands in the MariaDB MaxScale 2.1.0 Beta Release Notes.

Please try out the new module commands by downloading the MariaDB MaxScale 2.1.0 beta and give us your feedback on our mailing list [email protected] or on #maxscale@FreeNode.
February 16, 2017
Improving the Performance of MariaDB MaxScale

Performance has been a central goal of MariaDB MaxScale’s architecture from the start. Core elements in the drive for performance is the use of asynchronous I/O and Linux’ epoll. With a fixed set of worker threads, whose amount is selected to correspond to the number of available cores, each thread should either be working or be idle, but never be waiting for I/O. This should provide good performance that scales well with the number of available CPUs. However, benchmarking revealed that was not entirely the case. The performance of MaxScale did not continuously increase as more worker threads were added, but after a certain point the performance would actually start to decrease.

When we started working on MariaDB MaxScale 2.1 we decided to investigate what the actual cause for that behaviour was and to make modifications in order to improve the situation. As will be shown below, the modifications that were made have indeed produced good results.

Benchmark

The results below were produced with sysbench OLTP in read-only mode. The variant used performs 1000 point selects per transaction and the focus is on measuring the throughput that the proxy can deliver.

In the benchmark two computers were used, both of which have 16 cores / 32 hyperthreads, 128GB RAM and an SSD drive. The computers are connected with a GBE LAN. On one of the computers there were 4 MariaDB-10.1.13 backends and on the other MariaDB MaxScale and sysbench.

MariaDB MaxScale 2.0

The following figure shows the result for MariaDB MaxScale 2.0. As the basic architecture of MaxScale has been the same from 1.0, the result is applicable to all versions up to 2.0.

In the figure above, on the vertical axis there is the number of queries performed by second and on the horizontal axis there is the number of threads used by sysbench. That number corresponds to the number of clients MariaDB MaxScale will see. In each cluster, the blue bar shows the result when the backends are connected to directly, and the red, yellow and green bars when the connection goes via MaxScale running with 4, 8 and 16 threads respectively.

As long as the number of clients is small, the performance of MariaDB MaxScale is roughly the same irrespective of the number of worker threads. However, already at 32 clients it can be seen that 16 worker threads perform worse than 8 and at 64 clients that is evident. The overall performance of MaxScale is clearly below that of a direct connection.

MariaDB MaxScale 2.1

For MariaDB MaxScale 2.1 we did some rather significant changes regarding how the worker threads are used. Up until version 2.0 all threads were used in a completely symmetric way with respect to all connections, both from client to MaxScale and from MaxScale to the backends, which implied a fair amount of locking inside MaxScale. In version 2.1 a session and all its related connections are pinned to a particular worker thread. That means that there is a need for significantly less locking and that the possibility for races has been reduced.

The following figure shows the result of the same benchmark when run using MariaDB MaxScale 2.1.

Compared with the result of MaxScale 2.0, the performance follows and slightly exceeds the baseline of a direct connection. At 64 clients, 16 worker threads still provide the best performance but at 128 clients 8 worker threads overtake. However, as these figures were obtained with a version very close to the 2.1.0 Beta we are fairly confident that by the time 2.1 GA is released we will be able to address that as well. Based on this benchmark, the performance of 2.1 is roughly 45% better than that of 2.0.

MariaDB MaxScale and ProxySQL

ProxySQL is a MySQL proxy that in some respects is similar to MariaDB MaxScale. Out of curiosity we ran the same benchmark with ProxySQL 1.3.0, using exactly the same environment as when MaxScale was benchmarked. In the following we will show in different figures how MaxScale 2.0, MaxScale 2.1 and ProxySQL 1.3.0 compares, when 4, 8 and 16 threads are used.

With 4 threads, ProxySQL 1.3.0 performs slightly better than MaxScale 2.0 and MaxScale 2.1 performs significantly better than both. The performance of both MaxScale 2.1 and ProxySQL 1.3.0 start to drop at 256 client connections.

With 8 threads, up until 128 client connections the order remains the same, with MaxScale 2.1 at the top. When the number of client connections grows larger than that, MaxScale 2.1 and ProxySQL 1.3.0 are roughly on par.

With 16 threads, up until 64 client connections the order is still the same. When the number of client connections grows larger than that, MaxScale 2.1 and ProxySQL 1.3.0 are roughly on par. The performance of both drop slightly compared with the 8 thread situation.

Too far reaching conclusions should not be drawn from these results alone; it was just one benchmark, MaxScale 2.1 will still evolve before it is released as GA and the version of ProxySQL was not the very latest one.

Summary

In MariaDB MaxScale 2.1 we have made significant changes to the internal architecture and benchmarking shows that the changes were beneficial for the overall performance. MaxScale 2.1 consistently performs better than MaxScale 2.0 and with 8 threads, which in this benchmark is the sweetspot for MaxScale 2.0, the performance has improved by 45%.

If you want to give MaxScale 2.1.0 Beta a go, it can be downloaded at https://downloads.staging-mdb.com/MaxScale/.

February 16, 2017
Introducing MariaDB MaxScale 2.1.0 Beta
We are happy to announce that MariaDB MaxScale 2.1.0 beta is released today. MariaDB MaxScale is the next generation database proxy for MariaDB. Beta is an important time in our release and we encourage you to download this release today!

MariaDB MaxScale 2.1 introduces the following key new capabilities:

Dynamic Configuration
- Server, Monitor and Listeners: MaxScale 2.1 supports dynamic configuration of servers, monitors and listeners. Servers, monitors and listeners can be added, modified or removed during runtime. A set of new commands were added to maxadmin.
- Database firewall filter: Rules can now be modified during runtime using the new module commands introduced in this release.
- Persistent configuration changes: The runtime configuration changes are immediately applied to the running MaxScale as well as persisted using the new hierarchical configuration architecture.
Security
- Secure Binlog Server: The binlog cache files on MaxScale can now be encrypted. MaxScale binlog server also uses SSL in communication with Master and Slave.
- Secured single sign-on: MariaDB MaxScale now supports LDAP/GSSAPI authentication support
- Selective Data Masking: Meet your HIPPA and PCI compliance needs by obfuscating sensitive data using the new masking filter.
- Result set limiting: Prevent access to large sets of data with a single query by using maxrows filter.
- Prepared Statement filtering by database firewall: The database firewall filter now applies the filtering rules to prepared statements as well.
- Function filtering by database firewall: Now database firewall filter adds a rule to whitelist or blacklist a query based on presence of a function.
Scalability
- Aurora Cluster Support: MariaDB MaxScale can now be used as a Proxy for Amazon Aurora Cluster. Newly added monitor detects read replicas and write node in Aurora Cluster, and supports launchable scripts on monitored events like other monitors.
- Multi-master for MySQL monitor: Now MariaDB MaxScale can detect complex multi-master replication topologies for MariaDB and MySQL environment.
- Failover mode for MySQL Monitor: For two node master-slave cluster, MariaDB MaxScale now allows slave to act as a master in case the original master fails
- Read-Write Splitting with Master Pinning: MariaDB MaxScale 2.1 introduces a new “Consistent Critical Read Filter”. This filter detects a statement that would modify the database and route all subsequent statement to the master server where data is guaranteed to be in a up-to-date state
Query Performance
- Query Cache Filter: MariaDB MaxScale 2.1 now allows caching of query results in MaxScale for a configurable timeout. If a query is in cache, MaxScale will return results from cache before going to server to fetch query results
- Streaming Insert Plugin: A new plugin in MariaDB MaxScale 2.1 converts all INSERT statements done inside an explicit transaction into LOAD DATA LOCAL INFILE
Links:
- Release Notes
- List of Bugs fixed
- Binaries on Download Page
- Binaries on Portal.
- MaxScale 2.1.0 Documentation Knowledge Base.
- Source Code in GitHub, tagged with maxscale-2.1.0.
- MariaDB MaxScale 2.1.0 Performance Benchmark
Please post your question in our Knowledge Base or email me at [email protected]
February 16, 2017
How MariaDB ColumnStore Handles Big Data Workloads – Query Engine
MariaDB ColumnStore is a massively parallel scale out columnar database. Query execution behaves quite differently from how a row-based engine works. This article outlines how queries are executed, optimized, and how performance can be influenced.

Query Flow

Applications interact using SQL with MariaDB ColumnStore over the standard MariaDB connectors to the MariaDB server. For ColumnStore tables the query is routed through the ColumnStore storage engine.

The ColumnStore storage engine converts the parsed query into a ColumnStore specific format which is then passed to the ExeMgr process for actual execution. In return the ExeMgr retrieves the query results back.

The ExeMgr process converts the query request into a batch of primitive data jobs to perform the actual query. This is where ColumnStore performs query optimization to perform the query as efficiently as possible. The MariaDB server optimizer is mostly bypassed since its goals are driven around optimizing for row-based storage.

The primitive job steps are sent down to the PrimProc process running on each PM server. Filtering, joins, aggregates, and group bys are pushed down to enable scale out performance across multiple nodes.

Each primitive data job is generally a small discrete task that runs in a fraction of second. For filtering or projection jobs the system will read data in parallel and use other cores to process blocks in parallel to produce the result.

Results are returned from the PrimProc to the ExeMgr process. Once results are returned from each PM, the final stage of aggregation is performed to produce the results. Window functions are applied at the ExeMgr level due to the need for sorted results which may ultimately come from different PM nodes.

In the MariaDB server, any order by and select function results are applied before returning results to the client application.

Query Optimization

The ColumnStore optimizer makes use of table statistics including table size and extent map information to optimize query execution.

If a query involves joins table statistics are used to predict which table will have the largest results and make this the driving table. Both table size and the extent min max values are used in this calculation. The other tables will be queried by the ExeMgr and the results passed down to the PM nodes for hash joining with the driving table.

Where clause filters are examined in conjunction with the extent map minimum and maximum values to determine which extents for the column even need to be scanned, drastically reducing the number of extents that must be read. This tends to work particularly well for sorted data such as dates of when an event happened.

Table projection involves first executing any column filters and then projecting the minimum set of other columns necessary to satisfy the query. Column projection is an efficient operation due to the use of fixed length datatypes.

Query Scale Out

A ColumnStore deployment with multiple PM nodes will provide scale out query performance. ColumnStore will automatically distribute data evenly across the nodes. This ensures that each PM node is responsible for a smaller portion of the total data set. Having multiple PM nodes allows for applying more CPU, network and disk I/O to provide for:
- Reduction in query time, e.g. going from 2 nodes to 4 nodes will result in results coming back in half the time.
- Maintaining query response time as your data set grows.
If a time dimension or column is utilized and the data loaded in order or near order, extent map elimination allows for dramatic reduction in disk I/O when data is filtered by that column.

Example Query

To better illustrate query execution within MariaDB ColumnStore, consider the following query which produces a report of total order sum by market segment for nation 24 (US) in calendar Q4 of 2016:

select c.c_mktsegment cust_mkt_segment,

sum(o.o_totalprice) total_order_amount

from orders o join customer c on o.o_custkey = c.c_custkey

where c.c_nationkey = 24

and o.o_orderDATE >= ‘2016-10-01’

and o.o_orderDATE < ‘2017-01-01’

group by total_order_amount

order by total_order_amount;

+——————+——————–+

| cust_mkt_segment | total_order_amount |

+——————+——————–+

| AUTOMOBILE       |       352911555.24 |

| BUILDING         |       338742302.27 |

| FURNITURE        |       342992395.48 |

| HOUSEHOLD        |       339851076.28 |

| MACHINERY        |       353259581.80 |

+——————+——————–+

5 rows in set, 1 warning (0.19 sec)

This query will be executed via the following steps:
1. The customer table is identified using table statistics as being the smaller / dimension table and filtered first. Scan the customer table across pm nodes on the c_nationkey, c_mktsegment and c_custkey columns. Filter the c_nationkey column to rows matching the value 24 (US companies) and project c_custkey (the customer identifier) and c_mktsegment (the customer market segment).
2. Scan the larger / fact orders table across pm nodes on the o_custkey, o_orderDATE, and o_totalprice columns. Filter the o_orderDATE column to rows with a range of Q4 1997. If there are more than 8M rows in the orders table the system will utilize extent min and max values to eliminate reading those extents that are outside of the range.
3. A hash of c_custkey is built from the customer table results and then a distributed hash join is executed against the order table results in each PM server to produce the set of joined columns between customer and order.
4. A distributed grouping and aggregation is then executed in each PM server to produce the group by results. The results are accumulated in the UM server which also combines any overlapping results coming from different PM servers.
5. Finally the results are sorted by the total_order_amountcolumn in the UM server to produce the final result.
Utility tools are provided within MariaDB ColumnStore to help understand the query optimization process. For further details see the MariaDB ColumnStore knowledge base section on performance : https://staging-mdb.com/kb/en/mariadb/columnstore-performance-tuning/.
February 10, 2017
Monitoring MariaDB with the All New Monyog v7.0

Attend ‘Become better at monitoring MySQL & MariaDB using Monyog’ on Thursday, Feb 09, 2017 to learn more about the new features!

MariaDB is an open-source enterprise database with one of the most active, fastest-growing communities in the world, with MariaDB Enterprise adding the security, high availability, and scalability features required for mission-critical applications – and the management and monitoring tools expected.

In this blog, we’ll focus on monitoring. Monyog, included in MariaDB Enterprise, helps DBAs monitor everything from server status to query performance, all while receiving proactive warnings and alerts. Monyog 7.0 features an improved user interface, a customizable dashboard, and topological views, making it easier than ever to identify critical issues and uncover valuable performance insights.

Monitor the top 10 queries across all servers

Save time and resources with an extensive view of the state of MariaDB servers under one roof.

Customize the dashboard, and get enhanced chart performance

Choose to create and customize monitoring dashboards, and enable or disable MySQL and system charts, to best analyze your server performance.

Visualize the replication topology

Gain visibility into the replication hierarchy of servers along with the details of each replicated server to make sure the data is always up to date.

Take advantage of real-time monitoring

Optimize your MariaDB database performance by monitoring every query on your database in real time, so you can better manage resources.

In addition, here’s a quick overview of the MariaDB Monitoring tool, Monyog v7.0:

Want’s to know more, please contact us here.

January 20, 2017