Tag: Load balancing

  • Security Vulnerability CVE-2016-6664 / CVE-2016-5617

    During the fall there were a couple of vulnerabilities found that could be used for privilege escalations in conjunction with race conditions. These were:

    • CVE-2016-6662 MySQL Remote Root Code Execution / Privilege Escalation 0day
    • CVE-2016-6663 Privilege Escalation / Race Condition (also referred to as CVE-2016-5616)
    • CVE-2016-6664 Root Privilege Escalation (also referred to as CVE-2016-5617)

    I’ve published two blog posts about these vulnerabilities before:

    CVE-2016-6662 and CVE-2016-6663 have been fixed during the fall and versions of MariaDB has been released containing the fixes. As stated in the latter blog post the root privilege escalation vulnerability CVE-2016-6664 was not exploitable by itself. It will need to obtain shell access first through some other vulnerability. But a final fix was still needed to completely shut the door for this last related vulnerability.

    The CVE-2016-6664 vulnerability makes use of a weak point in the way the mysqld_safe script handled the creation of the error log file, through which root privileges could be obtained.

    Oracle made an attempt to fix this already in November, but the fix was unfortunately half-baked and made the vulnerability slightly less exploitable, but didn’t completely get rid of it. This and other issues in the mysqld_safe script were pointed out by Red Hat’s Security Team. Oracle has since then opened CVE-2017-3312 for the missing pieces of CVE-2016-6664 and fixed them.

    In MariaDB Server, we’ve now implemented our own fix for the vulnerability, which we believe completely removes the possibility to make use of it.

    CVE-2016-6664 is fixed as of the following versions of MariaDB Server:

    Please upgrade to these versions (or newer) to be protected against CVE-2016-6664. The latest versions can be download here.

    – – –

    In addition to CVE-2016-6664, fixes for the following CVEs affecting MySQL, mentioned in Oracle’s Critical Patch Update Advisory – January 2017 are included in the versions 5.5.54, 10.0.29 and 10.1.21 of MariaDB:

  • On Databases, Temporal Datatypes and Trains

    Introduction

    The data type aspect of databases is a key feature as is it when it comes to programming languages. I would guess that all programming languages, with the possible exception of assembly, provides a set of predefined “built in” datatypes. Some programming languages are limited in this respect, like Forth, where others have a larger set of types. A data type determines what data can be stored, what operations are allowed and semantics.

    One family of data types that is present in more or less all relational databases  (I don’t say all here as I know someone will tell me about an arcane relational database systems developed in Burundi where this is not true) is the temporal types, i.e. datatypes that hold a time value. This in difference to most programming languages where the native datatypes are numeric and strings, and all other types are extensions using some kind of structure style.

    So, databases have temporal datatypes, programming languages do not (OK, that is a generalization). In this blog post I will look at some aspects of temporal datatypes, and in a later blog post I will dig even deeper into this outrageously interesting subject.

    Temporal Datatypes in Databases

    Before we get into the aspect of trains, let’s spend some time with the temporal datatypes themselves. As already stated, the type of an item among other things determine the semantics of the type. Let’s start with a look at the temporal data types in MariaDB, they are DATETIME, TIMESTAMP, DATE, TIME and YEAR. If we for a second assume that you don’t know anything about how MariaDB looks at these, you might ask yourself what the difference is between TIMESTAMP and DATETIME, so let’s start with there.

    Both DATETIME and TIMESTAMP store a date and a time, but there the similarities end. And by the way, I’m not saying that we should change the behavior of these datatypes, just that they are sometimes a bit odd.

    The DATETIME datatype is more recent and is more in line with other relational databases, but on the other hand it takes up more space on disk. Both of these types also have microseconds support (since MariaDB 5.3). To enable this you add a precision, such as DATETIME(6). The reason a TIMESTAMP is more compact is that it can only hold a limited range of dates, from Jan 1 1970 up to 2038. Well, that should be enough for most purposes right? Yes, just as representing a year with just 2 digits and allowing for 640 K of RAM was “enough” a few years back! Those kinds of assumptions really made the world a better place for us all.

    As for the DATE, TIME and YEAR datatypes, I will skip them for and now focus on DATETIME and TIMESTAMP.

    One thing you do NOT want to use temporal types for

    I had a customer, many years ago, possible way back during the Reagan administration or so, that had an issue. They were using TIMESTAMP, using millisecond precision, in data in their OLTP systems as a PRIMARY KEY. They had determined that there would not be more than 1 transaction per microsecond, and that this would work. It didn’t. For the simple reasons that:

    • On average, there was a lot less than 1 transaction per microsecond, but during high load times, it could well be more than this.
    • Computers tend to get faster over time and the load of popular services also increase which means this scheme was bound to break faster the better it was.
    • This was a stupid assumption.

    Their solution was to have the transaction retry when they had a PRIMARY KEY violation, which was neither effective, nor performant or practical. Don’t do something even remotely similar to this!

    Other relational databases support for temporal data types

    Other relational databases also support temporal datatypes, and you might have sensed that I feel that the MariaDB temporal datatypes are a bit awkward. This is not so though, as all relation database temporal data types have quirks, to say the least, so it is appropriate to have a look at this too.

    Let’s begin with Oracle where there is support for DATE and TIMESTAMP. Oracle also supports INTERVAL types. As for Oracle DATE that is the oldest Oracle temporal datatypes, and it is rather odd in a few ways. One such oddity is that although the type is called DATE and when querying it, by default you get a proper date back, it actually stores the TIME also, up to seconds. Which means that two field that looks like they have the same value using the default format, the comparison might still fail. Odd, to say the least.

    As for SQL Server, things are messier still. Here a TIMESTAMP is actually a table attribute that works much like the way the first TIMESTAMP column in a MariaDB table works in that it keeps track of the last insert / update to the row. Then SQL Server has both DATETIME and a DATETIME2 datatypes, where the former has a limited date range. SQL Server also has a DATETIMEOFFSET which is pretty odd. I will not get into NULL handling with temporal data types in SQL Server and I will avoid giving you a headache. Also in SQL Server are DATE and TIME datatypes as well as a SMALLDATETIME, where the latter is a more compact DATETIME with no fractional seconds and again a limited range.

    I have not gotten into the issue of how relational databases treat temporal datatypes with incorrect data and NULL values. Note that “incorrect” data when it comes to temporal data is a fuzzy subject. Handling leap years is no big deal, but maybe you haven’t heard about leap seconds? They are there to compensate for the earth slowing down it’s rotation and leap seconds are added now and then to compensate for this and we have one such coming up by the end of this year, the last minute of 2016 will have 61 seconds, so that 2016-12-31 23:59:60 is actually a valid time, something not recognized by either MariaDB, Oracle, SQL Server or for that matter Linux (at least where I tested it) whci all report this as an invalid time specification. If someone asks you “how many seconds are there in a day” your answer should be, in true engineering fashion, “it depends” and if you write code assuming there are always 86400 (24 * 60 * 60) seconds in a day, you might be making a mistake, depending on the situation.

    Another situation is with Financial Services where domain-specific calendars are used in some cases, like a 360-day calendar where each year is considered to have 12 months of 30 days each. This is used for example with interest rate calculations, which is why every monthly mortgage payment of your house is the same amount, despite the fact that some months are shorter and other longer in your calendar (but not in the calendar used by your bank).

    SQL Standard Temporal Types

    In the SQL Standard, let’s assume SQL-99, there are three temporal data types: TIMESTAMP, DATE and TIME. The Oracle TIMESTAMP that was added in Oracle 9 are reasonably well in line with Standard SQL-99.

    The SQL Standard TIME and TIMESTAMP datatypes has a number of attributes, namely a precision, in terms of fractional seconds, and if TIME ZONES are used or not. Which brings us to the issue of Time Zones, are an issue they are, but it would probably be even worse without them.

    Time Zones and Trains

    I guess you are wondering what trains have to do with all this and with relational databases in particular? Well trains are nice and fun and the same goes for relational databases, right? Or maybe not. No, there is an aspect of temporal data that is related to trains. Let’s go back a few years in time to when trains were all new and hot, which is around year 1996. No wait, that was Netscape. We are talking about is the mid 1800s when trains caused people to travel a lot more and to travel much longer distances. There were no time zones though, so noon was different for every station on the railroad, which was when the sun was in zenith at that particular station. This meant that if you traveled for 1 hour you might end up 53 minutes away from your origin.

    In 1884, it was determined that we should all have one meridian (the one in Greenwich) and that we should have 24 time zones spread around the globe. And this would be a world standard and as such enforced, and you better follow this, or else…

    As we all know standards are universally accepted more or less always and they are also backward compatible. And this is why the SQL standard works so that all SQL databases can talk to each other, all SCSI connectors (remember SCSI? I do, but then I’m an old fart) fit any other SCSI connector and for a more modern example of a truly successful standard, all HDMI cables and connectors works with all other HDMI cables, connectors, screens, players and what have you not.

    This explains why the good intentions of having 24 time zones 1 hour apart around the globe isn’t really followed. In particular India and Australia screw things up with 15, 30 and 45 minute time zones.

    MariaDB Temporal Datatypes and Time Zones In Practice

    And what has this got to do with relational databases you ask? Well there is one difference between TIMESTAMP and DATETIME in MariaDB that we haven’t mentioned so far, and that is timezone support. The MariaDB TIMESTAMP data type supports time zones, which the DATETIME datatypes does not. What does this mean then? Well, it means that a TIMESTAMP data values is stored together with the timezone of the value. To explain what this means, let’s look at an example, first we need a table to insert some data into for our tests:

    MariaDB> CREATE TABLE temporaltest (timestamp_t timestamp,

      datetime_t datetime);

     

    Before moving on from this, let’s look at how MariaDB interprets this table definition:

    MariaDB> SHOW CREATE TABLE temporaltestG

    *************************** 1. row ***************************

           Table: temporaltest

    Create Table: CREATE TABLE `temporaltest` (

      `timestamp_t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

      `datetime_t` datetime DEFAULT NULL

    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

     

    As you can see, MariaDB adds a few things to the simple TIMESTAMP column, like a NOT NULL clause that we didn’t specify and a default value that we didn’t ask for. This is for backward compatibility to align more recent MariaDB TIMESTAMP column semantics with how a TIMESTAMP used to work way back during the Harding administration. Before we move on, let’s set the timezone of your server. The default for MariaDB is to use the timezone as defined by the host operating system, but in many cases that is not a good idea in production use. To be able to set the timezone with MariaDB, we first have to import timezone information into MariaDB, and that is done by running this from the command line:

    $ mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p

     

    This assumes that the zoneinfo file is at the default location, if you have installed MariaDB from a tarball it will be somewhere in that path. With that in place let’s set the timezone:

    MariaDB> SHOW VARIABLES LIKE ‘time_zone’;

    +—————+——–+

    | Variable_name | Value  |

    +—————+——–+

    | time_zone     | SYSTEM |

    +—————+——–+

    1 row in set (0.00 sec)

    MariaDB> SET time_zone = ‘PST8PDT’;

    Query OK, 0 rows affected (0.00 sec)

     

    So now we are running with the PST timezone instead of the one defined by the operating system. So far so good. Let’s then insert some data into the table we created above:

    MariaDB> INSERT INTO temporaltest VALUES(‘2015-03-08 02:05:00’, ‘2015-03-08 02:05:00’);

    MariaDB> SELECT * FROM temporaltest;

    +———————+———————+

    | timestamp_t         | datetime_t          |

    +———————+———————+

    | 2015-03-08 02:05:00 | 2015-03-08 02:05:00 |

    +———————+———————+

    1 row in set (0.00 sec)

     

    OK, that looks fine, right? Now, let’s assume that we move this server to the east coast and set the timezone as appropriate for that and select the above data again:

    MariaDB> SET time_zone = ‘EST’;

    Query OK, 0 rows affected (0.00 sec)

    MariaDB> SELECT * FROM temporaltest;

    +———————+———————+

    | timestamp_t         | datetime_t          |

    +———————+———————+

    | 2015-03-07 21:05:00 | 2015-03-08 02:05:00 |

    +———————+———————+

    1 row in set (0.00 sec)

     

    As you can see, with a different timezone we get data back adjusted to the timezone different for the TIMESTAMP column, but not for the DATETIME column. That does make a difference, right? If you run with MariaDB clients in different time zones, all those clients may well insert data using different time zones! If the server and the client are in different time zones, the data is converted to the timezone of the server and when retrieving data it is converted back to that of the client. Seems fair, right.

    Maybe this thing with time zones wasn’t such a bad and difficult thing after all? Yes, it can be handled, but that was before Germany was about to run out of electricity and in an attempt to fix that caused years of suffering to cows and programmers across the globe. That story will be told in the next part of this series of blogs though, so don’t touch that dial, I’ll be right back.

    Happy SQLing

    /Karlsson

  • Why Marko Mäkelä, Lead Developer InnoDB, Recently Joined MariaDB Corporation

    I recently joined MariaDB Corporation. You might not recognize my name, but you may have used some InnoDB features that I have worked on since I joined Innobase Oy as the first full-time employee in 2003.

    My first task was to reduce the overhead of the InnoDB table storage. I introduced ROW_FORMAT=COMPACT (and named the old format ROW_FORMAT=REDUNDANT) in MySQL 5.0.3. ROW_FORMAT=COMPRESSED and ROW_FORMAT=DYNAMIC were released as part of the InnoDB Plugin for MySQL 5.1.

    In the InnoDB Plugin, I also completed the ‘fast index creation’ feature. That along with the ROW_FORMAT changes and some BLOB bug fixes were among the major improvements that the InnoDB Plugin offered over the built-in InnoDB in MySQL 5.1. The InnoDB Plugin became the built-in InnoDB in MySQL 5.5.

    In MySQL 5.5, I transformed the InnoDB insert buffer into a change buffer (delete-mark and purge buffering) and introduced the first regression tests based on fault injection.

    In MySQL 5.6, I designed and implemented the InnoDB part of ALTER TABLE…ALGORITHM=INPLACE and LOCK=NONE, also known as the ‘online ALTER TABLE’. I also removed the famous limitation that the InnoDB redo log file size could not be changed.

    In MySQL 5.7 one of my most visible contributions probably is the InnoDB redo log format tagging, to prevent old servers from failing with obscure errors when starting up with new data files.

    Why did I join MariaDB? The short answer is that Monty called me and asked. The long answer is that having an academic background, I value open collaboration and the open exchange of ideas. At MariaDB it feels like being at home again, working with many of the colleagues from the 2003‒2005 era when both Innobase Oy and MySQL AB were small companies where each employee was able or forced to work on a wide range of  tasks.

    The acquisitions of the companies introduced policies and processes that restrict communication, gradually transforming the ‘bazaar’ into a ‘cathedral’. While ‘open source’ seems to have largely won over ‘closed source’ when it comes to information technology and communication infrastructure, I think that The Cathedral and the Bazaar are still with us. The new ‘cathedral’ is ‘closed development’ where source code is only released in snapshots synchronized with product releases. Significant parts of the development history are often hidden by squashing merges of long-lived development branches to a gigantic commit.

    MariaDB is the ‘bazaar’, encouraging feedback from end users in all stages of development.

    While a for-profit business cannot provide unpaid support to users, it sometimes makes sense to work with users to obtain test cases. For instance, MDEV-11233 (a server crash in CREATE FULLTEXT INDEX) had been reported by several users, but crucial details were missing. Finally, the same assertion failure message was posted on the #maria channel in the FreeNode IRC network. After some discussion, Waqar Khan was busy executing a binary search, running a series of SQL statements to reduce the problematic table from one million rows to a single record that triggered the problem. An hour or two later we had a minimal 3-statement test case for reproduce the problem.

    Another example is MDEV-6076 Persistent AUTO_INCREMENT for InnoDB tables, which I hope to be included in the upcoming MariaDB 10.2 release. Zhang Yuan at Alibaba looked at my code changes and pointed out a mistake before our internal code review had been completed.

    I am looking forward to interesting times with MariaDB.

  • IHME Believes Open Source MariaDB ColumnStore Is The Future of Data Warehousing

    Note: This is a guest post by Andrew Ernst, Assistant Director, Infrastructure at the Institute for Health Metrics and Evaluation. 

    In the early 1990s, the World Bank Commissioned an in-depth study to measure disability and death from a multitude of causes worldwide. Over the past few decades, this study has grown into an international consortium of more than 1,800 researchers from more than 120 countries, and its estimates are being updated annually.

    Today, the Global Burden of Disease report, managed by the Institute for Health Metrics and Evaluation (IHME), serves as the most comprehensive effort to systematically measure the world’s health problems. In fact, the tools can be used at the global, national, and local levels to understand health trends over time, just like gross domestic product data are used to monitor a country’s economic activity.

    The data is growing each year. In 2015, the Global Burden of Disease results were three times larger than any other year. As the Global Burden of Disease report continues to grow in size, focusing on more granular geographies, the data requirements also continue to scale exponentially.

    Solving Volume and Scale Challenges

    Over several decades, the size and scope of the Global Burden of Disease results have grown – today reaching multi-billion row tables. As it has grown, we’ve tried several storage engines which have failed miserably.

    The Global Burden results are developed through many internal processes and pipelines that rely on a MySQL-compliant infrastructure. The choice to adopt MySQL was made at a time when the scope of IHME’s work was much smaller, and today’s scale would have been incomprehensible. The ambitions of our researchers and abundance of available input data have pushed the boundaries of research and traditional database engines.

    IHME has leveraged the advanced Percona XtraDB Barracuda (InnoDB-compliant) storage engine for every database environment, and supports more than 90 database instances within their infrastructure.  The environments supporting those critical scientific computing pipelines run on high-end hardware with and are optimized for low-latency and to support servicing extreme multi-concurrency from their High Performance Compute cluster.

    Knowing that the existing solutions would not scale with the Institute’s growth, efforts have been underway to evaluate platforms that offer:

    • a MySQL-compatible interface
    • Are cost-effective
    • don’t require a vast amount of research application code to be re-written
    • Ideally offer a Open Source development effort with community-driven input and contributions.

    memSQL was one of the top contenders and performed extremely well at scale, but had a number of non-standard constructs for database design and implementation, and lacked definable security mechanisms for authentication and authorization with their non-commercial product.

    MySQL 5.7, while offering higher benchmark speeds for ingest and query optimization didn’t offer enough of a paradigm shift to make a huge impact in our workload.  We knew that any  platform selected would need to leverage multi-host sharding with a multi-threaded software. Our database team is small, and while building a sharding infrastructure is reasonably straightforward, realistically cannot ask the development staff to make the applications shard-aware.

    Then IHME evaluated MariaDB ColumnStore – which combined the power of big data analytics with the the ease of use of SQL. Leveraging MariaDB’s open source model, MariaDB offers high performance search queries on massive billion row data tables.

    With MariaDB ColumnStore, we were able to improve the performance of our multi-billion row tables. IHME found several benefits to using ColumnStore including:

    Higher performance: Compared to row-based storage, MariaDB ColumnStore column storage reduces disk I/O, making it much faster for read-intensive analytic workloads on large datasets.

    Better security: ColumnStore accesses all the same security capabilities delivered in MariaDB Server including encryption for data in motion, role-based access and audit features.

    Benefits of leveraging SQL: ColumnStore brings transactional and analytic workloads into a single enterprise grade system. It simplifies enterprise administration and execution with a standard SQL front end for OLTP and analytics.

    We also found MariaDB engineers to be incredibly responsive to IHME – we trust we can work with them for a very long time.

    Moving Forward

    Looking into the future, IHME has to design around a future of growing data that allows for regular updates as new data and epidemiological studies are made available.

    MariaDB’s ColumnStore storage engine solved both a volume and scale problem within our environment that allows us to seamlessly handle both current and planned increases in workload.

    When IHME released our  results in 2010, there were approximately 2 billion data points, and with the 2015 effort, that number has grown to just shy of 100 billion. The 2016 results are already suggesting we will far exceed 10 billion results per table in the next six months (each result set is about 9-10 tables of roughly the same size). Looking further into the future, IHME will be focusing on smaller geographical areas across the globe, and will need to support analytical workloads that include geospatial calculations.

    High performance, flexible data analytics using MariaDB ColumnStore doesn’t just make my day to day job easier. It will have a profound impact on how the global community can assess disease around the world.

    The pioneering effort of the IHME continues to be hailed as a major landmark in public health and an important foundation for policy formulation and priority setting.

  • A Look Inside MariaDB ColumnStore 1.0.6 GA

    Today, MariaDB ColumnStore has reached a major milestone – MariaDB ColumnStore 1.0 is now GA with the release of MariaDB ColumnStore 1.0.6 GA. The journey of MariaDB ColumnStore began in January 2016 when our team started building ColumnStore. The support from our early alpha and beta adopters and community users has helped us take MariaDB ColumnStore from the first alpha release to the GA today.

     

    MariaDB ColumnStore is a massively parallel, high-performance, distributed columnar storage engine built on MariaDB Server. It is the first columnar storage engine for big data analytics in the MariaDB ecosystem. It can be deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage.

    A Look Inside

    In MariaDB ColumnStore’s architecture, three components – a MariaDB SQL front end called User Module (UM), a distributed query engine called Performance Module (PM) and distributed data storage – work together to deliver high-performance, big data analytics.

    sSY3fOZ63DssEkjWboKANuw.png

    User Module (UM):
    The UM is made up of the front end MariaDB Server instance and a number of processes specific to MariaDB ColumnStore that handle concurrency scaling. The storage engine plugin for MariaDB ColumnStore hands over the query to one of these processes which then further break down SQL requests, distributing the various parts to one or more Performance Modules to process the query. Finally, the UM assembles all the query results from the various participating Performance Modules to form the complete query result set that is returned to the user.

    Performance Module (PM):
    The PM is responsible for storing, retrieving and managing data, processing block requests for query operations, and passing it back to the User Module(s) to finalize the query requests. The PM selects data from disk and caches it in a shared-nothing data cache that is part of the server on which the PM resides. MPP is accomplished by allowing the user to configure as many Performance Modules as they would like; each additional PM adds more cache to the overall database as well as more processing power.

    Distributed Data Storage:
    MariaDB ColumnStore is extremely flexible with respect to the storage system. When running on premise, it can use either local storage or shared storage (e.g., SAN) to store data. In the Amazon EC2 environment, it can use ephemeral or Elastic Block Store (EBS) volumes.

    MariaDB ColumnStore 1.0 Features

    Scale:

    Massively parallel architecture designed for big data scaling

    • Linear scalability as new nodes are added

    Easy horizontal scaling

    • Add new data nodes as your data grows
    • Continue read queries when adding new nodes

    Compression

    • Data compression designed to accelerate decompression rate, reducing disk I/O
    Performance:

    High-performance, real-time and ad-hoc analytics

    • Columnar optimized, massively parallel, distributed query processing on commodity servers

    High-speed data load and extract

    • Load data while continuing analytics queries
    • Fully parallel high-speed data load and extract
    Enterprise-Grade Analytics:

    Analytics

    • In-database distributed analytics with complex join, aggregation, window functions
    • Extensible UDF for custom analytics

    Cross-engine access

    • Use a single SQL interface for analytics and OLTP
    • Cross join tables between MariaDB and ColumnStore for full insight

    Security

    • MariaDB security features – SSL, role-based access and auditability
    • Out-of-the-box BI tool connectivity using ODBC/JDBC or standard MariaDB connectors
    Management and Availability:

    Easy to install, manage, maintain and use

    • Automatic horizontal partitioning
    • Online schema changes while read queries continue
    • No index, views or manual partition tuning needed for performance

    Deploy anywhere

    • On premise or on AWS
    • On premise using commodity servers

    High Availability

    • Automatic UM failover
    • Multi-PM distributed data attachment across all PMs in SAN and EBS environment for automatic PM failover

     

    The release notes for MariaDB ColumnStore 1.0.6, along with a list of bugs fixed, can be found here. Documentation is available in our Knowledge Base. Binaries for MariaDB 1.0.6 are available for download here. For developers wanting to do a quick install, Docker and Vagrant options are available. You can also find MariaDB-ColumnStore-1.0.6 AMI in the AWS marketplace.

     

    Reaching the GA could not have been possible without the valuable feedback we have received from the community and our beta customers. Thanks to everyone who contributed. Special acknowledgment also goes to the outstanding work by MariaDB ColumnStore Engineering team whose hard work and dedication has made this GA possible.

    The journey does not stop here. As the new year unfolds we will start looking at the content and begin planning for MariaDB ColumnStore 1.1. Based on what we have already learned from our beta users, we will be adding streaming and more manageability features in 1.1. If you have any ideas or suggestions that you would like to see in the next release, please create a request in JIRA. For questions or comments, you can reach me at [email protected] or tweet me @dipti_smg

     

  • Facebook MyRocks at MariaDB

    Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyRocks storage engine. Today I’m going to share a bit more on what that means for MariaDB users. Members of the Facebook Database Engineering team helped us answer some questions we think our community will have about MyRocks.

    Benefits of MariaDB Server’s Extensible Architecture
    Before discussing specifics of MyRocks, new readers may benefit from a description of MariaDB Server architecture, which is extensible at every layer including the storage layer. This means users and the community can add functionality to meet unique needs. Community contributions are one of MariaDB’s greatest advantages over other databases, and a big reason for us becoming the fastest growing open source database in the marketplace.

    Openness in the storage layer is especially important because being able to use the right storage engine for the right use case ensures better performance optimization. Both MySQL and MariaDB support InnoDB – a well known, general purpose storage engine. But InnoDB is not suited to every use case, so the MariaDB engineering team is extending support for additional storage engines, including Facebook’s MyRocks for workloads requiring greater compression and IO efficiency, and MariaDB ColumnStore (currently in beta), which will provide faster time-to-insight with Massively Parallel Execution (MPP).

    Facebook MyRocks for MariaDB
    When searching for a storage engine that could give greater performance for web scale type applications, MyRocks was an obvious choice because of its superior handling of data compression and IO efficiency. Besides that, its LSM architecture allows for very efficient data ingestion, like read-free replication slaves, or fast bulk data loading.

    As we add support for new storage engines, many of our current users may ask, “What happens to MariaDB’s support for InnoDB? Do I have to migrate?” Of course not! We have no plans to abandon InnoDB. InnoDB is a proven storage engine and we expect it to continue to be used by MariaDB users. But we do expect that deployments that need highest possible efficiency will opt for MyRocks because of its performance gains and IO efficiency. Over time, as MyRocks matures we expect it will become appropriate for even more use cases.

    The first MariaDB version of MyRocks will be available in a release candidate of MariaDB Server 10.2 coming this winter. Our goal is for MyRocks to work with all MariaDB features, but some of them, like optimistic parallel replication, may not work in the first release. MariaDB is an open source project that follows the “release often, release early” approach, so our goal is to first make a release that meets core requirements, and then add support for special cases in subsequent releases.

    Now let’s move onto my discussion with Facebook’s Database Engineering team!

    Can you tell us a bit about the history of RocksDB at Facebook?

    In 2012, we started to build an embedded storage engine optimized for flash-based SSD, by forking LevelDB. The fork became RocksDB, which was open-sourced on November 2013 [1] . After RocksDB proved to be an effective persistent key-value store for SSD, we enhanced RocksDB for other platforms. We improved its performance on DRAM in 2014 and on hard drives in 2015, two platforms with production use cases now.

    Over the past few years, we’ve introduced numerous features and improvements. To name a few, we built compaction filter and merge operator in 2013, backup and column families in 2014, transactions and bulk loading in 2015, and persistent cache in 2016. See the list of features that are not in LevelDB: https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB .

    Early RocksDB adopters at Facebook such as the distributed key-value store ZippyDB [2], Laser [2] and Dragon [3] went into production in early 2013. Since then, many more new or existing services at Facebook started to use RocksDB every year. Now RocksDB is used in a number of services across multiple hardware platforms at Facebook.

    [1] https://code.facebook.com/posts/666746063357648/under-the-hood-building-and-open-sourcing-rocksdb/ and http://rocksdb.blogspot.com/2013/11/the-history-of-rocksdb.html
    [2] https://research.facebook.com/publications/realtime-data-processing-at-facebook/
    [3] https://code.facebook.com/posts/1737605303120405/dragon-a-distributed-graph-query-engine/

    Why did FB go down the RocksDB path for MySQL?

    MySQL is a popular storage solution at Facebook because we have a great team dedicated to running MySQL at scale that provides a high quality of service. The MySQL tiers store many petabytes of data that have been compressed with InnoDB table compression. We are always looking for ways to improve compression and the LSM algorithm used by RocksDB has several advantages over the B-Tree used by InnoDB. This led us to MyRocks: RocksDB is a key-value storage engine. MyRocks implements that MySQL storage engine API to make RocksDB work with MySQL and provide SQL functionality. Our initial goal was to get 2x more compression from MyRocks than from compressed InnoDB without affecting read performance. We exceeded our goal. In addition to getting 2x better compression, we also got much lower write rates to storage, faster database loads, and better performance.

    Lower write rates enable the use of lower endurance flash, and faster loads simplify the migration from MySQL on InnoDB to MySQL on RocksDB. While we don’t expect better performance for all workloads, the way in which we operate the database tier for the initial MyRocks deployment favors RocksDB more than InnoDB. Finally, there are features unique to an LSM that we expect to support in the future, including the merge operator and compaction filters. MyRocks can be helpful to the MySQL community because of efficiency and innovation.

    We considered multiple write-optimized database engines. We chose RocksDB because it has excellent performance and efficiency and because we work directly with the team. The MyRocks effort has benefited greatly from being able to collaborate on a daily basis with the RocksDB team. We appreciate that the RocksDB team treats us like a very important customer. They move fast to make RocksDB better for MyRocks.

    How was MyRocks developed?

    MyRocks is developed by engineers from several locations across the globe. The team had the privilege to work with Sergey Petrunia right from the beginning, and he is based in Russia. At Facebook’s Menlo Park campus, Siying Dong leads RocksDB development and Yoshinori Matsunobu leads the collaboration with MySQL infrastructure and data performance teams. From the Seattle office, Herman Lee worked on the initial validation of MyRocks that gave the team the confidence to proceed with MyRocks for our user databases as well as led the MyRocks feature development. In Oregon, Mark Callaghan has been benchmarking all aspects of MyRocks and RocksDB, which has helped developers prioritize performance improvements and feature work. Since the rollout began, the entire database engineering team has been helping to make MyRocks successful by developing high-confidence testing, improving MySQL rollout speed, and addressing other issues. At the same time, the MySQL infrastructure and data performance teams worked to adapt our automation around MyRocks.

    What gave Facebook the confidence to move to MyRocks in production?

    Much of our early testing with the new storage engine was running the Linkbench benchmark used to simulate Facebook’s social graph workload. While these results were promising, we could not rely completely on them to make a decision. In order for MyRocks to be compelling for our infrastructure, MyRocks needed to reduce space and write rates by 50% compared with InnoDB on production workloads.

    Once we supported enough features in MyRocks, we created a MyRocks test replica from a large production InnoDB server. We built a tool to duplicate the read and write traffic from the production InnoDB server to the MyRocks test replica. Compared with compressed InnoDB, we confirmed that MyRocks used half the space and reduced the storage write rate by more than half while providing similar response times for read and write operations.

    We ran tests where we consolidated two InnoDB production servers onto a single MyRocks server and showed that our hardware can handle the double workload. This was the final result we needed to show that MyRocks is capable of reducing our server requirements by half and gave us the confidence that we should switch from InnoDB to MyRocks.

    What approach did Facebook take for deploying MyRocks in production?

    Moving to a new storage engine for MySQL comes with some risk and requires extensive testing and careful planning. Starting the RocksDB deployment with our user databases that store the social graph data may seem counterintuitive. However, the team chose to go this route because of two mutually reinforcing reasons:

    1. Based on benchmark and production experiments, the efficiency gains were significant enough and proportional to the scale of the deployment
    2. The workload on our user database tier is relatively simple, well known, and something our engineering team could easily reason about as most of it comes from our TAO Cache.

    The benefits we expect as well as further details on the MyRocks project can be found in Yoshinori’s post.

    Both MyRocks and MariaDB are open source projects that are made stronger with community involvement. How will it help MyRocks when MariaDB releases a supported version? How would you like to see the community get more involved?

    We expect MyRocks to get better faster when it is used beyond Facebook. But for that to happen it needs to be in a distribution like MariaDB Server that has great documentation, expert support, a community, and many power users. The community brings more skills, more use cases, and more energy to the MyRocks effort. We look forward to getting bug reports when MyRocks doesn’t perform as expected, feature requests that we might have missed, and pull requests for bug fixes and new features.

    I am most excited about attending conference talks about MyRocks presented by people who don’t work at Facebook. While I think it is a great storage engine, the real test is whether other people find it useful — and hopefully useful enough that they want to talk and write about it.

     

  • Leveraging MariaDB in NEOs Distributed Storage Architecture

    The following is a guest blog post from Sven Franck, marketing manager at Nexedi, one of the largest OSS publishers in Europe with most solutions based on prior research projects. Headquarters are located in Lille, France with Nexedi offices serving clients around the world in Germany, Japan and China.

    In this blog post, we’ll show why we chose MariaDB as a core component of the NEO system architecture. NEO is our distributed, redundant and transactional storage system, designed to be an alternative to ZEO and FileStorage.

    The volume of data we’re seeing is growing across the board, with IoT, big data and machine learning becoming more relevant, and projects such as Wendelin gaining industry traction. Because our services often range from network infrastructure to business applications, we’re increasingly putting greater emphasis on efficient data handling, which is making accessibility, availability, distribution and performance a high-priority.

    NEO (short for Nexedi Enterprise Object database) was developed to replace all implementations of ZEO (Zope Enterprise Object database) being used at Nexedi as well as to switch to an architecture that could be scaled automatically, while also providing persistence and resiliency. During development, SQL and MariaDB proved to be a good choice because they met our immediate needs while also being flexible enough to reshape tables as the design of NEO grew. This was key because frequent schema changes in early project phases were expected. MariaDB allowed us to focus on specific features of NEO, while not having to call the low-level InnoDB BTree implementation directly, which reduced development time considerably.

    Leveraging the Power of MariaDB

    NEO uses MariaDB to store low level object, transaction and location data, and makes these accessible via the “medium” ZOPE layer, which our main software solutions, likeERP5, are running on. Using MariaDB, we can overcome the shortcomings of ZODBs traditional way of maintaining indices, and are now able to easily scale our application architectures automatically and depending on need. MariaDB also provides well-tested durability, which is especially important for the more advanced features of NEO, such as storing data in non-chronological order when lagging nodes are catching up or a node’s workload is reduced by moving data to another node. Finally, since most of our implementations are running long-term, MariaDB also provides the needed stability in terms of source code availability – it’s a proven open-source embeddable solution.

    As part of NEO, MariaDB was included in all latest ERP5 implementations as well as our internal system and test infrastructure (30 users, 70 test nodes), currently handling indices of about 97GB (TokuDB) or 133GB (InnoDB backup), which without NEO compression would equate to a FileStorage equivalent between 250GB and 350GB.

    Scaling MariaDB into the Future

    For the past couple of years, we’ve been increasing our efforts to contribute back to MariaDB, because we believe there should be open-source industry-grade software solutions available, and in terms of database technologies, we consider MariaDB an essential solution. We are proud of what we have accomplished using MariaDB and will continue our support, which includes working on joint research projects and helping to improve the code of MariaDB itself.

    Additional Resources