Twenty Ways To Optimize MySQL for Faster Insert Rate
If you’re following my blog posts, you read that I had to develop my own database because MySQL insert speed was deteriorating over the 50GB mark. That’s why I tried to optimize for faster insert rate.
I was inserting at a rate of 50,000 inserts per second, but it grew worse, to the point of only 6,000 inserts per second, which is well below what I needed.
I was able to optimize the database, so the sustained insert rate was kept around the 100GB mark, but that’s it.
I decided to share the techniques I used for optimizations; it may help people who want a faster insert rate into MySQL.
Will all the methods improve your insert performance?
Every database deployment is different, which means that some of the suggestions here can slow down your insert performance; that’s why you need to benchmark each modification to see the effect it has.
Buy a fast server
‘The Cloud’ has been a hot topic for the past few years―with a couple clicks, you get a server, and with one click you delete it, a very powerful way to manage your infrastructure.
Besides the downside in costs, though, there’s also a downside in performance. Let’s take, for example, DigitalOcean, one of the leading VPS providers. For $40, you get a VPS that has 8GB of RAM, 4 Virtual CPUs, and 160GB SSD.
What is a virtual CPU
It’s important to know that virtual CPU is not the same as a real CPU; to understand the distinction, we need to know what a VPS is.
VPS is an isolated virtual environment that is allocated on a dedicated server running a particular software like Citrix or VMWare. It’s possible to allocate many VPSs on the same server, with each VPS isolated from the others.
So, as an example, a provider would use a computer with X amount of threads and memory and provisions a higher number of VPSs than what the server can accommodate if all VPSs would use a100% CPU all the time.
The reason is that the host knows that the VPSs will not use all the CPU at the same time.
Let’s assume each VPS uses the CPU only 50% of the time, which means the web hosting can allocate twice the number of CPUs. Therefore, it’s possible that all VPSs will use more than 50% at one time, which means the virtual CPU will be throttled.
Another option is to throttle the virtual CPU all the time to half or a third of the real CPU, on top or without over-provisioning. This will allow you to provision even more VPSs.
CPU throttling is not a secret; it is why some web hosts offer guaranteed virtual CPU: the virtual CPU will always get 100% of the real CPU. Needless to say, the cost is double the usual cost of VPS.
If I use a bare metal server at Hetzner (a good and cheap host), I’ll get either AMD Ryzen 5 3600 Hexa-Core (12 threads) or i7-6700 (8 threads), 64 GB of RAM, and two 512GB NVME SSDs (for the sake of simplicity, we’ll consider them as one, since you will most likely use the two drives in mirror raid for data protection)
As you can see, the dedicated server costs the same, but is at least four times as powerful.
There are drawbacks to take in consideration, however:
|Fast deployment||Slow deployment|
|Built-in backups||Manual backups|
|Easy to restore snapshots||No snapshots|
|Webhost responsible for hardware||You must debug any hardware issues|
One of the fastest ways to improve MySQL performance, in general, is to use bare-metal servers, which is a superb option as long as you can manage them.
Using SSD instead of magnetic drives
It’s 2020, and there’s no need to use magnetic drives; in all seriousness, don’t unless you don’t need a high-performance database.
A magnetic drive can do around 150 random access writes per second (IOPS), which will limit the number of possible inserts.
An SSD will have between 4,000-100,000 IOPS per second, depending on the model.
Using RAID 5/6
Raid 5 means having at least three hard drives―one drive is the parity, and the others are for the data, so each write will write just a part of the data to the drives and calculate the parity for the last drive.
The parity method allows restoring the RAID array if any drive crashes, even if it’s the parity drive.
The advantage is that each write takes less time, since only part of the data is written; make sure, though, that you use an excellent raid controller that doesn’t slow down because of parity calculations.
In addition, RAID 5 for MySQL will improve reading speed because it reads only a part of the data from each drive.
RAID 6 means there are at least two parity hard drives, and this allows for the creation of bigger arrays, for example, 8+2: Eight data and two parity.
On a personal note, I used ZFS, which should be highly reliable, I created Raid X, which is similar to raid 5, and I had a corrupt drive. I was so glad I used a raid and wanted to recover the array.
I got an error that wasn’t even in Google Search, and data was lost. Fortunately, it was test data, so it was nothing serious. But I dropped ZFS and will not use it again. The fact that I’m not going to use it doesn’t mean you shouldn’t.
Tweak MySQL memory settings
MySQL default settings are very modest, and the server will not use more than 1GB of RAM. The reason for that is that MySQL comes pre-configured to support web servers on VPS or modest servers.
The assumption is that the users aren’t tech-savvy, and if you need 50,000 inserts per second, you will know how to configure the MySQL server.
Some of the memory tweaks I used (and am still using on other scenarios):
From MySQL documentation:
The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data. The default value is 134217728 bytes (128MB).
Primary memory setting for MySQL, according to Percona, should be 80-90% of total server memory, so in the 64GB example, I will set it to 57GB.
Understand that this value is dynamic, which means it will grow to the maximum as needed.
The more memory available to MySQL means that there’s more space for cache and indexes, which reduces disk IO and improves speed.
Make sure you put a value higher than the amount of memory; by accident once, probably a finger slipped, and I put nine times the amount of free memory. The database was throwing random errors.
From MySQL documentation:
Typically, having multiple buffer pool instances is appropriate for systems that allocate multiple gigabytes to the InnoDB buffer pool, with each instance being one gigabyte or larger
This setting allows you to have multiple pools (the total size will still be the maximum specified in the previous section), so, for example, let’s say we have set this value to 10, and the innodb_buffer_pool_size is set to 50GB., MySQL will then allocate ten pools of 5GB.
Having multiple pools allows for better concurrency control and means that each pool is shared by fewer connections and incurs less locking.
Increasing the number of the pool is beneficial in case multiple connections perform heavy operations.
Other Memory settings
Just to clarify why I didn’t mention it, MySQL has more flags for memory settings, but they aren’t related to insert speed. They can affect insert performance if the database is used for reading other data while writing.
In that case, any read optimization will allow for more server resources for the inserts.
Tweak commit mechanism
MySQL is ACID compliant (Atomicity, Consistency, Isolation, Durability), which means it has to do certain things in a certain way that can slow down the database.
In some cases, you don’t want ACID and can remove part of it for better performance.
What is MySQL transaction
Part of ACID compliance is being able to do a transaction, which means running a set of operations together that either all succeed or all fail.
For example, let’s say we do ten inserts in one transaction, and one of the inserts fails. The database should “cancel” all the other inserts (this is called a rollback) as if none of our inserts (or any other modification) had occurred.
What is MySQL commit
A commit is when the database takes the transaction and makes it permanent.
The process of a transaction is:
- Start transaction
- Do some modifications
- Commit transaction
The way MySQL does commit: It has a transaction log, whereby every transaction goes to a log file and it’s committed only from that log file.
The transaction log is needed in case of a power outage or any kind of other failure. The database can then resume the transaction from the log file and not lose any data.
MySQL inserts with a transaction
After we do an insert, it goes to a transaction log, and from there it’s committed, which means that we have our data written two times, once to the transaction log and once to the actual table.
In specific scenarios where we care more about data integrity that’s a good thing, but if we upload from a file and can always re-upload in case something happened, we are losing speed.
Changing the commit mechanism
The flag innodb_flush_log_at_trx_commit controls the way transactions are flushed to the hard drive. There are three possible settings, each with its pros and cons.
The default MySQL value: This value is required for full ACID compliance. MySQL writes the transaction to a log file and flushes it to the disk on commit.
With this option, MySQL will write the transaction to the log file and will flush to the disk at a specific interval (once per second).
With this option, MySQL flushes the transaction to OS buffers, and from the buffers, it flushes to the disk at each interval that will be the fastest.
This flag allows you to change the commit timeout from one second to another value, and on some setups, changing this value will benefit performance. I believe it has to do with systems on Magnetic drives with many reads.
Using precalculated primary key for string
Let’s say we have a table of Hosts. Naturally, we will want to use the host as the primary key, which makes perfect sense.
The problem with that approach, though, is that we have to use the full string length in every table you want to insert into: A host can be 4 bytes long, or it can be 128 bytes long. Inserting the full-length string will, obviously, impact performance and storage.
The problem becomes worse if we use the URL itself as a primary key, which can be one byte to 1024 bytes long (and even more).
The solution is to use a hashed primary key. Instead of using the actual string value, use a hash. Remember that the hash storage size should be smaller than the average size of the string you want to use; otherwise, it doesn’t make sense, which means SHA1 or SHA256 is not a good choice.
Changing the flush method
The flag innodb_flush_method specifies how MySQL will flush the data, and the default is O_SYNC, which means all the data is also cached in the OS IO cache.
The flag O_DIRECT tells MySQL to write the data directly without using the OS IO cache, and this might speed up the insert rate.
Using file system compression
Some filesystems support compression (like ZFS), which means that storing MySQL data on compressed partitions may speed the insert rate. The reason is that if the data compresses well, there will be less data to write, which can speed up the insert rate.
Do you need that index?
Inserting to a table that has an index will degrade performance because MySQL has to calculate the index on every insert. In case there are multiple indexes, they will impact insert performance even more.
Check every index if it’s needed, and try to use as few as possible.
BTW, when I considered using custom solutions that promised consistent insert rate, they required me to have only a primary key without indexes, which was a no-go for me.
Dropping the index
In case you have one or more indexes on the table (Primary key is not considered an index for this advice), you have a bulk insert, and you know that no one will try to read the table you insert into, it may be better to drop all the indexes and add them once the insert is complete, which may be faster.
This solution is scenario dependent. If it’s possible to read from the table while inserting, this is not a viable solution.
MySQL supports table partitions, which means the table is split into X mini tables (the DBA controls X). The one big table is actually divided into many small ones.
Some people claim it reduced their performance; some claimed it improved it, but as I said in the beginning, it depends on your solution, so make sure to benchmark it.
Placing a table on a different drive
It’s possible to place a table on a different drive, whether you use multiple RAID 5/6 or simply standalone drives. Placing a table on a different drive means it doesn’t share the hard drive performance and bottlenecks with tables stored on the main drive.
Using bulk insert
If you have a bunch of data (for example when inserting from a file), you can insert the data one records at a time:
Insert into table values ()
This method is inherently slow; in one database, I had the wrong memory setting and had to export data using the flag –skip-extended-insert, which creates the dump file with a single insert per line.
Needless to say, the import was very slow, and after 24 hours it was still inserting, so I stopped it, did a regular export, and loaded the data, which was then using bulk inserts.
The alternative is to insert multiple rows:
Insert into table values (),(),()
The limitation is the value of –max_allowed_packet, which limits the maximum size of a single command.
You should experiment with the best number of values per command: I limited it at 400 rows per insert, but I didn’t see any improvement beyond that point.
Using replace into or insert ignore
Trying to insert a row with an existing primary key will cause an error, which requires you to perform a select before doing the actual insert. This will, however, slow down the insert further if you want to do a bulk insert.
MySQL Replace into
Replace the row into will overwrite in case the primary key already exists; this removes the need to do a select before insert.
MySQL Insert ignore
Insert ignore will not insert the row in case the primary key already exists; this removes the need to do a select before insert.
Inserting from multiple threads
In case the data you insert does not rely on previous data, it’s possible to insert the data from multiple threads, and this may allow for faster inserts.
Some things to watch for are deadlocks. In my case, one of the apps could crash because of a soft deadlock break, so I added a handler for that situation to retry and insert the data.
Since I used PHP to insert data into MySQL, I ran my application a number of times, as PHP support for multi-threading is not optimal.
Using load from file
Using load from file allows you to upload data from a formatted file. You simply specify which table to upload to and the data format, which is a CSV.
Using application-level cache
The data I inserted had many lookups. For example, if I inserted web links, I had a table for hosts and table for URL prefixes, which means the hosts could recur many times.
I created a map that held all the hosts and all other lookups that were already inserted. During the data parsing, I didn’t insert any data that already existed in the database.
Adjusting char/varchar collation
When working with strings, check each string to determine if you need it to be Unicode or ASCII.
Ascii character is one byte, so a 255 characters string will take 255 bytes.
Unicode is needed to support any language that is not English, and a Unicode char takes 2 bytes. Therefore, a Unicode string is double the size of a regular string, even if it’s in English.
Some collation uses utf8mb4, in which every character is 4 bytes, so, inserting collations that are 2 or 4 bytes per character will take longer.
In my case, URLs and hash primary keys are ASCII only, so I changed the collation accordingly.
Using MySQL cluster
MySQL supports clustering. This means the database is composed of multiple servers (each server is called a node), which allows for faster insert rate The downside, though, is that it’s harder to manage and costs more money.
I calculated that for my needs I’d have to pay between 10,000-30,000 dollars per month just for hosting of 10TB of data which will also support the insert speed I need.
Using a custom engine
MySQL uses InnoDB as the default engine. There are more engines on the market, for example, TokuDB. I don’t have experience with it, but it’s possible that it may allow for better insert performance.
Using a custom solution
Unfortunately, with all the optimizations I discussed, I had to create my own solution, a custom database tailored just for my needs, which can do 300,000 inserts per second without degradation.
I know there are several custom solutions besides MySQL, but I didn’t test any of them because I preferred to implement my own rather than use a 3rd party product with limited support.