SEO-Explorer

SEO Explorer Blog:
Try us:

Why would I want to build a database?

If you know a thing or two about databases, the mere thought of writing a database should set off alarm bells, and if you had asked me 3 years ago I would have probably told you to drop it – why write a database? With so many good free open source databases on the market, you would have to be crazy.

I promise you that by the end of this article, not only will you agree with me, but you’ll also want to learn more about how I built the database.

Let’s start at the beginning

In early 2018 I started working on my SEO analytics SAAS (the one you are visiting right now) and I wanted to manage the vast amounts of backlink data, so I did some research to find the best database for my needs.

Should I use SQL or NoSQL? I discovered that there’s a lot of hype and buzz in the industry, but when you dig deeper into the technical texts you discover that what you need doesn’t fit into the latest shiny technology.

So I decided to stay with regular SQL and read about which is best: MySQL or PostgreSQL, both very good open source databases. Going with a paid software solution was never an option.

Database limits

I started reading about the limitations of each database and I came across a number of posts that claimed that after 200 GB of data, both databases’ performance starts to degrade. It was hard for me to believe this because both of these databases have been around for over 20 years, and 200 GB is a relatively small amount of data so degradation at this level was just unimaginable.

Hitting the wall

I decided to go with MySQL because of my previous knowledge of it. I designed my schema, and started inserting the data.

It went well – I was able to insert an entire file in 15 seconds. Each file contained information about 50,000 pages including the title, and outlinks for each page.

It takes numerous inserts and selects to insert data for one page, and I was able to reach 50,000 inserts/second, WOW.

Everything looked great, BUT as the database grew, its performance got worse. When it hit 100 GB each file took 5 minutes to create and the performance dropped to 6,000 inserts/second.

Denial

I researched and tried every possible database optimization I could find, and I did end up finding some solutions that allowed me to push up the point of performance degradation by 50 GB, not really anything to celebrate.

Knowing that my database would end up being 10 Terabytes, this was not a solution.

Hope with PostgreSQL?

Maybe I was wrong to choose MySQL? I decided that I would read about PostgreSQL before using it, and perhaps waste a bit of time doing so. What I was able to understand was that the behavior that causes the problems in MySQL is the same with PostgreSQL.

I was convinced that some solution to this problem exists. I did find an addon (in beta) that requires installing many open source framework to work. This addon allows for linear-time inserts into the database, but it places many restrictions on the data you need to insert, which meant I couldn’t use it, and besides – I’m hesitant to use beta hacks.

So.. What’s the solution?

Obviously there must be a solution because there are services that provide SEO backlink data. So how do they do it?

The solution: a “Database cluster”. So I researched this to figure out which one I could use knowing that I can get a single node to work with a maximum of 200GB.

I looked into various cloud providers, including Amazon, Digital Ocean and Google. To deploy such a cluster with a total capacity of 10TB would cost 10-30 thousand dollars per month. Let me repeat that number in a different format: 10,000-30,000 dollars per month.

WOW, that’s a huge amount for a company that has no MVP and no clients. It was at this point that I understood how people “burn” their VC money.

Someone else would raise capital and use it to pay, on average, 20k$/Month for a database. Unless that someone would generate sales fast, the money would run out.

Friends to the rescue

I talked to two friends, and each one of them gave me their own perspective. The first one manages Petabytes of data, the other is an expert entrepreneur. They both came to the conclusion (which I did as well, but preferred not to go that route) that I should develop my own file-based database.

I knew that such a database would take at least a year to develop and test – a whole year that I would be out of the market, so I decided to wait with it.

Final attempt

In My final attempt with MySQL, I was able to squeeze a bit more, but it wouldn’t solve my issue, so at this point I understood that I had no other choice but to develop my own database (or raise capital which wasn’t something I planned on doing).

In retrospect, the fact that I spent 3 months tweaking the existing database solutions in every possible way gave me the advantage of knowing that I had no choice but to go ahead with my own. There was no going back and trying again with the other solutions, a thought that may have been tempting when I got to the hard parts.

The database

In My final attempt with MySQL, I was able to squeeze a bit more, but it wouldn’t solve my issue, so at this point I understood that I had no other choice but to develop my own database (or raise capital which wasn’t something I planned on doing).

In retrospect, the fact that I spent 3 months tweaking the existing database solutions in every possible way gave me the advantage of knowing that I had no choice but to go ahead with my own. There was no going back and trying again with the other solutions, a thought that may have been tempting when I got to the hard parts.g posts I will go a bit deeper into how the database works, and why I’m not selling the database as a standalone solution.

The takeaway

I think that anyone who plans to store massive amount of URL backlinks, should read this post, I’m not advocating to do what I did and write a database, I’m advocating to know the prices upfront, and not be surprised later, after many dollars wasted. Even more when there are SEO APIs available for a fraction of the cost.

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Share via
Copy link
Powered by Social Snap