How do I choose a data storage solution in 2020?

Choosing a data storage solution in 2020 is complicated. There are a lot of factors to take into consideration, not just the state of your business now, but you need too have an idea on how it will look a few years down the road.

The way you handle data storage in your application is pivotal in providing a positive user experience. Not just should you store data in a way that is optimized for your application, but the data should be secure so that unauthorized intruders do not get their hands on it.

A datastore is a software solution where you store and organize all the data you collect through your app, while a database management system (DBMS) is software for managing the data stores.

Your data strategy

The important thing is to understand the needs of your application, from the structure and size of your data and the read/write speeds you need. It is very valuable to your application to determine a data strategy that matches your business before considering different solutions. Understanding and modelling the data that you are going to store is equally important for defining a data strategy and for evaluating data storage solutions.

Applications have to deal with data in variety of formats, so selecting the right database includes picking the storage that fits the format. If you select the wrong data structure for persisting your data, your application will require more workarounds and may not scale well as a result.

SQL or NoSQL data storage?

When it comes to choosing a solution to persistently store data, one of the biggest challenges is picking between a SQL (relational) and a NoSQL (non-relational) data structure. While both have good performance, there are certain key differences you must keep in mind.

SQL databases

A relational database is ideal for storing structured data (zip codes, credit card numbers, dates, ID numbers). SQL is a mature technology, they’re well-documented, boast great support, and work well with most modern frameworks and libraries. The most well known examples of SQL databases are PostgreSQL, MySQL and Microsoft SQL Server.

Recommended providers

  • MySQL is a free-to-use, open-source database that facilitates effective management of databases. It is a stable, reliable and powerful solution with advanced features.
  • PostgreSQL is similar to MySQL, but you have to be able to customize it properly. It is a very stable database, in contrast to MySQL. It is also considered to be the best database engine for large data. And this can be a deciding factor for you when choosing.
  • SQL Server is a popular Relational Database Management System (RDBMS) developed by Microsoft, with enterprise grade performance and security.

NoSQL databases

NoSQL databases, also called non-relational or distributed databases, serve as an alternative to relational databases. They can store and process unstructured data (data from social media, photos, MP3 files, etc.), offering developers more flexibility and greater scalability.

NoSQL databases are very flexible, and allow applications for data to be changed on the fly without affecting existing data. But they are less mature than SQL databases, and the NoSQL community isn’t as well defined.

Recommended providers

  • MongoDB is an open source, lightweight schema-less database with a lot of valueable features: Flexible document schemas, code-native data access, change-friendly design, powerful querying and analytics and easy horizontal scale-out.
  • Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads.
  • Azure CosmosCB is designed to allow customers to elastically (and independently) scale throughput and storage across any number of geographical regions. Azure Cosmos DB is the first globally distributed database service in the market today to offer comprehensive service level agreements encompassing throughput, latency, availability, and consistency.

Managed or self hosted?

If you have a cloud hosted application you can choose between fully managed services or self hosting your datastorage solutions. At first glance, it seems much more convenient to have all underlying services managed by a service provider, but with this approach, you risk losing full control over your application. On the other hand, greater control equals greater responsibility.

On premise vs cloud hosted

Where you host your datastorage solutions is a key question in 2020. In recent years, a growing number of solution providers have looked to migrate their applications to their cloud hosting platforms. The likes of Amazon Web Services, Microsoft Azure and Google Cloud Platform have all experienced huge growth and offer a wide array of solutions.

Scalability

Perhaps one of the biggest advantages of a cloud-based solutions is the ability to scale your datastore and add resources automatically to meed spikes in your application traffic. Cloud providers will be able to use their platform tools to help you cope with growth, whereas on-premise solutions you may need to undergo a lengthy procurement process to secure the same capabilities.

Cost

While cloud providers seem to be a lot cheaper up-front, you need to consider the total costs of ownership rather than comparing it to the expenses associated with on-premise hosting. However, even if the costs turn out to be broadly similar overall, companies may be able to benefit from the cloud-based model, where they only have to deal with a single, ongoing operational expense.

Security

86% of companies percieve cloud based database storage as insecure and fear that their data can be easily comprimized. But there has yet to be a significant data breach that can be traced back to the cloud provider itself.

Compliency

Making sure that you comply with data protection laws in your region is more important than ever. With the likes of GDPR in Europe promising huge fines for any business that fails to take care of its sensitive data, the costs of overlooking compliance are high.

You may think that hosting your data in your own datacenter makes it easier to comply with this new reality, because you control the whole data storage process. But in the end, it could become very costly to build a secure and compliant environment for your data from scratch.

Cloud hosting providers however benefit from their huge scale and can provide industry leading data protection. Most of the recognoze the importance of data security and being fully compliant to the rules and regulations in the reagions they operate.

Disaster recovery

An effective backup and disaster recovery (DR) plan is another essential part of any database strategy. If you're looking at cloud options, prospective providers should be able to offer a detailed explanation of what systems they have in place for this and what your rights are should their systems suffer a failure. This should all also be laid out as part of your service level agreement.

Speed & reliability

All good cloud providers will guarantee significant uptime promises - usually somewhere in the region of 99.9%. But this still leaves them with some leeway for downtime (approx. 10 minutes a week) that might be unacceptable for your business applications, so you need to take that into consideration.

Latency is also a factor if you are hosting the database away from your main applications, there will be some added latency between datacenters.

Recommended providers

Microsoft Azure

Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement. Azure SQL Database is always running on the latest stable version of the SQL Server database engine and patched OS with 99.99% availability.

With Azure SQL Database, you can create a highly available and high-performance data storage layer for the applications and solutions in Azure. SQL Database can be the right choice for a variety of modern cloud applications.

Amazon Web Services

Choose from 15 purpose-built database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. AWS’s portfolio of purpose-built databases supports diverse data models and allows you to build use case driven, highly scalable, distributed applications. By picking the best database to solve a specific problem or a group of problems, you can break away from restrictive one-size-fits-all monolithic databases and focus on building applications to meet the needs of your business.

With AWS databases, you don’t need to worry about database management tasks such as server provisioning, patching, setup, configuration, backups, or recovery. You can scale your database's compute and storage resources easily, often with no downtime. Because purpose-built databases are optimized for the data model you need, your applications can scale and perform better at 1/10 the cost versus commercial databases.

AWS databases are built for business-critical, enterprise workloads, offering high availability, reliability, and security. These databases support multi-region, multi-master replication, and provide full oversight of your data with multiple levels of security.

Google Cloud Platform

For over 15 years, Google has been building one of the fastest, most powerful, and highest-quality cloud infrastructures on the planet. Internally, Google uses this infrastructure for several high-traffic and global-scale services, including Gmail, Maps, YouTube, and Search. Because of the size and scale of these services, Google has put a lot of work into optimizing its infrastructure and creating a suite of tools and services to manage it effectively. Google Cloud puts this infrastructure and these management resources at your fingertips.

Google provides a wide array of of datastorage solutions, both SQL and NoSQL applications:

  • Cloud SQL
  • Cloud Firestore
  • Cloud Bigtable
  • Cloud Memorystore
  • Firebase Realtime Database

What solution do I choose then?

This is not easy, and the basic answer is it depends, based on your application. But with the knowledge you gained by reading this short article (and many more floating around the Internet), there are a few steps you can go through to be better prepared to make this choice for your application:

  • You need to understand your data models, the amount of data you need to store/retrieve, and the speed/scaling requirements.
  • Model your data to determine if a sql or no-sql database fits your model. Keep in mind you can mix and match solutions if you need to.
  • During the modeling process, consider things such as the ratio of reads-to-writes, along with the throughput you will require to satisfy reads and writes.
  • If you are using multiple datastore solutions, use primary database to store and retrieve canonical data, with one or more additional databases to support additional features such as searching, data pipeline processing, and caching.
If you liked this article and would like to get in contact with me, you can either send me an email or call me on my mobile.

I feel lucky to be working with these companies :)