Announcing PhotonIQ: The New AI CDN For Accelerating Apps, APIs, Websites and Services

Advantages and disadvantages of NoSQL

Chapter 2 of Distributed Data

Databases come in many shapes and sizes, but there are two primary flavors to choose from: the traditionally used relational model (which stores structured data in related tables) and the relatively recent non-relational model (which stores semi-structured data in files). If you’ve been looking for a database model that allows storing unrelated data, NoSQL just might be your ticket. This chapter will focus on the pros and cons of using NoSQL so you can decide whether it is the right fit for your organization.

When building applications or opting for Macrometa's ready-to-go industry solutions, it is important to have a deep understanding of the underlying technological possibilities. This knowledge serves a dual purpose: assisting you in making informed decisions when selecting technologies and empowering you to effectively scale and accommodate new business use cases once the technology is implemented. Databases play a pivotal role in extracting value from data, propelling your business forward and enabling strategic growth.

Before diving in, let’s get some basics out of the way. Short for "Not Only SQL," NoSQL is a database management system for managing large volumes of distributed data. Typically, these systems avoid enforcing any fixed schema over the data being managed. NoSQL was introduced as an alternative to traditional SQL databases to solve several key challenges; speed, scalability, and availability.

The advent of big data found tech giants like Amazon, Google, and Facebook struggling to manage and store unprecedented amounts of data. They were forced to ask and then address some profound questions:

  • How can we match our customers' speed when storing, retrieving, or reorganizing data?
  • How can we reduce the time spent maintaining massive databases?
  • How can we lower the cost of hardware?
  • How can we leverage clusters of cheap, commodity hardware on the cloud to replace individual, leviathan database servers?

NoSQL databases were the industry's answer to many of these questions. If your organization has been asking the same questions, NoSQL may be worth considering.

Summary of key concepts

Traditional database models use a relational approach to storing data. Also sometimes referred to as "SQL databases" — for Structured Query Language, the programming language used to work with them — these models excel at storing data as efficiently as possible. They are called “relational” because data gets stored in tables that have relationships between them.

SQL-based databases have many advantages but also some drawbacks. They require a lot of resources to ensure compliance with Atomicity, Consistency, Isolation, and Durability (ACID) principles. They are not built to distribute tables across multiple nodes, so scaling is another challenge. The only way to scale SQL-based databases is by adding more processing power to the server (vertical scaling). As mentioned, relational queries use SQL, a standard, safe and versatile language that has been around for 40+ years and allows for complex joins. However, this language requires a good understanding of how data is structured.

NoSQL was purpose-built to bypass many of these limitations. It leaves the task of enforcing referential integrity to developers, using a less restrictive consistency model called BASE, which stands for Basically Available Soft-state Eventually-consistent. This model does not enforce any schema on data. NoSQL leverages cluster management software and distributes data across multiple smaller machines. New machines can be added to the cluster as needed, allowing the databases to grow via horizontal scaling. NoSQL lacks any standard query language and relies on easy-to-use, database-specific APIs.

The following table summarizes the differences between SQL and NoSQL.

SQLNoSQL
AvailabilitySingle point of failureRedundant and highly available
ConsistencyStrongly consistentEventually consistent
ScalingScales verticallyScales horizontally
SchemaSchema is fixedSchema is flexible
ShardingManually codedBuilt-in
LanguageUses SQLUses custom APIs
Transaction ModelACID modelTypically uses BASE model, although some now offer ACID as well

Pros of NoSQL

Now that we have some understanding of how NoSQL databases differ from SQL-based systems, let’s dive a bit deeper into the Pros of NoSQL.

Horizontally Scalable and Cheap

While SQL data is stored as rows grouped inside tables, NoSQL data is stored as individual, self-contained units inside various structures like documents, dictionaries, columns, and graphs. This allows NoSQL data to reside on multiple machines. As data quantities increase, NoSQL databases can grow by adding more smaller devices (horizontal scaling), unlike their SQL counterparts, which require expensive hardware upgrades (vertical scaling). There is only a slight overhead to adding more devices to NoSQL databases because they use tools like Zookeeper to manage all the nodes. This makes the NoSQL database well suited for big-data scenarios like storing continuous data from IoT sensors.

Distributed data diagram

Vertical vs. Horizontal scaling

High Performance: Millions of Transactions per Second

NoSQL databases aren’t saddled with the responsibility of ensuring data integrity, which means they don’t have to validate the data's integrity when it enters the system. This is especially useful in scenarios like logging, where write speed is a key metric, and data integrity checks are not required.

Flexibility to Change Data Types at Any Time

NoSQL databases can ingest unstructured and semi-structured data at scale, unlike their SQL counterparts, which define data types for every column before data can enter the system using constraints. (Note that some SQL database providers like Postgres are now introducing JSON datatype to allow the same flexibility.)

In the example relational database below, any records added to the employee_login_history table must have an employee_id in the employee table; otherwise, the system will prevent the row from entering the employee_login_history table. Not having similar checks in NoSQL means developers can save a lot of time not having to define data types and relationships for every single column. It also means that if the structure of incoming data changes, it can go into the same database without developers getting involved.

CREATE TABLE [dbo].[employee]
	(
	[employee_id] [int] IDENTITY(1,1) NOT NULL,
	[first_nm] [nvarchar](50) NOT NULL,
	[last_nm] [nvarchar](50) NULL,
	[start_dt] [date] NOT NULL,
	[termination_dt] [date] NULL,
	CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED 
		(
		[employee_id] ASC
		)	
	) ON [PRIMARY]
GO
CREATE TABLE [dbo].[employee_login_history]
	(
	[login_history_id] [int] IDENTITY(1,1) NOT NULL,
	[employee_id] [int] NOT NULL,
	[login_dt] [date] NOT NULL,
	CONSTRAINT [PK_employee_login_history] PRIMARY KEY CLUSTERED 
		(
		[login_history_id] ASC
		)
	) ON [PRIMARY]
GO
Database table example

Relational table relationship example

Availability and Redundancy: Zero Downtime

NoSQL databases replicate data in a near-real-time manner across their nodes. Each node can potentially have multiple replicas, which can take over if the primary node fails and start fielding read requests from client applications. Furthermore, nodes and replicas can be spread across diverse geographic locations, ensuring 100 percent uptime in the event of a server crash, power loss to one or more nodes, natural disaster, etc.

Ease of Use Via APIs

NoSQL provides access to stored data objects via object-based APIs, as opposed to SQL, which is a mathematical, set-based language. Many enterprise developers lack the experience to develop complex logic in SQL — they prefer to call an API and let it figure out how to fetch the data. This makes it easier and faster to scale.

No Set Schema Provides Support for Semi-Structured and Unstructured Data

NoSQL is schemaless, which means databases can store data of any type. This even means support for types of data not invented yet, providing a level of future-proofing that SQL-based databases just can't offer. This also means no data truncation (SQL-based systems require that you specify the length of every data point).

Automatic Data Expiration

NoSQL databases allow enterprises to define the age of any data entering the system. The ability to define an age limit for data is made possible by the BASE property of NoSQL databases, whereas implementing the same feature in a SQL-based system would require coding. This feature may come in handy when collecting log data in databases, where we may not be interested in something that happened after more than a specific amount of time.

Specialized Data Type Support: Key-Value, Document, Graph, and Wide Columns

NoSQL databases can store data in various ways: they can be document databases and store data in XML, JSON, or BSON format, whichever is closer to the data objects used by your application. The typical use case for document databases includes trading platforms, mobile app development, and e-commerce platforms. Examples of document databases include MongoDB, CouchDB, and Azure Cosmos DB. A JSON document might look like this:

{
"employeeId":"123",
"first_nm":"Romin",
"last_nm":"Irani",
"start_dt":"06/14/2022",
"termination_dt":"06/15/2023",
“login_history”:
                “Logins”:
                        [
                            {“login_dt”:”06/14/2022”},
                            {“login_dt”:”06/15/2022”},
                            {“login_dt”:”06/16/2022”},
                            {“login_dt”:”06/17/2022”}
                        ]
}

Data can be stored in the form of key-value stores, retaining every data element as an attribute name and its value. These databases are ideal when developing shopping carts and storing user preferences. Here’s an example of five pairs:

{
"employee_Id":"123",
"first_nm":"Romin",
"last_nm":"Irani",
"start_dt":"06/14/2022",
"termination_dt":"06/15/2023"
}

If your use case requires a focus on how data elements may be indirectly related, such as applications for fraud detection or social networks, graph databases can store data as “nodes” (data element) and “links” (connections between data elements). In contrast, relational databases are great with direct relationships but terribly slow and complex for uncovering indirect ones.

Finally, NoSQL databases can be of the Wide-column type. Wide-column NoSQL databases store data tables as sections or columns instead of data rows, which is handy for scenarios requiring fast analytics calculations at the column level.

Cons of NoSQL

Fast, scalable, and highly available, NoSQL databases have so many benefits! That said, there are some limitations to consider before selecting them for a given project.

Possibly Too Purpose-Specific

SQL databases can do much more than just store data: They can ensure data integrity and can be used to calculate analytics, pull data using complex joins, and quickly and efficiently retrieve large quantities of data. They are more portable and come with well defined and understood standards.

NoSQL databases are not general in this way. They are purpose-built to do one thing well, meaning you will have to dedicate your NoSQL database for one use case only and would have to create a second database for a different purpose. If you are planning to use your database for various use cases, NoSQL is less likely to be the right candidate.

Smaller Community

SQL databases have been around for about half a century, which means that there is a decent talent pool: People with deep SQL experience are relatively easy to find. That is not the case with NoSQL databases, which have only been around for about a decade. Furthermore, NoSQL databases generally have specific APIs, and fewer people have deep experience with any specific API. Finding people that can run your NoSQL setup can be a challenge.

Relatively Large

In terms of storage utilization, NoSQL databases are fast but relatively inefficient compared to SQL-based systems. NoSQL databases are redundant by design because they can create multiple copies of data internally. Schema metadata gets stored in every document, which also increases space usage. You can see that in both of the following documents, the key names are repeated, which takes up more space.

{
"employeeId":"123",
"first_nm":"Romin",
"last_nm":"Irani",
"start_dt":"06/14/2022",
"termination_dt":"06/15/2023"
}

{
"employeeId":"124",
"first_nm":"Sam",
"last_nm":"Farmer",
"start_dt":"03/14/2019",
"termination_dt":"06/15/2020"
}

NoSQL databases generally require more disk space than their SQL counterparts. In a SQL table, tables and columns get defined just once. The same data above in SQL would be stored as below; notice how the column name only appears once:

Database table example

Of course, this example has just two rows; the difference becomes more magnified in real databases with lots of data.

Newer and Less Mature

SQL databases have had more than half a century of adoption and improvement, and the technology is quite mature. NoSQL databases have only been around for a fraction of the time of SQL-based systems, which means NoSQL database offerings aren’t mature enough yet, and stability is sometimes an issue.

No Universal Query Language and No Joins

Relational database systems rely on Structured Query Language (SQL), which has been around for half a century. The language is mature and well understood by a large cadre of software professionals.

Joins give us the ability to pull information from multiple related tables. SQL allows for joins, which means queries are flexible and can fetch data in various ways to extract detailed meaning from raw data.

Suppose we have the same employee and employee_login_history tables as described above, and we wish to find out how many times every employee in the system has logged in during the last 30, 60, and 90 days in separate columns. We can find that out using a rather complex but well understood query like this:

SELECT e.employee_id, 
sum(case when elh.login_dt between getdate()-30 and getdate() then 1 else 0 end) as '30_day_login_count',
sum(case when login_dt between getdate()-60 and getdate() then 1 else 0 end) as '60_day_login_count', 
sum(case when login_dt between getdate()-90 and getdate() then 1 else 0 end) as '90_day_login_count'
FROM employee e 
INNER JOIN  employee_login_history elh
      On e.employee_id = elh.employee_id
GROUP BY e.employee_id

A sample output from an SQL based system might then look something like this…

Database table example

NoSQL APIs are relatively emerging, generally not very flexible, and do not allow the flexibility joins can offer. Furthermore, they are not as prevalent, and the talent pool of professionals comfortable enough to work with these APIs is not as deep.

While NoSQL databases provide a fast way to store and retrieve data, a lot of the work of number crunching is left mostly to developer tools. Developers can make API calls like below to fetch the data into memory first, then use tools like Python to do the number crunching in memory and calculate results. Relating information inside the NoSQL databases is hard and generally discouraged, which is a big disadvantage.

Here’s what a call to an API might look like:

@app.route('/employee/<id>')
def get_one_employee(id: str):
    employee = Employee.objects(id=id).first()
    return jsonify(employee), 200

Potential Data Retrieval Inconsistencies

SQL and NoSQL models differ widely when it comes to reading rapidly changing data.

SQL uses isolation levels; users can define whether they are OK reading stale data or must have the latest data (while accepting that the queries will be delayed). They do this via something called isolation levels.

NoSQL model has no such controls — users may see data in a stale state until it gets updated on all nodes. NoSQL offers an “eventually consistent” view of data, meaning some queries will return old data, and users either have to accept this or write code to ensure consistency.

In the following code example, the NOLOCK hint allows an SQL query to read inconsistent data because the situation requires it. We would use this if we want the query to run while the employee_login_history table was being updated:

SELECT e.employee_id, count(*) 
FROM employee e 
INNER JOIN  employee_login_history elh (NOLOCK)
      On e.employee_id = elh.employee_id
GROUP BY e.employee_id

If we want to only see data when it is consistent, we can force it to happen by not using the NOLOCK hint like so:

SELECT e.employee_id, count(*) 
FROM employee e 
INNER JOIN  employee_login_history elh
      On e.employee_id = elh.employee_id
GROUP BY e.employee_id

NoSQL databases do not offer such flexibility. Users just have to be OK with seeing inconsistent data at times and handle it outside the API.

Lack of Data Integrity Safeguards

Relational database systems offer better data integrity at the cost of speed. For example, using our sample tables, we can specify a foreign key relationship between the two tables:

ALTER TABLE [dbo].[employee_login_history]  
WITH CHECK ADD  CONSTRAINT [FK_employee_login_history_employee] 
FOREIGN KEY([employee_id])
REFERENCES [dbo].[employee] ([employee_id])
GO

This means any data we add to the employee_login_history table must only be for employees whose ID already exists in the employee table; otherwise, the INSERT statement will fail. The database will check for this automatically, so there’s no need for the developer to write extra logic to ensure this check.

NoSQL databases have no mechanisms to prevent bad data from entering the system. If users wish to build in this functionality, they must write custom logic, which can get tedious.

Each NoSQL Implementation Has Its Own Syntax

Relational and NoSQL databases both have a variety of vendors. However, the learning curve for switching relational database vendors is not all that steep because all vendors offer a flavor of SQL, a language that is widely understood and adopted.

On the other hand, NoSQL databases rely on APIs developed and customized by the vendors. These APIs can vary greatly from vendor to vendor, making it difficult to switch vendors once users are locked in.

Conclusion

With databases forming the core of most applications, it’s essential to choose the model that fits your requirements like a glove. Selecting the right model depends not just on speed and accuracy requirements but on your development team’s skillset as well.

NoSQL databases are a great fit when you’re dealing with unstructured or semi-structured data, such as IoT sensor data, where logging data quickly is more important than waiting for confirmation that the data is good. It’s also great when dealing with huge volumes of data in applications where speed is more important than accuracy. NoSQL is a good fit when you have developers comfortable working with NoSQL APIs, and you are trying to minimize downtime when servicing clients.

SQL databases are fine when speed is not a requirement — think about a banking application, where we’d rather wait a bit than show an incorrect account balance! It’s also well-suited to situations where developers are comfortable working with SQL and when you are trying to use the same database to perform a variety of tasks, including complex joins to extract meaning from your data.

Like the Article?

Subscribe to our LinkedIn Newsletter to receive more educational content.

Chapters

Platform

PhotonIQ
Join the Newsletter