SQL vs NoSQL
The decision between using a SQL (relational) or a NoSQL (non-relational) database (or both), has been a decision I have had to do a lot of research on recently. While there can be a certain amount of contention between the two camps, as I see it the decision mostly comes down to depending on the current use case. Both options can be a good fit depending on what you want to do with the data. So with that in mind this article is attempting to be a run-down of the likely advantages or pitfalls of both technologies.
Tables or Collections
The number one difference between relational and non-relational databases is the way the data is stored. Relational data is tabular by its very nature hence it is stored in tables with columns and rows, where relations can be set up between tables allowing them to cooperate in data storage and ease retrieval of the data. Non-relational data on the flip side is not meant to fit in tables or rows but is stored in collections which will be groupings of "structured data" like; key value mappings; JSON documents; or graphs depending on the database being used.
This is one of the key differentiators as to which database is the best fit for you. If you are looking to persist objects from your application storing them in a non-relational (object store) can be considerably easier. But if tabular data is more useful the relational is the way to go.
Strict or Dynamic/No Schema
Relational data will follow a strict schema of tables and columns to describe of the data should be stored. This encourages a get it right first time mentality, as it is more complicated to change the schema of an existing table after the fact. Although it does have the advantage that once data has been put in to the database, it will always be preservers in the same known format so the database will take some work away from the application.
In the non-relational world there is generally a dynamic or no schema this means that a collection can contain any type of data for example a JSON document store could contain multiple documents with completely different properties. This offers a far greater level of flexibility but does off load the work of ensuring the data is as expected to the application.
Normalization is largely a goal of relational databases, breaking up data in to the smallest possible tables (related) removing the need for repeated data. This can add to the complexity of the database but reduces the work required in the application when updating or deleting data. It can also lead to space savings on disk but this point is largely moot these days.
Non-relational databases tend to care less about normalization although it is not completely absent, this can mean more work from the application side to update or delete data as it may need to touch multiple points in the database to perform the work.
Scale Up or Scale Out
This is another of the major differentiators between relational and non-relational. The way they scale to increase performance for greater concurrency and response time. Relational databases typically will scale up. As in they will be in a single server and additional resources (CPU and RAM) will be allocated to the server to improve performance. Non-relational on the other hand is by nature distributed so is much more able to scale out and run across a number of clustered servers. While vertical scaling can achieve insanely high performance it is still out stripped by the scale out of non-relational.
SQL databases have been long renowned to maintain integrity through what’s called the ACID (Atomic | Consistence | Isolated | Durable) properties and most relational storage vendors honour ACID. The goal is to support isolated indivisible transactions whose changes are persisted and leave the data in a consistent state. NoSQL databases, on the other hand, make you choose between any two priorities per the CAP (Consistency | Availability | Partition Tolerance) theorem, since all three are difficult to achieve in a distributed node-based system.
Both relational and non-relational databases offer replication and the cost and quality of the replication can vary greatly depending on the choice of database within each camp. However generally non-relational databases are able to offer greater availability for a lower cost.
When making the decision between relational and non-relational databases it almost always comes down to the specific use case. relational databases can cover the majority of the work but if you find yourself in the edge cases where they cannot work for you then non-relational will likely fill those gaps.
Author: Dave Garner, SOC Consultant
Dave Garnar is an Automation Consultant for ServiceWorks at CDW. Working primarily using C# and PowerShell to automate Cloud Services. Dave has been with CDW since 2007 being brought in to CDW as part of the acquisition of Panacea Services. Over his time with CDW Dave has come to Specialize in Virtualization primarily VMware and has been awarded VMware vExpert status for the last two years. He runs his own blog http://blog.davotronic5000.co.uk and is active in the VMware and PowerShell communities as well as on twitter @davotronic5000.