Week 13 - State of DB
The state of the UNION
Congratulations! You are strong junior developers in 6 out of the top ten relational database management systems in 2023.
SELECT first_name, last_name
FROM customer
ORDER BY customerId;
This is how you query 6 out of the top ten databases: MySQL, PostgreSQL, SQLite, MS SQL Server, MariaDB, and Oracle.
The concepts you've been learning this semester - from how you SELECT data, to your DML statements, to built-in functions, table architectures, foreign keys, etc., etc., it's all pretty much the same in all these relational databases.
Relational databases hold the vast market share, but certain jobs are best done with NoSQL databases, of which Mongo Opens in a new window is the most popular.
In mongoDB, our query would look something like this:
db.customer.find(
{ },
{ first_name: 1, last_name: 1, _id: 0 })
.sort( { customerId: 1 } )
Are you familiar with JavaScript & JSON? You'll be able to figure out NoSQL databases pretty easily.
Let's take a moment to look at where the industry is at in 2023, and how things are changing, and then we'll look at the differences between the most popular databases.
Who uses what?
As of the 2022 StackOverflow Developer Survey, MySQL was the DBMS used by most developers1 Opens in a new window.
Well over half of all professional developers used SQL in 2022, behind only HTML, CSS and JavaScript2 Opens in a new window.
As of April 2023, the #1 most frequently discussed database management system is Oracle3 Opens in a new window.
Do keep in mind - databases are used for a lot of things besides websites!
Postgres Opens in a new window and SQLite Opens in a new window gained the most market share over the last five years3 Opens in a new window, and are now the #2 and #3 most-used DBMSs overall 1 Opens in a new window, respectively.
Note: I strongly encourage you to check out the entire Stack Overflow Developer survey Opens in a new window. It will give you a lot of insight into the community you're joining, including things like demographics, job turnover, gender equality, salary, etc.
NoSQL ("Not only SQL") databases tried to offer an alternative to relational database architecture by using key/value stores, documents or other "flat" architectures instead of tables. While this was a strong industry trend for a few years, in 2017 and 2018 adoption flattened or fell slightly.
The consensus seems to be that NoSQL is a good supplemental tool for vertical scalability and rapid iteration, and can offer very fast performance for simple GET/SET commands, but is not a replacement for relational databases.
For example, if you think your data structures will change while you're working on a project, and you're not sure how they will change, a document-based NoSQL database like Mongo might be a good choice.
If your primary concern is high-speed caching of data, a key-value store db like Redis Opens in a new window might be the best choice.
If you're dealing with huge chunks of inconsistently structured data, you might need a wide column store db like Cassandra Opens in a new window.
If you're dealing more with relationships than data, you might want a graph-based db like Neo4J Opens in a new window.
Of the developers in the the developer survey, 47% said they used MySQL, while 28% used the most popular NoSQL database - MongoDB.
You're most likely to encounter MongoDB when working with React Opens in a new window. A popular development stack is the "MERN" stack:
- Mongo (the database)
- Express Opens in a new window (the backend framework)
- React (the application framework)
- Node Opens in a new window (the runtime environment)
With the explosion of application frameworks over the last five years, however, you can't really expect to work on a single stack for any length of time (or even when working on a single application). To see what different dev teams use as their stack, check out Stackshare Opens in a new window.
Before we dive into the different individual Database Management Systems, I also wanted to mention that there's a new query language that's becoming popular:
GraphQL Opens in a new window is presented as an alternative to REST APIs. REST shares data (usually as JSON) by allowing you to make an HTTP GET request, and returning data.
GraphQL does that too, but unlike REST, it allows you to define the format and structure of the data that gets returned, which is pretty cool.
Here's a good article summarizing the topic Opens in a new window.
NoSQL Databases
MongoDB
Let's look at how data is stored in Mongo...
{
"_id" : 1,
"name" : {
"first" : "John",
"last" : "Backus"
},
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [
"Fortran",
"ALGOL",
"Backus-Naur Form",
"FP"
]
}
{
"_id" : ObjectId("51df07b094c6acd67e492f41"),
"name" : {
"first" : "John",
"last" : "McCarthy"
},
"birth" : ISODate("1927-09-04T04:00:00Z"),
"death" : ISODate("2011-12-24T05:00:00Z"),
"contribs" : [
"Lisp",
"Artificial Intelligence",
"ALGOL"
]
}
{
"_id" : 1,
"name" : {
"first" : "John",
"last" : "Backus"
},
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [
"Fortran",
"ALGOL",
"Backus-Naur Form",
"FP"
],
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{
"award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering"
}
]
}
Look familiar? It's JSON (well, technically it's BSON Opens in a new window - binary JSON)!
While concepts like table joins don't transfer to NoSQL, basic database CRUD operations remain. So instead of writing...
DROP TABLE people
we write...db.people.drop()
If you'd like to know more about how MongoDB compares to MySQL, Mongo has a (biased, but comprehensive) write‑up on their site Opens in a new window.
Redis
Redis is a pretty fascinating "in-memory data structure store", meant to be used for really impressive caching. It is a NoSQL database like MongoDB, with the major difference being that it exists entirely within RAM, storing nothing to disk. It definitely challenges the notions of what a database is - in some ways it's the opposite of ACID.
ACID compliance
One thing to note when comparing database management systems is ACID compliance.
What is that, and why should we care?
ACID is an acronym for what database nerds, as far back as 1983 Opens in a new window, have agreed are important standards for a database management system.
A stands for Atomicity, which means that if one statement in a transaction fails, the whole transaction fails. If you try to insert a hundred rows, and one of your inserts is missing a closing bracket, the whole script fails.
C stands for Consistency, meaning that any data that breaks a rule (think data types and other constraints) is not allowed.
I stands for Isolation. That means that even if transactions run at the same time, you get the same result as if you'd run them in order.
D stands for Durability, meaning that after you've committed to the db, the power can out and no data will be lost. That means writing to 'non-volatile' memory (i.e. something besides RAM).
Is MySQL ACID-compliant? Depends on who you ask, but, basically, it depends.
MySQL, depending on your configuration, may fall short with I. But does that matter for most web projects? Simply put: no.
MySQL has first-in-class documentation and community support, excellent speed, a comparatively shallow learning curve, forgiving syntax, and, well, it's free!
Postgres
ACID compliance bragging rights do go to Postgres, however. Postgres is the somewhat more aloof yet dependable sibling to MySQL, and its robust feature set and dependability have made it the fastest growing DBMS the past three years in a row.
It's got great support for JSON, impressive features like table partitioning, and cool data types like Money, Geometry, IP addresses, and data ranges.
SQLite
SQLite, as the name suggests, is a "light" alternative meant for prototyping, or for running on devices like mobile phones or personal computers, or inside browsers, rather than on a dedicated server. SQLite is for when you want a relational database, but don't want a heavy configuration process, and you don't mind having weak security, and not having distinct users.
SQLite is not a great tool for running a whole website, but it can be really handy for prototyping, testing, and single-user applications like desktop and mobile apps.
It's the reason we were able to start this semester off writing queries in the browser!
And that's the overview of the state of databases in 2023!
If you think you still want more information about how you might weigh database options, I'm a fan of this blogpost Opens in a new window.