Lesson 3 of 6

Non-Relational Databases — Thinking in Documents

Estimated time: 2–2.5 hours

What You Will Learn

Understand what NoSQL means and why it exists alongside traditional relational databases
Learn how document databases like MongoDB store data as flexible, JSON-like documents instead of rigid tables
See how a real-world data model looks different in SQL versus MongoDB — and why that matters
Discover key-value stores like Redis and why they are used for caching, sessions, and real-time data
Get a brief overview of other NoSQL types: column-family, graph, and search databases
Build a practical decision framework for choosing between SQL and NoSQL in real projects

In Lesson 18, you learned how relational databases organize data into neat rows and columns — like a well-structured spreadsheet. You learned about tables, primary keys, foreign keys, and how SQL lets you ask powerful questions about your data. That approach works beautifully for a lot of the data in the world. But not all of it.

Think about the data that describes you as a person. You have a name and an email address — those fit nicely into columns. But you also have a list of skills that could be any length. You have work experiences, each with its own job title, company name, start date, and description. You might have five social media profiles or none at all. You have a list of hobbies that your friend does not have, and your friend speaks three languages while you speak one.

Trying to squeeze all of that into a rigid table with fixed columns gets awkward fast. You end up creating extra tables, writing complex JOIN queries, and dealing with a lot of empty columns for data that some people have and others do not. It works, but it can feel like trying to organize a messy closet using only identical shoeboxes.

What if you could just describe each person as they are — with all their unique, nested, variable data — and store that whole description as a single unit? That is exactly the idea behind non-relational databases, and that is what this lesson is about.

1. What is NoSQL?

NoSQL stands for "Not Only SQL." It does not mean "no SQL at all" or "SQL is bad." It simply means there are other ways to store and organize data beyond the traditional table-and-row model. NoSQL databases come in several flavors, and each one is designed to solve a specific kind of problem particularly well.

NoSQL = Different Tradeoffs, Not Better or Worse

Relational databases and NoSQL databases are not competitors — they are different tools for different jobs. A hammer is not better than a screwdriver. It depends on whether you are working with nails or screws. Many real-world applications use both types of databases at the same time, each handling the kind of data it is best suited for.

Here is the core difference. In a relational database (like MySQL, PostgreSQL, or SQLite), you must define your data structure before you store any data. You create a table, specify exactly what columns it has and what types of data each column holds, and every row must follow that exact structure. This is called a schema, and it is strict. If you decide later that users need a "middle_name" column, you have to alter the entire table.

In most NoSQL databases, the structure is flexible. Each record can have its own shape. One user document might have a "middle_name" field and another might not. One product might have a "color" field while another has a "size" field. The database does not care. It stores whatever you give it.

This flexibility is both the greatest strength and the greatest risk of NoSQL. It makes development faster because you do not have to plan every detail of your schema in advance. But it also means your application code has to be more careful about handling data that might look different from one record to the next.

The major categories of NoSQL databases are:

Document databases (MongoDB, CouchDB) — store data as JSON-like documents
Key-value stores (Redis, DynamoDB) — store data as simple key-value pairs, like a dictionary
Column-family stores (Cassandra, HBase) — store data in columns instead of rows, optimized for massive scale
Graph databases (Neo4j, Amazon Neptune) — store data as nodes and relationships, optimized for connected data

We will spend the most time on document databases and key-value stores because they are by far the most common in everyday web development. But we will also give you a taste of the other types so you know they exist and when they shine.

2. Document Databases — MongoDB

The most popular type of NoSQL database is the document database, and the most popular document database is MongoDB. Instead of storing data in tables with rows and columns, MongoDB stores data as documents. Each document is a self-contained unit of data that looks a lot like JSON — the same format you have already seen in JavaScript and in API responses.

Note If you have worked through the earlier lessons in this track, you already know JSON from working with APIs and configuration files. MongoDB documents use a format called BSON (Binary JSON), which is essentially JSON with a few extra data types. For our purposes, you can think of MongoDB documents as JSON.

Let us start with a simple example. Imagine you are building a community platform for Lansing-area coders. You need to store information about each user. In a relational database, you would create a users table with columns for name, email, and so on. In MongoDB, each user is stored as a document in a collection (which is roughly equivalent to a table).

Here is what a simple user document looks like:

{
  "_id": "64f1a2b3c8e4d5f6a7b8c9d0",
  "name": "Marcus Johnson",
  "email": "marcus@example.com",
  "joinedDate": "2024-03-15",
  "role": "member"
}

So far, that looks a lot like a single row in a SQL table — just written in a different format. The _id field is like a primary key. MongoDB generates it automatically. Nothing too exciting yet.

But here is where documents get interesting. Unlike a SQL row, a document can contain nested objects and arrays. This means you can store complex, hierarchical data inside a single record. Let us look at a more realistic user document:

{
  "_id": "64f1a2b3c8e4d5f6a7b8c9d0",
  "name": "Marcus Johnson",
  "email": "marcus@example.com",
  "joinedDate": "2024-03-15",
  "role": "member",
  "skills": ["JavaScript", "HTML", "CSS", "React", "Node.js"],
  "address": {
    "city": "Lansing",
    "state": "MI",
    "zip": "48912"
  },
  "experience": [
    {
      "title": "Junior Web Developer",
      "company": "Lansing Web Co.",
      "startDate": "2024-06-01",
      "endDate": null,
      "current": true,
      "description": "Building responsive websites for local businesses"
    },
    {
      "title": "Intern",
      "company": "TechStart Michigan",
      "startDate": "2024-01-15",
      "endDate": "2024-05-30",
      "current": false,
      "description": "Assisted with front-end development and testing"
    }
  ],
  "socialLinks": {
    "github": "https://github.com/marcusjohnson",
    "linkedin": "https://linkedin.com/in/marcusjohnson"
  }
}

Look at how much information is packed into that single document. Marcus has an array of skills (which could be any length), a nested address object, an array of experience objects (each with its own fields), and a social links object. All of this lives together in one place. When your application needs to display Marcus's profile page, it makes one query to the database and gets everything it needs.

The SQL Comparison: Three Tables and JOINs

Now let us think about how you would store this same data in a relational database. Remember from Lesson 18 that each table can only have simple values in its columns — you cannot put an array or a nested object inside a single cell. So you would need to split this data across multiple tables:

Table 1: users

id	name	email	joined_date	role	city	state	zip	github	linkedin
1	Marcus Johnson	marcus@example.com	2024-03-15	member	Lansing	MI	48912	https://github.com/marcusjohnson	https://linkedin.com/in/marcusjohnson

Table 2: user_skills

id	user_id	skill
1	1	JavaScript
2	1	HTML
3	1	CSS
4	1	React
5	1	Node.js

Table 3: user_experience

id	user_id	title	company	start_date	end_date	current	description
1	1	Junior Web Developer	Lansing Web Co.	2024-06-01	NULL	true	Building responsive websites for local businesses
2	1	Intern	TechStart Michigan	2024-01-15	2024-05-30	false	Assisted with front-end development and testing

To get all of Marcus's data, your SQL query would need to JOIN all three tables together:

SELECT u.*, s.skill, e.title, e.company, e.start_date, e.end_date
FROM users u
LEFT JOIN user_skills s ON u.id = s.user_id
LEFT JOIN user_experience e ON u.id = e.user_id
WHERE u.id = 1;

That is three tables, two JOINs, and a query that returns multiple rows (one for each combination of skill and experience) that your application code then has to reassemble into a single user object. It works perfectly fine, but it is more complex than the MongoDB approach for this particular use case.

Tip Neither approach is "wrong." The SQL approach gives you strong data integrity and powerful querying across relationships. The MongoDB approach gives you simpler reads and more natural data modeling for nested, variable data. The best choice depends on your specific situation.

Advantages of Document Databases

Flexible schema: Different documents in the same collection can have different fields. You do not need to plan every column in advance. If you need to add a "phone_number" field to some users, just start including it — no table migration required.
Fast reads for complex objects: Because all related data lives in one document, reading a user profile is a single lookup. No JOINs, no assembling data from multiple tables.
Natural code mapping: If your application uses JavaScript (or any language that works with JSON), your data looks almost identical in the database and in your code. There is no translation layer between "database rows" and "application objects."
Horizontal scaling: Document databases are designed to spread data across multiple servers easily. This is called sharding, and it is how companies handle massive amounts of data.

Disadvantages of Document Databases

Data duplication: In a relational database, you store a piece of data once and reference it from other tables using foreign keys. In a document database, you often end up copying the same data into multiple documents. If a company name changes, you might need to update it in thousands of user documents instead of one row in a companies table.
Consistency challenges: Because data can be duplicated, it is possible for copies to get out of sync. Relational databases use transactions and foreign keys to prevent this. Document databases leave more of that responsibility to your application code.
Harder cross-document queries: Asking "which users work at the same company?" is easy with a SQL JOIN. In MongoDB, it is more complex because each user's experience data is buried inside their own document. Cross-document queries require special operations called aggregation pipelines.
No strict schema enforcement: The flexibility that makes documents easy to work with can also lead to messy, inconsistent data if your team is not disciplined about data structures.

When Documents Shine

Document databases work best when your data is naturally hierarchical (like a user profile with nested details), when different records can have different shapes (like products in a catalog where a shirt has "size" and a laptop has "RAM"), and when your application usually reads and writes entire objects at once rather than individual fields across many records.

3. Key-Value Stores — Redis

If document databases are like flexible filing cabinets, then key-value stores are like a dictionary or a phonebook. You look something up by its name (the key), and you get back its definition (the value). That is it. No tables, no documents, no schema — just keys and values.

The most popular key-value store is Redis, and it is everywhere. If you have ever used a website that loads instantly, displays real-time data, or remembers that you are logged in, there is a good chance Redis is working behind the scenes.

Here is the mental model. Imagine a massive dictionary where every entry has a unique name and a value:

"session:abc123"       →  "{ userId: 42, role: 'admin', loginTime: '2024-09-15T10:30:00' }"
"cache:user:42"        →  "{ name: 'Marcus Johnson', email: 'marcus@example.com' }"
"leaderboard:weekly"   →  "[{ name: 'Alice', score: 2850 }, { name: 'Bob', score: 2340 }]"
"rate-limit:192.168.1" →  "47"

You give Redis a key, and it gives you back the value. You can set a key, get a key, delete a key, and set a key to automatically expire after a certain amount of time. That simplicity is what makes Redis incredibly fast.

Note Redis stores all of its data in memory (RAM), not on disk. That is what makes it so fast — reading from RAM is roughly 100,000 times faster than reading from a hard drive. The tradeoff is that RAM is limited and expensive. Redis is not meant to be your primary database for all data. It is a specialized tool for data that needs to be accessed extremely quickly.

Common Uses for Redis

Session storage: When you log into a website, the server creates a session — a small record that says "this person is logged in and here is who they are." Storing sessions in Redis means the server can verify your login status in microseconds instead of milliseconds. At scale, that difference matters enormously.

Caching: Imagine your application needs to display a user's profile, which requires querying the database, joining three tables, and formatting the result. That might take 200 milliseconds. If you store the result in Redis after the first query, every subsequent request for that same profile takes less than 1 millisecond. This is called caching — keeping a copy of expensive-to-compute data in a fast location.

Leaderboards and counters: Redis has built-in support for sorted sets, which makes it perfect for leaderboards, view counters, like counts, and any data that needs to be incremented or ranked in real time.

Rate limiting: If you want to prevent a user from making more than 100 API requests per minute, you can use Redis to count their requests. Set a key with their IP address, increment it with each request, and set it to expire after 60 seconds. Simple and fast.

Why Redis Is So Fast: O(1) Lookup

In computer science, O(1) means "constant time" — the operation takes the same amount of time regardless of how much data you have. Whether Redis has 100 keys or 100 million keys, looking up a single key takes roughly the same amount of time. This is because Redis uses a data structure called a hash table (the same structure behind JavaScript objects and Python dictionaries). Combined with storing everything in RAM, this makes Redis one of the fastest data stores in existence.

Redis in Practice

Redis commands are refreshingly simple. Here are the most common ones:

SET user:42:name "Marcus Johnson"     -- Store a value
GET user:42:name                      -- Retrieve it: "Marcus Johnson"
DEL user:42:name                      -- Delete it

SET session:abc123 "{ userId: 42 }"   -- Store session data
EXPIRE session:abc123 3600            -- Auto-delete after 1 hour (3600 seconds)

INCR page:home:views                  -- Increment a counter by 1
GET page:home:views                   -- "1847"

Notice how there are no tables, no schemas, no JOINs. Just set a key, get a key. This simplicity is what makes Redis so powerful for the specific problems it solves. You would never try to build an entire application on Redis alone — it is not designed for complex queries or relationships. But as a companion to your primary database (whether SQL or MongoDB), it is invaluable.

Tip A common architecture in the real world is to use a relational database (like PostgreSQL) as the primary data store for all your important data, and Redis as a caching and session layer that sits in front of it. Your app checks Redis first. If the data is there, it returns instantly. If not, it queries the main database, stores the result in Redis for next time, and then returns it. This pattern is called cache-aside and it dramatically improves performance.

4. Other Types of NoSQL Databases

Document databases and key-value stores are the NoSQL types you will encounter most often in everyday web development. But there are other specialized types that solve problems those two cannot. Let us take a brief tour so you know they exist and can recognize when they might be the right tool.

Column-Family Databases (Cassandra, HBase)

Imagine you have a table with billions of rows and hundreds of columns, but any given query only needs a few of those columns. In a traditional row-based database, the system has to read entire rows even if you only want two columns out of fifty. Column-family databases flip this around — they store data by column instead of by row, so reading just the columns you need is extremely efficient.

Apache Cassandra is the most well-known column-family database. It was originally built by Facebook to power their inbox search feature, and it is now used by companies like Netflix, Uber, and Apple to handle datasets with billions of rows spread across hundreds of servers around the world. Cassandra can handle millions of writes per second and is designed to never go down, even if entire data centers fail.

You probably will not need Cassandra for your early projects. It shines at a scale most applications never reach. But if you ever work for a company that deals with massive amounts of time-series data (like server logs, sensor readings, or financial transactions), you will likely encounter it.

Graph Databases (Neo4j)

Some data is all about relationships. Think about a social network: Alice follows Bob, Bob follows Carol, Carol and Alice are both members of the "Lansing Coders" group, and Alice recommended Carol for a job at the same company where Bob works. The interesting questions are not about individual people — they are about the connections: "Who are friends of my friends?" "What is the shortest path between two people?" "Which users have the most influence?"

You can model this in a relational database, but the SQL queries get complex and slow very quickly as the number of relationships grows. Graph databases like Neo4j are built specifically for this. They store data as nodes (people, places, things) and edges (relationships between them), and they can traverse millions of connections in milliseconds.

Graph databases are used for:

Social networks: friend suggestions, connection paths, group recommendations
Fraud detection: finding suspicious patterns in financial transaction networks
Recommendation engines: "customers who bought X also bought Y"
Knowledge graphs: connecting concepts, people, and facts (like how Google answers questions directly in search results)

Search Engines (Elasticsearch)

Elasticsearch is technically a search engine, but it is often grouped with NoSQL databases because it stores and queries data in a non-relational way. Its superpower is full-text search — the ability to search through millions of text documents and return relevant results in milliseconds, ranked by how well they match your query.

When you type a search query on an e-commerce site and it finds products even when you misspell words, or when a news site lets you search through years of articles by keywords and phrases, that is usually Elasticsearch (or a similar search engine) at work. It handles fuzzy matching, relevance ranking, faceted search (filtering by category, price range, etc.), and autocomplete suggestions.

Like Redis, Elasticsearch is typically used alongside a primary database, not as a replacement. Your main data lives in PostgreSQL or MongoDB, and a copy is indexed in Elasticsearch specifically for searching.

Important You do not need to memorize all these database types right now. The goal is simply to know they exist. When you encounter a problem in the real world — like needing to search through millions of text records or model complex relationships — you will remember that there are specialized databases designed for exactly that. Then you can learn the specific one you need.

5. SQL vs NoSQL — How to Choose

One of the most common questions new developers ask is: "Should I use SQL or NoSQL?" The honest answer is: it depends. But "it depends" is not very helpful, so let us build a practical decision framework you can actually use.

The Comparison at a Glance

Factor	SQL (Relational)	NoSQL (Non-Relational)
Data structure	Fixed schema — all rows have the same columns	Flexible — each record can have different fields
Relationships	Excellent — JOINs connect data across tables	Limited — data is usually self-contained in documents
Consistency	Strong — ACID transactions guarantee data integrity	Varies — some sacrifice consistency for speed and scale
Scaling	Vertical (bigger server) — harder to distribute	Horizontal (more servers) — designed for distribution
Query language	SQL — standardized, powerful, well-known	Database-specific APIs and query languages
Schema changes	Requires migrations — careful planning needed	Flexible — add fields on the fly
Best for	Structured data with clear relationships	Variable data, rapid development, massive scale
Examples	MySQL, PostgreSQL, SQLite, Oracle	MongoDB, Redis, Cassandra, Neo4j, Elasticsearch

Choose SQL When:

Your data has clear relationships. If you are building an e-commerce system where orders belong to customers, orders contain products, and products belong to categories, the relational model handles this beautifully.
Data integrity is critical. Financial applications, healthcare systems, and anything where incorrect or inconsistent data could cause real harm. SQL databases with ACID transactions make sure data is always correct.
You need complex queries. If you regularly need to answer questions like "What is the total revenue by product category for customers in Michigan who signed up in the last 90 days?", SQL is built for exactly this.
Your data structure is well-defined and unlikely to change frequently. If you know what your data looks like and it is consistent, a relational schema keeps everything clean and organized.

Choose NoSQL When:

Your data is naturally hierarchical or nested. User profiles, product catalogs with variable attributes, content management systems where each article can have different metadata.
You need extreme speed for simple lookups. Caching, session management, real-time features — this is where key-value stores like Redis excel.
Your data structure is evolving rapidly. In the early stages of a startup when you are experimenting with features and changing your data model weekly, the flexibility of a document database can save you time.
You need to handle massive scale. If you are dealing with millions of users generating billions of records, NoSQL databases like Cassandra are designed to distribute data across many servers seamlessly.
Your application reads and writes entire objects. If your app always loads a complete user profile and saves a complete user profile, a document database avoids the overhead of multiple table JOINs.

The Practical Answer for Most Beginners

Tip If you are building your first real project and are unsure which to pick, start with a relational database like PostgreSQL or MySQL. Here is why: relational databases can handle the vast majority of applications perfectly well. They have been battle-tested for decades. The SQL skills you learn are universal and transferable. And if you later discover that a specific part of your application would benefit from MongoDB or Redis, you can add that alongside your relational database. Starting with NoSQL and later realizing you need relational features is a harder problem to solve.

Many successful applications — and many successful companies — run entirely on relational databases. Instagram was serving tens of millions of users with PostgreSQL before they ever added other database types. The key is to understand both approaches, know when each one shines, and make an informed decision rather than following hype.

Knowledge Check

1. What does NoSQL stand for, and what does it mean?

"No SQL" — it means these databases cannot use SQL at all and are completely incompatible with structured data "Not Only SQL" — it means there are alternative approaches to storing data beyond the traditional relational table model "New SQL" — it means a modern version of SQL that replaces the older standard "Non-Structured Query Language" — it means a query language for unstructured data

Correct! NoSQL stands for "Not Only SQL." It does not reject SQL — it simply acknowledges that relational databases are not the only way to store and organize data. Different data problems call for different solutions.

2. You are building a user profile system where each user can have a different number of skills, work experiences, and social media links. Which database approach would most naturally model this data?

A relational database with one table and many nullable columns for every possible field A document database like MongoDB, where each user profile is stored as a single document with nested arrays and objects A key-value store like Redis, since each user has a unique ID that can serve as the key A graph database like Neo4j, since users are connected to their skills and experiences

Correct! A document database handles variable, nested data naturally. Each user document can contain arrays of skills and sub-documents for experiences, all in one place, without needing multiple tables or JOINs. While a relational database can also model this (using multiple tables), the document approach is the most natural fit for this kind of hierarchical, variable data.

3. Why is Redis so fast compared to traditional databases?

It uses a more advanced version of SQL that executes queries faster It only works with small datasets, so there is less data to search through It stores all data in memory (RAM) and uses O(1) hash table lookups, avoiding disk access and complex query processing It compresses data before storing it, which makes reads faster

Correct! Redis achieves its speed through two key design decisions: storing everything in RAM (which is roughly 100,000 times faster than disk) and using hash table lookups that take constant time regardless of how much data is stored. This makes it ideal for caching, sessions, and any data that needs to be accessed in microseconds.

Lesson Summary

In this lesson, you expanded your understanding of databases beyond the relational model you learned in Lesson 18. Here is what we covered:

NoSQL ("Not Only SQL") is not a replacement for relational databases — it is a family of alternative approaches, each optimized for different kinds of data and access patterns.
Document databases like MongoDB store data as flexible JSON-like documents that can contain nested objects and arrays. They are ideal for hierarchical, variable data like user profiles and product catalogs.
Key-value stores like Redis are the simplest and fastest type of database. They store data as key-value pairs in memory and are perfect for caching, sessions, counters, and real-time features.
Column-family databases like Cassandra handle billions of rows across hundreds of servers. Graph databases like Neo4j excel at relationship-heavy data like social networks. Search engines like Elasticsearch provide lightning-fast full-text search.
Choosing between SQL and NoSQL is not about which is "better" — it is about which is the right tool for your specific problem. Many production applications use both.
When in doubt, start with a relational database. They handle the vast majority of use cases, and the skills you learn transfer everywhere.

In the next lesson, we will take a step back in time and explore legacy databases — the older systems that many businesses still rely on today. Understanding these systems is not just a history lesson; it is practical career knowledge that can set you apart in the job market, especially here in Michigan where many companies still run on legacy technology.

Finished this lesson?

← Previous: Relational Databases Next: Legacy Databases →