Lesson 1 of 6

Understanding Data & Data Types

Estimated time: 1.5–2 hours

What You Will Learn

Understand what data really is and why every application revolves around it
See how a spreadsheet is the perfect mental model for understanding database tables
Learn the most common SQL data types — strings, numbers, decimals, booleans, dates, and more
Understand why choosing the right data type matters for accuracy, performance, and safety
Learn what NULL is, how it differs from zero and empty strings, and how to handle it
Avoid the most common beginner mistakes when designing database columns
Compare SQL data types to the Java types you already know from earlier lessons

1. Everything Is Data

Welcome to the Understanding Databases track! This is a new chapter in your journey, and it is going to change how you think about software. In the previous tracks, you built web pages with HTML and CSS, added interactivity with JavaScript, created a REST API with Spring Boot, and even connected your contact form to a SQLite database. You have already accomplished something remarkable. Now it is time to go deeper into the thing that sits at the heart of every application you have ever used: data.

Think about every application on your phone right now. What does Instagram actually do? It stores photos, captions, usernames, like counts, comments, and follower lists. What does a banking app do? It stores account numbers, balances, transaction histories, and dates. What does Google Maps do? It stores addresses, coordinates, business names, hours of operation, and reviews. Strip away the beautiful interfaces, the animations, and the clever features, and every application is doing the same three things: storing data, moving data, and displaying data.

That is not an oversimplification. That is genuinely how professional developers think about software. When a software team sits down to build a new feature, the very first question they ask is not "What should the button look like?" or "What color should the background be?" The first question is: "What data do we need?" Once you know what data you need, the rest of the application — the API endpoints, the frontend pages, the user interactions — flows naturally from that decision.

You already experienced this firsthand. In Lesson 13, when you built the contact form API, the first thing you did was define the ContactMessage class with fields for id, name, email, and message. That was you defining your data. Everything else — the controller, the endpoints, the JSON responses — was built around that data definition. In Lesson 14, when you added SQLite, you were solving a data problem: how to make the data persist after the server restarts.

In this track, you are going to learn how databases work from the ground up. Not just how to use them (you already did that with JPA and Hibernate), but how to think about them. How to design good database tables. How to choose the right types for your data. How to write SQL queries that ask exactly the right questions. By the end of this track, you will understand databases deeply enough to design and build the data layer for any application.

Let us start at the very beginning: what is data, and how do we organize it?

2. Databases Are Spreadsheets with Rules

If you have ever used Google Sheets or Microsoft Excel, you already understand 80% of how a database works. That might sound surprising, but it is true. A spreadsheet and a database table are structurally almost identical. The difference is that a database adds rules to keep the data clean, consistent, and reliable.

Let us start with what they have in common. Open a spreadsheet in your mind (or open a real one if you want to follow along). Imagine you are tracking contact form submissions. You would set it up like this:

id	name	email	message	submitted_at
1	Maria Garcia	maria@example.com	Love the site! Great work.	2025-03-15 10:30:00
2	James Wilson	james.w@email.com	How do I sign up for classes?	2025-03-15 14:22:00
3	Aisha Johnson	aisha.j@example.com	I would like to volunteer.	2025-03-16 09:05:00

Look at that table. Every database concept has a spreadsheet equivalent:

Columns are called fields (or attributes) in database language. Each column holds one specific type of information — the person's name, their email, their message. Just like in a spreadsheet, you label each column at the top so everyone knows what goes there.
Rows are called records (or entries). Each row represents one complete item — one contact form submission, one user, one product. When Maria Garcia submits the contact form, her submission becomes one row in the table.
Cells are called values. Each cell holds one specific piece of data — the intersection of a column and a row. Maria's name, Maria's email, Maria's message — each of those is one value.

The Spreadsheet-to-Database Dictionary

Spreadsheet = Table. Column header = Field / Column. Row = Record. Cell = Value. If you can read a spreadsheet, you can understand a database table. They are the same idea.

So if they are so similar, why do we need databases at all? Why not just use a spreadsheet? Great question. The answer is rules. A spreadsheet lets you put anything anywhere. You could type a phone number in the email column. You could leave the name blank. You could type "yesterday" in a date column. A spreadsheet does not care. It accepts whatever you give it.

A database is different. A database lets you define rules about what kind of data each column can hold. You can say: "The email column must always contain text, and it can never be empty." You can say: "The id column must contain a whole number, and every value must be unique." You can say: "The submitted_at column must contain a valid date and time." These rules are called constraints, and they are one of the most powerful features of a database. They prevent bad data from getting in, which means the data you get out is reliable.

Think about it this way: a spreadsheet is like a notebook where you can write anything. A database is like a well-designed form with labeled boxes, dropdown menus, and validation — it guides you to enter the right kind of data in the right place.

And the most fundamental rule you set for each column? Its data type.

3. Data Types: Telling the Database What to Expect

When you create a table in a database, you must tell the database what type of data each column will hold. Will it hold text? Numbers? Dates? True/false values? This is called the column's data type, and it is one of the most important decisions you make when designing a database.

If you completed the Java lessons, this concept should feel familiar. In Java, when you declare a variable, you specify its type: String name, int age, double price, boolean isActive. SQL data types work the same way — they tell the database what kind of value a column will store. The names are a little different, but the ideas are the same ones you already know.

Let us walk through each major data type, one at a time, with plenty of examples.

3.1 — Text: VARCHAR, CHAR, and TEXT

Text is probably the most common type of data in any application. Names, email addresses, descriptions, messages, addresses, usernames — all of these are text. In SQL, there are several types for storing text, and the most important one is VARCHAR.

VARCHAR stands for "variable-length character." It stores text up to a maximum length that you specify. For example, VARCHAR(100) means "this column can hold up to 100 characters of text." If someone enters a name that is only 10 characters long, the database only uses the space it needs — it does not waste space padding out to 100 characters. That is what "variable-length" means.

You might wonder: why set a limit at all? Why not just say "this column holds text" and leave it at that? The answer is protection. If you do not set a limit, someone (or some program) could submit a "name" that is 50 million characters long, consuming massive amounts of storage and potentially crashing your application. Setting a reasonable limit like VARCHAR(100) for a name or VARCHAR(255) for an email protects your database from unreasonable data.

CHAR is VARCHAR's stricter cousin. CHAR(10) means "this column always stores exactly 10 characters." If the actual text is shorter, the database pads it with spaces. CHAR is useful for fixed-length codes like state abbreviations (CHAR(2) for "MI", "OH", "CA") or zip codes (CHAR(5) for "48912"). But for most text, VARCHAR is the better choice because real-world text varies in length.

TEXT is for large, unrestricted text. It can hold thousands or even millions of characters. Use TEXT for things like blog post bodies, long descriptions, or message content where you do not want to impose a character limit. The trade-off is that TEXT columns are slightly less efficient for the database to search and sort compared to VARCHAR.

-- Text type examples in a CREATE TABLE statement
CREATE TABLE users (
    username    VARCHAR(50),     -- Up to 50 characters (usernames are short)
    email       VARCHAR(255),    -- Up to 255 characters (standard for emails)
    state_code  CHAR(2),         -- Exactly 2 characters ("MI", "OH", "CA")
    bio         TEXT             -- Unlimited length (user biographies can be long)
);

Java comparison: In Java, all of these map to the String type. Java's String does not have a built-in length limit (it can hold billions of characters), so you handle length validation in your code. In SQL, the type itself enforces the length limit. When you used private String name; in your ContactMessage class in Lesson 13, the database equivalent would be something like name VARCHAR(100).

3.2 — Whole Numbers: INT, BIGINT, and SMALLINT

Whole numbers (numbers without a decimal point) are used constantly in databases. User IDs, ages, quantities, counts, years — all of these are whole numbers. SQL gives you several integer types, and the difference between them is how big the numbers can get.

INT (also called INTEGER) is the standard choice. It can hold values from about −2.1 billion to +2.1 billion. That is big enough for almost anything: user IDs, product quantities, page view counts, ages, years. When in doubt, use INT.

BIGINT is for truly enormous numbers. It can hold values up to about 9.2 quintillion (that is 9,200,000,000,000,000,000). You would use BIGINT for things like unique IDs on platforms with billions of records (think Twitter's tweet IDs or Facebook's user IDs), or for storing very large counts like total bytes transferred across a network.

SMALLINT is for smaller numbers, typically up to about 32,767. It uses less storage space than INT, so it is efficient for things like age (no one lives to 32,767), rating scores (1 through 5), or small quantities. If you know your numbers will always be small, SMALLINT saves a tiny bit of space — which adds up when you have millions of rows.

SQL Type	Range	Storage Size	Good For
`SMALLINT`	−32,768 to 32,767	2 bytes	Age, ratings, small counts
`INT`	−2.1 billion to 2.1 billion	4 bytes	IDs, quantities, years, most numbers
`BIGINT`	−9.2 quintillion to 9.2 quintillion	8 bytes	Huge IDs, massive counts

-- Integer type examples
CREATE TABLE products (
    id              INT,           -- Product ID (standard integer)
    quantity        INT,           -- How many are in stock
    rating          SMALLINT,      -- Star rating 1-5 (small numbers)
    total_views     BIGINT         -- Could grow very large on popular sites
);

Java comparison: SQL's SMALLINT maps to Java's short. SQL's INT maps to Java's int. SQL's BIGINT maps to Java's long. Remember the private Long id; field in your ContactMessage class? That used Java's Long (the wrapper class for long), which corresponds to SQL's BIGINT. The same concept, just different names in different languages.

3.3 — Decimal Numbers: DECIMAL, FLOAT, and DOUBLE

Sometimes you need numbers with decimal points: prices ($19.99), temperatures (72.5), GPS coordinates (42.7325), percentages (85.7%). SQL has several types for decimal numbers, and choosing the right one really matters. This is one of the most important sections in this entire lesson, so please read it carefully.

DECIMAL (also called NUMERIC) stores exact decimal numbers. When you write DECIMAL(10, 2), you are saying: "This number has up to 10 total digits, and 2 of them come after the decimal point." So DECIMAL(10, 2) can store values like 19.99, 1234567.89, or 0.50. The key word here is exact. If you store 19.99 in a DECIMAL column, you get back exactly 19.99. Not 19.989999999. Not 19.990000001. Exactly 19.99.

FLOAT and DOUBLE store approximate decimal numbers. They use a clever encoding called "floating point" that lets them represent an enormous range of values (from incredibly tiny to astronomically large), but at the cost of precision. The computer stores these numbers in binary (base 2), and some decimal fractions — like 0.1 — cannot be represented exactly in binary. This leads to tiny rounding errors that accumulate over time.

This brings us to one of the most important rules in all of software development:

Never use FLOAT or DOUBLE for money! Floating-point numbers have tiny rounding errors. If you store $19.99 as a FLOAT, the database might actually store it as $19.989999999999998. This might seem harmless, but those tiny errors add up. If you are processing thousands of financial transactions, FLOAT rounding errors can cause your books to be off by real dollars. Banks, stores, and every serious financial application use DECIMAL for money. Always. No exceptions.

So when should you use FLOAT or DOUBLE? They are fine for scientific measurements, sensor readings, GPS coordinates, and other values where tiny imprecision is acceptable. A temperature of 72.500000001 degrees is close enough. But $72.500000001 in someone's bank account is not close enough.

-- Decimal type examples
CREATE TABLE order_items (
    price       DECIMAL(10, 2),  -- Exact: $19.99 stays $19.99 (USE THIS FOR MONEY!)
    tax_rate    DECIMAL(5, 4),   -- Exact: 0.0625 (6.25% tax rate)
    weight_kg   FLOAT,           -- Approximate: 2.345 kg (close enough for shipping)
    latitude    DOUBLE           -- Approximate: 42.732535 (GPS coordinates)
);

SQL Type	Precision	Good For	Bad For
`DECIMAL(p, s)`	Exact	Money, financial data, tax rates	Nothing — it is always safe, just slightly slower
`FLOAT`	Approximate (~7 digits)	Scientific data, sensor readings	Money, anything requiring exactness
`DOUBLE`	Approximate (~15 digits)	GPS coordinates, complex calculations	Money, anything requiring exactness

Java comparison: SQL's FLOAT maps to Java's float. SQL's DOUBLE maps to Java's double. SQL's DECIMAL maps to Java's BigDecimal — a special class that handles exact decimal arithmetic. In Java, you would use BigDecimal for financial calculations for the exact same reason: float and double have rounding errors that are unacceptable for money.

3.4 — True or False: BOOLEAN

Sometimes data is a simple yes-or-no question. Is this user active? Has this order been shipped? Is this email address verified? Has the user agreed to the terms of service? These are all boolean values — they can only be TRUE or FALSE.

In SQL, the BOOLEAN type stores exactly that: true or false. Some databases (like MySQL) actually store booleans as tiny integers where 0 means false and 1 means true. Other databases (like PostgreSQL) have a proper BOOLEAN type. SQLite, the database you used in Lesson 14, stores booleans as integers (0 or 1). But regardless of how the database stores it internally, the concept is the same: the value is either true or false.

-- Boolean type examples
CREATE TABLE users (
    id              INT,
    username        VARCHAR(50),
    email           VARCHAR(255),
    is_active       BOOLEAN,        -- Is this account currently active?
    email_verified  BOOLEAN,        -- Has the user confirmed their email?
    is_admin        BOOLEAN         -- Does this user have admin privileges?
);

Java comparison: SQL's BOOLEAN maps directly to Java's boolean (or its wrapper class Boolean). Same concept, same name, same idea. In Java you would write private boolean isActive = true; and in SQL you would store TRUE or FALSE (or 1 / 0) in a BOOLEAN column.

3.5 — Dates and Times: DATE, DATETIME, and TIMESTAMP

Dates and times are everywhere in applications. When was this account created? When does the event start? When was the last login? When was this order placed? SQL provides several types for storing dates and times, and understanding the differences will save you many headaches.

DATE stores a calendar date with no time information. It holds a year, month, and day. The standard format is YYYY-MM-DD. For example: 2025-03-15 means March 15, 2025. Use DATE when you only care about the day, not the specific time — like a person's birthday, a project deadline, or an event date.

DATETIME (sometimes called TIMESTAMP, depending on the database) stores both a date and a time. The standard format is YYYY-MM-DD HH:MM:SS. For example: 2025-03-15 14:30:00 means March 15, 2025 at 2:30 PM. Use DATETIME when the exact time matters — like when an order was placed, when a user logged in, or when a message was submitted.

TIMESTAMP is similar to DATETIME but is often timezone-aware. It stores the date and time relative to UTC (Coordinated Universal Time), which is the global standard. This is important for applications used by people in different time zones. If someone in Lansing, Michigan (Eastern Time) submits a form at 2:00 PM, and someone in Los Angeles (Pacific Time) views it, the timestamp ensures both people see the correct time relative to their own location.

-- Date and time type examples
CREATE TABLE events (
    id              INT,
    title           VARCHAR(200),
    event_date      DATE,           -- 2025-03-15 (just the date, no time)
    start_time      DATETIME,       -- 2025-03-15 14:30:00 (date and time)
    created_at      TIMESTAMP       -- Automatically set when the row is created
);

-- Example values:
-- event_date:  '2025-07-04'
-- start_time:  '2025-07-04 18:00:00'
-- created_at:  '2025-03-15 14:30:22'

Best practice: Always store dates in the standard YYYY-MM-DD format. Do not store them as text like "March 15, 2025" or "03/15/2025". The standard format lets the database sort dates correctly (January comes before February, 2024 comes before 2025) and perform date calculations (like "find all orders from the last 30 days"). If you store dates as plain text, the database cannot do any of this.

Java comparison: SQL's DATE maps to Java's LocalDate. SQL's DATETIME maps to Java's LocalDateTime. SQL's TIMESTAMP maps to Java's Instant or ZonedDateTime. If you used JPA in Lesson 14, you might remember annotating date fields with @Temporal or using LocalDateTime — those are the Java equivalents of what the database stores.

3.6 — Binary Data: BLOB

BLOB stands for Binary Large Object. It is used to store raw binary data — things like images, PDF files, audio clips, or any other kind of file. Instead of storing text or numbers, a BLOB column stores the raw bytes of a file directly in the database.

In practice, most modern applications do not store large files directly in the database. Instead, they store the files on a file server or cloud storage service (like Amazon S3) and just store the URL or file path in the database as a VARCHAR. This approach is more efficient because databases are optimized for structured data (text, numbers, dates), not for serving large files.

That said, BLOBs are sometimes used for small binary data like user profile pictures (thumbnails), digital signatures, or encrypted tokens. You should know that BLOB exists, but you will not use it very often in typical web applications.

Rule of thumb: If the file is small (under a few kilobytes), storing it as a BLOB in the database might be fine. If the file is large (images, videos, documents), store the file on a file server and save just the URL in the database as a VARCHAR.

3.7 — The Complete Data Type Reference

Here is a summary table that shows every data type we covered, what it stores, its Java equivalent, and when to use it. Bookmark this section — you will come back to it often as you design database tables.

SQL Type	What It Stores	Java Equivalent	Example Value
`VARCHAR(n)`	Text up to n characters	`String`	'Maria Garcia'
`CHAR(n)`	Fixed-length text of exactly n characters	`String`	'MI'
`TEXT`	Long, unlimited text	`String`	'A very long blog post...'
`SMALLINT`	Small whole numbers (−32K to 32K)	`short`	25
`INT`	Whole numbers (−2.1B to 2.1B)	`int`	48912
`BIGINT`	Very large whole numbers	`long`	9200000000000
`DECIMAL(p, s)`	Exact decimal numbers	`BigDecimal`	19.99
`FLOAT`	Approximate decimals (~7 digits)	`float`	72.5
`DOUBLE`	Approximate decimals (~15 digits)	`double`	42.732535
`BOOLEAN`	True or false	`boolean`	TRUE
`DATE`	Calendar date (no time)	`LocalDate`	'2025-03-15'
`DATETIME`	Date and time	`LocalDateTime`	'2025-03-15 14:30:00'
`TIMESTAMP`	Date and time (timezone-aware)	`Instant`	'2025-03-15 18:30:00+00'
`BLOB`	Raw binary data (files, images)	`byte[]`	(binary data)

4. Why Data Types Matter

You might be thinking: "Okay, I get it. Numbers go in number columns and text goes in text columns. But does it really matter that much? Can I just make everything a VARCHAR and move on?" This is a totally fair question, and the answer is: no, you should not do that. Choosing the right data type matters enormously, for four important reasons.

4.1 — Data Integrity

The data type acts as a first line of defense against bad data. If you define a column as INT, the database will refuse to accept the value "hello" in that column. If you define a column as DATE, the database will reject "banana" because it is not a valid date. This means your data stays clean and reliable automatically, without you having to write validation code for every single value.

Imagine if your age column was a VARCHAR. Someone could enter "twenty-five" instead of 25. Someone else could enter "old." Another person could enter "999." With an INT column, the database only accepts actual numbers. With a SMALLINT column, it only accepts numbers in the valid range. The data type enforces the rules for you.

4.2 — Storage Efficiency

Different data types use different amounts of storage space. A SMALLINT uses 2 bytes. An INT uses 4 bytes. A BIGINT uses 8 bytes. If you have a column for age and you use BIGINT instead of SMALLINT, you are using four times more storage than you need — for every single row in the table. That might not matter when you have 100 rows, but when you have 100 million rows, the wasted space adds up to hundreds of megabytes.

Storage efficiency also affects speed. The database has to read data from disk into memory to work with it. Smaller data means fewer disk reads, which means faster queries. Choosing appropriate types is not just about saving disk space — it is about making your application faster.

4.3 — Sorting and Comparisons

Data types determine how values are sorted and compared. This is a subtle but critical point. Consider the numbers 1, 2, 10, and 20. If you store them as integers, the database sorts them correctly: 1, 2, 10, 20. But if you store them as text (VARCHAR), the database sorts them alphabetically: "1", "10", "2", "20". That is because in alphabetical sorting, "10" comes before "2" (just like "ba" comes before "c"). This can cause extremely confusing bugs in your application.

The same problem applies to dates. If you store dates as text, the database cannot compare them correctly. Is "March 15, 2025" before or after "February 28, 2025"? With text, the database would sort by the first letter — "F" comes before "M" — which gives the wrong answer. With a proper DATE type, the database understands the calendar and sorts correctly.

4.4 — Validation and Business Logic

When you choose the right data type, you can take advantage of built-in database functions. You can use SUM() and AVG() on number columns to calculate totals and averages. You can use date functions to find "all orders from the last 7 days" or "all users who signed up this month." You can use comparison operators to find "all products priced under $20.00." None of these operations work correctly if you store everything as text.

Why Types Matter — The Bottom Line

Choosing the right data type gives you four things: data integrity (bad data is rejected automatically), storage efficiency (your database uses space wisely), correct sorting (numbers and dates sort properly), and powerful operations (math, date calculations, and comparisons work correctly). Choosing the wrong type gives you bugs, wasted space, and data you cannot trust.

5. Understanding NULL — The Absence of Data

There is one more concept you need to understand before you can design database tables confidently, and it trips up almost every beginner. It is called NULL.

NULL means "no value." Not zero. Not an empty string. Not false. No value at all. It means the data is missing, unknown, or not applicable. Think of it as a blank cell in a spreadsheet — not a cell with a zero or a space in it, but a truly empty cell where no one has entered anything.

Here are some real-world examples of when data is legitimately NULL:

A user has not set a profile picture yet — their profile_picture_url is NULL.
An order has been placed but not yet shipped — the shipped_date is NULL.
A student has enrolled in a class but has not received a grade yet — their grade is NULL.
A contact form submission did not include a phone number (because the field was optional) — the phone is NULL.

NULL Is Not Zero and Not Empty

This is the part that confuses beginners, so let us be very clear about it. These are three completely different things:

Value	Meaning	Example
`0`	The number zero — a real value that means "none" or "nothing counted"	A shopping cart with 0 items (we know the count; it is zero)
`''` (empty string)	A text value that happens to have no characters	A user who cleared their bio field (they intentionally left it blank)
`NULL`	No value at all — unknown, missing, or not yet provided	A user who never visited the bio page (we do not know their bio)

The difference matters. If a patient's blood pressure is 0, that is a medical emergency. If it is NULL, it just means the nurse has not measured it yet. Very different situations. If an employee's salary is 0, that means they are a volunteer. If it is NULL, it means we do not have their salary information on file. Same data column, completely different meanings.

IS NULL vs. = NULL

Here is a quirk of SQL that catches everyone the first time: you cannot use = to check for NULL. In SQL, NULL = NULL is not true. It is not false either. It is unknown. This sounds strange, but it makes logical sense: if two values are both unknown, you cannot say they are equal — they could be anything.

Instead, SQL provides special operators: IS NULL and IS NOT NULL.

-- WRONG: This will NOT find rows where shipped_date is NULL
SELECT * FROM orders WHERE shipped_date = NULL;

-- CORRECT: Use IS NULL instead
SELECT * FROM orders WHERE shipped_date IS NULL;

-- Find all orders that HAVE been shipped (shipped_date is not NULL)
SELECT * FROM orders WHERE shipped_date IS NOT NULL;

Remember: Never use = NULL or != NULL in SQL. Always use IS NULL and IS NOT NULL. This is one of the most common SQL mistakes, and every programmer makes it at least once. Now you know to watch out for it.

Nullable vs. NOT NULL Columns

When you create a table, you can decide whether each column is allowed to contain NULL values or not. If a column is NOT NULL, the database will refuse to save a row that does not include a value for that column. If a column is nullable (the default in most databases), NULL values are allowed.

How do you decide? Think about whether the data is required or optional. A contact form must have a name and an email — those should be NOT NULL. But a phone number might be optional — that can be nullable. A user account must have a username — NOT NULL. But a middle name is optional — nullable.

-- NOT NULL means the column MUST have a value (required)
-- No constraint means the column CAN be NULL (optional)

CREATE TABLE contact_messages (
    id          INT          NOT NULL,  -- Required: every message needs an ID
    name        VARCHAR(100) NOT NULL,  -- Required: we must know who sent it
    email       VARCHAR(255) NOT NULL,  -- Required: we need a way to reply
    phone       VARCHAR(20),            -- Optional: not everyone shares their phone
    message     TEXT         NOT NULL,  -- Required: the message itself
    submitted_at TIMESTAMP  NOT NULL   -- Required: we record when it was sent
);

Best practice: Default to NOT NULL unless you have a specific reason for the data to be optional. It is easier to make a column nullable later than to deal with unexpected NULLs causing bugs in your application. Required data should always be NOT NULL.

6. Choosing the Right Type: A Practical Walkthrough

Theory is great, but let us put it into practice. Remember the contact form you built in Lesson 13? Let us walk through every field and justify the data type choice, just like a professional developer would do when designing a database table. This is the thought process you should follow every time you create a new table.

Here is the original contact form from Lesson 13. It had fields for name, email, and message, and the ContactMessage class also had an id field. Let us now design the database table properly, thinking carefully about each type:

CREATE TABLE contact_messages (
    id           INTEGER      PRIMARY KEY AUTOINCREMENT,
    name         VARCHAR(100) NOT NULL,
    email        VARCHAR(255) NOT NULL,
    message      TEXT         NOT NULL,
    submitted_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Now let us justify every single choice:

id INTEGER PRIMARY KEY AUTOINCREMENT — The ID is a whole number that uniquely identifies each message. We use INTEGER because IDs are always whole numbers. PRIMARY KEY means this value must be unique for each row (no two messages can have the same ID). AUTOINCREMENT means the database will automatically assign the next number every time a new row is added — we do not have to manage this ourselves. Remember how in Lesson 13 you had a nextId counter that you incremented manually? AUTOINCREMENT does that automatically and more reliably.

name VARCHAR(100) NOT NULL — Names are text, so we use VARCHAR. We set a limit of 100 characters because no reasonable name is longer than that. (The longest name ever recorded is about 50 characters.) NOT NULL because every message must come from someone — an anonymous message without a name is not useful for a contact form.

email VARCHAR(255) NOT NULL — Email addresses are text. We use VARCHAR(255) because the email specification (RFC 5321) says that email addresses can be up to 254 characters long. We round up to 255, which is a common convention. NOT NULL because the whole point of a contact form is to let you reply, and you need an email address for that.

message TEXT NOT NULL — We use TEXT instead of VARCHAR because people might write long messages and we do not want to cut them off. There is no meaningful maximum length for a free-form message, so TEXT is the right choice. NOT NULL because a contact form submission without a message is pointless — what would you reply to?

submitted_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP — We need to know when each message was submitted, both for record-keeping and for sorting messages chronologically. DATETIME stores the date and time. NOT NULL because every message is submitted at a specific moment. DEFAULT CURRENT_TIMESTAMP means the database will automatically fill in the current date and time when a new row is inserted, so the application does not have to provide this value explicitly.

The Type Selection Checklist

For every column, ask yourself four questions:

What kind of data is it? Text, number, date, true/false, or binary?
How big can it get? Short text (VARCHAR), long text (TEXT), small number (SMALLINT), big number (BIGINT)?
Does it need to be exact? Use DECIMAL for money, FLOAT/DOUBLE for approximate measurements.
Is it required or optional? NOT NULL for required, nullable for optional.

Let us practice with one more example. Imagine you are building a table for an online store's product catalog:

CREATE TABLE products (
    id           INTEGER       PRIMARY KEY AUTOINCREMENT,
    name         VARCHAR(200)  NOT NULL,       -- Product name: "Blue Running Shoes"
    description  TEXT,                          -- Optional detailed description
    price        DECIMAL(10,2) NOT NULL,        -- $49.99 — DECIMAL, never FLOAT!
    quantity     INT           NOT NULL DEFAULT 0, -- How many in stock
    weight_kg    FLOAT,                         -- 0.45 kg — approximate is fine
    is_active    BOOLEAN       NOT NULL DEFAULT TRUE,  -- Is product visible?
    created_at   DATETIME      NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Notice the thought behind each choice. The product name has a generous 200-character limit. The description uses TEXT because some products have very long descriptions, and it is nullable because not every product needs one. The price uses DECIMAL because money must be exact. The quantity uses INT with a default of 0. The weight uses FLOAT because approximate precision is fine for shipping calculations. And the is_active flag uses BOOLEAN so products can be hidden without being deleted.

7. Common Mistakes Beginners Make

Every developer makes these mistakes at least once. Learning about them now will save you hours of debugging later. Consider this section your "learn from other people's pain" guide.

Mistake 1: Using VARCHAR for Everything

This is the most common beginner mistake. "Why bother learning all these types? I will just make every column VARCHAR(255) and store everything as text." This seems convenient, but it causes serious problems:

No automatic validation. If your age column is a VARCHAR, the database happily accepts "twenty-five", "old", "-7", or "banana" as valid ages.
Wrong sorting. Numbers stored as text sort alphabetically: "1", "10", "2", "20" instead of 1, 2, 10, 20. Dates stored as text have the same problem.
No math. You cannot calculate the average age, the total sales, or the sum of quantities if they are stored as text. You would have to convert them every time, which is slow and error-prone.
Wasted space. The number 42 takes 4 bytes as an INT but 2+ bytes per character as VARCHAR, plus overhead. Multiply this by millions of rows.

The rule is simple: Use the type that matches the data. Numbers get number types. Dates get date types. True/false values get BOOLEAN. Text gets VARCHAR or TEXT. Let the database do its job.

Mistake 2: Using FLOAT for Money

We covered this earlier, but it is worth repeating because the consequences are so severe. Here is what happens when you use FLOAT for money:

-- BAD: Using FLOAT for money
CREATE TABLE bad_orders (
    total FLOAT  -- DO NOT DO THIS!
);

-- If you store 0.1 + 0.2 in a FLOAT column, you might get:
-- 0.30000000000000004 instead of 0.3

-- GOOD: Using DECIMAL for money
CREATE TABLE good_orders (
    total DECIMAL(10, 2)  -- Always exact: 0.1 + 0.2 = 0.30
);

A single rounding error of a fraction of a cent might seem harmless. But if your application processes 10,000 transactions per day, those tiny errors accumulate. Over a year, your accounting could be off by hundreds or thousands of dollars. Financial auditors and regulatory agencies do not accept "floating-point rounding" as an excuse. Use DECIMAL for money. Always.

Mistake 3: No Length Limits on VARCHAR

Some databases let you create a VARCHAR column without specifying a maximum length. While this might work, it removes an important safety net. Without a length limit, a malicious user (or a buggy program) could submit a "name" that is 10 million characters long, consuming enormous amounts of storage and potentially causing performance problems.

Always specify a reasonable maximum length for VARCHAR columns:

VARCHAR(50) for usernames
VARCHAR(100) for names
VARCHAR(255) for email addresses
VARCHAR(500) for short descriptions or titles
TEXT when you genuinely need unlimited length

Mistake 4: Storing Dates as Strings

Never store dates as VARCHAR. You might be tempted to store dates like "March 15, 2025" or "03/15/2025" because they look readable. But string dates cannot be sorted correctly, cannot be compared reliably (is "03/15/2025" before or after "12/01/2024"?), and cannot be used with date functions like "find all records from the last 30 days." Use DATE or DATETIME types, and always use the YYYY-MM-DD format.

Mistake 5: Making Everything Nullable

If you do not think carefully about which columns should allow NULL, you end up with a database full of missing data. Then your application has to check for NULL everywhere, your queries become more complex, and your reports have gaps. Decide upfront which fields are required (NOT NULL) and which are truly optional (nullable). When in doubt, make it NOT NULL.

The Five Deadly Sins of Data Types

Avoid these five mistakes and you will be ahead of most beginner developers:

Using VARCHAR for everything (use the correct type instead)
Using FLOAT for money (use DECIMAL instead)
Skipping length limits on VARCHAR (always set a reasonable max)
Storing dates as strings (use DATE or DATETIME instead)
Making everything nullable (use NOT NULL for required fields)

Lesson Summary

You just covered a lot of ground in this lesson, and every bit of it will serve you well as you continue through the Understanding Databases track. Let us recap the key ideas.

Everything is data. Every application you use — from social media to banking to maps — is fundamentally about storing, moving, and displaying data. When you design software, the data comes first. Everything else is built around it.

Databases are spreadsheets with rules. A database table is just like a spreadsheet: columns define the fields, rows are individual records, and cells hold values. The difference is that a database enforces rules about what kind of data can go in each column, keeping your data clean and reliable.

Data types tell the database what to expect. You learned about text types (VARCHAR, CHAR, TEXT), integer types (SMALLINT, INT, BIGINT), decimal types (DECIMAL, FLOAT, DOUBLE), BOOLEAN for true/false, DATE and DATETIME for dates and times, and BLOB for binary data. Each type has a specific purpose, and choosing the right one matters for integrity, efficiency, sorting, and functionality.

Never use FLOAT for money. This is worth repeating one more time. Use DECIMAL for any financial data. FLOAT and DOUBLE have tiny rounding errors that accumulate and cause real problems.

NULL is not zero and not empty. NULL means "no value" — the data is missing or unknown. Always use IS NULL and IS NOT NULL in SQL queries, never = NULL. Use NOT NULL constraints for required columns.

Choosing the right type is a skill. For every column, ask: What kind of data is it? How big can it get? Does it need to be exact? Is it required or optional? These four questions will guide you to the right type every time.

In the next lesson, you will learn about relational databases — how tables connect to each other using relationships, and why this is one of the most powerful ideas in all of computer science. You will see how a users table connects to an orders table, how one piece of data can reference another, and how this structure lets you model the real world in a database. See you there!

Knowledge Check

1. You need to store the price of a product (like $29.99) in a database column. Which data type should you use?

FLOAT VARCHAR(10) DECIMAL(10, 2)

Correct! DECIMAL(10, 2) is the right choice for money because it stores exact decimal values. FLOAT would introduce tiny rounding errors (0.30000000000000004 instead of 0.30), which is unacceptable for financial data. VARCHAR would store the price as text, preventing you from doing math like calculating totals or averages. Always use DECIMAL for money.

2. What is the correct way to find all rows where the phone column has no value in SQL?

SELECT * FROM users WHERE phone = NULL SELECT * FROM users WHERE phone IS NULL SELECT * FROM users WHERE phone = ''

Correct! In SQL, you must use IS NULL to check for NULL values, not = NULL. The expression phone = NULL always evaluates to unknown (not true), so it would never return any rows. And phone = '' checks for an empty string, which is different from NULL — an empty string is a value (just an empty one), while NULL means no value at all.

3. Why is storing numbers as VARCHAR (text) a bad idea?

Because VARCHAR columns cannot hold numbers at all Because text-stored numbers sort incorrectly (1, 10, 2 instead of 1, 2, 10) and cannot be used in calculations Because VARCHAR is slower than INT in all situations

Correct! When numbers are stored as text, they sort alphabetically instead of numerically. This means "10" comes before "2" (because "1" comes before "2" alphabetically), and "9" comes after "80". Additionally, you cannot use math functions like SUM, AVG, or comparisons like "greater than 100" on text columns without converting them first. VARCHAR can hold digit characters (option A is wrong), and while INT is generally more efficient, that is not the primary problem (option C is incomplete).

Finished this lesson?

← Previous: What's Next Next: Relational Databases →