My personal notes for Learn Database Normalization

A video by Decomplexify

What is Database Normalization?

A good database design can sometimes protect against bad data. The database structure can restrict bad data from being entered, preventing issues and inconsistencies.

This is known as a data integrity failure—the database cannot be correct because it disagrees with itself.

When you normalize a database table, you structure it in such a way that it can’t express redundant information. For example, in a normalized table, you wouldn’t be able to give [a customer] two dates of birth, even if you wanted to. Very broadly, the table can only express one version of the truth.

Normalized tables are:

  • Easier to understand
  • Easier to enhance and extend
  • Protected from insertion anomalies, update anomalies and deletion anomalies.

Normal forms are sets of criteria that we can use to determine how “safely” the data is structured to prevent the problems listed above.

Disclaimer

It’s important to note that this video doesn’t intend to be rigorously and pedantically accurate. Instead, it focuses on practical definitions that are correct most of the time.

Core Concepts

Anomolies

When a database is not normalized, it’s possible to introduce a few types of anomalies to the data integrity:

  • Creation anomaly: We’re unable to insert data because we don’t have all the data needed to insert a row into the database.
  • Upsert anomaly: Updating data in one row creates a logical inconsistency that is incongruent with another.
  • Deletion anomaly: When deleting a row from a table, we lose data structurally unrelated to that row.

Dependencies

It’s also important to understand the idea of a functional dependency, where each value of the attribute/attributes on the left side of the arrow is associated with exactly one value of the attribute on the right side.

For example, {Player_ID, Item_Type} → {Item_Quantity} expresses that each combination of Player_ID and Item_Type is associated with a specific value of Item_Quantity.

There are many other types of dependencies in database, and they’re covered in more detail in the normal forms below.

First Normal Form (1NF)

A table in first normal form must meet a few separate criteria:

  • Row order: There is no right order. Relational databases should not depend on row order. If you find yourself relying on row order to convey meaning, be explicit and add a column for that data instead.
  • Mixing data types: You can’t mix data types within a column. There’s an explicit version with union data types, which all RDMSs won’t allow you to do. There’s also an implicit version of this, where you represent multiple values incorrectly as a more generic type, such as representing numbers as a string and mixing them with string values.
  • Primary key: Every table should have a primary key.
  • Repeating groups: Don’t restrict repeating groups of data items in a single row (or even a single column). Instead, split these into a separate table.

Second Normal Form (2NF)

This is how the non-key columns (non-key attributes) relate to the primary key. Non-key attributes are attributes part of the primary key.

Informally, second normal form says each non-key attribute must depend on the entire primary key.

Third Normal Form (3NF)

The table cannot have any transitive dependencies. This means that a non-key attribute cannot depend on another non-key attribute. In other words:

Every non-key attribute should depend on the key, the whole key, and nothing but the key.

For a stricter version of 3NF, known as Boyce-Codd normal form, you can remove “non-key" from this definition.

Every attribute should depend on the key, the whole key, and nothing but the key.

Fourth Normal Form (4NF)

A multi-value dependency indicates a relationship between three or more attributes where the values for one attribute restrict the values of another attribute to a specific set of possible values.

These are written with a double arrow (↠).

Fourth normal form says that the only kind of non-trivial multivalued dependencies we’re allowed to have in a table must be multivalued dependencies on the key.

Fifth Normal Form (5NF)

If we want to ensure that a table that’s in fourth normal form is also in fifth normal form, we need to ask ourselves can be logically thought of as the result of joining some other tables together.

In other words:

It must not be possible to describe the table as being the logical result of joining some other tables together.