Normalize your damn data

Bob stared at his computer screen with a frustrated expression on his face. Months into the project, the customer suddenly wanted to modify and even delete their data. “Why the hell didn’t they mention that up front?” he wondered. Bob had been given a highly visible project and made significant progress tailoring the app to the customer’s demands, but now he was at a breaking point.

The customer, working in a corporate environment, really needed a way to record and report metrics to the executives. The customer stated that they need a way to put in organizational objectives, metric names, and metric values. Simple enough, Bob put all the data in the same data set titled “Objectives”.

Step 1: Create a way to put in organizational objectives. Bob created a text column for organizational objectives

Step 2: Prevent the same objective from being entered twice. Bob constrained the objective column to unique values

Step 3: Associate metrics with objects. Bob had the form reference the objective column to fill in the objective column and used a type column to differentiate between objective and metric

Bob landed on this structure:

The customer noted that there were some typos in the objectives column. Due to the references, Bob had to create a way to update every other record that was impacted. The customer wanted to change individual objectives to completely different meanings sometimes, so he took out the functionality to update every other record.

Bob continued to make hack after hack. Eventually, he would create another dataset that functioned to copy data back and forth.

Afterwards, Bob had a chance encounter with Javier, a preacher of the gospel of normalization. Javier explained to him that there’s a lot of suffering in the world due to data that’s not been normalized. Thus, Javier said unto him:

When normalizing your data, thou shalt:

— Prevent repeated data in your datasets
 — Separate your entities into different datasets
 — Use keys to identify your data
 — Make all attributes (columns) dependent on the primary key

“Of course!” exclaimed Bob. “If I had just separated my objectives and metrics into different datasets to begin with, this application would have been much easier to develop. If I had only done Codd’s will!”

If you have been developing for a non-trivial amount of time, you have probably encountered data that’s not been normalized. We don’t normalize data because the customers love it, we normalize data to make our lives as developers easier. This also results in the ability to create robust reports and queries.

Bob made the same mistake that a lot of other developers make. He was not working within a database though. He was working on a platform that allowed him to create list datasets.

Regardless if you’re a database, back-end, front-end, full-stack, CMS, or platform developer, you should get familiar with Edward Codd’s rules of normalization. Some resources to help:

Fred Coulson Tutorial on Normalization

Steve Litt Normalization Tips

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

TechNet — Normalization

Thanks for reading, and remember to normalize your data!

Denny Headrick loves to work with and write about software development. You can follow him on Twitter @dennythecoder. If you enjoyed the article, make comments, hit applause, share with others, or quietly smile.

You may also like

  • Use HTML Semantically

    After years of hacks using tables for positioning elements with tables, you may be surprised to find that HTML has become increasingly expressive.

    Read More >
  • Java 8 Lambdas

    In this post, I discuss my brief experience with Java 8's lambdas and thoughts about them.

    Read More >