|
Custom Search
| |
|
How to Design Relational Database The first thing comes to my mind when I think about database design is normalization. It's easy to talk about it in theory but very confusing for newbies when applying to reality. My personal experience about learning database design is that you can't bypass the theory part of it but you really don't care or consider about which normalization form (First Normal Form, Second Normal Form, and Third Normal Form) you are at when you have gathered some experience and really get down to relational database design in details. So let's start the database design theories. As I'm not an academic type of person, I will want to talk about things in plain English. Components in a relational database model A relational database has these four basic components:
Database Normalization Normalization is the process of breaking down our data and store them in different tables consistently. There are mainly three levels of normalization (first, second, and third normal form). Further normal forms are available but not commonly used. You can ignore them completely. Before we move onto the theories about normal forms, let's assume you are the company owner of Northwind Trader and you buy and sell goods. How would you store your company sales data in the old days? By intuition, you would want to store data separately into different draws or folders. Products data in one draw and suppliers data in another draw. Naturally you want to relate products to suppliers to find out who supplied what. This type of thinking is the basis of relational database design. First Normal Form (1st NF) In theory, this is the first thing you need to do to break down your data. In this step, we want to achieve the following three goals:
Goal #1: Each row in a table must be uniquely identifiable What are the consequences of unable to identify each row? We could either store the same information twice in the table, or update or delete wrong records as we are not able to distinguish between rows. To uniquely identify a row, we define a single column or a group of columns as primary key. Below is customers table in northwind database. Each customer is assigned a unique value for identification.
Goal #2: Each column must only contain one piece of information. It's easier to explain this requirement by illustration. We have four address data as below:
As we see above, if data is grouped together like this in a single column, it would be impossible to search postcode by country efficiently. To solve this problem, the First Normal Form requires that each of these items must be broken down and stored in its own column. This is known as storing atomic data. One column for street address, one column for City, one column for postcode, and one column for country.
Goal #3: No repeating columns - no two columns should contain the same data. In other words, every column in a row should store data that is different from all other columns. To illustrate this, think of the order_details table that lists products purchased. We could structure a table shown as below:
For the same order (OrderID 10255), the table above stores two products (ProductID = 2 and ProductID2 = 26) in the first row and another two products in the second row. The repeated columns are ProductID vs ProductID2, UnitPrice vs UniPrice2, and Quantity vs Quantity2. This structure presents two problems. First, it's difficult to search for information. Second, it creates redundant columns for the same information. To solve the problem, we split the columns and replace them with multiple rows.
What is functional dependency? We introduce functional dependency here because it's going to be used in the next two normal forms. Functional dependency describes the concept that all other columns in a table must depend completely on the primary key column. For example, the values in the CompanyName column and ContactName column depend completely on the value in the CustomerID column. In other words, if we know CustomerID, then we can tell the customer's company name and contact name. This reads like CustomerID determines CompanyName and ContactName.
The effect of achieving functional dependency is that each related set of data is put into its own table. In other words, each table only represents one subject. It should be noted that functional dependencies are not limited to depending on a single column. We can have a combination of several columns determine other columns in the table. Second Normal Form (2nd NF) For a table to be in the Second Normal Form, it must already be in the First Normal Form. After 1st NF, every table has got a primary key. Sometimes the primary key consists of multiple columns (also known as composite primary key). When this happens, we should pay attention to other columns (non-key columns) in this table to make sure all these other non-key columns must fully depend on the whole primary key. If not, we will have to action on the following three steps:
If we can't find partial dependencies of non-key columns on the composite primary key or the table uses a single column as primary key, we are already at Second Normal Form. The pre-requisite for 2nd Normal Form is that the primary key in the table must consist of multiple columns. 2nd Normal Form (partial dependency) example Table below has a composite primary key that consists of OrderID and ProductID column. The combination of OrderID and ProductID uniquely identifies each row in this table. Note that UnitPrice and Quantity are non-key column and are fully dependent on this composite primary key.
Order_details table: Now, look at OrderDate and RequiredDate column. They are non-key column as well. We notice that:
OrderDate and RequiredDate column are not fully dependent on the combination of OrderID and ProductID, which form the composite primary key. We just discovered a partial dependency. Therefore, this table is not in 2nd NF. Let's remove the partial dependency by splitting this table into the following two tables.
Orders table:
Order_details table: Our new Orders table only stores data related to orders and we created a new table Order_details. If we want to find out order date for order 10248 in Order_details table, we use the relationship between the two tables to work out that information. This relationship is established by OrderID in both tables. This relationship is called foreign key relationship. One-to-Many relationship:
Foreign key relationship (one-to-many relationship): Third Normal Form (3rd NF) The 3rd NF is also related to solving a dependency problem. But unlike 2nd NF, here the dependency is on non-primary key column(s). Third Normal Form requires that if any column that is not DIRECTLY dependent on primary key (either single column primary key or composite primary key), it should be removed and placed into a different table. Table below shows that ProductID is the primary key of the table. SupplierName is dependent on SupplierID column, which in turn depends on the primary key ProductID. This is called transitive dependency. The rule for the 3rd NF requires that each column be dependent on the primary key only, and not any other columns. Here SupplierName clearly violated this rule.
To resolve the problem, we take out SupplierName and put it into a separate suppliers table. See below.
Suppliers table: Now products table only contains SupplierID column which can be linked to suppliers table to get SupplierName.
Products table: One-to-Many relationship:
One-to-Many relationship: The benefits of normalization So far we have covered the three normal forms but we haven't explicitly listed why we need to normalize. Here are the two main benefits you get from normalization. Benefit #1: Reduce data redundancy. Unless absolutely necessary, storing redundant data is a waste of system resources. Nowadays hard disk space is cheap but is not free. More importantly, maintaining very large size of databases requires more work on database administrators and network engineers. Benefit #2: Reduce data inconsistency. The data redundancy issue can further cause date inconsistency issues. As we keep the same piece of data across different locations, we have to make sure that they are kept exactly the same all the time. For example, when one piece of data is updated, the same data in other locations has to be updated as well. The issues of data inconsistency are collectively called Data Anomaly. There are four types of possible errors they could cause. These four types of errors are the causes of data anomaly:
Summary Database normalization is clearly going to make our data management more efficiently. When we look back at what we have discussed so far, we can summarize the key points for the three normal forms. Remembering the following three short sentences can help you quickly recall what each normalization is all about. This was what I did when attending the exam for System Analysis course.
We have now concluded data normalization theory. Next logical step in the learning is understanding what data integrity is and how to use various database features (we use MySQL for illustration) to enforce data integrity - Enforce data integrity by databases constraints. Other tutorials in this category 1. Enforce Data Integrity by Database Constraints |
Copyright © 2024 GeeksEngine.com. All Rights Reserved. This website is hosted by HostGator. No portion may be reproduced without my written permission. Software and hardware names mentioned on this site are registered trademarks of their respective companies. Should any right be infringed, it is totally unintentional. Drop me an email and I will promptly and gladly rectify it. |
Home | Feedback | Terms of Use | Privacy Policy |