Custom Search
 


Enforce Data Integrity by Database Constraints


What is data integrity?

Data integrity refers to the accuracy, consistency, and reliability of data that is stored in the database. Both database designers and database developers are responsible for implementing data integrity within one or a set of related databases.

A simple example is that in Northwind categories table, CategoryName must be unique no matter how many records the table holds. If this rule is not enforced, Seafood category could be accidentally stored twice in the table which clearly violates our business rules.

Types of Data Integrity

There are four types of data integrity:

  1. Row integrity
  2. Column integrity
  3. Referential integrity
  4. User-defined integrity

Row integrity

Row integrity refers to the requirement that all rows in a table must have a unique identifier that can be used to tell apart each record. This unique identifier is normally known as Primary Key of the table. A Primary Key can be formed by a single column or a combination of multiple columns.

For example, categories table uses single column CategoryID as the unique identifier.

Single column primary key

Order_details table uses the composition of OrderID and ProductID as its unique identifier. It means that the row with OrderID 10248 and ProductID 11 can only appear once in the table.

Composition of two column primary key

Column integrity

Column integrity refers to the requirement that data stored in a column must adhere to the same format and definition. This includes data type, data length, default value of data, range of possible values, whether duplicate values are allowed, or whether null values are allowed.

For example, LastName in employees table must be varchar, max 20 characters long, default to empty string, and can't be null.

Lastname in employees table

Referential integrity

How do you tell who supplied product Longlife Tofu in products table? Referential integrity guarantees that a supplier exists.

  • You locate the data row Longlife Tofu in products table and find the value in SupplierID column is 4.

    products table:
    SupplierID in products table

  • You then check suppliers table for the record SupplierID 4 and find out its CompanyName is Tokyo Traders.

    suppliers table:
    SupplierID in suppliers table

Tokyo Traders is the supplier for product Longlife Tofu. Here referential integrity guarantees two things:

  1. SupplierID 4 (of Tokyo Traders) in suppliers table can't be deleted or updated because it has been referenced by product Longlife Tofu via the SupplierID column in products table.

  2. None can we add a new product in products table with a SupplierID 30 as this supplier does not exist in suppliers table.

Referential integrity is defined at the database design time and enforced by creating table relationships between tables. After the referential relationship is set up, database engine will follow the two rules stated above to guarantee data integrity. It will raise errors if the rules are violated.

User-defined integrity

Some applications have complex business logic that can't be enforced by defining criteria in the three data integrity types we have discussed so far (row integrity, column integrity, and referential integrity). In this circumstance, we need to implement our own code logic to make sure data is saved accurately and consistently across all business domains. The code logic can be implemented by using database triggers, stored procedures or functions, or by using tools external to the database engine such as embedding non SQL languages (like VBScript or C# in SQL Server) in the database, or by using scripting or programming languages in the middle-tier or front-tier of the application.

Here is an example of a user-defined data integrity.

In our Northwind database, when a customer places an order, we first need to check if this is a new customer to our business. If yes, we add this customer to customers table. We then check if we have enough quantity for each product this customer ordered in stock. If yes, we add this product to order_details table for this order and decrement the quantity for this product in Products table. There are four tables involved in this transaction:

  1. customers
  2. products
  3. orders
  4. order_details

To enforce data integrity, we use a database transaction which involves these four table (customers, products, orders, order_details):

  • We roll back the transaction if inserting the new customer failed in customers table.
  • We roll back the transaction if inserting a new order failed in orders table.
  • We roll back the transaction if inserting the record in order_details table failed.
  • We roll back the transaction if decrementing product quantity failed in products table.

When a transaction is rolled back, the four tables are restored back to their original state (which is prior to the start of the transaction). The logic here is implemented by using our own user-defined data integrity. What tools you use to enforce the data integrity depend on your system requirements.

Data integrity is enforced by database constraints

Database Constraints are declarative integrity rules of defining table structures. They include the following 7 constraint types:

  1. Data type constraint:

    This defines the type of data, data length, and a few other attributes which are specifically associated with the type of data in a column.

  2. Default constraint:

    This defines what value the column should use when no value has been supplied explicitly when inserting a record in the table.

  3. Nullability constraint:

    This defines that if a column is NOT NULL or allow NULL values to be stored in it.

  4. Primary key constraint:

    This is the unique identifier of the table. Each row must have a distinct value. The primary key can be either a sequentially incremented integer number or a natural selection of data that represents what is happening in the real world (e.g. Social Security Number). NULL values are not allowed in primary key values.

  5. Unique constraint:

    This defines that the values in a column must be unique and no duplicates should be stored.

    Sometimes the data in a column must be unique even though the column does not act as Primary Key of the table. For example CategoryName column is unique in categories table but CategoryName is not primary key of the table.

  6. Foreign key constraint:

    This defines how referential integrity is enforced between two tables.

  7. Check constraint:

    This defines a validation rule for the data values in a column so it is a user-defined data integrity constraint. This rule is defined by the user when designing the column in a table. Not every database engine supports check constraints. As of version 5.0, MySQL does not support check constraint. But you can use enum data type or set data type to achieve some of its functionalities that are available in other Relational Database Management Systems (Oracle, SQL Server, etc).

Data integrity type Enforced by database constraint
Row integrity
  • Primary key constraint
  • Unique constraint
Column integrity
  • Foreign key constraint
  • Check constraint
  • Default constraint
  • Data type constraint
  • Nullability constraint
Referential integrity
  • Foreign key constraint
User-defined integrity
  • Check constraint

Use database constraints whenever possible

There are two main reasons why using database constraints is a preferred way of enforcing data integrity.

First, constraints are inherent to the database engine and so use less system resources to perform their dedicated tasks. We resort to external user-defined integrity enforcement only if constraints are not sufficient to do the job properly.

Second, database constraints are always checked by the database engine before insert, update, or delete operation. Invalid operation is cancelled before the operation is undertaken. So they are more reliable and robust for enforcing data integrity.



On next page, we are going to look at How to enforce data type constraint.


Copyright © 2010 GeeksEngine.com. All Rights Reserved.

This website is hosted by LunarPages.

No portion may be reproduced without my written permission. Software and hardware names mentioned on this site are registered trademarks of their respective companies. Should any right be infringed, it is totally unintentional. Drop me an email and I will promptly and gladly rectify it.

 
Home | Kung Fu Timer | Feedback | Terms of Use | Privacy Policy