What data integrity is and how to enforce it

Enforce Data Integrity by Database Constraints

What is data integrity?

Data integrity refers to the accuracy, consistency, and reliability of data that is stored in the database. Both database designers and database developers are responsible for implementing data integrity within one or a set of related databases.

A simple example is that in Northwind categories table, CategoryName must be unique no matter how many records the table holds. If this rule is not enforced, Seafood category could be accidentally stored twice in the table which clearly violates our business rules.

Types of Data Integrity

There are four types of data integrity:

Row integrity
Column integrity
Referential integrity
User-defined integrity

Row integrity

Row integrity refers to the requirement that all rows in a table must have a unique identifier that can be used to tell apart each record. This unique identifier is normally known as Primary Key of the table. A Primary Key can be formed by a single column or a combination of multiple columns.

For example, categories table uses single column CategoryID as the unique identifier.

Single column primary key

Order_details table uses the composition of OrderID and ProductID as its unique identifier. It means that the row with OrderID 10248 and ProductID 11 can only appear once in the table.

Composition of two column primary key

Column integrity

Column integrity refers to the requirement that data stored in a column must adhere to the same format and definition. This includes data type, data length, default value of data, range of possible values, whether duplicate values are allowed, or whether null values are allowed.

For example, LastName in employees table must be varchar, max 20 characters long, default to empty string, and can't be null.

Lastname in employees table

Referential integrity

How do you tell who supplied product Longlife Tofu in products table? Referential integrity guarantees that a supplier exists.

You locate the data row Longlife Tofu in products table and find the value in SupplierID column is 4.
products table:
You then check suppliers table for the record SupplierID 4 and find out its CompanyName is Tokyo Traders.
suppliers table:

Tokyo Traders is the supplier for product Longlife Tofu. Here referential integrity guarantees two things:

SupplierID 4 (of Tokyo Traders) in suppliers table can't be deleted or updated because it has been referenced by product Longlife Tofu via the SupplierID column in products table.
None can we add a new product in products table with a SupplierID 30 as this supplier does not exist in suppliers table.

Referential integrity is defined at the database design time and enforced by creating table relationships between tables. After the referential relationship is set up, database engine will follow the two rules stated above to guarantee data integrity. It will raise errors if the rules are violated.

User-defined integrity

Some applications have complex business logic that can't be enforced by defining criteria in the three data integrity types we have discussed so far (row integrity, column integrity, and referential integrity). In this circumstance, we need to implement our own code logic to make sure data is saved accurately and consistently across all business domains. The code logic can be implemented by using database triggers, stored procedures or functions, or by using tools external to the database engine such as embedding non SQL languages (like VBScript or C# in SQL Server) in the database, or by using scripting or programming languages in the middle-tier or front-tier of the application.

Here is an example of a user-defined data integrity.

In our Northwind database, when a customer places an order, we first need to check if this is a new customer to our business. If yes, we add this customer to customers table. We then check if we have enough quantity for each product this customer ordered in stock. If yes, we add this product to order_details table for this order and decrement the quantity for this product in Products table. There are four tables involved in this transaction:

customers
products
orders
order_details

To enforce data integrity, we use a database transaction which involves these four table (customers, products, orders, order_details):

We roll back the transaction if inserting the new customer failed in customers table.
We roll back the transaction if inserting a new order failed in orders table.
We roll back the transaction if inserting the record in order_details table failed.
We roll back the transaction if decrementing product quantity failed in products table.

When a transaction is rolled back, the four tables are restored back to their original state (which is prior to the start of the transaction). The logic here is implemented by using our own user-defined data integrity. What tools you use to enforce the data integrity depend on your system requirements.

Data integrity is enforced by database constraints

Database Constraints are declarative integrity rules of defining table structures. They include the following 7 constraint types:

Data type constraint:
Data type constraint defines the type of value a column should contain, including numeric, date and time, character, etc.
Nullability constraint:
Nullability constraint defines whether a column is allowed to store NULL values.
The NOT NULL constraint prevents the column from having a NULL value as it specifies that a NULL value is not allowed.
NULL constraint represents unknown data which is not the same as no data or empty data.
Default constraint:
Default constraint specifies a default value for a column if no value is supplied when adding a record to a table.
Primary key constraint:
A primary key is a column (or multiple columns) in a table that can be used to uniquely identify each row in the table. The primary key column cannot contain NULL values. A UNIQUE constraint is automatically generated for a primary key column.
One table can only have one primary key, which may consist of single or multiple columns. If the primary key contains multiple columns, it's called a composite primary key, and the combination of the multiple columns must contain distinct values.
The primary key can be either a sequentially incremented integer number or a natural selection of data that represents what is happening in the real world (e.g., Social Security Number).
Foreign key constraint:
Foreign key constraint defines how referential integrity is enforced between two tables. It uses a foreign key column in a table to link to a primary key column in another table.
The table with the foreign key can be called child table, referencing table, or foreign key table. The table with the primary key can be called parent table, referenced table, or primary key table.
Database engines use FOREIGN KEY constraint to enforce data in sync between the parent table and the child table. You can't delete a row in the parent table when the value of the primary key still exists in the foreign key column in the child table.
Unique constraint:
Sometimes the data in a column must be unique even though the column does not act as PRIMARY KEY of the table. For example, CategoryName column is unique in categories table, but CategoryName is not a primary key of the table. In this case, we create UNIQUE constraint which defines that the values in a column or columns must be unique, and no duplicates can be stored at any time. You can have many UNIQUE constraints per table.
Check constraint:
Check constraint enforces that values in a column must satisfy a specific condition. In the background, the database engine uses a validation rule to check data quality when they are entered into the column. This rule is defined by the user when designing the column in a table. Not every database engine supports check constraints.

Data integrity type	Enforced by database constraint
Row integrity	Primary key constraint Unique constraint
Column integrity	Foreign key constraint Check constraint Default constraint Data type constraint Nullability constraint
Referential integrity	Foreign key constraint
User-defined integrity	Check constraint

Use database constraints whenever possible

There are two main reasons why using database constraints is a preferred way of enforcing data integrity.

First, constraints are inherent to the database engine and so use less system resources to perform their dedicated tasks. We resort to external user-defined integrity enforcement only if constraints are not sufficient to do the job properly.

Second, database constraints are always checked by the database engine before insert, update, or delete operation. Invalid operation is cancelled before the operation is undertaken. So they are more reliable and robust for enforcing data integrity.

On next page, we are going to look at How to enforce data type constraint.

Other tutorials in this category

1. How to Design Relational Database

2. How to Enforce Data Type Constraint

3. How to Enforce Default Constraint and Nullability Constraint

4. Primary Key Constraint and Unique Constraint

5. Foreign Key Relationships and Considerations

Back to Tutorial Index Page

	Custom Search