Custom Search
 


How to Enforce Data Type Constraint


Data Type Constraint

MySQL supports a number of data types in three categories:

  1. Numeric types
  2. Date and Time types
  3. String types

Data type constraint is defined when you create a new table or modifiy an existing table. Below data type is defined for CategoryName column as varchar of 15 characters long. (Screenshot taken from the free MySQL client - SQLyog).

Change data type

Practical Data Type Tips for MySQL

To understand MySQL data type thoroughly, refer to MYSQL documentation for detailed explanation.

Below listed some practical tips you should consider when defining data types in MySQL. Note that these considerations are based on my experience of using MySQL with PHP. If you use different languages to work with MySQL, the following may not be applicable to you.

Tip #1: When creating an integer data type, define it as unsigned if the column is never going to store negative values.

Unsigned integers are non-negative numbers. Signed integers can be either positive or negative values. If your application will never use negative numbers for an integer column, always define it as unsigned. This prevents unwanted negative numbers from being stored in the column. In addition, unsigned integer is smaller in length than the corresponding signed integer.

For example, auto-incremented (starting from 1) primary key column is a good candidate for unsigned int. Max value for unsigned int is 4294967295 which is slightly over 4.2 billion. In northwind, OrderID is defined as unsigned integer because we know it does not make sense if OrderID is a negative number.

Unsigned integer for OrderID

Tip #2: So you are using unsigned integer data type, but be aware of a gotcha.

Yes, there is a gotcha which is related specifically to MySQL. It caught me a couple of times.

Let's do an experiment to illustrate how this gotcha can easily occur.

  1. First, let's create a table called gotcha in SQLyog.

    In this table, we created four columns with data type of int, mediumint, smallint, and tinyint.

    CREATE TABLE `gotcha`
    (
    `test` int(10) unsigned NOT NULL default '0',
    `test2` mediumint(8) unsigned NOT NULL default '0',
    `test3` smallint(5) unsigned NOT NULL default '0',
    `test4` tinyint(3) unsigned NOT NULL default '0'
    )

    Table used for test of out of range unsigned integer value

  2. Then we add one record to the gotcha table. Add integer 5 for all 4 columns.

    insert into `gotcha`(`test`,`test2`,`test3`,`test4`) values (5,5,5,5);

    Data for test of out of range unsigned integer value

  3. Run a query for to update the value for each column. The query will try to update the column to a negative number.

    update gotcha set test = test - 6

    update gotcha set test2 = test2 - 6

    update gotcha set test3 = test3 - 6

    update gotcha set test4 = test4 - 6

  4. Check what the value is in each column after the updates.

    Result out of range unsigned integer value

We subtracted 6 from 5 and it should return -1 and store it in each column of this row. But because all columns are defined as unsigned integer data type, no negative number is allowed to be stored in there. Rather than raising an out-of-range error, MySQL simply returned the max integer value for the defined data type for each column.

The gotcha here is that MySQL does not throw an error and so you are not able to catch the error in your PHP or other languages' error handling routine.

This is how it caught me. My site keeps a counter for the number of unread messages in Inbox for each member. When a member read an unread message, the counter is reduced by 1. Because of a bug in my code, the counter sometimes was trying to decrement to -1. You know what happened next - members complained about the meaningless data 4294967295 shown on their page.

How to fix the problem

Actually this is not a bug in MySQL. This strange behavior is expected and documented in MySQL Reference Manual.

To fixed the problem, MySQL recommends the use of CAST function - CAST(expr AS type). It will cast the value to 0 if a negative value is returned.

update gotcha set test = CAST(test - 6 AS SIGNED)

update gotcha set test2 = CAST(test2 - 6 AS SIGNED)

update gotcha set test3 = CAST(test3 - 6 AS SIGNED)

update gotcha set test4 = CAST(test4 - 6 AS SIGNED)

Before CAST:
Fixed out of range unsigned integer value

After CAST:
Fixed out of range unsigned integer value

The lesson we learned here

When you define an integer (bigint, int, mediumint, smallint, tinyint) column, think about how it's going to be used in your application. If the column value can be decremented, always use the CAST function in the update statement.

Tip #3: Don't use bigint data type if you don't need it.

Int is a normal-size integer and it's sufficient in most cases. The signed int range is -2147483648 to 2147483647. The unsigned int range is 0 to 4294967295 (which is slighly over 4.2 billion).

Bigint is a large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

There are times when you do need to use bigint. For example, if you use Unix timestamp to define a date of birth column, because of the limitations 32-bit signed integer on Unix/Linux and Windows, dates in Unix timestamp format are restricted to the years 1901-2038 on Unix or Linux, and 1970-2038 on Windows due to integer overflow for dates beyond those years. To overcome the overflow problem, signed bigint must be used. Refer to article How to use Date and Time data as integer value in PHP and MySQL for more details.

Tip #4: For small definition tables, in most cases, unsigned tinyint for primary key column is sufficient.

In northwind, table categories is a definition table as it defines what categories the products can be grouped into.

CategoryID column in categories table is defined as unsigned tinyint which ranges from 0 to 255. No need to define it as unsigned int (0 to 4294967295), unsigned mediumint (0 to 16777215), or unsigned smallint (0 to 65535) because we are certain that we will never have more than 255 categories.

If we only look at CategoryID in categories table itself, it does not seem worthwhile to be bothered. But if we look at products table where CategoryID is used as foreign key column, it does look extremely helpful to use tinyint when taking into account the efficiency of query performence in JOINs between categories and products table as well as data storage efficiencies.

Unsigned tinyint for CategoryID

Tip #5: Only use UTF8 character set where you need it.

If you are certain that your database is only used to store data in English, you don't need utf8 character set. Use latin1 character set with latin1_swedish_ci collation instead. In terms of data storage, not all utf8 characters use the same number of bytes and can require up to three bytes per character. That means UTF8 reservs more spaces to store data than latin1. Also comparison and sorting for utf8 data use more system resources than other character sets.

Use utf8 character set and utf8_unicode_ci (or utf8_general_ci) collation if your database is going to be used for multilingual applications.

Define utf8 data type

Tip #6: When creating a True or False type of column in MySQL, don't use bit or tinyint data type for defining 0 for FALSE and 1 for TRUE.

This is because bit data type can be any number between 1 to 64. Data range for signed tinyint is -128 to 127. The unsigned tinyint range is 0 to 255. Using bit or tinyint for TRUE or FALSE opens up the possiblity of storing numbers that are not 0 or 1. The best practice is create an enum data type with two values only - enum('y','n').

In our Northwind MySQL database, we defined Discontinued column in products table as enum data type with two values only: y for yes and n for no.

Enum data type for Discontinued column in products table

When we insert or update a product, we can simply use the character y or n to get the job done.

update products set discontinued = 'y' where ProductID = 1

Tip #7: When defining string data type, consider its length in pratical use.

The two main string data types are char and varchar. Char data type is a fixed-length string that is always right-padded with spaces to the specified length when stored. Varchar data type stores variable length of string value.

For example, CategoryName is defined as varchar(15) which is variable string of 15 characters. The category name Beverages uses only 9 character space. But if you define the column as char(15) which is fixed length string of 15 characters, the category name Beverages will occupy 15 character space even though it only has 9 characters. There will be 6 right-padded spaces to the end of the name Beverages.

Use char data type only if you are centain that the column always stores fixed length of data. For example, MD5 hashed value is always 32-character long. In this case, it's preferable to define the column as char(32) rather than varchar(32) or some other length. Index created on a char column makes queries slighly more efficient than index created on a varchar columns. More on database indexes later.

I use MD5 hashed value for storing passwords in a table. When verifying the password submitted by a user, the submitted password is MD5 hased and then the hased value is compared with the hased value stored in the table. If they match, the user is a legit user.

Tip #8: When creating date and time type of data, store them as integer in your MySQL table rather than date or datetime data type.

The advantages are mainly related to coding flexibility, easiness of data comparison, and increased query performance with indexes on integer columns.

Because in PHP we can convert date and time to Unix timestamp value which is a long integer containing the number of seconds between the Unix Epoch (January 1 1970 00:00:00 GMT) and the time specified, we can store the long integer in our column for date and time related data. When using the column data, we can convert them back to the date and time format for displaying purpose by using the Date and Time Functions in PHP.

These integer values are also extremely easy for calculation purpose when we need to add or substract certain time from it. For example, the date and time 2007-11-30 00:00:00 has a timestamp value 1196348400 (which is the number of seconds since January 1 1970 00:00:00 GMT). To add one day to 2007-11-30 00:00:00, we convert one day to 86400 seconds (1 x 24 x 60 x 60) and then add 86400 seconds to 1196348400. That is 86400 + 1196348400 = 1196434800. We then convert 1196434800 back to date and time format for displaying - we got 2007-12-01 00:00:00.

More about date and time in article How to use Date and Time data as integer value in PHP and MySQL.

Tip #9: When storing current date and time in a table, always store GMT time.

Using Greenwich Mean Time (GMT) is very handy no matter where your application is used. When display the date and time to the end user, you can convert the GMT to the user's local time based on his/her timezone.

Again, refer to the GMT part of the article How to use Date and Time data as integer value in PHP and MySQL for more information on this topic.

Tip #10: When you create a column to store IP addresses, always define it as unsigned interger data type rather than dotted representation of a network address string (e.g. 147.132.42.18).

This can grealy imporve query performance when the column is searched (WHERE clause), joined to another column, or sorted. It also makes IP range calculation and comparison much easier.

Convert IP address to integer

When saving IP address data to a coulmn, use MySQL function INET_ATON() which stands for "ASCII to numbers". This function converts dotted representation of a network address string such as '127.0.0.1' for localhost to an integer that represents the numeric value of the address.

SELECT INET_ATON('127.0.0.1') ==> 2130706433

Here is an example of how to use PHP to save IP as unsigned integer in MySQL:

<?
$username 
"fred";
$ip $_SERVER["REMOTE_ADDR"];

$sql "insert into member(username, ip_addr) values('"
    
mysql_real_escape_string($username). "', INET_ATON('"$ip"'))";

mysql_query($sql);
?>

Convert integer back to dotted IP address

When we need to display IP address back to its dotted format, use INET_NTOA which stands for "numbers to ASCII".

SELECT INET_NTOA(2130706433) ==> '127.0.0.1'

Here is an example of how to use PHP to retrieve IP from MySQL:

<?
$username 
"fred";

$sql "select INET_NTOA('ip_addr') as ip from member where username = '"
    
mysql_real_escape_string($username). "'";
    
$result mysql_query($sql);
$ip mysql_result($result00);
?>

INET_ATON and INET_NTOA have equivalent PHP functions:

MySQL Function Equivalent PHP Function
INET_ATON ip2long
INET_NTOA long2ip

This is how to use the equivalent PHP functions:

<?
echo ip2long("127.0.0.1"); // Returns 2130706433

echo long2ip(2130706433); // Returns 127.0.0.1
?>

Tip #11: Think again when you use blob data type.

A blob data type can be used to store binary data such as media files (photos or other very large non text data) in MySQL database. Blob data is large and can increase database size quickly. A way to get around this problem is store the media files in file directory system and only store their filenames in database. Refer to article Implement file directory based image management in MySQL Northwind for detailed information.

When do you need to use BLOB data type in MySQL?

If, for security reasons, you need to protect the BLOB data, it's preferrable to store them in database tables rather than storing in file systems.

For example, you create a photo album on a social networking site and share it with friends. If the photo is stored in a table, before you retrieve it from the table, you can verify if the viewer has the privilege to view it. If yes, you run the SELECT query to get the photo from the table and display it. If the viewer does not have the privilege, you display a message to indicate the photo can't be viewed. But if the photo is stored on disk, as long as the photo's URL is known, anyone can view it on the Internet.

On next page, we are going to look at how to enforce Defualt Constraint and Nullability Constraint.





Other tutorials in this category

1. How to Design Relational Database

2. Enforce Data Integrity by Database Constraints

3. How to Enforce Default Constraint and Nullability Constraint

4. Primary Key Constraint and Unique Constraint

5. Foreign Key Relationships and Considerations

Back to Tutorial Index Page


Copyright © 2024 GeeksEngine.com. All Rights Reserved.

This website is hosted by HostGator.

No portion may be reproduced without my written permission. Software and hardware names mentioned on this site are registered trademarks of their respective companies. Should any right be infringed, it is totally unintentional. Drop me an email and I will promptly and gladly rectify it.

 
Home | Feedback | Terms of Use | Privacy Policy