Database Normalization for Developers
Practice should always be based upon a sound knowledge of theory.
~ Leonardo da Vinci
Rosina Bignall
Blog: http://rosinabignall.com
Twitter: @rosinabignall
JoindIn: https://joind.in/13747
Slides: http://goo.gl/BGkrkI
Assistance Dog Zaakir
http://tamarr.org
Database Normalization
The process of efficiently organizing data in the database
Database Normalization
Optimizing the data stored in the database
Database Normalization
The process of examining relations for anomalies
Database Normalization
An important part of the database design process
Database Normalization
Can be thought of as a trade-off between data redundancy and performance
Goals of Normalization
- Arranging data in logical groupings
Goals of Normalization
- Arranging data in logical groupings
- Eliminating redundant data
Goals of Normalization
- Arranging data in logical groupings
- Eliminating redundant data
- Ensuring data dependencies make sense
Goals of Normalization
- Arranging data in logical groupings
- Eliminating redundant data
- Ensuring data dependencies make sense
- Access and manipulate the data quickly and efficiently
Who needs it
- Online Transactional Processing
Data vs Information
Business Rules
Relationships
Accuracy and Performance
Poor Normalization causes problems such as
- Excessive Disk I/O
- Poor Performance
- Inaccurate Data
- Incorrect or Missing Data
Definitions
Heading
A set of attributes
Definition
Definition
Tuple
A set of ordered pairs <A,v>, one pair for each attribute in the heading
Definition
Row
Essentially the implementation of a tuple
Definition
Relation
An ordered pair <H,h>, where h is the set of tuples all having heading H
Definition
Table
Essentially the implementation of a relation
Definition
Functional Dependency (FD)
An expression of the form X->Y where X and Y are subsets of the attributes then whenever two tuples have the same value for X they also have the same value for Y
Functional Dependencies
{ CITY } -> { STATUS}
Definition
Key
One or more attributes that uniquely identifies a tuple in the relation
Invoice Example
First Normal Form (1NF)
All relations are by definition in 1NF
1NF simply means that each tuple in the relation contains exactly one value for each attribute
Invoice
- Inv_no
- Date
- Cust_no
- Cust_Address
- Item_Id
- Description
- Unit_Price
- Quantity
- Amount
- Total
1st Normal Form
Second Normal Form (2NF)
A relation is in second normal form (2NF) if and only if for every nontrivial FD X->Y at least one of the following is true: (a) X is a superkey; (b) Y is a subkey; (c)X is not a subkey.
No repeating groups
How to get 2NF
Create a separate relation for each set of repeating data
Invoice
- Inv_no (k)
- Cust_no
- Cust_Address
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Description
- Unit_price
- Quantity
- Amount
2nd Normal Form
Third Normal Form (3NF)
A relation is in third normal form (3NF) if and only if for every nontrivial FD X->Y either (a) X is a superkey; (b) Y is a subkey.
Every attribute must depend on the primary key
How to get 3NF
Remove attributes that don't depend on the primary key to another relation
Invoice
- Inv_no (k)
- Cust_no
- Cust_Address
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Quantity
- Amount
3rd Normal Form
Item
- Item_Id (k)
- Description
- Unit_price
Boyce/Codd Normal Form (BCNF)
Every attribute must depend on the key, the whole key and nothing but the key
A relation is in Boyce/Codd normal form (BCNF) if and only if, for every nontrivial FD X->Y, X is a superkey.
How to get BCNF
Remove attributes that don't depend on the whole key to a separate relation
Invoice
- Inv_no (k)
- Cust_no
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Quantity
- Amount
Boyce/Codd Normal Form
Item
- Item_Id (k)
- Description
- Unit_price
Customer
- Cust_no (k)
- Cust_Address
Resulting Relations
Another way
Invoice Example
Procedure
- Determine Attributes
- Determine Functional Dependencies
- For every set of functional dependencies with the same left side (X) add a relation with heading containing the left side (X) and all the right sides (Y).
Attributes
- Date
- Inv_no
- Cust_no
- Cust_Address
- Item_Id
- Description
- Unit_price
- Quantity
- Amount
- Total
Functional Dependencies (FDs)
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Procedure
Determine AttributesDetermine Functional Dependencies- For every set of functional dependencies with the same left side (X) add a relation with heading containing the left side (X) and all the right sides (Y).
Functional Dependencies (FDs)
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Invoice Relation
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> Total- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Customer Relation
- Cust_no -> Cust_Address
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> TotalCust_no -> Cust_Address- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Item Relation
- Item_Id -> Description
- Item_Id -> Unit_price
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> TotalCust_no -> Cust_AddressItem_Id -> DescriptionItem_Id -> Unit_price- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Invoice Detail Relation
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Resulting Relations
Should you Normalize?
Why you might not normalize
- Joins are expensive
- Normalized design can be difficult
- Rapid prototyping
Why you might not normalize
- Joins are expensive
Why you might not normalize
- Joins are expensive
- Normalized design can be difficult
Why you might not normalize
- Joins are expensive
- Normalized design can be difficult
- Rapid prototyping
If you don't normalize
put triggers and controls in to ensure data consistency
Resources
- Database Design & Relational Theory: Normal Forms & All That Jazz, C.J. Date
- Database In Depth: Relational Theory for Practitioners, C.J. Date
- Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, Michael J. Hernandez
- An introduction to database systems, 7th ed
Rosina Bignall
Blog: http://rosinabignall.com
Twitter: @rosinabignall
Slides: http://goo.gl/BGkrkI
Feedback please!
https://joind.in/13747
Database Normalization for Developers
By Rosina Bignall
Database Normalization for Developers
A talk for PHPTek 2015
- 2,044