Demystifying Database Normalization

This will be a short and concise material to understand and check your understanding of Normal forms in databases. I assume you have already read about them from standard text books like Silberschatz or Navathe.

Functional Dependency – We say an attribute set (a single attribute or a group of attributes) $A$ determines an attribute set $B$ if whenever $A$ repeats its value, $B$ also repeats its value. Any unique attribute determines every other attribute.

Super Key – An attribute set which determines every other attributes

Candidate Key – A minimal super key becomes a Candidate key. i.e., if we remove any attribute from a candidate key, it should no longer be a super key

Primary Key – Any one candidate key is selected as a Primary key so that the table can have only one primary key but multiple candidate keys may be present (we will see its use in Table indexing)

Prime attribute: An attribute that is part of some candidate key is a prime attribute


Student ID	Name	Phone	Email
CS_001	Ravi	9995677893	email1@abc.com
CS_002	Manya	9992347689	[email protected]
CS_003	Ravi	9456792001	[email protected]

Here, we are having a table which is an instance of the Relation Student (Student ID, Name, Phone, Email) – the same relation can have other table instances too. We can see that Name value is repeating and so cannot be the key. Student ID, Phone and Email are Unique values and all are candidate keys (we can say they are candidate keys for the given table; but to say the same for Student relation we must be sure of all possible instances of it; i.e., there can be another Student table with $2$ students having same Phone number or email. Assuming all instances of Student relation are like the given one we have the following keys:

Candidate Keys: Student ID, Phone, Email
Primary Key: Any one of {Student ID, Phone, Email}
Super Keys: {Student ID}, {Phone}, {Email}, {Student ID, Name}, {Phone, Name}, {Email, Name}, {Student ID, Phone}, {Phone, Email}, {Email, Student ID}, {Student ID, Name, Phone}, {Phone, Name, Email}, {Email, Name, Student ID}, {Email, Phone, Student ID} {Email, Name, Student ID, Phone}

Now, we will see how we can ensure a given relation satisfies certain normal forms. The higher the normal form, better the relation is in terms of redundancies and removal of anomalies. But practically to ensure higher normal forms we need to decompose the relations to smaller ones which can have performance issues. Anyway that is not our concern here and we see how we can decompose the relations to ensure higher normal forms. While we decompose a relation we must ensure

Losslessness – A decomposition of a relation $R$ into $R_1$ and $R_2$ is not lossy if $R_1 \cap R_2$ is a key of either $R_1$ or $R_2.$

It is also ideal to keep dependencies. i.e., while decomposing a relation it is ideal we can ensure the dependencies without a need to JOIN the decomposed relations. But this is not a strict requirement like losslessness.

1NF

1NF does not allow multi-valued attributes. i.e., every attribute in a row must have a single value. (An attribute is free to have multiple components like an Address attribute having House No., Street Name etc.). For example, in the below table “Phone” attribute is violating this.

For the below relation assume that only a single course can start on any day and all enrolments must be on that day. i.e., Enroll Date $\to$ Course ID is a FD as well as Course ID $\to$ enroll Date. The candidate keys for the below relation are {SID, Course ID} and {SID, Enroll Date}

SID	Course ID	Enroll Date	SNAME	Phone	Marks	Grade
100	1	31/7/2018	DONY	7474749892, 9994567321	80	A
100	2	1/8/2018	DONY	7474749892, 9994567321	70	B
101	1	31/7/2018	JOSE	7856400234	85	A
101	2	1/8/2018	JOSE	7856400234	85	A

To make the above table to 1NF, we can do as follows:

SID	Course ID	Enroll Date	SNAME	Phone	Marks	Grade
100	1	31/7/2018	DONY	7474749892	80	A
100	1	31/7/2018	DONY	9994567321	80	A
100	2	1/8/2018	DONY	7474749892	70	B
100	2	1/8/2018	DONY	9994567321	70	B
101	1	31/7/2018	JOSE	7856400234	85	A
101	2	1/8/2018	JOSE	7856400234	85	A

i.e., we duplicated ROWS to remove multi-value for attribute Phone and thus make the relation in 1NF.

2NF

Second Normal Form requires a relation to be in 1NF and for every functional dependency $A \to B,$ we should not have $C \to B$ where $C \subset A$ and $B$ is a non-prime attribute. i.e., there should not be any partial functional dependency ($A\to B$ is a partial FD, if a proper subset of $A$ determines $B$) for any non-prime attribute. Partial functional dependency means an attribute being determined by a proper subset of some candidate key.

A necessary (but not sufficient) condition for a relation to be in 1NF but not in 2NF is that it should have at least one candidate key with multiple attributes

SID	Course ID	Enroll Date	Marks	Grade
100	1	31/7/2018	80	A
100	1	31/7/2018	80	A
100	2	1/8/2018	70	B
100	2	1/8/2018	70	B
101	1	31/7/2018	85	A
101	2	1/8/2018	85	A

SID	SNAME	Phone
100	DONY	7474749892
100	DONY	9994567321
100	DONY	7474749892
100	DONY	9994567321
101	JOSE	7856400234
101	JOSE	9994567321

3NF

A relation is in 3NF if it is in 2NF and no non-prime attribute should be determined by anything but a super key (no transitive dependency) . In the first relation above we have a FD, Marks $\to$ Grade and Grade is a non-key attribute and Marks is not a super key. So, this violates 3NF. To make this relation into 3NF, we can decompose the relation as follows:

SID	Course ID	Enroll Date	Marks
100	1	31/7/2018	80
100	1	31/7/2018	80
100	2	1/8/2018	70
100	2	1/8/2018	70
101	1	31/7/2018	85
101	2	1/8/2018	85

Marks	Grade
80	A
80	A
70	B
70	B
85	A
85	A

A necessary (but not sufficient) condition for a relation to be in 2NF but not in 3NF is that some non-prime attribute must be determined by a non-key

BCNF

A relation is in BCNF, if for every non-trivial functional dependency $A \to B,$ $A$ is a super key. It basically removes the “non-prime attribute” restriction in the 3NF definition thus ensuring that every attribute must be determined only by a super key. The above relation is not in BCNF because Enroll Date determines Course ID (also vice versa) and Enroll Date is not a super key. In order to make the relation to BCNF, we need to split them as follows:

SID	Course ID	Marks
100	1	80
100	1	80
100	2	70
100	2	70
101	1	85
101	2	85

Course ID	Enroll Date
1	31/7/2018
1	31/7/2018
2	1/8/2018
2	1/8/2018
1	31/7/2018
2	1/8/2018

A necessary (but not sufficient) condition for a relation to be in 3NF but not in BCNF is that it should have overlapping candidate keys – two candidate keys of two or more attributes and at least one common attribute

The proof for the above statement can be given as follows:

If a relation is in 3NF and not in BCNF, then there must be a non-trivial FD $A \to B,$ where $B$ is a prime attribute and $A$ is not a super key (assume $A$ is the minimal attribute set which determines $B$). This means $B$ is not a super key as otherwise $A$ must also be a super key. But since, $B$ is a prime attribute, for some non-empty set of attributes $Y,$ $BY$ is a candidate key. Since $A \to B,$ and $A$ is assumed to be minimal, this means $AY$ is also a candidate key. Thus we got two candidate keys $AY$ and $BY$ and $Y$ is common.

Till 3NF, we can always ensure a lossless and dependency preserving decomposition. But for BCNF, this may not be possible always and so we ensure lossless property at the expense of losing some dependencies.

2NF prohibits partial FD involving non-prime attributes and BCNF prohibits partial FD involving prime attributes.

Databases Preparation Resources for GATE CSE

GO Classroom

Demystifying Database Normalization

Like this:

Related

Skip links

Share this:

Like this:

Related