Categories: Databases

Demystifying Database Normalization

This will be a short and concise material to understand and check your understanding of Normal forms in databases. I assume you have already read about them from standard text books like Silberschatz or Navathe.

Functional Dependency – We say an attribute set (a single attribute or a group of attributes) $A$ determines an attribute set $B$ if whenever $A$ repeats its value, $B$ also repeats its value. Any unique attribute determines every other attribute.

Super Key – An attribute set which determines every other attributes

Candidate Key – A minimal super key becomes a Candidate key. i.e., if we remove any attribute from a candidate key, it should no longer be a super key

Primary Key – Any one candidate key is selected as a Primary key so that the table can have only one primary key but multiple candidate keys may be present (we will see its use in Table indexing)

Prime attribute: An attribute that is part of some candidate key is a prime attribute

Student ID Name Phone Email
CS_001 Ravi 9995677893 email1@abc.com
CS_002 Manya 9992347689 email2@abc.com
CS_003 Ravi 9456792001 email3@abc.com

Here, we are having a table which is an instance of the Relation Student (Student ID, Name, Phone, Email) – the same relation can have other table instances too. We can see that Name value is repeating and so cannot be the key. Student ID, Phone and Email are Unique values and all are candidate keys (we can say they are candidate keys for the given table; but to say the same for Student relation we must be sure of all possible instances of it; i.e., there can be another Student table with $2$ students having same Phone number or email. Assuming all instances of Student relation are like the given one we have the following keys:

  • Candidate Keys: Student ID, Phone, Email
  • Primary Key: Any one of {Student ID, Phone, Email}
  • Super Keys: {Student ID}, {Phone}, {Email},  {Student ID, Name}, {Phone, Name}, {Email, Name}, {Student ID, Phone}, {Phone, Email}, {Email, Student ID}, {Student ID, Name, Phone}, {Phone, Name, Email}, {Email, Name, Student ID}, {Email, Phone, Student ID} {Email, Name, Student ID, Phone}

Now, we will see how we can ensure a given relation satisfies certain normal forms. The higher the normal form, better the relation is in terms of redundancies and removal of anomalies. But practically to ensure higher normal forms we need to decompose the relations to smaller ones which can have performance issues. Anyway that is not our concern here and we see how we can decompose the relations to ensure higher normal forms. While we decompose a relation we must ensure

  1. Losslessness – A decomposition of a relation $R$ into $R_1$ and $R_2$ is not lossy if $R_1 \cap R_2$ is a key of either $R_1$ or $R_2.$

It is also ideal to keep dependencies. i.e., while decomposing a relation it is ideal we can ensure the dependencies without a need to JOIN the decomposed relations. But this is not a strict requirement like losslessness.

1NF

1NF does not allow multi-valued attributes. i.e., every attribute in a row must have a single value. (An attribute is free to have multiple components like an Address attribute having House No., Street Name etc.). For example, in the below table “Phone” attribute is violating this.

For the below relation assume that only a single course can start on any day and all enrolments must be on that day. i.e., Enroll Date $\to$ Course ID is a FD as well as Course ID $\to$ enroll Date. The candidate keys for the below relation are {SID, Course ID} and {SID, Enroll Date}

SID Course ID Enroll Date SNAME Phone Marks Grade
100 1 31/7/2018 DONY 7474749892, 9994567321 80 A
100 2 1/8/2018 DONY 7474749892, 9994567321 70 B
101 1 31/7/2018 JOSE 7856400234 85 A
101 2 1/8/2018 JOSE 7856400234 85 A

To make the above table to 1NF, we can do as follows:

SID Course ID Enroll Date SNAME Phone Marks Grade
100 1 31/7/2018 DONY 7474749892 80 A
100 1 31/7/2018 DONY 9994567321
80 A
100 2 1/8/2018 DONY 7474749892 70 B
100 2 1/8/2018 DONY 9994567321
70 B
101 1 31/7/2018 JOSE 7856400234 85 A
101 2 1/8/2018 JOSE 7856400234 85 A

i.e., we duplicated ROWS to remove multi-value for attribute Phone and thus make the relation in 1NF.

2NF

Second Normal Form requires a relation to be in 1NF and for every functional dependency $A \to B,$ we should not have $C \to B$ where $C \subset A$ and $B$ is a non-prime attribute. i.e., there should not be any partial functional dependency ($A\to B$ is a partial FD, if a proper subset of $A$ determines $B$) for any non-prime attribute. Partial functional dependency means an attribute being determined by a proper subset of some candidate key.

A necessary (but not sufficient) condition for a relation to be in 1NF but not in 2NF is that it should have at least one candidate key with multiple attributes

SID Course ID Enroll Date Marks Grade
100 1 31/7/2018 80 A
100 1 31/7/2018 80 A
100 2 1/8/2018 70 B
100 2 1/8/2018 70 B
101 1 31/7/2018 85 A
101 2 1/8/2018 85 A
SID SNAME Phone
100 DONY 7474749892
100 DONY 9994567321
100 DONY 7474749892
100 DONY 9994567321
101 JOSE 7856400234
101 JOSE 9994567321

3NF

A relation is in 3NF if it is in 2NF and no non-prime attribute should be determined by anything but a super key (no transitive dependency) . In the first relation above we have a FD, Marks $\to$ Grade and Grade is a non-key attribute and Marks is not a super key. So, this violates 3NF. To make this relation into 3NF, we can decompose the relation as follows:

SID Course ID Enroll Date Marks
100 1 31/7/2018 80
100 1 31/7/2018 80
100 2 1/8/2018 70
100 2 1/8/2018 70
101 1 31/7/2018 85
101 2 1/8/2018 85
Marks Grade
80 A
80 A
70 B
70 B
85 A
85 A

A necessary (but not sufficient) condition for a relation to be in 2NF but not in 3NF is that some non-prime attribute must be determined by a non-key

BCNF

A relation is in BCNF, if for every non-trivial functional dependency $A \to B,$ $A$ is a super key. It basically removes the “non-prime attribute” restriction in the 3NF definition thus ensuring that every attribute must be determined only by a super key. The above relation is not in BCNF because Enroll Date determines Course ID (also vice versa) and Enroll Date is not a super key. In order to make the relation to BCNF, we need to split them as follows:

SID Course ID Marks
100 1 80
100 1 80
100 2 70
100 2 70
101 1 85
101 2 85
Course ID Enroll Date
1 31/7/2018
1 31/7/2018
2 1/8/2018
2 1/8/2018
1 31/7/2018
2 1/8/2018

A necessary (but not sufficient) condition for a relation to be in 3NF but not in BCNF is that it should have overlapping candidate keys – two candidate keys of two or more attributes and at least one common attribute

The proof for the above statement can be given as follows:

If a relation is in 3NF and not in BCNF, then there must be a non-trivial FD $A \to B,$ where $B$ is a prime attribute and $A$ is not a super key (assume $A$ is the minimal attribute set which determines $B$). This means $B$ is not a super key as otherwise $A$ must also be a super key. But since, $B$ is a prime attribute, for some non-empty set of attributes $Y,$ $BY$ is a candidate key. Since $A \to B,$ and $A$ is assumed to be minimal, this means $AY$ is also a candidate key. Thus we got two candidate keys $AY$ and $BY$ and $Y$ is common.

Till 3NF, we can always ensure a lossless and dependency preserving decomposition. But for BCNF, this may not be possible always and so we ensure lossless property at the expense of losing some dependencies.

2NF prohibits partial FD involving non-prime attributes and BCNF prohibits partial FD involving prime attributes.

Databases Preparation Resources for GATE CSE

GO Classroom

Arjun Suresh

Share
Published by
Arjun Suresh

Recent Posts

GATE CSE 2022 Admissions, Results and Placement Responses

This page shows all details regarding GATE CSE 2022 Admissions including results, admission offers and…

2 years ago

GATE CSE 2021 Admission Responses

Source Add your Response Rank Predictor for GATE CSE 2021 Result Responses: GATE CSE 2021…

3 years ago

GATE CSE Books – More Options

Best Books for GATE CSE with Relevant Chapters to Read  Heads Up! These GATE CSE…

3 years ago

ISI PCB Previous Year Papers with Solution

Indian Statistical Institute(ISI) offers M Tech in Computer Science with the Admission Test Codes MMA/PCA…

3 years ago

ISI JRF Previous Year Papers with Solution

Indian Statistical Institute(ISI) offers Junior Research Fellowships (JRF) in Computer Science, Statistics, Mathematics, Quantitative Economics,…

3 years ago

ISI DCG Previous Year Papers with Solution

 Indian Statistical Institute(ISI) conduct the admission test for Postgraduate Diploma in Computer Applications(PGDCA) with the…

3 years ago