This will be a short and concise material to understand and check your understanding of Normal forms in databases. I assume you have already read about them from standard text books like Silberschatz or Navathe.
Functional Dependency – We say an attribute set (a single attribute or a group of attributes) $A$ determines an attribute set $B$ if whenever $A$ repeats its value, $B$ also repeats its value. Any unique attribute determines every other attribute.
Super Key – An attribute set which determines every other attributes
Candidate Key – A minimal super key becomes a Candidate key. i.e., if we remove any attribute from a candidate key, it should no longer be a super key
Primary Key – Any one candidate key is selected as a Primary key so that the table can have only one primary key but multiple candidate keys may be present (we will see its use in Table indexing)
Prime attribute: An attribute that is part of some candidate key is a prime attribute
Student ID | Name | Phone | |
CS_001 | Ravi | 9995677893 | email1@abc.com |
CS_002 | Manya | 9992347689 | [email protected] |
CS_003 | Ravi | 9456792001 | [email protected] |
Here, we are having a table which is an instance of the Relation Student (Student ID, Name, Phone, Email) – the same relation can have other table instances too. We can see that Name value is repeating and so cannot be the key. Student ID, Phone and Email are Unique values and all are candidate keys (we can say they are candidate keys for the given table; but to say the same for Student relation we must be sure of all possible instances of it; i.e., there can be another Student table with $2$ students having same Phone number or email. Assuming all instances of Student relation are like the given one we have the following keys:
- Candidate Keys: Student ID, Phone, Email
- Primary Key: Any one of {Student ID, Phone, Email}
- Super Keys: {Student ID}, {Phone}, {Email}, {Student ID, Name}, {Phone, Name}, {Email, Name}, {Student ID, Phone}, {Phone, Email}, {Email, Student ID}, {Student ID, Name, Phone}, {Phone, Name, Email}, {Email, Name, Student ID}, {Email, Phone, Student ID} {Email, Name, Student ID, Phone}
Now, we will see how we can ensure a given relation satisfies certain normal forms. The higher the normal form, better the relation is in terms of redundancies and removal of anomalies. But practically to ensure higher normal forms we need to decompose the relations to smaller ones which can have performance issues. Anyway that is not our concern here and we see how we can decompose the relations to ensure higher normal forms. While we decompose a relation we must ensure
- Losslessness – A decomposition of a relation $R$ into $R_1$ and $R_2$ is not lossy if $R_1 \cap R_2$ is a key of either $R_1$ or $R_2.$
It is also ideal to keep dependencies. i.e., while decomposing a relation it is ideal we can ensure the dependencies without a need to JOIN the decomposed relations. But this is not a strict requirement like losslessness.
1NF
1NF does not allow multi-valued attributes. i.e., every attribute in a row must have a single value. (An attribute is free to have multiple components like an Address attribute having House No., Street Name etc.). For example, in the below table “Phone” attribute is violating this.
For the below relation assume that only a single course can start on any day and all enrolments must be on that day. i.e., Enroll Date $\to$ Course ID is a FD as well as Course ID $\to$ enroll Date. The candidate keys for the below relation are {SID, Course ID} and {SID, Enroll Date}
SID | Course ID | Enroll Date | SNAME | Phone | Marks | Grade |
100 | 1 | 31/7/2018 | DONY | 7474749892, 9994567321 | 80 | A |
100 | 2 | 1/8/2018 | DONY | 7474749892, 9994567321 | 70 | B |
101 | 1 | 31/7/2018 | JOSE | 7856400234 | 85 | A |
101 | 2 | 1/8/2018 | JOSE | 7856400234 | 85 | A |
To make the above table to 1NF, we can do as follows:
SID | Course ID | Enroll Date | SNAME | Phone | Marks | Grade |
100 | 1 | 31/7/2018 | DONY | 7474749892 | 80 | A |
100 | 1 | 31/7/2018 | DONY | 9994567321 |
80 | A |
100 | 2 | 1/8/2018 | DONY | 7474749892 | 70 | B |
100 | 2 | 1/8/2018 | DONY | 9994567321 |
70 | B |
101 | 1 | 31/7/2018 | JOSE | 7856400234 | 85 | A |
101 | 2 | 1/8/2018 | JOSE | 7856400234 | 85 | A |
i.e., we duplicated ROWS to remove multi-value for attribute Phone and thus make the relation in 1NF.
2NF
Second Normal Form requires a relation to be in 1NF and for every functional dependency $A \to B,$ we should not have $C \to B$ where $C \subset A$ and $B$ is a non-prime attribute. i.e., there should not be any partial functional dependency ($A\to B$ is a partial FD, if a proper subset of $A$ determines $B$) for any non-prime attribute. Partial functional dependency means an attribute being determined by a proper subset of some candidate key.
A necessary (but not sufficient) condition for a relation to be in 1NF but not in 2NF is that it should have at least one candidate key with multiple attributes
SID | Course ID | Enroll Date | Marks | Grade |
100 | 1 | 31/7/2018 | 80 | A |
100 | 1 | 31/7/2018 | 80 | A |
100 | 2 | 1/8/2018 | 70 | B |
100 | 2 | 1/8/2018 | 70 | B |
101 | 1 | 31/7/2018 | 85 | A |
101 | 2 | 1/8/2018 | 85 | A |
SID | SNAME | Phone |
100 | DONY | 7474749892 |
100 | DONY | 9994567321 |
100 | DONY | 7474749892 |
100 | DONY | 9994567321 |
101 | JOSE | 7856400234 |
101 | JOSE | 9994567321 |
3NF
A relation is in 3NF if it is in 2NF and no non-prime attribute should be determined by anything but a super key (no transitive dependency) . In the first relation above we have a FD, Marks $\to$ Grade and Grade is a non-key attribute and Marks is not a super key. So, this violates 3NF. To make this relation into 3NF, we can decompose the relation as follows:
SID | Course ID | Enroll Date | Marks |
100 | 1 | 31/7/2018 | 80 |
100 | 1 | 31/7/2018 | 80 |
100 | 2 | 1/8/2018 | 70 |
100 | 2 | 1/8/2018 | 70 |
101 | 1 | 31/7/2018 | 85 |
101 | 2 | 1/8/2018 | 85 |
Marks | Grade |
80 | A |
80 | A |
70 | B |
70 | B |
85 | A |
85 | A |
A necessary (but not sufficient) condition for a relation to be in 2NF but not in 3NF is that some non-prime attribute must be determined by a non-key
BCNF
A relation is in BCNF, if for every non-trivial functional dependency $A \to B,$ $A$ is a super key. It basically removes the “non-prime attribute” restriction in the 3NF definition thus ensuring that every attribute must be determined only by a super key. The above relation is not in BCNF because Enroll Date determines Course ID (also vice versa) and Enroll Date is not a super key. In order to make the relation to BCNF, we need to split them as follows:
SID | Course ID | Marks |
100 | 1 | 80 |
100 | 1 | 80 |
100 | 2 | 70 |
100 | 2 | 70 |
101 | 1 | 85 |
101 | 2 | 85 |
Course ID | Enroll Date |
1 | 31/7/2018 |
1 | 31/7/2018 |
2 | 1/8/2018 |
2 | 1/8/2018 |
1 | 31/7/2018 |
2 | 1/8/2018 |
A necessary (but not sufficient) condition for a relation to be in 3NF but not in BCNF is that it should have overlapping candidate keys – two candidate keys of two or more attributes and at least one common attribute
The proof for the above statement can be given as follows:
If a relation is in 3NF and not in BCNF, then there must be a non-trivial FD $A \to B,$ where $B$ is a prime attribute and $A$ is not a super key (assume $A$ is the minimal attribute set which determines $B$). This means $B$ is not a super key as otherwise $A$ must also be a super key. But since, $B$ is a prime attribute, for some non-empty set of attributes $Y,$ $BY$ is a candidate key. Since $A \to B,$ and $A$ is assumed to be minimal, this means $AY$ is also a candidate key. Thus we got two candidate keys $AY$ and $BY$ and $Y$ is common.
Till 3NF, we can always ensure a lossless and dependency preserving decomposition. But for BCNF, this may not be possible always and so we ensure lossless property at the expense of losing some dependencies.
2NF prohibits partial FD involving non-prime attributes and BCNF prohibits partial FD involving prime attributes.
Databases Preparation Resources for GATE CSE