Exploring Normal Forms
When designing relational databases, one important concept we need to understand is Database Normalization. Normalization is a process of structuring and organizing data in the database to avoid redundancy and anomalies -- issues like insertion, update, and delete anomalies. Normalization is done through a series of steps called normal forms. Each step is known as a normal form and has a certain set of rules or conditions that must be met.
The primary goal of database normalization is to reduce data redundancy, which means that the information or data should be stored only once. Storing the data more than once is a waste of storage and also results in inconsistent data. This is because if data that appears in more than one place is changed, the change needs to be made in all places. If the change is not made in all places, then the database will have inconsistent data.
There are several types of normal forms, including:
First Normal Form (1NF): A relation is in the 1NF if it holds an atomic value for each attribute. By atomic value, we mean that each value in the attribute column of a relation is unique and indivisible.
Second Normal Form (2NF): A relation is in the 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on the primary key.
Third Normal Form (3NF): A relation is in the 3NF if it is in 2NF and no transitive dependency exists.
These forms are carefully designed to ensure data integrity within databases. A relation in a database that does not respect these forms may be prone to anomalies and redundancy. In the given Python code, we check if a sample 'students' database complies with these three forms.
xxxxxxxxxx
print('Normalization errors:', normalization_errors)
if __name__ == "__main__":
# Defining a table: relations
students = [
{'id': 1, 'name': 'Alice', 'major': 'CS', 'age': 20},
{'id': 2, 'name': 'Bob', 'major': 'Finance', 'age': 22},
{'id': 3, 'name': 'Charlie', 'major': 'AI', 'age': 23}
]
# Checking for normalization forms
normalization_errors = []
# 1NF: There is no top-to-bottom ordering to the rows.
for i in range(len(students) - 1):
if not students[i]['id'] <= students[i+1]['id']:
normalization_errors.append('1NF violation: ')
break
# 2NF: Records should not depend on part of the primary key.
# Assuming 'id' is the primary key and is unique
id_set = set(student['id'] for student in students)
if len(id_set) != len(students):
normalization_errors.append('2NF violation: non-unique ids')
# 3NF: Non-key attributes must not depend on other non-key attributes.
# Checking that 'age' and 'major' don't correlate
age_major_pairs = set((student['age'], student['major']) for student in students)
if len(age_major_pairs) != len(id_set):
normalization_errors.append('3NF violation')