Home > Build Datastores From Scratch > Build Datastores From Scratch > Building a Relational Database System

Introduction to Relational Databases: A Practical Tutorial

Are you interested in learning about relational databases and how they work under the hood? Look no further! In this tutorial, we will dive into the fundamentals of relational databases and explore their theory, applications, and operations.

We will start by understanding the concept of a relation in a relational database and how it can be represented using a two-dimensional table. We will then explore the concept of normalization and the different normal forms that ensure data integrity and reduce redundancy.

Moving on, we will learn about the fundamental operations of relational algebra, such as projection, selection, and cartesian product. We will see how these operations can be performed in Python, allowing us to work with relational databases more effectively.

Next, we will take a hands-on approach and build a simple in-memory relational database from scratch using Python. We will implement basic operations like writing, reading, updating, and deleting data, giving you a practical understanding of how a database can be structured and manipulated.

Finally, we will touch on the importance of querying the database to retrieve meaningful data. We will discuss how queries can be made in Python using key-value pairs and explore the concept of database modification, including adding and removing datasets.

By the end of this tutorial, you will have a solid foundation in relational databases and the confidence to work with them. Whether you are new to databases or looking to expand your knowledge, this tutorial will equip you with the skills to build and manage relational databases effectively. So let's dive in and unlock the power of relational databases!

Understanding Relations

In relational databases, the term relation refers to a set of data items that have something in common. It is usually represented as a two-dimensional table, where each row (or tuple) represents a single entity, and each column represents an attribute of that entity.

The data inside a relation can be mapped to a dictionary data structure in Python, where each key-value pair can be thought of as a column-value pair. The keys (column names) must be unique within a relation, similar to how a dictionary functions. Moreover, an entire relation can be thought of as a list of dictionaries (assuming all dictionaries have the same keys).

Using relations in databases, engineers can logically structure their data in such a way that it depicts relationships between different types of data. This data structuring is highly pertinent in fields like finance and AI, where complex inter-relational data is the norm.

For example, consider a relation representing students. Each student entity in this relation could have attributes like name (string), age (integer), and major (string). In Python, you could simulate this relation using a dictionary.

Let's take a look at the code snippet which shows how a single student, a relation, can be represented in Python.

xxxxxxxxxx
 
if __name__ == '__main__':
  # Python Dictionary to simulate a relation
  student = {'name': 'John Doe', 'age': 22, 'major': 'Computer Science'}
  # Each key-value pair can be thought of as a column-value pair
  for column, value in student.items():
    print(f'{column}: {value}')

Try this exercise. Is this statement true or false?

In a relational database, a relation can be simulated as a list of dictionaries in Python, where each dictionary represents a tuple and each key-value pair inside the dictionary represents a column-value pair.

Press true if you believe the statement is correct, or false otherwise.

Exploring Normal Forms

When designing relational databases, one important concept we need to understand is Database Normalization. Normalization is a process of structuring and organizing data in the database to avoid redundancy and anomalies -- issues like insertion, update, and delete anomalies. Normalization is done through a series of steps called normal forms. Each step is known as a normal form and has a certain set of rules or conditions that must be met.

The primary goal of database normalization is to reduce data redundancy, which means that the information or data should be stored only once. Storing the data more than once is a waste of storage and also results in inconsistent data. This is because if data that appears in more than one place is changed, the change needs to be made in all places. If the change is not made in all places, then the database will have inconsistent data.

There are several types of normal forms, including:

First Normal Form (1NF): A relation is in the 1NF if it holds an atomic value for each attribute. By atomic value, we mean that each value in the attribute column of a relation is unique and indivisible.
Second Normal Form (2NF): A relation is in the 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on the primary key.
Third Normal Form (3NF): A relation is in the 3NF if it is in 2NF and no transitive dependency exists.

These forms are carefully designed to ensure data integrity within databases. A relation in a database that does not respect these forms may be prone to anomalies and redundancy. In the given Python code, we check if a sample 'students' database complies with these three forms.

xxxxxxxxxx
    print('Normalization errors:', normalization_errors)
 
if __name__ == "__main__":
    # Defining a table: relations
    students = [
        {'id': 1, 'name': 'Alice', 'major': 'CS', 'age': 20},
        {'id': 2, 'name': 'Bob', 'major': 'Finance', 'age': 22},
        {'id': 3, 'name': 'Charlie', 'major': 'AI', 'age': 23}
    ]
​
    # Checking for normalization forms
    normalization_errors = []
​
    # 1NF: There is no top-to-bottom ordering to the rows.
    for i in range(len(students) - 1):
        if not students[i]['id'] <= students[i+1]['id']:
            normalization_errors.append('1NF violation: ')
            break
​
    # 2NF: Records should not depend on part of the primary key.
    # Assuming 'id' is the primary key and is unique
    id_set = set(student['id'] for student in students)
    if len(id_set) != len(students):
        normalization_errors.append('2NF violation: non-unique ids')
​
    # 3NF: Non-key attributes must not depend on other non-key attributes.
    # Checking that 'age' and 'major' don't correlate
    age_major_pairs = set((student['age'], student['major']) for student in students)
    if len(age_major_pairs) != len(id_set):
        normalization_errors.append('3NF violation')
​

Try this exercise. Fill in the missing part by typing it in.

A relation is in the Third Normal Form (3NF) if it is in 2NF and no _ dependency exists.

Write the missing line below.

Relational algebra, as you may already know, is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as their result. It forms the backbone of any relational database management system (DBMS), as it provides the foundational operations with which more complex queries can be built.

The fundamental operations of relational algebra are as follows:

Projection: This operation retrieves certain columns from the table and discards the other columns. For instance, if you just want to retrieve student names from the 'student' table, you can project the 'name' column.
Selection: As the name suggests, this operation selects tuples (rows) that satisfy a given predicate. For example, you can select students who were born after 1991.
Cartesian Product: This operation combines information from two relations. Combining 'student' and 'enrollment' tables would give us information about which student is enrolled in which course.

The code given executes these operations in Python. This might not be as straightforward as SQL's SELECT, FROM, WHERE approach. But understanding these operations will enable you to work with any relational database more effectively.

It's important to note that real-world databases can be massive, and operating on such large data sets would require more efficient algorithms and data structures, typically provided by DBMS. For now, understanding these operations on a theoretical level is crucial to grasp how relational databases work under the hood.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Our example tables
  student = [{'id':1, 'name':'John', 'birth_year':1991}, {'id':2, 'name':'Mike', 'birth_year':1995}, {'id':3, 'name':'Anna', 'birth_year':1993}]
  enrollment = [{'student_id':1, 'course_id':1}, {'student_id':2, 'course_id':2}, {'student_id':3, 'course_id':3}]
  
  #The Projection Operation
  projection = [x['name'] for x in student]
  print('Projection result: ', projection)
  
  #The Selection Operation
  selection = [x for x in student if x['birth_year']>1991]
  print('Selection result: ', selection)
  
  #The Cartesian Product Operation
  cartesian_product = [(x, y) for x in student for y in enrollment]
  print('Cartesian Product result: ', cartesian_product)

Try this exercise. Click the correct answer from the options.

Based on the fundamental operations of relational algebra provided, which of these describes the 'Selection' operation?

Click the option that best answers the question.

It retrieves certain columns from the table and discards the other columns.
It combines information from two relations.
It selects tuples (rows) that satisfy a given predicate.
It computes the set difference of two relations.

Building a simple relational database from scratch involves implementing some of key concepts we've discussed in the previous sections. For our example, we will use Python to build a simple in-memory relational database that supports basic operations like writing, reading and updating data.

First, let's start by creating a Dictionary that will serve as our database. In Python, Dictionaries are a data structure similar to hash tables where every entry is a key-value pair. Thus, they are a perfect option to implement a simple key-value store.

Next, we'll define functions to handle the basic operations.

write(key, value) function to store data: This function will write the data into the dictionary. The 'key' will serve as the unique identifier for the data, sort of like a primary key in a relational database.
read(key) function to fetch data: This function will be used to retrieve data from the dictionary using a key. It's equivalent to a SQL's SELECT statement.
update(key, new_value) function to update data: This function will take a key and new_value and change the existing value of the given key. This is similar to SQL's UPDATE command.
delete(key) function to remove data: This function will remove a key-value pair from the dictionary. This is similar to SQL's DELETE command.

Remember that this is a very basic implementation. Although it lacks features of a fully-fledged relational database, it gives us an idea of how data can be stored and manipulated in a database. At a higher level, real-world databases include additional components like query optimizers, transaction logs, and background cleanup processes, but these basics apply.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Initialize an empty dictionary to be our database
  db = {}
​
  # Define our functions
  def write(key, value):
    db[key] = value
​
  def read(key):
    return db.get(key, 'Not Found')
​
  def update(key, new_value):
    if key in db:
      db[key] = new_value
​
  def delete(key):
    if key in db:
      del db[key]
​
  # Let's test our database
  write('name', 'Alan Turing')
  print(read('name'))  # Outputs: Alan Turing
  update('name', 'Grace Hopper')
  print(read('name'))  # Outputs: Grace Hopper
  delete('name')
  print(read('name'))  # Outputs: Not Found
​
print('Program ended')

Build your intuition. Click the correct answer from the options.

Which of the following operations was NOT mentioned in the implementation of our simple in-memory database?

Click the option that best answers the question.

write()
read()
update()
rename()

After building our relational database, the next step is querying the database to retrieve the data we need. The main interaction for end users or software apps is the ability to get that data in a meaningful way from the database. This is similar to the SQL's SELECT statement. In the context of our simple example using a Python dictionary as our database, we can achieve this by accessing the key-value pairs with specific keys.

Suppose we have a simple database with keys 'mathematics' and 'computer_science', each having its related data. We can create a read(database, key) function to query the database. The function takes the 'database' (our Python dictionary) and a 'key' as parameters and then checks if the key exists in the database. If it does, it returns the associated value. If it doesn't, it returns a message indicating that no data was found for the provided key.

With this method, we can easily retrieve expenses related to 'mathematics', like under the topic categories ('calculus', 'combinatorics', 'algebra'), its difficulty level as 'hard', and a boolean indicating if it is fun. Similarly, for 'computer_science', we get related topics ('data_structure', 'algorithms', 'operating_systems'), difficulty level as 'medium', and a boolean indicating fun as well. A search for 'finance' returns 'No data found for key: finance' as there is no data associated with 'finance' in our database.

This is a simple interpretation of querying a database, and in real-world applications, substantial and complex queries can be made using actual SQL or other database query languages.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Python logic here
  database = {
    'mathematics': {
      'topics': ['calculus', 'combinatorics', 'algebra'],
      'difficulty': 'hard',
      'is_fun': True
    },
    'computer_science': {
      'topics': ['data_structure', 'algorithms', 'operating_systems'],
      'difficulty': 'medium',
      'is_fun': True
    }
  }
  
  # Querying the database
  def read(database, key):
    if key in database:
      return database[key]
    else:
      return 'No data found for key: {}'.format(key)
​
  print(read(database, 'computer_science'))
  print(read(database, 'mathematics'))
  print(read(database, 'finance'))

Let's test your knowledge. Click the correct answer from the options.

In the context of querying a relational database, what does SQL's SELECT statement represent?

Click the option that best answers the question.

Creation of a new database table
Modification of an existing database
Retrieval of data from a database
Deleting data from a database

While building and querying a relational database are key components of managing data, expanding the database functionality plays an equally vital role. This includes ways we can modify the database to adapt to changing needs, like adding and remove datasets, or updating existing information.

Consider our Python dictionary that represents a simple relational database. Suppose we want to include data for the key 'finance'. Adding a new relation or dataset can be achieved by simply defining a new key-value pair in the dictionary. As Python dictionaries are mutable, changes take effect immediately in the dictionary object itself.

In the context of software development, this could mean addition of new features or data fields as per project requirements. For example, in finance, data fields could include topics like 'investment_banking', 'corporate_finance', 'asset_management', difficulty level as 'medium', and a boolean indicating whether it's fun.

Deletion of relations or datasets is also straightforward. Assume a situation where we no longer need data for 'mathematics'. We can easily remove the key-value pair from our Python dictionary using the del keyword.

Here, we also handle the search to demonstrate the addition and deletion. After adding 'finance', a search for 'finance' yields the data we added. Similarly, after deleting 'mathematics', a search for it returns 'No data found for key: mathematics'.

This encapsulates a basic but crucial aspect of managing relational databases - the ability to modify the database structure as per changing needs, again reflecting the principle of how real-world database systems are managed.

xxxxxxxxxx
 
if __name__ == "__main__":
​
  database = {'mathematics': {'topics': ['calculus', 'combinatorics', 'algebra'], 'difficulty': 'hard', 'is_fun': True},
               'AI': {'topics': ['machine_learning', 'deep_learning', 'neural_networks'], 'difficulty': 'hard', 'is_fun': True}}
​
  def search(database, key):
    if key in database:
      return database[key]
    else:
      return f'No data found for key: {key}'
​
  # Adding a new key-value pair to the database
  database['finance'] = {'topics': ['investment_banking', 'corporate_finance', 'asset_management'], 'difficulty': 'medium', 'is_fun': False}
​
  print(search(database, 'finance'))
​
  # Deleting a key-value pair from the database
  del database['mathematics']
​
  print(search(database, 'mathematics'))

Are you sure you're getting this? Click the correct answer from the options.

What options does a developer have when needing to change the structure of a relational database to adapt to changing needs?

Click the option that best answers the question.

Adding new relations or datasets only
Deleting relations or datasets only
Cannot modify the database structure
Both adding new relations or datasets and deleting existing ones

To wrap up, we have examined key aspects of relational databases - including their theory, applications, operations, relations, normal forms, and practical implementation in a programming language. Throughout this course, we leaned on Python due to its simplicity and readability - allowing us to explore computational concepts behind relational databases structure and core operations.

We constructed a simple relational database from scratch, we experienced how various operations can be performed including querying, and learnt how to expand the functionality according to growing needs. This included implementing changes like adding, deleting or updating datasets as needed. This is crucial in application development, especially in sectors like finance and AI, where data's dynamism and volume are significant.

Now that you understand the basic functionality of relational databases, the next steps can involve digging deeper into more complex aspects of database management systems. You might consider exploring distributed databases, concurrent operations, and locking mechanisms.

Additionally, you can start working with professional database management systems like PostgreSQL or MySQL - applying the foundational knowledge you've gained here. This will improve your perspective on real-world applications and further your skill set.

Remember, the best way to learn is by doing. The more databases you try to build and manage, the better you'll understand how underlying structures and operations work. Regardless of the path you take after this course, the key is to keep learning, experimenting, and solving problems. Happy coding!

Build your intuition. Is this statement true or false?

PostgreSQL and MySQL are considered professional database management systems.

Press true if you believe the statement is correct, or false otherwise.

Introduction to Relational Databases: A Practical Tutorial

Understanding Relations

Try this exercise. Is this statement true or false?

Exploring Normal Forms

Try this exercise. Fill in the missing part by typing it in.

Try this exercise. Click the correct answer from the options.

Click the option that best answers the question.

Build your intuition. Click the correct answer from the options.

Click the option that best answers the question.

Let's test your knowledge. Click the correct answer from the options.

Click the option that best answers the question.

Are you sure you're getting this? Click the correct answer from the options.

Click the option that best answers the question.

Build your intuition. Is this statement true or false?

Programming Categories

Popular Lessons