Home > Build Datastores From Scratch > Build Datastores From Scratch > Refining your Relational Database System

Introduction to Building and Refining Relational Database Systems

In this tutorial, we will explore the fundamental concepts and techniques for building and refining relational database systems. As experienced software engineers, we understand the importance of database transactions and how they ensure data integrity and concurrency control. We will learn about the key characteristics of transactions, such as atomicity and isolation, and how they guarantee the reliability of our data.

Next, we will delve into the concept of database joins and how they allow us to combine data from multiple tables based on related columns. We will explore different types of joins, such as inner join, left join, and full join, and understand their practical applications.

Moving forward, we will explore the powerful tools of triggers and stored procedures. These components automate routines and maintain data integrity in our database systems. We will learn how to implement triggers and stored procedures to automate tasks and enhance the security and performance of our systems.

Lastly, we will discuss the ACID properties of relational databases: Atomicity, Consistency, Isolation, and Durability. We will understand the significance of these properties in maintaining reliable and stable data, even in the presence of multiple concurrent transactions.

By the end of this tutorial, you will have a solid understanding of how to build and refine a relational database system, and be equipped with the knowledge to explore more advanced database topics like indexing, partitioning, and NoSQL databases. So, let's dive in and enhance your skills in building and refining datastores!

As a highly experienced software engineer, you're likely to be familiar with the underlying mechanics of a database. One such mechanic is the concept of transactions. These are fundamental to maintaining the integrity of our data and ensuring that we don't run into problems with concurrency or erroneous data entry.

A database transaction is a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions.

Key characteristics of a transaction are:

Atomicity: The changes to data are performed as if they are a single operation. That is, all the changes are performed or none of them are. They don’t stop halfway.
Isolation: Each transaction is executed in isolation from other transactions. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions, one will finish executing before the other begins.

In the given python code, we have simulated a simple transaction within a database. Each transaction could have multiple operations, these could be read/write operations or changes in the database schema. We are utilizing object-oriented programming by creating classes for databases and transactions and storing each operation as part of a specific transaction.

In computer science, transactions are often compared to financial transactions. When you make a financial transaction, like depositing money into a savings account, you rely on the atomicity and isolation of that transaction to ensure your money makes it into the account without incident. Much like these financial transactions, database transactions help keep our data stored in a consistent and reliable manner.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Python logic here
  class Transaction:
    def __init__(self):
      self.operations = []
    def add_operation(self, operation):
      self.operations.append(operation)
  
  class Database:
    def __init__(self):
      self.transactions = []
    def add_transaction(self, transaction):
      self.transactions.append(transaction)
  
  def simulate_database_transactions():
    database = Database()
    transaction = Transaction()
  
    for i in range(1, 10):
      transaction.add_operation(f'Operation {i}')
    
    database.add_transaction(transaction)
  
    for transaction in database.transactions:
      for operation in transaction.operations:
        print(operation)
  
  simulate_database_transactions()

Are you sure you're getting this? Fill in the missing part by typing it in.

A unit of work that is performed against a database, treated as coherent and reliable, and independent of other transactions, is called a ___.

Write the missing line below.

After our dive into the concept of database transactions, it's time to start looking at how they are implemented in a relational database system. Transactions ensure our database operations have atomicity and isolation which are crucial for maintaining the integrity and consistency of our data. Let's take it step by step.

In Python, we can simulate simple transactions using functions within a class. The purpose of the transaction function in this case, would be to group database operations into a single unit of work. Here, an operation could be a database READ/WRITE action. We take these actions and pair them along with a COMMIT or ROLLBACK action.

A COMMIT action would signify the end of a successful transaction and all changes made within the transaction would reflect in the database.

Contrarily, a ROLLBACK action comes into play when an operation within the transaction fails. In this case, we revert back to the state of the database before the transaction started. It's like the transaction never happened.

This approach ensures that either all database operations within a transaction are carried out successfully or none at all, thereby maintaining the atomicity of transactions. Remember, we also need to talk about the concurrency of transactions for completeness, however that's a topic for another day.

In the context of Software Engineering and Financial markets, transactions help us maintain consistency deep within our systems and ensuring no erroneous data corrupts the state of the system. Take this code snippet for instance, where we simulate these simple transactions in Python.

xxxxxxxxxx
 
if __name__ == "__main__":
  class Transaction:
    def __init__(self):
      self.operations = []
​
    def add_operation(self, operation):
      self.operations.append(operation)
​
    def execute_operations(self):
      try:
        for operation in self.operations:
          operation.execute()
        print('Commit')
      except:
        print('Rollback')
        for operation in self.operations:
          operation.rollback()
  print('Transactions executed')

Let's test your knowledge. Click the correct answer from the options.

What is the primary purpose of transactions in a database system?

Click the option that best answers the question.

Improve query performance
Ensure data redundancy
Provide fail-safe against data corruption
Increase storage capacity

As proficient engineers who have undoubtedly worked with database systems, we can appreciate the power of SQL for managing data in a relational database. One of those features that makes SQL database systems efficient, reliable, and versatile is the concept of 'joins'.

Joins in SQL essentially follow the mathematical set theory you learned back in computer science days. They combine rows from two or more tables based on a related column, allowing us to create a relational algebra that helps structure data in a meaningful way. Believe it or not, this is what makes SQL databases powerful for financial or AI systems, as you can recombine data in different ways swiftly.

There are several types of joins, including inner join, left (outer) join, right (outer) join, and full (outer) join. We will explore the practicality of each type later. However, all of these join types have similar syntax, following the structure: SELECT...FROM...JOIN...ON...

Understanding joins is paramount for enhancing your database system's performance and will help you manipulate data in more sophisticated, flexible ways.

xxxxxxxxxx
 
if __name__ == "__main__":
  # An example of how a JOIN might look in Python
  employees = [{"id": 1, "name": "John"}, {"id": 2, "name": "Jane"}]
  departments = [{"id": 1, "dept": "Engineering"}, {"id": 2, "dept": "Marketing"}]
​
  # Perform a "join" in Python
  joined_data = []
  for e in employees:
    for d in departments:
      if e["id"] == d["id"]:
        joined_data.append({"id": e["id"], "name": e["name"], "dept": d["dept"]})
​
  print(joined_data)
​
  print("We've just simulated a relational database join in Python!")

Let's test your knowledge. Click the correct answer from the options.

Which of the following best describes the purpose of joins in a relational database?

Click the option that best answers the question.

Joining two or more tables based on unrelated columns
Repeating rows in two different tables
Joining two or more tables based on a related column
Deleting columns in a table

Now that we've explored the concept and importance of joins in relational databases, it's time to put these principles into practice. With the context of a finance application, let's implement joins in our database system.

When implementing joins, we look for related data from two separate tables. Let's consider two tables in the finance application database: Stocks and Transactions. The Stocks table contains an ID, stock symbol, and company name, while the Transactions table includes the transaction ID, stock ID (which matches the ID in the Stocks table), transaction type (buy or sell), number of shares, and price.

Implementing a JOIN operation in our database system could help answer questions like, 'What are all the transactions involving a specific stock?'

We can represent a simple INNER JOIN with Python as follows:

xxxxxxxxxx
 
if __name__ == "__main__":
​
  # Defining the table data
  Stocks = [(1, 'AAPL', 'Apple Inc.'), (2, 'GOOG', 'Alphabet Inc.'), (3, 'TSLA', 'Tesla Inc.')]
  Transactions = [(101, 1, 'buy', 100, 150.0), (102, 3, 'sell', 50, 800.0), (103, 2, 'buy', 75, 1200.0), (104, 1, 'sell', 120, 140.0)]
​
  # Implementing a simple inner join
  transactions_with_stocks = [(t[0], s[1], t[2], t[3], t[4]) for s in Stocks for t in Transactions if s[0] == t[1]]
  print('Transactions with Stocks:', transactions_with_stocks)

Try this exercise. Fill in the missing part by typing it in.

In our example of joining the Stocks and Transactions table, we are looking to answer the question: 'What are all the ___ involving a specific stock?'

Write the missing line below.

In any sophisticated relational database system, including those underlying major financial applications, two important components are Triggers and Stored Procedures. They are vital tools in refining our database system by automating routines and maintaining data integrity.

Triggers

A trigger is essentially a set of instructions that are automatically executed (or 'triggered') in response to specific events in the database. These events could be changes to data such as an insertion, deletion, or alteration. In financial data systems, triggers could be used to automatically update relevant data fields when a new transaction occurs. This automation can greatly increase efficiency and consistency in data handling.

Stored Procedures

Stored Procedures, on the other hand, are precompiled sets of code stored in the database that can be repeatedly called and executed. They not only help in reducing the amount of information sent to the server but also gives you control over who can access and manipulate your databases.

The real magic happens when they work together. Imagine a scenario in the finance world. A trigger could activate a stored procedure every time a stock price updates, causing a recalculation of an investor's portfolio value.

Understanding how to use triggers and stored procedures effectively will be crucial. Up next, we'll dive into implementing them in our own database system.

xxxxxxxxxx
 
if __name__ == '__main__':
  # Here we're just creating bare-bones functions to represent triggers and stored procedures
  # In a real-world scenario, these would be part of a larger database system
  
  def trigger(transaction):
    print(f'Trigger activated by {transaction} transaction')
  
  def stored_procedure():
    print('Stored procedure executed')
  
  trigger('BUY')
  stored_procedure()

Let's test your knowledge. Is this statement true or false?

Triggers and Stored Procedures in a relational database system can only be used separately and not in conjunction.

Press true if you believe the statement is correct, or false otherwise.

Implementing Triggers and Stored Procedures

Now, let's dive into how to leverage these powerful database refinement tools - Triggers and Stored Procedures - within a relational database system.

Triggers

To implement a trigger, you typically define a function that performs the desired operation, then assign this function to a particular database event (e.g., insert, update, delete).

A trigger can be an effective way of ensuring data integrity. For example, in a financial system, you could implement a trigger that verifies if a user has sufficient balance before executing a withdrawal operation.

Stored Procedures

Stored procedures are similar in many ways to functions in typical programming. They consist of a prewritten SQL statement or a sequence of statements stored in the database.

To craete a stored procedure, you'll typically open a new query in your SQL client, write the procedure using a CREATE PROCEDURE command and follow it by a BEGIN/END block containing the SQL statements, then close the query.

Stored procedures provide an extra layer of security by limiting direct access to the underlying data. They also enhance performance as they are precompiled and the database can execute them more efficiently.

The sample Python code below demonstrates creating a simple trigger and stored procedure with psycopg, a popular PostgreSQL adapter for Python.

Remember, the specific syntax for creating triggers and stored procedures might differ based on the specific SQL variant you are using.

xxxxxxxxxx
 
if __name__ == "__main__":
  import psycopg2
  conn = psycopg2.connect(database='postgres', user='postgres', password='password', host='localhost')
  cur = conn.cursor()
​
  # Create function for trigger
  cur.execute("CREATE OR REPLACE FUNCTION notify_transaction() RETURNS TRIGGER AS $$ BEGIN NOTIFY transaction_channel, 'A transaction has been performed!'; RETURN NEW; END; $$ LANGUAGE plpgsql;")
​
  # Create trigger
  cur.execute("CREATE TRIGGER transaction_trigger AFTER INSERT ON transactions FOR EACH ROW EXECUTE FUNCTION notify_transaction();")
​
  # Create stored procedure
  cur.execute("CREATE PROCEDURE update_balance(IN customer_id int, IN amount numeric) LANGUAGE plpgsql AS $$ BEGIN UPDATE customers SET balance = balance + amount WHERE id = customer_id; END; $$;")
​
conn.commit()
print('Trigger and stored procedure have been successfully created!')

Are you sure you're getting this? Fill in the missing part by typing it in.

Stored procedures are precompiled and the database can execute them more efficiently. This also provides an extra layer of _ by limiting direct access to the underlying data.

Write the missing line below.

Achieving ACID Properties

ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties are a key aspect in relational database management systems and are important to ensure the reliability during transactions. Let's explore each property and their importance in refining our relational database system:

Atomicity: This property ensures each transaction is treated as a single, indivisible, atomic unit, which either succeeds completely or fails. If any part of the transaction fails, all changes made during the transaction are rolled back, returning the database to its original state prior to the transaction.
Consistency: Consistency ensures only valid data following all rules and constraints is written into the database. If a transaction results in invalid data, the entire transaction is rolled back.
Isolation: This property guarantees each transaction is executed in isolation from other transactions. Despite concurrent execution, it appears to a transaction as if others don't exist.
Durability: Durability assures that once a transaction is committed, it will remain so, even in the event of power loss, crashes, or errors. It emphasizes the permanent nature of database transactions.

Ensuring our system adheres to these principles guarantees that our data remains reliable, accurate, and stable even in the instance of multiple simultaneous transactions. This significantly enhances our database system, making it more valuable and trusted by users and applications.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Python logic: a simplistic representation of transaction functionality
  data_store = {}
  temp_store = {} # To hold data during transaction
​
  transaction_started = False
​
  def start_transaction():
    transaction_started = True
​
  def end_transaction():
    data_store.update(temp_store)
    temp_store.clear()
    transaction_started = False
​
  start_transaction()
  temp_store['a'] = 1
  temp_store['b'] = 2
  end_transaction()
​
  print(data_store)
  # Prints: {'a': 1, 'b': 2} demonstrating successful transaction

Build your intuition. Is this statement true or false?

In the ACID concept of databases, the 'Consistency' property ignores rules and constraints when writing data into the database.

Press true if you believe the statement is correct, or false otherwise.

Summary and the Road Ahead

Throughout this tutorial, we've delved deep into refining our relational database system. We've explored important concepts like transactions, joins, triggers, and stored procedures, and implemented them in our system. We also discussed the significance of achieving the ACID properties for reliable data management.

Building and refining a relational database system is a fundamental aspect of software development and computer science. It enhances the system's value in practical applications such as finance, where reliable data is crucial. This knowledge aids in developing sophisticated systems capable of maintaining complex data relationships.

As we move forward, we'll continue to explore more advanced aspects of building and refining datastores. This knowledge is not the end, but the foundation for understanding advanced database technologies such as MongoDB and ElasticSearch. Our journey to 'Build Datastores From Scratch' will help us better understand modern technology stacks and make us more confident developers. Happy coding!

Road Ahead

Through this tutorial, we've only scratched the surface of what's possible with relational databases. Here are some topics you might consider for further exploration:

Database Indexing for efficient data retrieval
Database Partitioning for tackling large datasets
Advanced SQL Techniques such as subqueries and Common Table Expressions (CTEs)
Implementing NoSQL databases like MongoDB
Distributed databases and their implications in performance and reliability

Remember, refining software competencies requires continuous learning and practice. Keep exploring, keep implementing and continue evolving as a software craftsman.

xxxxxxxxxx
 
if __name__ == "__main__":
    print("Continue refining your relational database system and explore advanced topics for further skill enhancement.")

Let's test your knowledge. Is this statement true or false?

Refining software competencies requires discontinuing learning and practice.

Press true if you believe the statement is correct, or false otherwise.

Introduction to Building and Refining Relational Database Systems

Are you sure you're getting this? Fill in the missing part by typing it in.

Let's test your knowledge. Click the correct answer from the options.

Click the option that best answers the question.

Let's test your knowledge. Click the correct answer from the options.

Click the option that best answers the question.

Try this exercise. Fill in the missing part by typing it in.

Triggers

Stored Procedures

Let's test your knowledge. Is this statement true or false?

Implementing Triggers and Stored Procedures

Triggers

Stored Procedures

Are you sure you're getting this? Fill in the missing part by typing it in.

Achieving ACID Properties

Build your intuition. Is this statement true or false?

Summary and the Road Ahead

Road Ahead

Let's test your knowledge. Is this statement true or false?

Programming Categories

Popular Lessons