Mark As Completed Discussion

File Systems in Operating Systems

In this lesson, we will learn:

  • How files are organized on computer systems
  • Techniques used by the OS to manage files

Computer systems store tons of user files. For us, these are documents, pictures, videos, etc. For the computer, they are just sequences of bits.

The OS stores files on persistent storage devices like hard drives. Unlike memory, persistent storage keeps data intact even when powered off.

The OS plays a key role in:

  • Organizing files on storage devices
  • Managing read/write access
  • Ensuring performance and reliability

It does this through the file system.

What is a File System?

A file system is how the OS organizes and manages files on a storage device. Key responsibilities include:

  • Storing files and metadata
  • Organizing the storage hierarchy
  • Managing free vs used space
  • Access control and security
  • Caching and buffering reads/writes

The file system provides structure to raw persistent storage. It enables reliable long-term file storage and efficient access.

Introduction

In this lesson, we'll learn how operating systems implement and manage file systems. This covers critical OS functionality like:

  • File organization and directory structure
  • Allocating disk space
  • Access control
  • Buffering and caching

Let's get started!

Here is how I would rewrite this section on data storage abstractions in more student-friendly language:

Abstractions for Data Storage

The operating system doesn't just provide abstractions for memory management. It also provides abstractions for data storage on disk using the file system.

The main abstractions the OS gives us are:

  • Files - a named collection of data
  • Directories - a structure for organizing files into a hierarchy

Files

A file represents a logical storage unit from the user's perspective.

Directories

Directories create a tree folder structure for organizing files.

Why Abstractions?

These abstractions simplify storage from the user's view:

  • We just see files and folders, not physical disk blocks
  • Makes storage portable across devices
  • Allows user-friendly interaction
Abstractions for Data Storage

The OS handles all the complex low-level data mapping behind the scenes. Files and directories are abstractions provided by the OS to create a logical view of data storage for the user.

Files

Files are the basic unit of data storage from the user's perspective.

On a low level, a file is just a linear sequence of bytes that can be read and written. The OS doesn't know what kind of data a file contains - it could be an image, text, code, etc.

Each file has associated metadata managed by the OS:

  • File name - A unique identifier for the file
  • File type - Indicates the type of data (image, text, etc)
  • File size - The number of bytes the file contains
  • Permissions - Controls who can access the file
  • Timestamps - Creation, modified, and access times

Additionally, each file has an inode number that identifies it within the file system. This inode number maps the file to its data blocks on disk. We'll talk more about inodes later.

Directories

Directories provide a way to organize files in a hierarchical structure.

A directory contains a list of entries that reference other files or directories. Each entry maps a name to an inode number.

For example, a directory could contain:

  • Entry for fileA -> inode 21
  • Entry for fileB -> inode 33
  • Entry for subdirX -> inode 57

This creates a tree of directories and files:

  • Directories contain references to files/other directories
  • These child directories can contain more files/dirs
  • Allows building a folder hierarchy

So directories have a specific internal structure:

  • List of entries mapping names -> inodes
  • Each entry points to a file or subdirectory
  • Allows nested folder structure

Like files, directories are also identified by an inode number within the file system.

File System Organization

Files are stored in hardware disk structures, commonly in hard disk drives (HDD) or solid-state devices (SSD). If we think about file systems, it has two aspects; how is the data organized on the disk, and how is the data on disk accessed to be used in programs? This is where the file management functionality of the operating system comes in. It organizes file systems in data structures, and provides access through specific access methods.

File System Organization

Let's look at the overall organization of this.

  • The operating system divides the space provided by the disk into a sequence of bits/bytes as blocks. These blocks store all kinds of data, such as system and user data.
  • A portion of these blocks are dedicated to storing user data. They are classified to be in a data region.
  • To track information about each file, file systems have a structure called inode. This has all the metadata regarding files such as file size, permissions, access and modification times, etc. Some blocks on disk are also reserved for these inodes. They contain references to inode numbers (yes, the metadata associated with files!). This makes the process of locating the file on disk easier.
  • The OS also requires a method to track allocation and deallocation of inode and data blocks. For this purpose, allocation structures also occupy the blocks on disk. A simple and common allocation structure is the bitmap, which uses 0 and 1 bits to indicate if the corresponding block is free or not.
  • One last thing that exists on the disk structure is a block reserved for metadata regarding the file system itself. This block is known as the superblock. The metadata includes information on the number of inode and data blocks, file system type, etc. When mounting a file system, the OS always reads the superblock first.

File System Organization

Note that directories are stored in the same way as files. They also occupy blocks, have inode numbers that reference data blocks where directory contents are stored. We could say that they are treated as special kinds of files.

Are you sure you're getting this? Is this statement true or false?

System data is stored in blocks that are in the data region.

Press true if you believe the statement is correct, or false otherwise.

Are you sure you're getting this? Click the correct answer from the options.

If a block is free, then its corresponding bit in the bitmap will be ___.

Click the option that best answers the question.

  • 0
  • 1
  • 0 or 1
  • None of the above

Access Methods

The other aspect of file systems is their accessibility in programs. In this regard, two operations are most important; reading and writing.

Access Methods

Reading Files

To read files, programs need to open them first. An open(<path-to-file>) system call is issued, and the file system starts finding the required file. For this, we require the file's inode and inode number. The file system thus traverses through different blocks to find the desired inode for the file.

Consider an example where you wanted to read a file example.txt. The file system starts traversing from the root directory (all traversals start from here), whose inode number is already known by the file system. In most file systems, this inode number is 2. The file system accesses the corresponding inode and looks inside it to find references to data blocks for its contents. It keeps traversing through these references until the desired file is found. Once the file is opened, the program would issue a read() system call. The inode is used to find the corresponding block for the file contents, and the data is read from the first block of the file (a single file can occupy multiple blocks!). Reading files may update the inode with some of its metadata, such as access times.

Writing Files

The initial procedure of reading and writing files is the same, as the file needs to be found and opened in the program before any operation. Once the file system traverses through the files to find the desired file, the program issues a write() system call. A key difference between reading and writing files is that writing to a file may allocate additional blocks.

Mainly, writing to files occurs in three steps,

  • Reading the data from bitmap to locate the file (open() system call)
  • Updating newly allocated blocks by writing to corresponding bitmap and inodes.
  • Writing data on the actual allocated block.

As you may have observed, these operations are I/O expensive and may slow down the disk. To cater to this issue, most systems use the system memory and introduce caching and buffering techniques to reduce I/O overhead.

Caching and Buffering

Caching and buffering techniques apply to both read and write operations. Computer systems always strive for optimization and they want to do so in any way possible.

Let's consider read operations first. They are simpler than write operations as they do not change any existing data in memory. The OS caches important blocks in the system memory, which allows for faster read operations.

In the case of write operations, the write buffering technique is used. Some of the write operations are delayed and are grouped to be sent in batches at once. This allows the system to schedule I/Os (just like processes!) and increases performance.

Let's test your knowledge. Fill in the missing part by typing it in.

File mappings in disk is managed by ___.

Write the missing line below.

Build your intuition. Is this statement true or false?

Read operation can update the number of allocated blocks.

Press true if you believe the statement is correct, or false otherwise.

One Pager Cheat Sheet

  • This lesson will discuss the role of the operating system in managing and organizing files and persistent storage devices for good performance and reliability.
  • The operating system provides abstractions in the form of files and directories to enable data storage.
  • Files can be anything from images and text to code and they have an associated inode number which the OS is unaware of.
  • Directories are special files which store information about other files and directories, and are associated with an inode number.
  • The Operating System divides hard disk drive or solid-state device space into blocks, stores each file's metadata in an inode, and uses allocation structures and a superblock to manage it all.
  • System data is reserved elsewhere and not stored in the data region as it is only for user data.
  • A bitmap structure is used to track which blocks of data have been used in the data region, and if a block is free, its corresponding bit will be set to 1.
  • Reading and writing are the two most important access methods in a file system, allowing for programs to access its contents.
  • Theopen()system call is issued to traverse the file system through multiple blocks, starting from the root directory, to find the desired file's inode and then its contents using theread()system call, possibly updating the inode with its metadata.
  • Writing files involves locating and allocating the blocks to data and can be I/O expensive, so most systems use caching and buffering techniques to reduce I/O overhead.
  • Caching and buffering techniques are used to optimize both read and write operations, allowing for increased system performance.
  • Inodes manage the file mappings on disk, helping the Operating System (OS) quickly locate files and optimize disk access.
  • A read operation only retrieves data from disk and does not modify the existing number of allocated blocks.