About file systems

These notes will provide some additional details on file systems on Linux systems. Most of the material covered here gives details on what happens inside the kernel when you make various Linux system calls. This material is not usually covered in a systems programming course, and is usually covered in great detail in a course on operating systems. Since we do not currently offer a course on operating systems, I am presenting this material here is a supplement to our other course material.

Most of the material in these notes comes from chapters 14 and 18 of The Linux Programming Interface by Michael Kerrisk.

File systems and device drivers

When an application wants to interact with a file it will use various system calls such as open() and read(). These system calls go to the kernel, which then will communicate with a file system, which in turn will talk to a device driver to access the file on a device.

Devices

Each storage device in a Linux system is a device. The kernel uses a device driver to communicate with the device. Device drivers support all of the usual system calls for working with files, such as open(), close(), read(), and write().

The system associates a special file, called a device file, with each device in the system. These device files can be found in /dev. In addition, information about currently active devices is exported to files in /sys.

Each device file has a major ID number, which associates the file with a particular device driver, and a minor ID number, which is used to number devices in each device type.

cd /sys
ls -l

Disk partitions

Each disk is divided into one or more (nonoverlapping) partitions. Each partition is treated by the kernel as a separate device residing under the /dev directory. A disk partition may hold any type of information, but usually contains one of the following:

File systems

Linux systems can actually use several different kinds of file systems, including ext2, ext3, ext4, FAT32, NTFS, HFS, and NFS.

Why are there so many different file systems?

  1. Different operating systems commonly use different file systems. For example, Windows uses FAT32 and NTFS. If you want to use a device that was formatted on a non-Linux operating system you need to be able to work with its file system.
  2. NFS is a network file system that allows you to access files on file servers over a network.
  3. Linux uses several different virtual file systems for special purposes. For example, the /proc directory uses a special virtual file system to display information about running processes as a collection of special files.

Although the Linux file system appears to be a single, unified file system rooted at /, in reality a given Linux system can make use of many different file systems at different points in the directory tree.

Mount points

The mount command allows you to mount a device to a directory.

$ mount device directory

You can also use the mount command to see a list of currently mounted devices and their mount points.

$ mount
/dev/sda6 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda8 on /home type ext3 (rw,acl,user_xattr)
/dev/sda1 on /windows/C type vfat (rw,noexec,nosuid,nodev)
/dev/sda9 on /home/mtk/test type reiserfs (rw)

An example file system: ext2

The original file system on Linux systems was the ext file system. In time, this was superceded by the ext2 file system, which supported larger files and larger partitions. In turn, ext2 has been superceded by ext3 and ext4, which offer more advanced features such as journaling.

ext2 is a nice example of a file system to look at, since it is simple enough to understand easily.

In ext2 each file is represented by an inode, which is a fixed-size structure on disk that stores data about the file and provides information about the data blocks that make up the file.

The structure of ext2:

The superblock contains information about the file system itself. The inode table contains a list of inodes, each of which is identified by an inode number. inodes are numbered starting at 1. inode 1 contains information about bad data blocks that are not available to use. inode 2 gives information about the root directory on the file system. All other parts of the file system can be accessed starting from that root directory. When the file system gets mounted into a Linux system's full file system the root directory of the file system gets mapped to a mount point in the full file system.

The structure of an i-node:

The "other file information" portion of an inode contains file metadata for the file. Metadata includes:

  1. File permissions: who can do what with the file
  2. The file's owner and group
  3. The file's size in bytes
  4. The number of data blocks used by the file
  5. Access time, change time, modification time, deletion time
  6. The number of links to the file
  7. Extended information, including access control lists

In a program you can access most of this metadata by using the stat() function. stat() returns information about the file in a stat structure:

struct stat {
  dev_t     st_dev;     /* ID of device containing file */
  ino_t     st_ino;     /* Inode number */
  mode_t    st_mode;    /* File type and mode */
  nlink_t   st_nlink;   /* Number of hard links */
  uid_t     st_uid;     /* User ID of owner */
  gid_t     st_gid;     /* Group ID of owner */
  dev_t     st_rdev;    /* Device ID (if special file) */
  off_t     st_size;    /* Total size, in bytes */
  blksize_t st_blksize; /* Block size for filesystem I/O */
  blkcnt_t  st_blocks;  /* Number of 512B blocks allocated */
  struct timespec st_atim;  /* Time of last access */
  struct timespec st_mtim;  /* Time of last modification */
  struct timespec st_ctim;  /* Time of last status change */

  /* Backward compatibility */
  #define st_atime st_atim.tv_sec
  #define st_mtime st_mtim.tv_sec
  #define st_ctime st_ctim.tv_sec
};

In the terminal, the ls command and its various options also displays this information.

Obtaining Information About a File System: statvfs()

The statvfs() and fstatvfs() library functions obtain information about a mounted file system.

#include <sys/statvfs.h>

int statvfs(const char *pathname, struct statvfs *statvfsbuf);
int fstatvfs(int fd, struct statvfs *statvfsbuf);

The only difference between these two functions is in how the file system is identified. For statvfs(), we use pathname to specify the name of any file in the file system. For fstatvfs(), we specify an open file descriptor, fd, referring to any file in the file system. Both functions return a statvfs structure containing information about the file system in the buffer pointed to by statvfsbuf. This structure has the following form:

struct statvfs {
  unsigned long f_bsize;    /* File-system block size (in bytes) */
  unsigned long f_frsize;   /* Fundamental file-system block size
                               (in bytes) */
  fsblkcnt_t    f_blocks;   /* Total number of blocks in file system
                               (in units of 'f_frsize') */
  fsblkcnt_t    f_bfree;    /* Total number of free blocks */
  fsblkcnt_t    f_bavail;   /* Number of free blocks available to
                               unprivileged process */
  fsfilcnt_t    f_files;    /* Total number of i-nodes */
  fsfilcnt_t    f_ffree;    /* Total number of free i-nodes */
  fsfilcnt_t    f_favail;   /* Number of i-nodes available to
                               unprivileged process (set to
                               'f_ffree' on Linux) */
  unsigned long f_fsid;     /* File-system ID */
  unsigned long f_flag;     /* Mount flags */
  unsigned long f_namemax;  /* Maximum length of filenames on
                               this file system */
};

Directories and links

All file systems organize files into directories. A directory is simply a container for a set of files and other directories. In ext2, for example, each directory is actually a special file that contains a table of names and inode numbers. Some of these inodes will refer to files, while others will refer to directories.

Directories can also contain links. A link simply points to something else.

The ln command creates a link between two files. The link can either be a hard link or a soft link.

$ echo -n 'It is good to collect things,' > abc
$ ls -li abc
 122232 -rw-r--r--   1 mtk      users          29 Jun 15 17:07 abc
$ ln abc xyz
$ echo ' but it is better to go on walks.' >> xyz
$ cat abc
It is good to collect things, but it is better to go on walks.
$ ls -li abc xyz
 122232 -rw-r--r--   2 mtk      users          63 Jun 15 17:07 abc
 122232 -rw-r--r--   2 mtk      users          63 Jun 15 17:07 xyz

From the shell, symbolic links are created using the ln -s command. The ls -F command displays a trailing @ character at the end of symbolic links.

Manipulating files and directories

The rename() system call can be used both to rename a file and to move it into another directory on the same file system.

#include <stdio.h>
int rename(const char *oldpath, const char *newpath);

The mkdir() system call creates a new directory.

#include <sys/stat.h>
int mkdir(const char *pathname, mode_t mode);

The rmdir() system call removes the directory specified in pathname, which may be an absolute or a relative pathname.

#include <unistd.h>
int rmdir(const char *pathname);

The remove() library function removes a file or an empty directory.

#include <stdio.h>
int remove(const char *pathname);

Traversing a directory

The opendir() function opens a directory and returns a handle that can be used to refer to the directory in later calls.

#include <dirent.h>
DIR *opendir(const char *dirpath);

The readdir() function reads successive entries from a directory stream.

#include <dirent.h>
struct dirent *readdir(DIR *dirp);

Each call to readdir() reads the next directory entry from the directory stream referred to by dirp and returns a pointer to a statically allocated structure of type dirent, containing the following information about the entry:

struct dirent {
    ino_t d_ino;          /* File i-node number */
    char  d_name[];       /* Null-terminated name of file */
};

This structure is overwritten on each call to readdir().

The closedir() function closes the open directory stream referred to by dirp, freeing the resources used by the stream.

#include <dirent.h>
int closedir(DIR *dirp);

The nftw() function walks through the directory tree specified by dirpath and calls the programmer-defined function func once for each file in the directory tree.

#define _XOPEN_SOURCE 500
#include <ftw.h>
int nftw(const char *dirpath,
         int (*func) (const char *pathname, const struct stat *statbuf,
                      int typeflag, struct FTW *ftwbuf),
         int nopenfd, int flags);

Working directory

A process can retrieve its current working directory using getcwd().

#include <unistd.h>
char *getcwd(char *cwdbuf, size_t size);

On success, getcwd() returns a pointer to cwdbuf as its function result. If the pathname for the current working directory exceeds size bytes, then getcwd() returns NULL, with errno set to ERANGE.

The chdir() system call changes the calling process’s current working directory to the relative or absolute pathname specified in pathname (which is dereferenced if it is a symbolic link).

#include <unistd.h>
int chdir(const char *pathname);

Every process has a root directory, which is the point from which absolute pathnames (i.e., those beginning with /) are interpreted. By default, this is the real root directory of the file system. (A new process inherits its parent’s root directory.) On occasion, it is useful for a process to change its root directory, and a privileged process can do this using the chroot() system call.

#define _BSD_SOURCE
#include <unistd.h>
int chroot(const char *pathname);

The chroot() system call changes the process’s root directory to the directory specified by pathname (which is dereferenced if it is a symbolic link). Thereafter, all absolute pathnames are interpreted as starting from that location in the file system. This is sometimes referred to as setting up a chroot jail, since the program is then confined to a particular area of the file system.

Working with paths

The realpath() library function dereferences all symbolic links in pathname (a null-terminated string) and resolves all references to /. and /.. to produce a null-terminated string containing the corresponding absolute pathname.

#include <stdlib.h>
char *realpath(const char *pathname, char *resolved_path);

The dirname() and basename() functions break a pathname string into directory and filename parts.

#include <libgen.h>
char *dirname(char *pathname);
char *basename(char *pathname);