These notes will provide some additional details on file systems on Linux systems. Most of the material covered here gives details on what happens inside the kernel when you make various Linux system calls. This material is not usually covered in a systems programming course, and is usually covered in great detail in a course on operating systems. Since we do not currently offer a course on operating systems, I am presenting this material here is a supplement to our other course material.
Most of the material in these notes comes from chapters 14 and 18 of The Linux Programming Interface by Michael Kerrisk.
When an application wants to interact with a file it will use various system calls such as open()
and read()
. These system calls go to the kernel, which then will communicate with a file system, which in turn will talk to a device driver to access the file on a device.
Devices
Each storage device in a Linux system is a device. The kernel uses a device driver to communicate with the device. Device drivers support all of the usual system calls for working with files, such as open()
, close()
, read()
, and write()
.
The system associates a special file, called a device file, with each device in the system. These device files can be found in /dev
. In addition, information about currently active devices is exported to files in /sys
.
Each device file has a major ID number, which associates the file with a particular device driver, and a minor ID number, which is used to number devices in each device type.
cd /sys ls -l
Each disk is divided into one or more (nonoverlapping) partitions. Each partition is treated by the kernel as a separate device residing under the /dev
directory. A disk partition may hold any type of information, but usually contains one of the following:
Linux systems can actually use several different kinds of file systems, including ext2, ext3, ext4, FAT32, NTFS, HFS, and NFS.
Why are there so many different file systems?
/proc
directory uses a special virtual file system to display information about running processes as a collection of special files.Although the Linux file system appears to be a single, unified file system rooted at /
, in reality a given Linux system can make use of many different file systems at different points in the directory tree.
The mount command allows you to mount a device to a directory.
$ mount device directory
You can also use the mount command to see a list of currently mounted devices and their mount points.
$ mount /dev/sda6 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/sda8 on /home type ext3 (rw,acl,user_xattr) /dev/sda1 on /windows/C type vfat (rw,noexec,nosuid,nodev) /dev/sda9 on /home/mtk/test type reiserfs (rw)
The original file system on Linux systems was the ext file system. In time, this was superceded by the ext2 file system, which supported larger files and larger partitions. In turn, ext2 has been superceded by ext3 and ext4, which offer more advanced features such as journaling.
ext2 is a nice example of a file system to look at, since it is simple enough to understand easily.
In ext2 each file is represented by an inode, which is a fixed-size structure on disk that stores data about the file and provides information about the data blocks that make up the file.
The structure of ext2:
The superblock contains information about the file system itself. The inode table contains a list of inodes, each of which is identified by an inode number. inodes are numbered starting at 1. inode 1 contains information about bad data blocks that are not available to use. inode 2 gives information about the root directory on the file system. All other parts of the file system can be accessed starting from that root directory. When the file system gets mounted into a Linux system's full file system the root directory of the file system gets mapped to a mount point in the full file system.
The structure of an i-node:
The "other file information" portion of an inode contains file metadata for the file. Metadata includes:
In a program you can access most of this metadata by using the stat()
function. stat()
returns information about the file in a stat structure:
struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* Inode number */ mode_t st_mode; /* File type and mode */ nlink_t st_nlink; /* Number of hard links */ uid_t st_uid; /* User ID of owner */ gid_t st_gid; /* Group ID of owner */ dev_t st_rdev; /* Device ID (if special file) */ off_t st_size; /* Total size, in bytes */ blksize_t st_blksize; /* Block size for filesystem I/O */ blkcnt_t st_blocks; /* Number of 512B blocks allocated */ struct timespec st_atim; /* Time of last access */ struct timespec st_mtim; /* Time of last modification */ struct timespec st_ctim; /* Time of last status change */ /* Backward compatibility */ #define st_atime st_atim.tv_sec #define st_mtime st_mtim.tv_sec #define st_ctime st_ctim.tv_sec };
In the terminal, the ls
command and its various options also displays this information.
The statvfs()
and fstatvfs()
library functions obtain information about a mounted file system.
#include <sys/statvfs.h> int statvfs(const char *pathname, struct statvfs *statvfsbuf); int fstatvfs(int fd, struct statvfs *statvfsbuf);
The only difference between these two functions is in how the file system is identified. For statvfs()
, we use pathname to specify the name of any file in the file system. For fstatvfs()
, we specify an open file descriptor, fd
, referring to any file in the file system. Both functions return a statvfs
structure containing information about the file system in the buffer pointed to by statvfsbuf
. This structure has the following form:
struct statvfs { unsigned long f_bsize; /* File-system block size (in bytes) */ unsigned long f_frsize; /* Fundamental file-system block size (in bytes) */ fsblkcnt_t f_blocks; /* Total number of blocks in file system (in units of 'f_frsize') */ fsblkcnt_t f_bfree; /* Total number of free blocks */ fsblkcnt_t f_bavail; /* Number of free blocks available to unprivileged process */ fsfilcnt_t f_files; /* Total number of i-nodes */ fsfilcnt_t f_ffree; /* Total number of free i-nodes */ fsfilcnt_t f_favail; /* Number of i-nodes available to unprivileged process (set to 'f_ffree' on Linux) */ unsigned long f_fsid; /* File-system ID */ unsigned long f_flag; /* Mount flags */ unsigned long f_namemax; /* Maximum length of filenames on this file system */ };
All file systems organize files into directories. A directory is simply a container for a set of files and other directories. In ext2, for example, each directory is actually a special file that contains a table of names and inode numbers. Some of these inodes will refer to files, while others will refer to directories.
Directories can also contain links. A link simply points to something else.
The ln
command creates a link between two files. The link can either be a hard link or a soft link.
$ echo -n 'It is good to collect things,' > abc $ ls -li abc 122232 -rw-r--r-- 1 mtk users 29 Jun 15 17:07 abc $ ln abc xyz $ echo ' but it is better to go on walks.' >> xyz $ cat abc It is good to collect things, but it is better to go on walks. $ ls -li abc xyz 122232 -rw-r--r-- 2 mtk users 63 Jun 15 17:07 abc 122232 -rw-r--r-- 2 mtk users 63 Jun 15 17:07 xyz
From the shell, symbolic links are created using the ln -s
command. The ls -F
command displays a trailing @
character at the end of symbolic links.
The rename()
system call can be used both to rename a file and to move it into another directory on the same file system.
#include <stdio.h> int rename(const char *oldpath, const char *newpath);
The mkdir()
system call creates a new directory.
#include <sys/stat.h> int mkdir(const char *pathname, mode_t mode);
The rmdir()
system call removes the directory specified in pathname
, which may be an absolute or a relative pathname.
#include <unistd.h> int rmdir(const char *pathname);
The remove()
library function removes a file or an empty directory.
#include <stdio.h> int remove(const char *pathname);
The opendir()
function opens a directory and returns a handle that can be used to refer to the directory in later calls.
#include <dirent.h> DIR *opendir(const char *dirpath);
The readdir()
function reads successive entries from a directory stream.
#include <dirent.h> struct dirent *readdir(DIR *dirp);
Each call to readdir()
reads the next directory entry from the directory stream referred to by dirp
and returns a pointer to a statically allocated structure of type dirent
, containing the following information about the entry:
struct dirent { ino_t d_ino; /* File i-node number */ char d_name[]; /* Null-terminated name of file */ };
This structure is overwritten on each call to readdir()
.
The closedir()
function closes the open directory stream referred to by dirp
, freeing the resources used by the stream.
#include <dirent.h> int closedir(DIR *dirp);
The nftw()
function walks through the directory tree specified by dirpath
and calls the programmer-defined function func
once for each file in the directory tree.
#define _XOPEN_SOURCE 500 #include <ftw.h> int nftw(const char *dirpath, int (*func) (const char *pathname, const struct stat *statbuf, int typeflag, struct FTW *ftwbuf), int nopenfd, int flags);
A process can retrieve its current working directory using getcwd()
.
#include <unistd.h> char *getcwd(char *cwdbuf, size_t size);
On success, getcwd()
returns a pointer to cwdbuf
as its function result. If the pathname for the current working directory exceeds size
bytes, then getcwd()
returns NULL
, with errno
set to ERANGE
.
The chdir()
system call changes the calling process’s current working directory to the relative or absolute pathname specified in pathname
(which is dereferenced if it is a symbolic link).
#include <unistd.h> int chdir(const char *pathname);
Every process has a root directory, which is the point from which absolute pathnames (i.e., those beginning with /) are interpreted. By default, this is the real root directory of the file system. (A new process inherits its parent’s root directory.) On occasion, it is useful for a process to change its root directory, and a privileged process can do this using the chroot()
system call.
#define _BSD_SOURCE #include <unistd.h> int chroot(const char *pathname);
The chroot()
system call changes the process’s root directory to the directory specified by pathname
(which is dereferenced if it is a symbolic link). Thereafter, all absolute pathnames are interpreted as starting from that location in the file system. This is sometimes referred to as setting up a chroot jail, since the program is then confined to a particular area of the file system.
The realpath()
library function dereferences all symbolic links in pathname (a null-terminated string) and resolves all references to /.
and /..
to produce a null-terminated string containing the corresponding absolute pathname.
#include <stdlib.h> char *realpath(const char *pathname, char *resolved_path);
The dirname()
and basename()
functions break a pathname string into directory and filename parts.
#include <libgen.h> char *dirname(char *pathname); char *basename(char *pathname);