迈博汇金 破解:Anatomy of the Linux file system

来源:百度文库 编辑:九乡新闻网 时间:2024/04/27 17:08:14

Anatomy of the Linux file system

A layered structure-based review

M. Tim Jones (mtj@mtjones.com), Consultant Engineer, Emulex Corp.

Summary:  When it comes to file systems, Linux? is the Swiss Armyknife of operating systems. Linux supports a large number of file systems, fromjournaling to clustering to cryptographic. Linux is a wonderful platform for usingstandard and more exotic file systems and also for developing file systems. Thisarticle explores the virtual file system (VFS)—sometimes called the virtualfilesystem switch—in the Linux kernel and then reviews some of the majorstructures that tie file systems together.

Tags for this article:  disk, file, filesystem, filesystems, linux, system

Tag this!Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests

Date:  30 Oct 2007
Level:  Introductory
Also available in:  Russian Japanese Portuguese

Activity:  113322 views
Comments:   1 (View | Add comment - Sign in)

Average rating (269 votes)
Rate this article

Basic file system architecture

The Linux file system architecture is an interesting example of abstractingcomplexity. Using a common set of API functions, a large variety of file systemscan be supported on a large variety of storage devices. Take, for example, theread function call, which allows some number of bytesto be read from a given file descriptor. The readfunction is unaware of file system types, such as ext3 or NFS. It is also unawareof the particular storage medium upon which the file system is mounted, such as ATAttachment Packet Interface (ATAPI) disk, Serial-Attached SCSI (SAS) disk, orSerial Advanced Technology Attachment (SATA) disk. Yet, when theread function is called for an open file, the data isreturned as expected. This article explores how this is done and investigates themajor structures of the Linux file system layer.

What is a file system?

I'll start with an answer to the most basic question, the definition of a filesystem. A file system is an organization of data and metadata on a storage device.With a vague definition like that, you know that the code required to support thiswill be interesting. As I mentioned, there are many types of file systems andmedia. With all of this variation, you can expect that the Linux file systeminterface is implemented as a layered architecture, separating the user interfacelayer from the file system implementation from the drivers that manipulate thestorage devices.

File systems as protocols

Another wayto think about a file system is as a protocol. Just as network protocols (such asIP) give meaning to the streams of data traversing the Internet, file systems givemeaning to the data on a particular storage medium.

Mounting

Associating a file system to a storage device in Linux is a process calledmounting. The mount command is used to attacha file system to the current file system hierarchy (root). During a mount, youprovide a file system type, a file system, and a mount point.

To illustrate the capabilities of the Linux file system layer (and the use ofmount), create a file system in a file within the current file system. This isaccomplished first by creating a file of a given size usingdd (copy a file using /dev/zero as the source) -- inother words, a file initialized with zeros, as shown inListing 1.


Listing 1. Creating an initialized file
            $ dd if=/dev/zero of=file.img bs=1k count=10000            10000+0 records in            10000+0 records out            $            

You now have a file called file.img that's 10MB. Use thelosetup command to associate a loop device with thefile (making it look like a block device instead of just a regular file within thefile system):

$ losetup /dev/loop0 file.img            $            

With the file now appearing as a block device (represented by /dev/loop0), createa file system on the device with mke2fs. This commandcreates a new second ext2 file system of the defined size, as shown inListing 2.


Listing 2. Creating an ext2 file system with the loop device
            $ mke2fs -c /dev/loop0 10000            mke2fs 1.35 (28-Feb-2004)            max_blocks 1024000, rsv_groups = 1250, rsv_gdb = 39            Filesystem label=            OS type: Linux            Block size=1024 (log=0)            Fragment size=1024 (log=0)            2512 inodes, 10000 blocks            500 blocks (5.00%) reserved for the super user            ...            $            

The file.img file, represented by the loop device(/dev/loop0), is now mounted to the mount point/mnt/point1 using the mount command. Note thespecification of the file system as ext2. When mounted,you can treat this mount point as a new file system by doing using anls command, as shown in Listing 3.


Listing 3. Creating a mount point and mounting the file system through the loop device
            $ mkdir /mnt/point1            $ mount -t ext2 /dev/loop0 /mnt/point1            $ ls /mnt/point1            lost+found            $            

As shown in Listing 4, you can continue this process bycreating a new file within the new mounted file system, associating it with a loopdevice, and creating another file system on it.


Listing 4. Creating a new loop file system within a loop file system
            $ dd if=/dev/zero of=/mnt/point1/file.img bs=1k count=1000            1000+0 records in            1000+0 records out            $ losetup /dev/loop1 /mnt/point1/file.img            $ mke2fs -c /dev/loop1 1000            mke2fs 1.35 (28-Feb-2004)            max_blocks 1024000, rsv_groups = 125, rsv_gdb = 3            Filesystem label=            ...            $ mkdir /mnt/point2            $ mount -t ext2 /dev/loop1 /mnt/point2            $ ls /mnt/point2            lost+found            $ ls /mnt/point1            file.img lost+found            $            

From this simple demonstration, it's easy to see how powerful the Linux filesystem (and the loop device) can be. You can use this same approach to createencrypted file systems with the loop device on a file. This is useful to protectyour data by transiently mounting your file using the loop device when needed.

File system architecture

Now that you've seen file system construction in action, I'll get back to thearchitecture of the Linux file system layer. This article views the Linux filesystem from two perspectives. The first view is from the perspective of thehigh-level architecture. The second view digs in a little deeper and explores thefile system layer from the major structures that implement it.

High-level architecture

While the majority of the file system code exists in the kernel (except foruser-space file systems, which I'll note later), the architecture shown inFigure 1 shows the relationships between the major filesystem- related components in both user space and the kernel.


Figure 1. Architectural viewof the Linux file system components

User space contains the applications (for this example, the user of the filesystem) and the GNU C Library (glibc), which provides the user interface for thefile system calls (open, read, write, close). The system call interface acts as aswitch, funneling system calls from user space to the appropriate endpoints inkernel space.

The VFS is the primary interface to the underlying file systems. This componentexports a set of interfaces and then abstracts them to the individual filesystems, which may behave very differently from one another. Two caches exist forfile system objects (inodes and dentries), which I'll define shortly. Eachprovides a pool of recently-used file system objects.

Each individual file system implementation, such as ext2, JFS, and so on, exportsa common set of interfaces that is used (and expected) by the VFS. The buffercache buffers requests between the file systems and the block devices that theymanipulate. For example, read and write requests to the underlying device driversmigrate through the buffer cache. This allows the requests to be cached there forfaster access (rather than going back out to the physical device). The buffercache is managed as a set of least recently used (LRU) lists. Note that you canuse the sync command to flush the buffer cache out tothe storage media (force all unwritten data out to the device drivers and,subsequently, to the storage device).

What is a block device?

A blockdevice is one in which the data that moves to and from it occurs in blocks (suchas disk sectors) and supports attributes such as buffering and random accessbehavior (is not required to read blocks sequentially, but can access any block atany time). Block devices include hard drives, CD-ROMs, and RAM disks. This is incontrast to character devices, which differ in that they do not have aphysically-addressable media. Character devices include serial ports and tapedevices, in which data is streamed character by character.

That's the 20,000-foot view of the VFS and file system components. Now I'll lookat the major structures that implement this subsystem.

Major structures

Linux views all file systems from the perspective of a common set of objects.These objects are the superblock, inode, dentry, and file. At the root of eachfile system is the superblock, which describes and maintains state for the filesystem. Every object that is managed within a file system (file or directory) isrepresented in Linux as an inode. The inode contains all the metadata to manageobjects in the file system (including the operations that are possible on it).Another set of structures, called dentries, is used to translate between names andinodes, for which a directory cache exists to keep the most-recently used around.The dentry also maintains relationships between directories and files fortraversing file systems. Finally, a VFS file represents an open file (keeps statefor the open file such as the write offset, and so on).

Virtual file system layer

The VFS acts as the root level of the file-system interface. The VFS keeps trackof the currently-supported file systems, as well as those file systems that arecurrently mounted.

File systems can be dynamically added or removed from Linux using a set ofregistration functions. The kernel keeps a list of currently-supported filesystems, which can be viewed from user space through the /proc file system. Thisvirtual file also shows the devices currently associated with the file systems. Toadd a new file system to Linux, register_filesystem iscalled. This takes a single argument defining the reference to a file systemstructure (file_system_type), which defines the name ofthe file system, a set of attributes, and two superblock functions. A file systemcan also be unregistered.

Registering a new file system places the new file system and its pertinentinformation onto a file_systems list (see Figure 2 andlinux/include/linux/mount.h). This list defines the file systems that can besupported. You can view this list by typingcat /proc/filesystems at the command line.


Figure 2. File systemsregistered with the kernel

Another structure maintained in the VFS is the mounted file systems (seeFigure 3). This provides the file systems that arecurrently mounted (see linux/include/linux/fs.h). This links to thesuperblock structure, which I'll explore next.


Figure 3. The mounted filesystems list

Superblock

The superblock is a structure that represents a file system. It includes thenecessary information to manage the file system during operation. It includes thefile system name (such as ext2), the size of the file system and its state, areference to the block device, and metadata information (such as free lists and soon). The superblock is typically stored on the storage medium but can be createdin real time if one doesn't exist. You can find the superblock structure (seeFigure 4) in ./linux/include/linux/fs.h.


Figure 4. The superblockstructure and inode operations

One important element of the superblock is a definition of the superblockoperations. This structure defines the set of functions for managing inodes withinthe file system. For example, inodes can be allocated withalloc_inode or deleted withdestroy_inode. You can read and write inodes withread_inode and write_inodeor sync the file system with sync_fs. You can find thesuper_operations structure in./linux/include/linux/fs.h. Each file system provides its own inode methods, whichimplement the operations and provide the common abstraction to the VFS layer.

inode and dentry

The inode represents an object in the file system with a unique identifier. Theindividual file systems provide methods for translating a filename into a uniqueinode identifier and then to an inode reference. A portion of the inode structureis shown in Figure 5 along with a couple of the relatedstructures. Note in particular the inode_operations andfile_operations. Each of these structures refers to theindividual operations that may be performed on the inode. For example,inode_operations define those operations that operatedirectly on the inode and file_operations refer tothose methods related to files and directories (the standard system calls).


Figure 5. The inode structureand its associated operations

The most-recently used inodes and dentries are kept in the inode and directorycache respectively. Note that for each inode in the inode cache there is acorresponding dentry in the directory cache. You can find theinode and dentry structuresdefined in ./linux/include/linux/fs.h.

Buffer cache

Except for the individual file system implementations (which can be found at./linux/fs), the bottom of the file system layer is the buffer cache. This elementkeeps track of read and write requests from the individual file systemimplementations and the physical devices (through the device drivers). Forefficiency, Linux maintains a cache of the requests to avoid having to go back outto the physical device for all requests. Instead, the most-recently used buffers(pages) are cached here and can be quickly provided back to the individual filesystems.

Interesting file systems

This article spent no time exploring the individual file systems that areavailable within Linux, but it's worth note here, at least in passing. Linuxsupports a wide range of file systems, from the old file systems such asMINIX, MS-DOS, and ext2. Linux also supports the new journaling file systems suchas ext3, JFS, and ReiserFS. Additionally, Linux supports cryptographic filesystems such as CFS and virtual file system such as /proc.

One final file system worth noting is the Filesystem in Userspace, or FUSE. Thisis an interesting project that allows you to route file system requests throughthe VFS back into user space. So if you've ever toyed with the idea of creatingyour own file system, this is a great way to start.

Summary

While the file system implementation is anything but trivial, it's a greatexample of a scalable and extensible architecture. The file system architecturehas evolved over the years but has successfully supported many different types offile systems and many types of target storage devices. Using a plug-in basedarchitecture with multiple levels of function indirection, it will be interestingto watch the evolution of the Linux file system in the near future.


Resources

Learn

  • The proc file system provides a novel scheme for communicating between user space and the kernel through a virtual file system. "Access the Linux kernel using the /proc filesystem" (developerWorks, March 2006) introduces you to the /proc virtual file system and demonstrates its use.

  • The Linux system call interface provides the means to transition control between user space and the kernel to invoke kernel API functions. "Kernel command using Linux system calls" (developerWorks, 2007) explores the Linux system call interface.

  • Yolinux.com maintains a great list of Linux file systems, clustered file systems, and performance compute clusters. You can also find a complete list of Linux file systems in the File systems HOWTO. Xenotime provides another option with descriptions of a large number of file systems.

  • For more information on programming Linux in user space, check out GNU/Linux Application Programming , written by the author of this article.

  • In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.

  • In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.

  • See all Linux tips and Linux tutorials on developerWorks.

  • Stay current with developerWorks technical events and Webcasts.

Get products and technologies

  • The Filesystem in Userspace (FUSE) is a kernel module that enables development of file systems in user space. The file system driver implementation routes requests from the VFS back to user space. It's a great way to experiment with file system development without resorting to kernel development. If you're into Python, you can write a file system with this language as well using LUFS-Python.

  • Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2?, Lotus?, Rational?, Tivoli?, and WebSphere?.

  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2?, Lotus?, Rational?, Tivoli?, and WebSphere?.

Discuss

  • Get involved in the developerWorks community through blogs, forums, podcasts, and community topics in our new developerWorks spaces.

About the author

M. Tim Jones is an embedded software architect and the author of GNU/Linux Application Programming, AI Application Programming, and BSD Sockets Programming from a Multilanguage Perspective.His engineering background ranges from the development of kernels forgeosynchronous spacecraft to embedded systems architecture andnetworking protocols development. Tim is a Consultant Engineer forEmulex Corp. in Longmont, Colorado.