Recent Changes - Search:
NTLUG

Linux is free.
Life is good.

Linux Training
10am on Meeting Days!

1825 Monetary Lane Suite #104 Carrollton, TX

Do a presentation at NTLUG.

What is the Linux Installation Project?

Real companies using Linux!

Not just for business anymore.

Providing ready to run platforms on Linux

Created by Chris Cox on August 19, 2009, at 03:42 AM

(but were afraid to ask)
At the heart of almost any operating system is a filesystem. That is, a place to keep files containing data. To organize such files, most filesystems support the idea of a folder or directory and those are usually hierarchical, that is, one folder can contain other folders and so on.

Inside of Unix and Linux, files are very important. They not only serve as places for data and programs and such, but also serve as access points for devices and sometimes provide views and access into the deep recesses of the kernel itself!

A Folder is a File

To understand a folder you need to know that a file is just data... it has no name. Well, it does have an "address" if you will, but file names are assignments that are made inside of a special file called a directory. Thus a special file contains a list of file names and the "address" of the data that is associated with the file. Directory files also have "addresses" and thus can be named inside of another directory file.

The Chicken and the Egg

If directories contain files and other directories, how does this all get started? A filesystem is mounted to an empty directory called a mount point from which all folders and files will be found for that filesystem. But even a mount point, which is an empty directory, has to go somewhere... where? The answer is the "root". One filesystem has to be mounted at boot automatically. This filesystem is called the "root" and has a special psuedo mount point called "/". The creation of this root filesystem is usually done at initial installation of the OS. It is essential to the proper operation of the system. That does not mean it has to stored on a typical harddrive though. Just note that it is important to have, however it is made.

What's Your Address?

The abstraction to get at the file data we called an "address" earlier. Technically, this is called an "i-node" or inode for short. An inode stores information about the file that isn't part of the actual data content of the file. This includes things like permissions, ownership, etc. Within a mounted filesystem, these numeric inode "addresses" are unique. From them, the OS can get to the file content. Thus inode numbers are not guaranteed to be unique across different filesystems, so the context for an inode is the filesystem in which it resides. Think of it like our own communities. A local area might consists of many neighborhoods, towns, villages or cities. My community may have a First St. and someone not terribly far away may also live on First St. We may both have the same numeric address on First St., yet our addresses are not the same location. And so it is for filesystems and files. An inode for a file is unique within the context of its filesystem.

Directories contain a mapping of names to inodes. A directory has its own inode number and apart from the special root directory, a directory inode must have a name to inode mapping in a parent directory in order to be found. Anytime a file or directory loses all connection to the directory heirarchy, that is, the inode has no name and can no longer be found, that inode is said to be an orphan inode. Orphans have no parent structure, and therefore cannot be found. Normally, you will not see orphaned inodes. Orphans are usually created as a part of an interrupted and failed operation where an inode never was associated with a name mapping inside of a directory. Many filesystems have transaction journaling that helps mitigate the formation of orphaned files. With that said, if a filesystem were to get corrupted (always possible), filesystem repair might end up identifying orphaned inodes. These orphans end up in a special directory at the root of a mounted filesystem called (appropriately) lost+found.

Oh What a Tangled Web We Weave...

To better get a handle on files and directories and these things called inodes, it's wise to examine them in a context. The most familiar directory for users is their own home directory. So let's start there.

No Place Like Home

Just like a booting system needs a root filesystem on which to hang all files, users start out in a home directory. The home directory can be any directory inside of any filesystem. While typically found as a subdirectory of /home, a user's home directory could be anywhere. The mapping of a username to their home directory is found the file /etc/passwd. The /etc/passwd file is a text file with fields delimited by colons. Usernames are in the first field and their corresponding home directory is at the 6th field. With that said, /etc/passwd is just one possible source of user information. So, because of that, we can use a utility called getent which uses a feature that in general searches all known configured sources of username information.

 $ getent passwd ccox
 ccox:x:1000:100:Chris Cox:/localhome/ccox:/bin/bash

Where's my password?

"I don't see my password in there?" Nope. You don't. The passwd database used to contain a hashed (DES) version of your password. The problem is that the passwd data (the stuff other than the password) needs to be readable by anyone (otherwise, you couldn't know about users at all). When people found that computing DES hashes was rather quick, it became pretty easy to perform a brute force hack on all all hashed passwords in the passwd database. Therefore, in general, the hashes have been moved out of passwd and placed in a protected area (non-readable, except by an admin) keyed by the username (e.g. /etc/shadow).

Linux and some newer Unix distributions use other hash types like MD5 and Blowfish to make brute force attacks much harder even when the hashes are made visible. The hash types available in Linux depend upon what was compiled into glibc. MD5 is almost always present. Distributions like openSUSE include Blowfish as well.

Your home directory is also stored in your login shell at the variable called HOME.

 $ echo $HOME
 /localhome/ccox

You can also refer to any user's home directory using ~username. If you want your own home directory, just use ~.

 $ echo ~pulse
 /var/lib/pulseaudio
 $ echo ~
 /localhome/ccox

It is also good to know that if you do not supply a directory to the change directory, cd, command, then you will go to your home directory.

Home is Where the Heart Is

Let's create a new directory in our home called "heart". Inside of that directory, create a file called "aorta" and make two directories called "atrium" and "ventricle". Go into both of those sub directories and create files called "left" and "right".

 $ cd  # Remember this will take us to our home directory.
 $ mkdir heart
 $ cd heart
 $ touch aorta
 $ mkdir atrium ventricle
 $ cd atrium
 $ touch left right
 $ cd ../ventricle
 $ touch left right
 $ cd ~/heart

Tying Things Together

Directories are the key to not only associating names to inodes but they serve as the canvas to create the whole structure of the filesystem we navigate as users. The current directory is known as ".", called "dot". Every directory has a parent directory to which it belongs known as "..", called "dot-dot". Hidden filenames in Unix/Linux start with ".", so normally you cannot see these directory names. Use the -a option to ls to see these hidden files. To see the inode numbers associated with the files use the -i option. If the number of files in the directory is small, using the -1 (dash one) option will place the output in a single column. In order to distinguish file types (file vs. directory/ for example) add the -F option.

 $ cd
 $ cd heart
 $ ls -ia1F      # That 1 is a ONE!!
 2624 ./
    4 ../
 4842 aorta
 4852 atrium/
 4853 ventricle/
 $ cd atrium
 $ ls -ia1F
 4852 ./
 2624 ../
 4856 left
 4861 right

Notice that before we traversed into the directory atrium, that the inode number was 4852 and thus the inode of "dot" was also 4852 after entering that directory. The parent directory is 2624 which we can see is indeed the value of "dot" when we were in the heart directory.

The Key to Untying Things

So far we have seen an association of names to inodes. Another value that is kept on every file and directory is an incremented counter called a "link count". When files are "removed" the link count is decremented. If the link count goes down to zero, then the file is not referenced anywhere else and the file is removed. Referenced anywhere else? What does that mean?

One way to examine this is to use our sample directory scheme. You can view the link count by using the long listing option -l (lower case L) to the ls command.

 $ cd
 $ cd heart
 $ ls -ia1Fl     # First -1 is a ONE, second -l is a lowercase L
 total 3
 2624 drwxr-xr-x  4 ccox users  128 2009-08-22 19:29 ./
    4 drwxr-xr-x 55 ccox users 3512 2009-08-22 19:28 ../
 4842 -rw-r--r--  1 ccox users    0 2009-08-22 19:28 aorta
 4852 drwxr-xr-x  2 ccox users   96 2009-08-22 19:29 atrium/
 4853 drwxr-xr-x  2 ccox users   96 2009-08-22 19:29 ventricle/
 $ cd atrium
 $ ls -ia1Fl
 total 0
 4852 drwxr-xr-x 2 ccox users  96 2009-08-22 19:29 ./
 2624 drwxr-xr-x 4 ccox users 128 2009-08-22 19:29 ../
 4856 -rw-r--r-- 1 ccox users   0 2009-08-22 19:29 left
 4861 -rw-r--r-- 1 ccox users   0 2009-08-22 19:29 right

The third field of the long listing shows the link count. For example, atrium shows a link count of 2. Directories will always have a link count of at least two. Why? One for the directory entry in the parent and one for the "dot" entry in the directory itself. You can think of links this way, the link count on a file represents the total number of references to that inode inside of directories. The file aorta shows a link count of 1 because the only reference to it is inside of the heart directory. Same thing can be seen for left and right, normal files when created start out with a link count of 1.

The rm command is used to "remove" files (and directories if we provide some options). What actually happens is a decrement of the link count and if the link count goes to zero, THEN, the space is reclaimed by the filesystem.

Underneath the rm Command.

Underneath user commands are programs that make function calls inside of code to perform operations. Removing a file is a system command called unlink (try: man 2 unlink). Even the name of the system function give us insight into what a remove actually does.

It is NOT possible to create cross directory linkages for directories. That is you cannot create additional references to a directory inside of other directories. What makes the link count go up on directories is the fact that additional directories are created directly underneath a directory. Consider the heart directory link count. It shows a link count of 4. We know that there is a mandatory 2 for the the parent link and the directory itself. The additional 2 come from the the parent links inside of the sub-directories atrium and ventricle. The number of sub-directories for any directory can be determined by looking at the link count of that directory and subtracting 2.

Creating Links

There are actually two kinds of "links". The true link, called a hard link, is what we have examined so far, so let's continue with that. As noted already, you cannot create links to directories, but the system maintains directory linkages to enable you to traverse directories. Without those links, whole directory trees could become lost... and oddly enough, files that "exist" but cannot be reached are called orphans.

To create a link to an existing inode, you can use the ln command.

 $ cd
 $ cd heart
 $ ln aorta artery
 $ ls -ia1Fl     # First -1 is a ONE, second -l is a lowercase L
 total 3
 2624 drwxr-xr-x  4 ccox users  128 2009-08-22 19:29 ./
    4 drwxr-xr-x 55 ccox users 3512 2009-08-22 19:28 ../
 4842 -rw-r--r--  2 ccox users    0 2009-08-22 19:28 aorta
 4842 -rw-r--r--  2 ccox users    0 2009-08-22 19:28 artery
 4852 drwxr-xr-x  2 ccox users   96 2009-08-22 19:29 atrium/
 4853 drwxr-xr-x  2 ccox users   96 2009-08-22 19:29 ventricle/

This creates a hard link in the same directory. The file data can be manipulated using either entry. Notice that both entries are showing a link count now of 2. we would have to remove both the artery entry and aorta entry in order to actually free up any space occupied by the file (which is miniscule since our file is empty).

File links like this can be created throughout a given filesystem which allows for space saving and snapshot-like functions. For example:

 $ cd
 $ cp -alR heart new-heart # lower case L option

The cp command copies the heart directory recursively (-R), preserves permissions and time stamps (-a) and creates links (-l) to the individual files instead of copying them. Thus new-heart consists of new directories, but all files underneath are hard linked to the corresponding files from heart.

The Missing Link?

So which one is the REAL file? If you haven't guessed there is no difference between the original file in it's original location and a hard link you created anywhere else in the filesystem. Which one is real? All references to the inode are REAL. No difference.

Back to Site Blogs

Page last modified on September 18, 2009, at 03:11 AM