Symbolic Link

(Term from the UnixOperatingSystem, also available under some WindowsOperatingSystems. See SymbolicLinkOnWindows, WeakPointer)

Unix filesystems have two ways of referring to files: HardLinks and SymbolicLinks.

In the approach invented by Unix, HardLinks are essential: they are the means by which a directory associates the name of an object with that object itself, allowing it to be referenced by its parent/containing directory. That means that every file and directory below the root has a linkcount of at least 1. In addition, every directory has an entry named "." referring to itself, raising the minimum number of links to 2. Every subdirectory created in that same directory additionally has an entry named "..", referring to its parent directory, adding one to the parent's link count for each subdirectory.

Because of this approach to "." and "..", in general more than one directory entry can point to the same object, not just "." and "..". Users often consider these additional, optional links to be the only kind of hard link, forgetting about "." and ".." as hard links.

An object that has no remaining links to it is unreachable garbage that can be recycled to liberate disk space.

Filesystems that follow the traditional Unix design use reference counting to manage this recycling of objects that have no remaining hard links. Each object has a link count which drops to zero when the last directory entry which refers to that object is removed. If you shut down your machine while a file is being deleted, it's possible for unreachable garbage to remain, because the directory updating operation finished, but the liberation of the object did not. A file system checking program (often called fsck) can hunt those objects down and either remove them, or resurrect them in a special directory like lost+found. A journaling filesystem shouldn't run into this problem; it should have a logged record of the incomplete deletion operation and simply complete it on reboot.

With hard links out of the way, we can now discuss the real topic of this page, symbolic links.

A symbolic link is a special object which points to another file by relative or absolute pathname. Most operations which access the symbolic link are re-routed to the object to which it points; if that object doesn't exist, they fail.

Traditionally, a symbolic link is represented as a special kind of file, and the pathname text is stored as the contents of that file. A special attribute tells the operating system that the object is a symbolic link rather than a regular file. The operating system's lookup algorithm which navigates pathname references takes this into account; it reads the contents of symbolic links, and incorporates them into its navigation. This process is similar to macro-expansion. You can think of a pathname as an expression which refers to an object. A pathname that refers to a symbolic link object is a macro which replaces the path by the contents of the symbolic link, and the search continues; with the special provision that if the path is relative, the search continues from the last visited search point, which is the parent directory of the symbolic link.

Symbolic links are weak. Firstly, they can point to a nonexistent object. Such symbolic links are broken, just like out-of-date or invalid URL references in a web page. Renaming or renaming the target of a symbolic link does nothing to update that link. Symbolic links can also create cycles, which cause the lookup algorithm to keep revisiting the same symbolic link, leading to an infinite loop. Of course, there is defensive programming against this in any decent operating system, and a dedicated POSIX error code ELOOP to signal the situation when too many symbolic links have been encountered in resolving a path.

Secondly, symbolic links that contain relative pathnames change their meaning when they are moved. A symbolic link that points to the path ".." always refers to its parent directory. If you move that symbolic link somewhere, it refers to its new parent. This is advantageous: when you move an entire file tree, its internal symbolic links remain consistent, so long as they are relative. But it's a disadvantage too: any relative links outside of that tree will break.

(So lemme see if I've got this straight. The fact that symbolic links change their meaning is an "advantage" because when you're forced to move the WHOLE context of the symbolic link, it retains its correct meaning. And really, having your hand amputated is an advantage too because you can learn to use a hook.)

The POSIX command to create a hard link is:

 ln ./myfile ../otherdir/samefileasmyfile

When you do this, there is no difference between the two directory entries: when you change the file using one, it's the same as changing it using the other. Neither one is special. When you delete one link, the other one remains; when the last link is deleted, the file is deleted.

The command for creating a symbolic link is like this:

 ln -s ./myfile ../otherdir/symlinktomyfile

If the symlink is removed, the file is unaffected. If the last hard link is removed, the symlink remains, but cannot access the (now-deleted) file.

Note a subtlety in the above command. A symbolic link called symlinktomyfile is being created in the directory ../otherdir. That symlink will contain the path ./myfile. It will not actually refer to the file myfile in the current directory, but to a file called myfile in ../otherdir/myfile!

The ln command is not designed to be smart enough to do a path translation; it's just a thin wrapper for the operating system interface for creating a link with the specified contents, and people expect it to work that way.

The original author of this page probably did not intend this, but I'm keeping the example because it illustrates and underscores the shallow simplicity of symbolic links.

If you want a symbolic link inside ../otherdir to refer to this myfile over here, it will have to be something like:

  ln -s ../thisdir/myfile ../otherdir/myfile-symlink

ln -s ../thisdir/myfile ../otherdir # the myfile name will be used

Very often what is desired is actually

  ln -s `pwd`/../thisdir/myfile `pwd`/../otherdir

or equivalently using newer shell syntax

  ln -s $(pwd)/../thisdir/myfile $(pwd)/../otherdir

...to avoid ambiguity. Even better, to make sure the ".." doesn't get included in the symbolic link:

  ln -s $(cd ..; pwd)/thisdir/myfile $(cd ..; pwd)/otherdir

(Page heavily revised by KazKylheku)

[ rant moved to SymbolicLinkRant ]

Symbolic links are just delayed binding of the link name. Instead of evaluating the object which the link will refer to at link creation time, it will be evaluated at lookup time.

SymbolicLinks are also involved in many well-known Unix exploits. A longstanding way of cracking into a Unix system is to create a symlink to /etc/passwd (the password file, at least in the old days - if this file is nonexistent, anyone can become root), and trick some poor unsuspecting SetUserId? program (which runs with the privileges of whoever installs it - usually root - rather than the user who invokes it) to trunc() or otherwise clobber the file. As most file operations on a symlink (except renames and deletes, see above) affect the pointed-to file and not the link itself; this could result in /etc/passwd (or some other system file) from being compromised.

And this is different than hard links how? Especially considering the rant above that symlinks shouldn't exist?

It's probably worth mentioning that symbolic links can cross file systems and hard links cannot.

That's only because systems programmers are too dumb to come up with hard links that do cross filesystems. Or bidirectional hard links. Or directional hard links.

[Hard links that cross filesystems would look more and more like symbolic links. Perhaps a more elegant implementation than a pathname-stashed-in-a-file-with-a-particular-bit-in-the-inode-set, but the semantics of a Unix hard link, which make lots of assumptions about the physical layout of the device, clearly don't apply. When you add in the fact that a cross-volume link might point to a device which isn't mounted or otherwise isn't available (a remote filesystem across a network which is down), failure semantics must be thrown in as well.]

[A counter-example--even one of research quality only--might be a good thing, though.]

How would a hard link cross file system boundaries without becoming a symbolic link?

Easy, it's a chain of hard links. The first link in the chain points to the remote object component and has the name of the link to follow in it. The link in the remote system is either a hard link pointing to the target or it's another link that points to a remote object component and has ... blah blah. And of course, this is bidirectional.

The trick is that since these links are all in the root>>portals of the systems in question, they're inaccessible to the casual user even if they "cross" user-space. As a consequence, they can't be broken by a casual user but must be deleted properly by someone with legitimate permission to do so, or broken by the owner of one of the systems in the chain. IOW, these are NOT symbolic links!

Systems programmers are also too dumb to come up with sensible PermissionFlags, or history, or fine-grained history, or a whole bunch of other stuff which is ALL absolutely essential for user's sanity and peace of mind.

[Now we know why Richard is Richard. He gets no peace of mind, because the OS he runs doesn't provide the aforementioned laundry list of features! Until BlueAbyss is complete, he won't have peace of mind. And clearly, until he has peace of mind, he won't be able to work on BlueAbyss!]

[A classic case of DeadLock.]