What the 2>&1!

A while ago, I made a transition from the world of build automation and installation scripting (with a little bit of C mixed in) to the never ending story that is web development. I must admit that early on I was really wondering where I'd fit in without my trusty terminal window, tmux, and vim. But I assimilated and grew a healthy respect for the Sublime, and a genuine appreciation for the occasional Eclipse. I really learned to get my mouse on and it was fun.

But lately I'm learning that the world of DevOps is more than just coding in a comfy editor and throwing that code over the proverbial fence to release engineering and operations. It's a lot to do with owning that code 'til production. Part of this ownership involves composing really useful tools that have a command line interface and do one thing well: My former bread and butter. Recently, I decided to string a few of these commands together by writing a Borne shell script, which I just assumed would be immediately maintainable by my team. I was more than a little mistaken and was immediately bombarded by a bunch of really good questions.

Here's one such question:

What does 2>&1 mean?

In UNIX, whenever you run a program (and I mean any program) its code is moved into a container in memory called a process. A process is like a posh mansion for an executable and contains the latest amenities, including:

  • The program's code (in machine language).
  • Reserved memory (usually used for a heap).
  • A stack.
  • Command line arguments.
  • ...a whole bunch of other stuff...
  • A table of open files.

A table of open files?

Yup, just an ordinary list, mapping numbers to open files. For example, let's say I ran the command:

cat /tmp/somefile  

Here's how that table of open files may well look:

files[0] = ...  
files[1] = ...  
files[2] = ...  
files[3] = "/tmp/somefile"  

Why isn't /tmp/somefile at the front of the table?

Most UNIX commands—in fact the most useful ones—are pretty simple state machines: They process some input and produce some useful output, plus an inevitable error message or three. In order to facilitate this pattern, an early convention in UNIX was for all programs to have three already open files. These files are given names that stand in for those numbers:

File # Alias Usually Hooked Up To
0 Standard Input Your keyboard
1 Standard Output Your Screen
2 Standard Error Your Screen

So, for example, if a program were to write to file 1, whatever was written would appear on your terminal. Ditto for writes to file 2.

Further, if it reads from file 0, the process would temporarily wait for you to type something. When you do, the program would then process it.

Want to try it out? Run a program called cat:

$ cat
Hello world!  
Hello world!  
$

cat is program whose only job* is to read from standard input (file 0) and send what was read to the standard output (file 1).

In the example above:

  • I ran cat.
  • cat immediately started reading from its standard input.
  • I typed the text Hello world! and hit Enter.
  • cat sent this data to its standard output and it appeared on my terminal.
  • I typed the end-of-file character, by holding down CTRL and pressing D.
  • cat interpreted this as the end of its input and exited.

So, what's this >something business?

Remember how I mentioned that usually standard-input, -output and -error are hooked up to the keyboard, terminal, and terminal, respectively? This doesn't always have to be the case.

For example, I could tell my shell to kindly hook up the standard output of a command to a file:

$ cat 1>/tmp/file
Hello world!  
$

Here I did the same thing as before, and cat did exactly what it did before. The only difference is that when it wrote to the standard output, cat was actually writing to a file on disk.

Don't believe me?

$ ls /tmp/file
/tmp/file
$ cat 0</tmp/file
Hello world!  
$

You may have noticed that I asked something else of the shell in the command above. This time, I asked the shell to kindly hook cat's standard input to the file, /tmp/file. This caused cat to read from this file instead of thwritten ard.

These kind requests I've made to the shell are called Input/Output Redirections and they're neat in that they allow programs to be written to perform very simple tasks:

cat copies standard input to the standard output

...and used without modification to perform a variety of tasks:

# Put whatever I type into a file, /tmp/foo:
cat 1>/tmp/foo

# Send the contents of /tmp/foo to the terminal:
cat 0</tmp/foo

# Copy the file, /tmp/foo to /tmp/bar
cat 0</tmp/foo 1>/tmp/bar  

Wait. Multiple things?

Yup. You can ask the shell to perform I/O redirection as many times as you want. It will perform each task in order specified then run the program.

Okay. But why standard output and standard error? Don't they both go to the terminal?

You got it. Remember how I said that UNIX commands usually process their standard input and send their output to standard output? To ensure that this output stream remains clean and free of useful-but-out-of-band messages (like errors messages, warnings, and progress indicators) these are usually sent to another stream, the standard error.

Here's a concrete example. Just note that I sort of lied about cat. Where it's given a list of files, it will copy their contents to the standard output.

So, the example where I displayed the contents of /tmp/foo could have been rewritten as:

$ cat /tmp/foo
Hello world!  

And just to prove it can take multiple files:

$ cat /tmp/foo /tmp/bar
Hello world!  
Hello world!  
$

It can even take three files:

$ cat /tmp/foo /tmp/bar /tmp/uhoh
Hello world!  
Hello world!  
cat: /tmp/uhoh: No such file or directory  
$

Whoops!

Here is an example of a program sending an out-of-band message. Because it's sent to a different stream, I can still safely redirect the standard output to a file without needing to worry about cleaning up errors:

$ cat /tmp/foo /tmp/bar /tmp/uhoh 1>/tmp/output
cat: /tmp/uhoh: No such file or directory  
$ cat /tmp/output
Hello world!  
Hello world!  
$

I could even capture the errors to an entirely different file:

$ cat /tmp/foo /tmp/bar /tmp/uhoh 1>/tmp/output 2>/tmp/errors
$ cat /tmp/output
Hello world!  
Hello world!  
$ cat /tmp/errors
cat: /tmp/unoh: No such file or directory  
$

Do I need to type 1>/tmp/foo. Are there any shortcuts?

Yes!

Studies have shown that 9 times out of 10, when users redirect output from a program, they want to redirect its standard output. For this reason, the 1 can be omitted for output redirection (i.e. before the greater than sign, >).

The same studies have shown that 9.9999999 times out of 10, when users have redirected input to a program, they wanted it to go to its standard input, so the 0 can be omitted for input redirection (i.e. before the less than sign, <).

For example:

$ cat >/tmp/foo
Hello world!  #<--- This is me typing  
$ cat </tmp/foo
Hello world!  #<--- This is coming from the file  
$

So, what's 2>&1?!!??!!?

I guess that was your original question :-S.

Let's say I actually wanted all of the output—errors included—to go to a single file.

One way to do it would be to repeat myself:

$ cat /tmp/foo /tmp/bar /tmp/uhoh >/tmp/output 2>/tmp/output

Another would be to use an indirect reference to a file number:

$ cat /tmp/foo /tmp/bar /tmp/uhoh >/tmp/output 2>&1

This asks the shell to kindly hook the standard error to whatever the standard output is currently hooked up to. Note that these redirections happen in sequence, so in this context, this would have the standard error go to the file, /tmp/output.

Had I reversed the redirections:

$ cat /tmp/foo /tmp/bar /tmp/uhoh 2>&1 >/tmp/output

...this would have had the shell kindly send output destined to the standard error to wherever the standard output was originally going. This can be handy where you don't know where a stream is going, but you want another stream to go the same place.

That said, in this case, the result would be errors going to the terminal, and output going to the file, /tmp/output. Not the most useful request to the shell.

Oh, and what are file descriptors?

Above, I used the term, file numbers to refer to those indices into that table of open files. The real term the cool cats use is file descriptors.

In spite of the fancy name, they're still just numbers.

Thanks!

You're welcome :-)