A while ago, I made a transition from the world of build automation and installation scripting (with a little bit of C mixed in) to the never ending story that is web development. I must admit that early on I was really wondering where I'd fit in without my trusty terminal window,
vim. But I assimilated and grew a healthy respect for the Sublime, and a genuine appreciation for the occasional Eclipse. I really learned to get my mouse on and it was fun.
But lately I'm learning that the world of DevOps is more than just coding in a comfy editor and throwing that code over the proverbial fence to release engineering and operations. It's a lot to do with owning that code 'til production. Part of this ownership involves composing really useful tools that have a command line interface and do one thing well: My former bread and butter. Recently, I decided to string a few of these commands together by writing a Borne shell script, which I just assumed would be immediately maintainable by my team. I was more than a little mistaken and was immediately bombarded by a bunch of really good questions.
Here's one such question:
In UNIX, whenever you run a program (and I mean any program) its code is moved into a container in memory called a process. A process is like a posh mansion for an executable and contains the latest amenities, including:
- The program's code (in machine language).
- Reserved memory (usually used for a heap).
- A stack.
- Command line arguments.
- ...a whole bunch of other stuff...
- A table of open files.
A table of open files?
Yup, just an ordinary list, mapping numbers to open files. For example, let's say I ran the command:
Here's how that table of open files may well look:
files = ... files = ... files = ... files = "/tmp/somefile"
/tmp/somefileat the front of the table?
Most UNIX commands—in fact the most useful ones—are pretty simple state machines: They process some input and produce some useful output, plus an inevitable error message or three. In order to facilitate this pattern, an early convention in UNIX was for all programs to have three already open files. These files are given names that stand in for those numbers:
|File #||Alias||Usually Hooked Up To|
|0||Standard Input||Your keyboard|
|1||Standard Output||Your Screen|
|2||Standard Error||Your Screen|
So, for example, if a program were to
write to file 1, whatever was written would appear on your terminal. Ditto for
writes to file 2.
Further, if it
reads from file 0, the process would temporarily wait for you to type something. When you do, the program would then process it.
Want to try it out? Run a program called
$ cat Hello world! Hello world! $
cat is program whose only job* is to read from standard input (
file 0) and send what was read to the standard output (
In the example above:
- I ran
catimmediately started reading from its standard input.
- I typed the text
Hello world!and hit
catsent this data to its standard output and it appeared on my terminal.
- I typed the end-of-file character, by holding down
catinterpreted this as the end of its input and exited.
So, what's this
Remember how I mentioned that usually standard-input, -output and -error are hooked up to the keyboard, terminal, and terminal, respectively? This doesn't always have to be the case.
For example, I could tell my shell to kindly hook up the standard output of a command to a file:
$ cat 1>/tmp/file Hello world! $
Here I did the same thing as before, and
cat did exactly what it did before. The only difference is that when it wrote to the standard output,
cat was actually writing to a file on disk.
Don't believe me?
$ ls /tmp/file /tmp/file $ cat 0</tmp/file Hello world! $
You may have noticed that I asked something else of the shell in the command above. This time, I asked the shell to kindly hook
cat's standard input to the file,
/tmp/file. This caused
cat to read from this file instead of thwritten ard.
These kind requests I've made to the shell are called Input/Output Redirections and they're neat in that they allow programs to be written to perform very simple tasks:
cat copies standard input to the standard output
...and used without modification to perform a variety of tasks:
# Put whatever I type into a file, /tmp/foo: cat 1>/tmp/foo # Send the contents of /tmp/foo to the terminal: cat 0</tmp/foo # Copy the file, /tmp/foo to /tmp/bar cat 0</tmp/foo 1>/tmp/bar
Wait. Multiple things?
Yup. You can ask the shell to perform I/O redirection as many times as you want. It will perform each task in order specified then run the program.
Okay. But why standard output and standard error? Don't they both go to the terminal?
You got it. Remember how I said that UNIX commands usually process their standard input and send their output to standard output? To ensure that this output stream remains clean and free of useful-but-out-of-band messages (like errors messages, warnings, and progress indicators) these are usually sent to another stream, the standard error.
Here's a concrete example. Just note that I sort of lied about
cat. Where it's given a list of files, it will copy their contents to the standard output.
So, the example where I displayed the contents of
/tmp/foo could have been rewritten as:
$ cat /tmp/foo Hello world!
And just to prove it can take multiple files:
$ cat /tmp/foo /tmp/bar Hello world! Hello world! $
It can even take three files:
$ cat /tmp/foo /tmp/bar /tmp/uhoh Hello world! Hello world! cat: /tmp/uhoh: No such file or directory $
Here is an example of a program sending an out-of-band message. Because it's sent to a different stream, I can still safely redirect the standard output to a file without needing to worry about cleaning up errors:
$ cat /tmp/foo /tmp/bar /tmp/uhoh 1>/tmp/output cat: /tmp/uhoh: No such file or directory $ cat /tmp/output Hello world! Hello world! $
I could even capture the errors to an entirely different file:
$ cat /tmp/foo /tmp/bar /tmp/uhoh 1>/tmp/output 2>/tmp/errors $ cat /tmp/output Hello world! Hello world! $ cat /tmp/errors cat: /tmp/unoh: No such file or directory $
Do I need to type
1>/tmp/foo. Are there any shortcuts?
Studies have shown that 9 times out of 10, when users redirect output from a program, they want to redirect its standard output. For this reason, the
1 can be omitted for output redirection (i.e. before the greater than sign,
The same studies have shown that 9.9999999 times out of 10, when users have redirected input to a program, they wanted it to go to its standard input, so the
0 can be omitted for input redirection (i.e. before the less than sign,
$ cat >/tmp/foo Hello world! #<--- This is me typing $ cat </tmp/foo Hello world! #<--- This is coming from the file $
I guess that was your original question :-S.
Let's say I actually wanted all of the output—errors included—to go to a single file.
One way to do it would be to repeat myself:
$ cat /tmp/foo /tmp/bar /tmp/uhoh >/tmp/output 2>/tmp/output
Another would be to use an indirect reference to a file number:
$ cat /tmp/foo /tmp/bar /tmp/uhoh >/tmp/output 2>&1
This asks the shell to kindly hook the standard error to whatever the standard output is currently hooked up to. Note that these redirections happen in sequence, so in this context, this would have the standard error go to the file,
Had I reversed the redirections:
$ cat /tmp/foo /tmp/bar /tmp/uhoh 2>&1 >/tmp/output
...this would have had the shell kindly send output destined to the standard error to wherever the standard output was originally going. This can be handy where you don't know where a stream is going, but you want another stream to go the same place.
That said, in this case, the result would be errors going to the terminal, and output going to the file,
/tmp/output. Not the most useful request to the shell.
Oh, and what are file descriptors?
Above, I used the term, file numbers to refer to those indices into that table of open files. The real term the cool cats use is file descriptors.
In spite of the fancy name, they're still just numbers.
You're welcome :-)