Notes on shell scripting

Introduction

Shell scripting is a fairly hard technique: most things are rather arcane, and unless you “get it”, some things seem to make no sense at all. Of course, the problem is actually getting them.

While I can't claim years of experience or usage, between reading the bash manual and writing scripts, I learnt a couple of useful things, and it's worth writing something to share them.

This article assumes a bit of knoweledge in shell scripting as it's not a tutorial. Also, it will use sh syntax, so it's probably useless to users of zsh or other shells using their own different language.

Data types

Shell scripts are, for all intents and purpose, complete programs, written in an interpreted programming language. As such, these programs operate on structured data. The shell has two main data types: programs and strings. Some shells also define other types, but for the most part they are compound types or are type specific to certain contextes.

Programs are, as the name suggests, executable files present in the filesystem. Some shells also define a number of built-in programs, which are just like normal programs, with the exception that instead of being files, they are pieces of code inside the shell itself.

Strings are simply anything that is not a program.

In the most simple case, programs and strings are identified by their positions in a line of code. The first word is the program, the rest is made of strings.

A line is anything starting at column 0 and terminated by a newline character (usually \n). There can be spaces before the program. The \ (backslash) character before a new line is used to make a line “wrap” (e.g. to keep it under 80 columns), but until a newline character is found, it's considered a single line. It doesn't matter if a string is the name of a program, only the first word will be considered a program, everything else is a string.

Strings are called “strings” because of the following piece of code:

int main(int argc, char *argv[])

It's the standard signature of the main function in a C program. The char * type of the argv argument tells us that whatever is there is an array of characters, which is equivalent to a string in C. Since the elements of that array are the words following the program name in a line of a shell script, those words have a “string” data type.

Chaining programs

The main purpose of a shell script isn't to call a single program, but to call many of them in a certain order, some of them only when certain conditions are met, and so on. In fact, shell scripts are one of the simplest form of automation, and one of the reasons behind the UN*X mantra about making small composeable programs, rather than big monoliths attempting at doing everything and beyond.

To chain programs together, the shell uses special strings: the newline character; the semicolon (;); an and character (&); two and characters (&&); one vertical bar (|); two vertical bars (||).

These strings are not passed to the programs, but are used by the shell to control how programs are executed.

The newline character simply makes the shell execute each program sequentially, waiting for the end of the first program before executing the next.

The semicolon is exactly the same as the newline character, with the exception that the two program can be written in the same line of the script. As such, this script:

echo 1
echo 2

Is equivalent to (the whitespace around the semicolon is optional):

echo 1 ; echo 2

The single and character is similar to the semicolon, but different than the newline: like the semicolon, programs chained by & can be written on the same line, but unlike the previous two, the program following the & will be executed without waiting for the previous command to terminate.

As an example, let's say we have a program called foo that will print a message after two seconds, if we execute it like this:

foo hello world! ; echo hello!

The result will be:

hello world!
# Two seconds of nothing
hello!

On the other hand, if we use &:

foo hello world! & echo hello!

The result will be:

hello!
# About two seconds of nothing
hello world!

The single vertical bar is a pipe. Its purpose is to direct the output of a program to the input of the next program. For the most part, rather than chaining it's more like making a single program out of multiple ones (think of searching a file for a word using cat piped to grep), but it's still a form of chaining.

There's not much to say about pipes, programs are executed sequentially like the semicolon, but unlike it, programs chained by pipes can have different behaviours, according to the output of the previous program in the pipe.

The double and (&&) and double vertical bar (||) are conditional chains: the second program is executed only if the first program terminated successfully (return code of 0) or unsuccessfully (return code different than 0), respectively.

The purpose of these strings are to execute a command only if a certain condition is met. For example, it's common to write something like:

[ -d "some name" ] || exit 1 ; cd "some name"

That snippet will terminate the script if "some name" is not a directory. It will change directory to "some name" otherwise.

Conditionals

The previous example uses introduced a conditional expression. It was focused on the program chaining, but it used square brackets to check a condition before executing the rest of the chain.

Conditions are actually just the return code of programs: a code of 0 means success, any other number is failure. This is because the number indicates the type of error. For example, historically a segmentation fault has a code of 11. Examining this number can give hints as to why a program terminated unsuccessfully. The number is converted to a string by the shell before is being used by the rest of the script.

When dealing with simple programs, the return code is enough, but sometimes scripts need to check something outside, like wether or not a file exists, or if it's writeable, or even if two strings are the same (think of parsing the arguments passed to the script.)

In this case, the program test is used. This program is also called [ (an open square bracket), though when executed with that name, it behaves slightly differently: the last argument passed to the program must be a closed square bracket.

If you've ever wondered why you can't write something like:

if ["$1" == "foo"] then ...

Like you'd do in some other languages like C (if (x == y) { ...), it's because [ is a program, and as such, there must be whitespace around both the open and the closed square bracket (the former because it's the program name, the latter because it's an argument to the program.)

In particular, since test and [ are programs, they must be terminated by a chaining string, so:

test "$1" == "foo" ; echo foo
[ "$1" == "foo" ] ; echo foo

Are correct, while:

test "$1" == "foo" echo foo
[ "$1" == "foo" ] echo foo

Are wrong and will generate a syntax error.

The test program, however, simply checks a condition, chaining with a single program at most. Unless they are chained with && or ||, any following programs will be executed regardless of the return code of test.

To execute a sequence of programs when a condition is met, the if statement is used.

Technically, if is not a program, but like the chaining strings, the shell treats it specially. It follows this syntax:

if command ; then sequence ; fi

if, then and fi are special strings. There are also elif and else to define alternative branches.

The semicolon is simply one of the chaining strings, so it can be substituded by a newline character.

The command can be anything, not just test, as long as it's a valid program. Anything between the program name and the semicolon/newline will be gived to the program as argument.

If the return code of the command is 0, the sequence is executed, otherwise any elif or else branches are checked. The sequence of commands is simply a chain of one or more programs and their arguments.

Quoting

Quoting is the act of stopping the shell from treating some strings as special. A very simple example is when dealing with spaces in file names: spaces are used to separate two arguments, so grep word file with space would be considered a call to the grep program with four arguments, while the intended behaviour would've been a call to grep with two arguments, word and file with space.

There are three common ways to quote in the shell, though some shells also provide less commonly used ways too. The first is the quote of a single character, done with \ (backslash). In fact, \ before a new line effectively quotes the new line character, telling the shell to ignore it and treat the following line as part of the same line.

The previous example would've been written as grep word file\ with\ space, using backslashes.

The backslash quotes only one character, and if a string contains many characters that need to be quoted, it's going to be very hard having to always type it.

As such, a string can be quoted also by surrounding it with ' (single quote or apostrophe) or " (double quote).

While they might seem interchangeable (especially if someone is coming from a language where they are really interchangeable, like Javascript), they have subtle differences: the double quote allows expansion, while the single quote doesn't.

Expansion is performed in many places by the shell, but the most common use is with variables. Expansion is done when a value stored in a variable is looked up.

The following example shows this behaviour:

$ var=value
$ echo $var
value
$ echo "$var"
value
$ echo '$var'
$var

Where $ at the start of a line is the shell prompt, while the lines with no $ are the string printed by echo.

Getting the output of a program as a string

The output of a program can be redirected to to the standard input of a different program, using a pipe, or to a file, using the > (less than) character.

Sometimes, however, it is desirable to get said output as a string, for example to save it inside a variable, or use it as an argument to a different program. The following example can be useful to operate on every file with a name matching certain patterns:

for i in "output of ls pattern"; do
     # operate on $i
done

There are two ways to do this: one is to use ` (backtick), the other is to prefix the command with $( and postfix it with ). These two commands are the same: `ls *.c` ; $(ls *.c).

There are some subtle differences between the two forms, but for the most common cases it's pretty much just a matter or readability.

The previous example can then be written like this:

for i in $(ls pattern); do
     #operate on $i
done