Text: awk
AWK was released in 1977 and could be considered the original scripting language (Perl came in 1987, Python in 1991, and JavaScript in 1995). AWK is a complete language, but its sweet spot is small programs typed at the command line.
AWK’s strength is validating and manipulating textual data (strings and numbers). It was developed at a time when only grep and sed were available for searching files and performing basic textual substitutions. Being a Unix tool, it is designed to work well as part of the overall Unix tool chain.
AWK scans input files line by line with each line being broken out into fields. Each line is matched against a pattern and if there is a match the associated actions are taken. Common actions include manipulating the text in some way or performing a running calculation in order to generate a report.
Is it still worth learning AWK? My general answer is that it is always worth learning the core Unix shell tools really well as circumstances regularly pop up where these tools are, by far, the most efficient solution. Of course you can do everything that AWK does using a general purpose scripting language like Python, but the speed of writing a short one-liner AWK program that gets the job done is hard to beat. This is especially so when you get good at combining the Unix commands using pipes.
I am not trying to be complete in my description of AWK here (that’s what the resources below are for), rather my aim is to cover enough ground that I can quickly refresh the AWK basics and provide a number of AWK examples covering common use cases (sort of a mini-cookbook).
Resources
The AWK Peogramming Language, 2nd Edition, this is the official manual for the language written by the AWK creators themselves and recently updated in 2023 to keep things fresh.
Effective AWK Programming, 4th edition
AWK Versions
AWK has been around for a long time and there are a few different versions floating around. Most of the material on this page will apply to all versions, but you should ensure that you know the version you are running as there might be subtle differences.
AWK Programs
AWK is a line-oriented language and an AWK program consists of a series of pattern-action statements and function definitions.
pattern { action }
pattern { action }
function name(parameter-list) { statements }
Each line of the input file is scanned in order and each time that a line matches the pattern the action is executed. The pattern and the action are both optional (but, obviously, not both). If the pattern is missing the action is executed. If the action is missing the original input line is printed.
Of course there is a lot more detail to consider, but just knowing this line-oriented pattern-action approach gets you thinking about AWK the right way.
Running AWK
A common way to run AWK is as a single-line program on the command line. The program is entered between single quotes and it is followed the input file.
awk '{ print $1 }' survey.data
This program will print the first field of every line of the survey.data
file.
$1
is a special variable that references the first field on the input line.
awk '/foo/ { print }' survey.data
This will only print lines that contain “foo”. The print
statement by itself
prints the entire line, another way to do this is with the special variable
$0
which represents the entire line. Omitting the action altogether would have
the same effect.
awk '/foo/ ' survey.data
You can also specify a file containing the AWK program using the -f
flag.
awk -f first-col.awk survey.data
This will execute AWK using the contents of file first-col.awk
as the AWK
program and run the program against the input file survey.data
.
Calling AWK from Vim
It can be useful to call AWK from Vim. As an example, let’s say you had two columns of text in some part of the file you are editing and you wanted to switch those columns.
111 222
111 222
You could do a visual block selection of the lines in question and then issue the following AWK shell command to switch the columns.
:'<,'>!awk '{print $2 " " $1}'
And end up with this:
222 111
222 111
Once you understand how AWK works, there will be times in your editing where Vim macros or substitution can’t quite get the job done and AWK comes to the rescue. This relies on Vim’s ability to call any shell command and that Unix commands are designed to work with stdin and stdout.
Patterns
The pattern determines if the action will be performed and actions are a series of statements. If the pattern is absent the action is taken (i.e. the absence of a pattern equates to true).
BEGIN { statements }
: Executed before an input lines are read. There may be more than one and they are executed in the order they appear in the program. BEGIN usually comes first, although that is not a requirement. Common to set the field separator character (FS) in the BEGIN section and, when formatting tables, to print the header.END { statements }
: Executed after all input lines have been read. There may be more than one and they are executed in the order they appear in the program. END usually comes last, but it can appear anywhere in the program.expression { statements }
: Executed if the expression is true for the input line. In AWK, true means any expression that evaluates to nonzero or non-null./regex/ { statements }
: Executed if the input line matches the regular expression.pattern1, pattern2 { statements }
: This is known as the range pattern. The statements are executed for each line in the range where the range starts on the first line that matches pattern1 and ends on the next line that matches pattern2 (this might be the same line).