NAME
awk - Finds lines in files and makes specified changes to
them
SYNOPSIS
awk -f program [-Fcharacter] [file ...]
awk [-Fcharacter] statement ... [file ...]
The awk command is a more powerful pattern-matching command
than the grep command. It can perform limited processing on
the input lines, instead of simply displaying lines that
match.
FLAGS
-Fcharacter
Uses character as the field separator character (a space
by default).
-f program
Searches for the patterns and performs the actions found
in the file program.
DESCRIPTION
The awk command provides a flexible text-manipulation
language suitable for simple report generation. awk is a
more powerful tool for text manipulation than either sed or
grep.
The awk command:
o Performs convenient numeric processing.
o Allows variables within actions.
o Allows general selection of patterns.
o Allows control flow in the actions.
o Does not require any compiling of programs.
Pattern-matching and action statements can be specified
either on the command line or in a program file. In either
case, the awk command first reads all matching and action
statements, then it reads a line of its input and compares
it to each specified pattern. If the line matches a speci-
fied pattern, awk performs the specified actions and writes
the result to standard output. When it has compared the
current input line to all patterns, it reads the next line.
The awk command reads input files in the order stated on the
command line. If you specify a filename as a - (dash) or do
not specify a filename, awk reads standard input.
Enclose pattern-action statements on the command line in ''
(single quotes) to protect them from interpretation by the
shell. Consecutive pattern-action statements on the same
command line must be separated by a ; (semicolon), within
one set of quotes. Consecutive pattern-action statements in
an awk program file must be on separate lines.
You can assign values to variables on the awk command line
as follows:
variable=value
The awk command treats input lines as fields separated by
spaces, tabs, or a field separator you set with the FS vari-
able. (Consecutive spaces are recognized as a single
separator.) Fields are referenced as $1, $2, and so on. $0
refers to the entire line.
Pattern-Action Statements
Pattern-action statements follow the form:
pattern {action}
If a pattern lacks a corresponding action, awk writes the
entire line that contains the pattern to standard output.
If an action lacks a corresponding pattern, awk applies it
to every line.
Actions
An action is a sequence of statements that follow C language
syntax. These statements can include:
if (expression) statement [ else statement ]
while (expression) statement
for (expression;expression;expression) statement
for (variable in array) statement
break
continue
{ [ statement ... ] }
variable=expression
print [ expression_list ] [ >file ] [ | command ]
printf format[ ,expression_list ] [ >file | >>file | | command ]
next
exit [ expression ]
delete array [ expression ]
Statements can end with a semicolon, a newline character, or
the right brace enclosing the action.
Expressions can have string or numeric values and are built
using the operators +, -, , /, %, and ^ (exponentiation), a
space for string concatenation, and the C operators ++, --,
+=, -=, , /=, %=, ^=, *=, >, >=, <, <=, ==, !=, and ?:.
Because the actions process fields, input white space is not
preserved in the output.
The file and command arguments can be literal names or
expressions enclosed in parentheses. Identical string
values in different statements refer to the same open file.
The print statement writes its arguments to standard output
(or to a file if > file or >> file is present), separated by
the current output field separator and terminated by the
current output record separator.
The printf statement writes its arguments to standard output
(or to a file if >file or >>file is present, or to a pipe if
| command is present), separated by the current output field
separator, and terminated by the output record separator.
file and command can be literal names or parenthesized
expressions. Identical string values in different state-
ments denote the same open file. You can redirect the out-
put into a file using the print ... > file or printf ( ... )
> file statements. The printf statement formats its expres-
sion list according to the format of the printf() subroutine
(see the OSF/1 Programmer's Reference).
Variables
Variables can be scalars, array elements (denoted x[i]), or
fields.
Variable names can consist of uppercase and lowercase alpha-
betic letters, the underscore character, the digits (0 to
9), and extended characters. Variable names cannot begin
with a digit.
Variables are initialized to the null string. Array sub-
scripts can be any string; they do not have to be numeric.
This allows for a form of associative memory. Enclose
string constants in expressions in "" (double quotes). Mul-
tiple subscripts such as [i,j,k] are permitted; the consti-
tuents are concatenated and separated by the value of SUBSEP
(see the description in the following list).
There are several variables with special meaning to awk.
They include:
ARGC
Argument count, assignable.
ARGV
Argument array, assignable; nonnull members are inter-
preted as filenames.
FS Input field separator (default is a space). If it is a
space, then any number of spaces and tabs can separate
fields.
NF The number of fields in the current input line (record),
with a limit of 99.
NR The number of the current input line (record).
FNR The number of the current input line (record) in the
current file.
FILENAME
The name of the current input file.
RS Input record separator (default is a newline character).
OFS The output field separator (default is a space).
ORS The output record separator (default is a newline char-
acter).
OFMT
The output format for numbers (default % .6g).
SUBSEP
Separates multiple subscripts (default is 031).
Functions
Functions are defined at the position of a pattern-action
statement, as follows:
function foo(a, b, c) { ... ; return x }
Arguments are passed by value if scalar and by reference if
array name; functions can be called recursively. Arguments
are local to the function; all other variables are global.
There are several built-in functions that can be used in awk
actions. (For information about regular expressions as
referred to in this section, see grep.)
length(argument)
Returns the length, in characters, of argument, or of
the entire line if there is no argument.
blength(argument)
Returns the length, in bytes, of argument, or of the
entire line if there is no argument.
close(argument)
Closes the file or pipe expression. Note that you must
enclose a filename in double quotes when redirecting
output with the awk command; otherwise, it is treated as
an awk variable. For example:
print "Hello" > "/tmp/junk"
close ("/tmp/junk")
exp(number)
Takes the exponential of its argument.
rand
Returns a random number on (0, 1).
srand(number)
Sets seed for rand. The default is the time of day.
log(number)
Takes the base e logarithm of its argument.
sqrt(number)
Takes the square root of its argument.
int(number)
Takes the integer part of its argument.
substr(string,position,number)
Returns the substring number characters long of string,
beginning at position.
index(string,string2)
Returns the position in string where string2 occurs, or
0 (zero) if it does not occur.
match(string,regular_expression)
Returns the position in string where regular_expression
occurs, or 0 (zero) if it does not occur. The RSTART
and RLENGTH built-in variables are set to the position
and length, in bytes, of the matched string.
split(string,a,[regular_expression])
Splits string into array elements a[1], a[2], . . .,
a[number], and returns number. The separation is done
with the specified regular expression or with the FS
field separator if regular_expression is not given.
sub(regular_expression,string2,[string])
Substitutes string2 for the first occurrence of the reg-
ular expression regular_expression in string. If string
is not given, the entire line is used.
gsub(regular_expression,string2,[string])
Same as sub except that all occurrences of the regular
expression are replaced; both sub and gsub return the
number of replacements.
sprintf(fmt,expression1,expression2, ...)
Formats the expressions according to the printf format
string fmt and returns the resulting string.
system(command)
Executes command and returns its exit status.
The getline function sets $0 to the next input record from
the current input file; getline < file sets $0 to the next
record from file. getline x sets variable x instead.
Finally, command | getline pipes the output of command into
getline. Each call of getline returns the next line of out-
put from command. In all cases, getline returns 1 for a suc-
cessful input, 0 (zero) for End-of-File, and -1 for an
error.
Patterns
Patterns are arbitrary Boolean combinations of patterns and
relational expressions (the !, |, and & operators and
parentheses for grouping). You must start and end regular
expressions with slashes. You can use regular expressions
as described for grep, including the following special char-
acters:
+ One or more occurrences of the pattern.
? Zero or one occurrence of the pattern.
| Either of two statements.
( ) Grouping of expressions.
Isolated regular expressions in a pattern apply to the
entire line. Regular expressions can occur in relational
expressions. Any string (constant or variable) can be used
as a regular expression, except in the position of an iso-
lated regular expression in a pattern.
If two patterns are separated by a comma, the action is per-
formed on all lines between an occurrence of the first pat-
tern and the next occurrence of the second.
Regular expressions can contain extended (multibyte) charac-
ters with one exception: range constructs in character class
specifications using brackets cannot contain multibyte
extended characters. Individual instances of extended (mul-
tibyte) characters can appear within brackets; however,
extended characters are treated as separate 1-byte charac-
ters.
As in egrep, inclusion in ranges is determined by the col-
lating sequence as defined by the current locale. The wild-
card characters , +, and ? match characters and character
strings, not bytes.
There are two types of relational expressions that you can
use. The first type has the form:
expression match_operator pattern
where match_operator is either: ~ (for contains) or !~ (for
does not contain).
The second type has the form:
expression relational_operator expression
where relational_operator is any of the six C relational
operators: <, >, <=, >=, ==, and !=. A conditional can be
an arithmetic expression, a relational expression, or a
Boolean combination of these.
You can use the BEGIN and END special patterns to capture
control before the first and after the last input line is
read, respectively. BEGIN must be the first pattern; END
must be the last. BEGIN and END do not combine with other
patterns.
You have two ways to designate a character other than white
space to separate fields. You can use the -Fcharacter flag
on the command line, or you can start program with the fol-
lowing sequence:
BEGIN { FS = c }
Either action changes the field separator to c.
There are no explicit conversions between numbers and
strings. To force an expression to be treated as a number,
add 0 (zero) to it. To force it to be treated as a string,
append a null string (``'').
EXAMPLES
1. To display the lines of a file that are longer than 72
bytes, enter:
awk 'length >72' chapter1
This selects each line of the file chapter1 that is
longer than 72 bytes. awk then writes these lines to
standard output because no action is specified.
2. To display all lines between the words start and stop,
enter:
awk '/start/,/stop/' chapter1
3. To run an awk program (sum2.awk) that processes a file
(chapter1), enter:
awk -f sum2.awk chapter1
4. The following awk program computes the sum and average
of the numbers in the second column of the input file:
{
sum += $2
}
END {
print "Sum: ", sum;
print "Average:", sum/NR;
}
The first action adds the value of the second field of
each line to the sum variable. (awk initializes sum,
and all variables, to 0 (zero) before starting.) The
keyword END before the second action causes awk to per-
form that action after all of the input file is read.
The NR variable, which is used to calculate the aver-
age, is a special variable containing the number of
records (lines) that were read.
5. To print the names of the users who have the C shell as
the initial shell, enter:
awk -F: '$7 ~ /csh/ {print $1}' /etc/passwd
6. To print the first two fields in reversed order, enter:
awk '{ print $2, $1 }'
7. The following awk program prints the first two fields
of the input file in reversed order, with input fields
separated by a comma and a space, then adds up the
first column and prints the sum and average:
BEGIN { FS = ",[ ]*|[ ]+" }
{ print $2, $1}
{ s += $1 }
END { print "sum is", s, "average is", s/NR }
RELATED INFORMATION
Commands: grep(1)/egrep(1)/fgrep(1), sed(1).
Functions: printf(3).
"Using Internationalization Features" in the OSF/1 User's
Guide.
The discussion of awk in the OSF/1 Applications Programmer's
Guide.
Acknowledgement and Disclaimer