NAME
sort - Sorts or merges files
SYNOPSIS
sort [-m] [-o output_file] [-Abdfinru] [-k keydef] [-
tcharacter] [-T directory] [-ykilobytes]
[-zrecord_size] ... file ...
sort -c [-u] [-Abdfinr] [-k keydef] [-tcharacter] [-T direc-
tory] [-ykilobytes]
[-zrecord_size] ... file ...
sort [-Abcdfimnru] [-o output_file] [-tcharacter] [-T direc-
tory] [-ykilobytes]
[-zrecord_size] [+fskip][.cskip] [-fskip][.cskip] [-
bdfinr] ... file ...
The sort command sorts lines in its input files and writes
the result to standard output.
FLAGS
-A Sorts on a byte-by-byte basis using each character's
encoded value. On some systems, extended characters
will be considered negative values, and so sort before
ASCII characters.
-b Ignores leading spaces and tabs when determining the
starting and ending positions of a restricted sort key.
If the -b flag is specified before the first -k flag,
the -b flag is applied to all -k flags on the command
line; otherwise, the -b flag can be independently
attached to each -k field_start or field_end argument.
-c Checks that the input is sorted according to the order-
ing rules specified in the flags and the collating
sequence of the current locale. Displays nothing unless
the file is not sorted.
-d Specifies that only spaces and alphanumeric characters
(according to the current setting of LC_TYPE) are signi-
ficant in comparisons.
-f Treats all lowercase characters as their uppercase
equivalents (according to the current setting of
LC_TYPE) for the purposes of comparison.
-i Sorts only by printable characters (according to the
current setting of LC_TYPE).
-k keydef
Specifies one or more restricted sort key field
definitions. The format of a key field definition is as
follows:
field_start[type][,field_end[type]]
where the field_start and field_end arguments define a
key field that is restricted to a portion of the line,
and type is a modifier specified by b, d, f, i, n, or r.
The b modifier behaves like the -b flag, but applies
only to the field_start or field_end argument to which
it is attached. The other modifiers behave like their
corresponding flags, but apply only to the key field to
which they are attached; these modifiers have this
effect if specified with field_start, field_end or both.
Modifiers attached to a field_start or field_end argu-
ment override any specifications made by the flags.
The field_start portion of the keydef argument takes the
following form:
field_number[.first_character]
Fields and characters within fields are numbered start-
ing with 1. The field_number and first_character
pieces, interpreted as positive decimal integers,
specify the character to be used as part of a sort key.
If first_character is not specified, the default is the
first character of the field.
The field_end portion of the keydef argument takes the
following form:
field_number[.last_character]
The field_number is the same as that described for
field_start. The last_character argument, interpreted
as a nonnegative decimal integer, specifies the last
character to be used as part of the sort key. If
last_character evaluates to 0 (zero) or is not speci-
fied, the default is the last character of the field
specified by field_number.
If -k is not specified, the default sort key is the
entire line.
When there are multiple key fields, later keys are com-
pared only after all earlier keys compare as equal.
Except when the -u flag is specified, lines that other-
wise compare as equal are ordered as though none of the
flags -d, -f, -i, -n, or -k were present (but with -r
still in effect, if it was specified) and with all bytes
in the lines significant to the comparison.
-m Merges only (assumes sorted input).
-n Sorts any initial numeric strings (consisting of
optional spaces, optional dashes, and zero or more
digits with optional radix character and thousands
separator, as defined by the current locale) by arith-
metic value. An empty digit string is treated as zero;
leading zeros and signs on zeros do not affect ordering.
-o output_file
Directs output to output_file instead of standard out-
put. output_file can be the same as one of the input
files.
-r Reverses the order of the specified sort.
-tcharacter
Sets the field separator character to character. Each
occurrence of character is significant (for example, two
consecutive occurrences of character delimit an empty
field). To specify the tab character as the field
separator, you must enclose it in ' ' (single quotes).
The default field separator is one or more spaces.
-T directory
Places all the temporary files that are created in
directory.
-u Suppresses all but one in each set of equal lines.
Ignored characters such as leading tabs and spaces, and
characters outside of sort keys are not considered in
this type of comparison.
If used with the -c flag, -u checks that there are no
lines with duplicate keys, in addition to checking that
the input file is sorted.
-ykilobytes
Starts the sort command using kilobytes of main storage
and adds storage as needed. (If kilobytes is less than
the minimum storage size or greater than the maximum,
the mimimum or maximum is used instead.) If the -y flag
is omitted, the sort command starts with the default
storage size; -y0 starts with minimum storage, and -y
(with no value) starts with the maximum storage. The
amount of storage used by the sort command has a signi-
ficant impact on performance. Sorting a small file in a
large amount of storage is wasteful.
-zrecord_size
Prevents abnormal termination if lines being sorted are
longer than the default buffer size can handle. When
the -c or -m flags are specified, the sorting phase is
omitted and a system default size buffer is used. If
sorted lines are longer than this size, sort terminates
abnormally. The -z option specifies that the longest
line be recorded in the sort phase so that adequate
buffers can be allocated in the merge phase.
record_size must be a value in bytes equal to or greater
than the number of bytes in the longest line to be
merged.
+fskip.cskip
Specifies the start position of a key field. (Obsoles-
cent)
The fskip variable specifies the number of fields to
skip from the beginning of the input line, and the cskip
variable specifies the number of additional characters
to skip to the right beyond that point. For both the
starting point (+fskip.cskip) and the ending point (-
fskip.cskip) of a sort key, fskip is measured from the
beginning of the input line, and cskip is measured from
the last field skipped. If you omit .cskip, .0 (zero)
is assumed. If you omit fskip, 0 (zero) is assumed. If
you omit the ending field specifier (-fskip.cskip), the
end of the line is the end of the sort key.
You can supply more than one sort key by repeating
+fskip.cskip and -fskip.cskip. In cases where you
specify more than one sort key, keys specified further
to the right on the command line are compared only after
all earlier keys are sorted. For example, if the first
key is to be sorted in numerical order and the second
according to the collating sequence, all strings that
start with the number 1 are sorted according to the col-
lating order before the strings that start with the
number 2. Lines that are identical in all keys are
sorted with all characters significant. You can also
specify different flags for different sort keys in mul-
tiple sort keys.
-fskip.cskip
Specifies the end position of a key field. (Obsoles-
cent)
DESCRIPTION
The sort command treats all of its input files as one file
when it performs the sort. A - (dash) in place of a
filename specifies standard input. If you do not specify a
filename, it sorts standard input.
The sort command can handle a variety of collation rules
typically used in Western European languages, including
primary/secondary sorting, one-to-two character mapping, N-
to-one character mapping, and ignore-character mapping. To
summarize briefly:
Primary/Secondary Sorting
In this system, a group of characters all sort to the same
primary location. If there is a tie, a secondary sort is
applied. For example, in French, the plain and accented a's
all sort to the same primary location. If two strings col-
late to the same primary location, the secondary sort goes
into effect. These words are in correct French order:
abord
pre
aprs
pret
azur
One-to-Two Character Mappings
This system requires that certain single characters be
treated as if they were two characters. For example, in
German, the (scharfes-S) is collated as if it were ss.
N-to-One Character Mappings
Some languages treat a string of characters as if it were
one single collating element. For example, in Spanish, the
ch and ll sequences are treated as their own elements within
the alphabet. (ch comes between c and d in the alphabet,
and ll comes between l and m.)
Ignore-Character Mappings
In some cases, certain characters may be ignored in colla-
tion. For example, if - were defined as an ignore-
character, the strings re-locate and relocate would sort to
the same place.
The results that you get from sort depend on the collating
sequence as defined by the current setting of the LC_COLLATE
environment variable. The configuration files for collation
and character classification information are
/usr/lib/nls/loc/locale.
A field is one or more characters bounded by the beginning
of a line and the current field separator, or one or more
characters bounded by a field separator on either side. The
space character is the default field separator.
Lines longer than 1024 bytes are truncated by sort. The
maximum number of fields on a line is 10.
EXAMPLES
1. To perform a simple sort, enter:
sort fruits
This displays the contents of fruits sorted in ascend-
ing lexicographic order. This means that the charac-
ters in each column are compared one by one, including
spaces, digits, and special characters.
For instance, if fruits contains the text:
banana
orange
Persimmon
apple
%%banana
apple
ORANGE
then sort fruits displays:
%%banana
ORANGE
Persimmon
apple
apple
banana
orange
This order follows from the fact that in the ASCII col-
lating sequence, symbols (such as %) precede uppercase
letters, and all uppercase letters precede the lower-
case letters. If you are using a different collating
order, your results may be different.
2. To group lines that contain uppercase and special char-
acters with similar lowercase lines, and remove dupli-
cate lines, enter:
sort -d -f -u fruits
The -u flag tells sort to remove duplicate lines, mak-
ing each line of the file unique. This displays:
apple
%%banana
orange
Persimmon
Note that not only was the duplicate apple removed, but
banana and ORANGE were removed as well. The -d flag
told sort to ignore symbols, so %%banana and banana
were considered to be duplicate lines and banana was
removed. The -f flag told sort not to differentiate
between uppercase and lowercase, so ORANGE and orange
were considered to be duplicate lines and ORANGE was
removed.
When the -u flag is used with input that contains
nonidentical lines that are considered by sort (due to
other flags) to be duplicates, there is no way to
predict which lines sort will keep and which it will
remove.
3. To sort as in Example 2, but remove duplicates unless
capitalized or punctuated differently, enter:
sort -u -k 1df -k 1 fruits
Flags appearing between sort key specifiers apply only
to the specifier preceding them. There are two sorts
specified in this command line. -k 1df specifies the
first sort, of the same type done with -d -f in Example
3. Then -k 1 performs another comparison to distin-
guish lines that are not actually identical. This
prevents -u, which applies to both sorts because it
precedes the first sort key specifier, from removing
lines that are not exactly identical to other lines.
Given the fruits file shown in Example 1, the added -k
1 distinguishes %%banana from banana and ORANGE from
orange. However, the two instances of apple are
exactly identical, so one of them is deleted.
apple
%%banana
banana
ORANGE
orange
Persimmon
4. To specify a new field separator, enter:
sort -t: -k 2 vegetables
This sorts vegetables, comparing the text that follows
the first colon on each line. -t: tells sort that
colons separate fields. -k 2 tells sort to ignore the
first field and to compare from the start of the second
field to the end of the line. If vegetables contains:
yams:104
turnips:8
potatoes:15
carrots:104
green beans:32
radishes:5
lettuce:15
then sort -t: -k 2 vegetables displays:
carrots:104
yams:104
lettuce:15
potatoes:15
green beans:32
radishes:5
turnips:8
Note that the numbers are not in ascending order. This
is because a lexicographic sort compares each character
from left to right. In other words, 3 comes before 5
so 32 comes before 5.
5. To sort on more than one field, enter:
sort -t: -k 2n -k 1r vegetables
This performs a numeric sort on the second field (-k
2n) and then, within that ordering, sorts the first
field in reverse collating order (-k 1r). The output
looks like this:
radishes:5
turnips:8
potatoes:15
lettuce:15
green beans:32
yams:104
carrots:104
The lines are sorted in numeric order; when two lines
have the same number, they appear in reverse collating
order.
6. To replace the original file with the sorted text,
enter:
sort -o vegetables vegetables
The -o vegetables flag stores the sorted output into
the file vegetables.
7. To collate using Spanish rules, set the LC_COLLATE (or
LANG) environment variable to a Spanish locale, and
then use sort in the regular way, enter:
sort sp.words
If an input file named sp.words contains the following
Spanish words:
dama
loro
chapa
canto
mover
chocolate
curioso
llanura
The sorted file looks like this:
canto
curioso
chapa
chocolate
dama
loro
llanura
mover
If you sort the file using ASCII collation rules, the
output looks like this:
canto
chapa
chocolate
curioso
dama
llanura
loro
mover
FILES
/usr/lib/nls/loc/locale
Configuration files.
RELATED INFORMATION
Commands: comm(1), join(1), uniq(1).
Files: locale(4).
"Using Internationalization Features" in the OSF/1 User's
Guide.
Acknowledgement and Disclaimer