Linux AWK command Explained with Examples

Tech World 2022-08-04 Channel: Linux

Abstract: then this variable will print the value between every parameter. ORS – stores the record separator that is used for each record in the output. This va

Linux . Linux-shutdown-Command-Explained-with-Examples

Linux . Linux-make-Command-Explained-With-Examples

Linux supports various types of scripts and scripting languages to perform bulk updates and generate reports. Awk is one such scripting language that you can use to manipulate data and generate reports.

In this tutorial, we will learn about the awk command and scripting language with different use cases and examples.

The awk Command

Awk is a command-line utility as well as a scripting language that is used to manipulate data and format output reports. This command supports several variables, functions, and logical operators to get the desired output.

Awk does not require compiling and supports advanced text processing. You can write small statement programs using awk to search files for matching patterns. It is primarily used to scan patterns and process the data. Awk can search multiple files in a single statement.

You can perform the following string operations using awk:

1. Scan a file sequentially

2. Split each file line into fields

3. Compare input fields and lines with the pattern

4. Performs specified actions on the matched lines in the file

The awk command is useful for the following actions:

1. Manipulate and transform data files

2. Generated formatting reports from manipulated data

The following programming constructs are supported by awk:

1. Conditions and loops

2. String and mathematical operations

3. Variables, functions, and logical operators

You can select multiple pieces of individual texts or multiple files with awk.

The name of the command awk program originated from the initials of its original developers in 1977 named Alfred Aho, Peter Weinberger, and Brian Kernighan. They all worked at AT&T Bell Laboratories. The awk command evolved significantly with the contributions of many Linux/UNIX developers.

Basic Syntax of awk Command

The basic syntax of the awk command is as follows:

awk [options] 'selection_criteria {action}' input-file > output-file

Sample Input File

We are creating a sample file here to perform different operations using the awk command.

touch emp_records.txt

Add the following content to the file employees.txt:

Firstname  Lastname   Age    	City   	    EmpID
Bob    	   Thomas     32     	New York    80649
Steve  	   Brown      29     	Los Angeles 80521
David  	   Miller     36     	New York    80489
Travis 	   Wilson     47     	Chicago	    65179
John   	   Taylor     27     	Boston 	    81440
Andrew 	   White      41     	Austin 	    75486

Here we have one column with fields Firstname, Lastname, Age, City, and EmpID. To check the output of your sample file, type cat employees.txt.

Records and fields

The awk command is capable of processing textual data streams and text files. Like any table, the input lines are categorized into records and fields. Awk processes one text record at a time until the end of the file. Each record is separated from others by the output record separator character, which is the newline character by default. It means each line entry is considered a record. You can use the RS variable to set a new output record separator.

Records are made of fields, which are the same as table columns. Fields are separated by an output field separator, which includes whitespaces, tabs, and newline characters. Fields in every record are denoted by a dollar sign and a number, starting with 1. The first field is $1, the second field is $2, and so on. The last field is also represented by $NF. $0 represents the entire record.

This is how the records and fields will look in a tabular format:

Records and fields in awk command processing AWK Examples

Awk supports various options and expressions to help you generate different output lines. In this section, we will show you how to use awk functions and variables to get desired output.

The awk Command By Default

The default behavior of the awk command is to print each line of data, until the end of the file, from the input file.

awk ‘{print}’ emp_records.txt

Default output without variables in awk Print Specific Columns Using awk

You can specify specific column names to display or include in the awk output using the field numbers. For example, to print all records in the first column, type:

awk ‘{print $1}’ emp_records.txt

print specified columns or fields using awk Display Lines That Match the Specified Pattern

Instead of printing an entire column by specifying the column number, you can also specify a pattern and display records that match the pattern. The following command shows you how to display any record that starts with the character S:

awk ‘/^S/’ emp_records.txt

The ^ sign in the command indicates when awk begin process of the line and S denotes the character that you want to provide for the pattern.

Display Specific Lines of a Column

You can also print specified lines of a column by providing the column number. For example, to print the header of the third field, type the following command:

awk ‘print $3’ emp_records.txt | head -1

Print specific lines from a column

The above command is printing the third column ($3) and then we are using the pipe operator with value -1 to print the first entry of the column.

How to Split Line into Fields

The awk command splits the record enclosed by the record separator for every record and stores the result into the $n variables. If a record contains 4 words, then they are stored in $1, $2, $3, and $4 variables respectively. The entire line is represented by $0.

The following example shows you how to print only second and fifth fields from a file:

awk ‘{print $2,$5}’ emp_records.txt

Awk Built-in Variables

Field variables, such as $0 for the entire record and $1, $2, and so on for the fields, are the built-in variables of the awk command. Here is a list of the additional built-in variables:

NR – stores the current count of the total input records. The action or pattern statements are executed once for every record in the input file.
NF – stores a count of the total input fields that are part of the current input record.
FS – stores the field separator that is used for dividing input line fields. The default separator is white space or tab characters. You can assign this built-in variable to another character for updating the field separator.
RS – stores the current record separator character that is used for each record in the input file. The default record separator is a new line character.
OFS – stores the field separator that is used for dividing output lines. This variable separates the output fields when the awk command prints them. The default output field separator is a blank space. If the output has multiple command line arguments separated by commas, then this variable will print the value between every parameter.
ORS – stores the record separator that is used for each record in the output. This variable separates the output records when the awk command prints them. The default output field separator is a new line character. The ORS variable value is automatically added by the print function at the end of every record that is displayed by the awk command.

The following example shows how to use RS built-in variable to add an extra row separator to the output using the awk command:

awk ‘{print RS, $0}’ emp_records.txt

Using built-in variables to display awk output More Examples

Here we have some additional examples that will show you how to perform complex awk operations.

Regular Expressions in awk

A regular expression or regex in computing means a sequence of characters you can use to specify a search pattern for the awk command. Regex pattern for awk is surrounded by two slashes (/ /). The following example shows the syntax of regular expression in awk:

awk ‘/regex/ {action}’ input_file

To display the first field of every record that starts with S, you run the following command:

awk ‘/S/ {print $1}’ emp_records.txt

awk output sorted by regular expression Relational Expressions in awk

To match and display the content of a particular field or variable using awk, you can use relational expressions. The default behavior of the awk command is to check regex against all the records. However, relations expressions can be used to check against a field using the contain (~) comparison logical operator.

In this example, we are printing the last name (second field $2) of all the employees whose age is 47 (third field $3):

awk ‘$3 ~ /47/ {print $2}’ emp_records.txt

The relational expression to modify awk command output Range Pattern in awk

You can also specify two comma-separated patterns with the awk command. The command displays all the records that match the very first pattern and end at the second pattern. To print the employee IDs of all the records that start city 「Chicago」 and end at 「Austin」:

awk ‘/Chicago/, /Austin/ {print $5}’ emp_records.txt

Combine Patterns in awk

Use the logical patterns && (AND operator) and || (OR operator) to combine multiple patterns.

The following example shows how to use the logical operators to print employee IDs ($5) of all the employees whose age ($3) is between 35 and 50:

awk ‘$3 > 35 || $3 < 50 {print $5}’ emp_records.txt

Conditional Search in awk

You can use conditional search to retrieve data that matches the specified if-else statements. The following example shows that if employee ID ($5) is greater than 80000 then print the first name ($1), else print 0.

awk ‘{if ($5 > 80000) {print $1} else {print ‘0’}} emp_records.txt

Processing Output from Other Commands

The awk command can also be used to process output from other commands. For example, you can run the awk command on the ip addr output to retrieve specific information. The following example prints values of the $2 field if it is superseded by inet:

ip addr | awk ‘/inet / {print $2}’

BEGIN and awk END Blocks in AWK

The awk command supports two special patterns- BEGIN and END patterns. Awk BEGIN instructs awk to take certain actions before all other records are processed and awk END requires awk to take certain actions after all other records are processed.

You can set variables using the awk BEGIN pattern and process record data using the END pattern. The awk BEGIN pattern runs before the text processing and the END pattern runs after the text processing.

The following example shows you how to add a custom message before and after processing the first field of all the records:

awk ‘BEGIN {print 「First Record.」}; {print $1}; END {print 「Last Record.」}’ emp_records.txt

BEGIN and END special patterns Conclusion

In this tutorial, we will learn how to use the awk command to manipulate and process data from a text file. We have included several examples that will help you understand the command better. Check the awk command man page for more information about command syntax, output, and support functions.