Text Processing Tools

Overview

Linux provides a set of powerful text processing tools, including sed, awk, etc. These tools can efficiently process and transform text data.

sed - Stream Editor

sed (Stream Editor) is a powerful text processing tool that can filter and transform text.

Basic Syntax

sed [options] 'command' file
sed [options] -e 'command1' -e 'command2' file
sed [options] -f script_file file

Common Options

Option	Description
`-n`	Silent mode, don't automatically print
`-e`	Add command
`-f`	Read commands from file
`-i`	Edit file in-place
`-i.bak`	Backup before editing
`-r` / `-E`	Use extended regular expressions

Substitute Command s

# Basic substitution (first match per line)
$ sed 's/old/new/' file.txt

# Global substitution
$ sed 's/old/new/g' file.txt

# Case-insensitive
$ sed 's/old/new/gi' file.txt

# Substitute only nth match
$ sed 's/old/new/2' file.txt

# Show substituted lines
$ sed -n 's/old/new/p' file.txt

# Edit file in-place
$ sed -i 's/old/new/g' file.txt

# Backup and edit
$ sed -i.bak 's/old/new/g' file.txt

Addresses and Ranges

# Specify line numbers
$ sed '3s/old/new/' file.txt       # Line 3
$ sed '1,5s/old/new/' file.txt     # Lines 1-5
$ sed '3,$s/old/new/' file.txt     # Line 3 to end

# Lines matching pattern
$ sed '/pattern/s/old/new/' file.txt

# Range pattern
$ sed '/start/,/end/s/old/new/' file.txt

Delete Command d

# Delete specified line
$ sed '3d' file.txt                # Delete line 3
$ sed '1,5d' file.txt              # Delete lines 1-5
$ sed '$d' file.txt                # Delete last line

# Delete matching lines
$ sed '/pattern/d' file.txt

# Delete empty lines
$ sed '/^$/d' file.txt

# Delete comment lines
$ sed '/^#/d' file.txt

Print Command p

# Print specified lines
$ sed -n '3p' file.txt             # Print line 3
$ sed -n '1,5p' file.txt           # Print lines 1-5
$ sed -n '$p' file.txt             # Print last line

# Print matching lines
$ sed -n '/pattern/p' file.txt

# Print line numbers
$ sed -n '=' file.txt

Insert and Append

# Insert before specified line
$ sed '3i\New line content' file.txt

# Append after specified line
$ sed '3a\New line content' file.txt

# Insert/append before/after matching lines
$ sed '/pattern/i\Insert content' file.txt
$ sed '/pattern/a\Append content' file.txt

Replace Entire Line c

# Replace specified line
$ sed '3c\New content' file.txt

# Replace matching lines
$ sed '/pattern/c\New content' file.txt

Multiple Commands

# Separate with semicolons
$ sed 's/a/A/g; s/b/B/g' file.txt

# Use -e option
$ sed -e 's/a/A/g' -e 's/b/B/g' file.txt

# Use braces for grouping
$ sed '/pattern/{s/old/new/; s/foo/bar/}' file.txt

Advanced Techniques

# Use different delimiters
$ sed 's|/usr/local|/opt|g' file.txt
$ sed 's#http://#https://#g' file.txt

# Reference matched content
$ sed 's/\(.*\)/【\1】/' file.txt      # Add brackets
$ sed 's/[0-9]*/(&)/' file.txt         # & represents matched content

# Case conversion
$ sed 's/[a-z]/\u&/g' file.txt         # Capitalize first letter
$ sed 's/.*/\U&/' file.txt             # All uppercase
$ sed 's/.*/\L&/' file.txt             # All lowercase

Practical Examples

# Delete HTML tags
$ sed 's/<[^>]*>//g' file.html

# Delete leading whitespace
$ sed 's/^[ \t]*//' file.txt

# Delete trailing whitespace
$ sed 's/[ \t]*$//' file.txt

# Add line numbers
$ sed = file.txt | sed 'N;s/\n/\t/'

# Add blank line after each line
$ sed 'G' file.txt

# Merge consecutive blank lines
$ sed '/^$/N;/^\n$/d' file.txt

awk - Pattern Processing Language

awk is a powerful text processing language, especially suitable for structured data.

Basic Syntax

awk 'pattern { action }' file
awk -F separator 'pattern { action }' file

Built-in Variables

Variable	Description
`$0`	Entire line content
`$1, $2, ...`	Nth field
`NF`	Number of fields
`NR`	Current line number
`FNR`	Current file's line number
`FS`	Field separator
`OFS`	Output field separator
`RS`	Record separator
`ORS`	Output record separator
`FILENAME`	Current filename

Basic Operations

# Print all lines
$ awk '{print}' file.txt
$ awk '{print $0}' file.txt

# Print specified fields
$ awk '{print $1}' file.txt
$ awk '{print $1, $3}' file.txt

# Specify separator
$ awk -F ':' '{print $1}' /etc/passwd
$ awk -F ',' '{print $1, $2}' file.csv

# Print line numbers
$ awk '{print NR, $0}' file.txt

Pattern Matching

# Match regular expressions
$ awk '/pattern/' file.txt
$ awk '/pattern/ {print $1}' file.txt

# Conditional match
$ awk '$1 > 100' file.txt
$ awk '$1 == "value"' file.txt
$ awk 'NR > 5' file.txt

# Range match
$ awk '/start/,/end/' file.txt

# Field match
$ awk '$1 ~ /pattern/' file.txt
$ awk '$1 !~ /pattern/' file.txt

BEGIN and END

# Execute before processing
$ awk 'BEGIN {print "Start processing"} {print}' file.txt

# Execute after processing
$ awk '{print} END {print "Processing complete"}' file.txt

# Set variables
$ awk 'BEGIN {FS=":"; OFS="\t"} {print $1, $3}' /etc/passwd

# Count lines
$ awk 'END {print NR}' file.txt

Arithmetic Operations

# Basic operations
$ awk '{print $1 + $2}' file.txt
$ awk '{sum = $1 + $2; print sum}' file.txt

# Sum
$ awk '{sum += $1} END {print sum}' file.txt

# Average
$ awk '{sum += $1} END {print sum/NR}' file.txt

# Maximum/minimum
$ awk 'BEGIN {max=0} $1>max {max=$1} END {print max}' file.txt

String Functions

# Length
$ awk '{print length($1)}' file.txt

# Substring
$ awk '{print substr($1, 1, 3)}' file.txt

# Split
$ awk '{split($1, arr, "-"); print arr[1]}' file.txt

# Substitute
$ awk '{gsub(/old/, "new"); print}' file.txt

# Case conversion
$ awk '{print toupper($1)}' file.txt
$ awk '{print tolower($1)}' file.txt

# Find
$ awk '{if (index($0, "pattern") > 0) print}' file.txt

Control Structures

# if-else
$ awk '{if ($1 > 100) print "Large"; else print "Small"}' file.txt

# for loop
$ awk '{for (i=1; i<=NF; i++) print $i}' file.txt

# while loop
$ awk '{i=1; while (i<=NF) {print $i; i++}}' file.txt

# Arrays
$ awk '{count[$1]++} END {for (k in count) print k, count[k]}' file.txt

Formatted Output

# printf
$ awk '{printf "%-10s %5d\n", $1, $2}' file.txt

# Format specifiers
# %s  String
# %d  Integer
# %f  Floating point
# %-  Left align
# %10  Width

Practical Examples

# Count word frequency
$ awk '{for(i=1;i<=NF;i++) count[$i]++} END {for(w in count) print count[w], w}' file.txt | sort -rn

# Calculate total file size
$ ls -l | awk '{sum += $5} END {print sum}'

# Process CSV
$ awk -F ',' '{print $1 "\t" $2}' file.csv

# Extract IPs from log
$ awk '{print $1}' access.log | sort | uniq -c | sort -rn

# Conditional statistics
$ awk '$3 > 1000 {count++} END {print count}' file.txt

# Merge lines
$ awk 'ORS=NR%3?"\t":"\n"' file.txt

diff and patch

diff - Compare Files

# Basic comparison
$ diff file1.txt file2.txt

# Unified format
$ diff -u file1.txt file2.txt

# Side-by-side
$ diff -y file1.txt file2.txt

# Ignore whitespace
$ diff -w file1.txt file2.txt

# Recursive directory comparison
$ diff -r dir1/ dir2/

# Generate patch
$ diff -u old.txt new.txt > changes.patch

patch - Apply Patch

# Apply patch
$ patch < changes.patch

# Specify file
$ patch file.txt < changes.patch

# Reverse (undo)
$ patch -R < changes.patch

# Dry run
$ patch --dry-run < changes.patch

comm - Compare Sorted Files

# Display three columns: only in file1, only in file2, in both
$ comm file1.txt file2.txt

# Show only in both
$ comm -12 file1.txt file2.txt

# Show only in file1
$ comm -23 file1.txt file2.txt

join - Join Files

# Join based on common field
$ join file1.txt file2.txt

# Specify join fields
$ join -1 2 -2 1 file1.txt file2.txt

# Specify separator
$ join -t ':' file1.txt file2.txt

Summary

This chapter introduced powerful Linux text processing tools:

sed: Stream editor, suitable for simple text replacement and transformation
awk: Pattern processing language, suitable for structured data processing
diff/patch: File comparison and patching
comm/join: File merging and comparison

Mastering sed and awk will greatly improve your text processing efficiency.

Previous chapter: Text Editors

Next chapter: Regular Expressions

#Text Processing Tools

#Overview

#sed - Stream Editor

#Basic Syntax

#Common Options

#Substitute Command s

#Addresses and Ranges

#Delete Command d

#Print Command p

#Insert and Append

#Replace Entire Line c

#Multiple Commands

#Advanced Techniques

#Practical Examples

#awk - Pattern Processing Language

#Basic Syntax

#Built-in Variables

#Basic Operations

#Pattern Matching

#BEGIN and END

#Arithmetic Operations

#String Functions

#Control Structures

#Formatted Output

#Practical Examples

#diff and patch

#diff - Compare Files

#patch - Apply Patch

#comm - Compare Sorted Files

#join - Join Files

#Summary

Text Processing Tools

Overview

sed - Stream Editor

Basic Syntax

Common Options

Substitute Command s

Addresses and Ranges

Delete Command d

Print Command p

Insert and Append

Replace Entire Line c

Multiple Commands

Advanced Techniques

Practical Examples

awk - Pattern Processing Language

Basic Syntax

Built-in Variables

Basic Operations

Pattern Matching

BEGIN and END

Arithmetic Operations

String Functions

Control Structures

Formatted Output

Practical Examples

diff and patch

diff - Compare Files

patch - Apply Patch

comm - Compare Sorted Files

join - Join Files

Summary