Text Processing Tools
Overview
Linux provides a set of powerful text processing tools, including sed, awk, etc. These tools can efficiently process and transform text data.
sed - Stream Editor
sed (Stream Editor) is a powerful text processing tool that can filter and transform text.
Basic Syntax
bash
sed [options] 'command' file
sed [options] -e 'command1' -e 'command2' file
sed [options] -f script_file fileCommon Options
| Option | Description |
|---|---|
-n | Silent mode, don't automatically print |
-e | Add command |
-f | Read commands from file |
-i | Edit file in-place |
-i.bak | Backup before editing |
-r / -E | Use extended regular expressions |
Substitute Command s
bash
# Basic substitution (first match per line)
$ sed 's/old/new/' file.txt
# Global substitution
$ sed 's/old/new/g' file.txt
# Case-insensitive
$ sed 's/old/new/gi' file.txt
# Substitute only nth match
$ sed 's/old/new/2' file.txt
# Show substituted lines
$ sed -n 's/old/new/p' file.txt
# Edit file in-place
$ sed -i 's/old/new/g' file.txt
# Backup and edit
$ sed -i.bak 's/old/new/g' file.txtAddresses and Ranges
bash
# Specify line numbers
$ sed '3s/old/new/' file.txt # Line 3
$ sed '1,5s/old/new/' file.txt # Lines 1-5
$ sed '3,$s/old/new/' file.txt # Line 3 to end
# Lines matching pattern
$ sed '/pattern/s/old/new/' file.txt
# Range pattern
$ sed '/start/,/end/s/old/new/' file.txtDelete Command d
bash
# Delete specified line
$ sed '3d' file.txt # Delete line 3
$ sed '1,5d' file.txt # Delete lines 1-5
$ sed '$d' file.txt # Delete last line
# Delete matching lines
$ sed '/pattern/d' file.txt
# Delete empty lines
$ sed '/^$/d' file.txt
# Delete comment lines
$ sed '/^#/d' file.txtPrint Command p
bash
# Print specified lines
$ sed -n '3p' file.txt # Print line 3
$ sed -n '1,5p' file.txt # Print lines 1-5
$ sed -n '$p' file.txt # Print last line
# Print matching lines
$ sed -n '/pattern/p' file.txt
# Print line numbers
$ sed -n '=' file.txtInsert and Append
bash
# Insert before specified line
$ sed '3i\New line content' file.txt
# Append after specified line
$ sed '3a\New line content' file.txt
# Insert/append before/after matching lines
$ sed '/pattern/i\Insert content' file.txt
$ sed '/pattern/a\Append content' file.txtReplace Entire Line c
bash
# Replace specified line
$ sed '3c\New content' file.txt
# Replace matching lines
$ sed '/pattern/c\New content' file.txtMultiple Commands
bash
# Separate with semicolons
$ sed 's/a/A/g; s/b/B/g' file.txt
# Use -e option
$ sed -e 's/a/A/g' -e 's/b/B/g' file.txt
# Use braces for grouping
$ sed '/pattern/{s/old/new/; s/foo/bar/}' file.txtAdvanced Techniques
bash
# Use different delimiters
$ sed 's|/usr/local|/opt|g' file.txt
$ sed 's#http://#https://#g' file.txt
# Reference matched content
$ sed 's/\(.*\)/【\1】/' file.txt # Add brackets
$ sed 's/[0-9]*/(&)/' file.txt # & represents matched content
# Case conversion
$ sed 's/[a-z]/\u&/g' file.txt # Capitalize first letter
$ sed 's/.*/\U&/' file.txt # All uppercase
$ sed 's/.*/\L&/' file.txt # All lowercasePractical Examples
bash
# Delete HTML tags
$ sed 's/<[^>]*>//g' file.html
# Delete leading whitespace
$ sed 's/^[ \t]*//' file.txt
# Delete trailing whitespace
$ sed 's/[ \t]*$//' file.txt
# Add line numbers
$ sed = file.txt | sed 'N;s/\n/\t/'
# Add blank line after each line
$ sed 'G' file.txt
# Merge consecutive blank lines
$ sed '/^$/N;/^\n$/d' file.txtawk - Pattern Processing Language
awk is a powerful text processing language, especially suitable for structured data.
Basic Syntax
bash
awk 'pattern { action }' file
awk -F separator 'pattern { action }' fileBuilt-in Variables
| Variable | Description |
|---|---|
$0 | Entire line content |
$1, $2, ... | Nth field |
NF | Number of fields |
NR | Current line number |
FNR | Current file's line number |
FS | Field separator |
OFS | Output field separator |
RS | Record separator |
ORS | Output record separator |
FILENAME | Current filename |
Basic Operations
bash
# Print all lines
$ awk '{print}' file.txt
$ awk '{print $0}' file.txt
# Print specified fields
$ awk '{print $1}' file.txt
$ awk '{print $1, $3}' file.txt
# Specify separator
$ awk -F ':' '{print $1}' /etc/passwd
$ awk -F ',' '{print $1, $2}' file.csv
# Print line numbers
$ awk '{print NR, $0}' file.txtPattern Matching
bash
# Match regular expressions
$ awk '/pattern/' file.txt
$ awk '/pattern/ {print $1}' file.txt
# Conditional match
$ awk '$1 > 100' file.txt
$ awk '$1 == "value"' file.txt
$ awk 'NR > 5' file.txt
# Range match
$ awk '/start/,/end/' file.txt
# Field match
$ awk '$1 ~ /pattern/' file.txt
$ awk '$1 !~ /pattern/' file.txtBEGIN and END
bash
# Execute before processing
$ awk 'BEGIN {print "Start processing"} {print}' file.txt
# Execute after processing
$ awk '{print} END {print "Processing complete"}' file.txt
# Set variables
$ awk 'BEGIN {FS=":"; OFS="\t"} {print $1, $3}' /etc/passwd
# Count lines
$ awk 'END {print NR}' file.txtArithmetic Operations
bash
# Basic operations
$ awk '{print $1 + $2}' file.txt
$ awk '{sum = $1 + $2; print sum}' file.txt
# Sum
$ awk '{sum += $1} END {print sum}' file.txt
# Average
$ awk '{sum += $1} END {print sum/NR}' file.txt
# Maximum/minimum
$ awk 'BEGIN {max=0} $1>max {max=$1} END {print max}' file.txtString Functions
bash
# Length
$ awk '{print length($1)}' file.txt
# Substring
$ awk '{print substr($1, 1, 3)}' file.txt
# Split
$ awk '{split($1, arr, "-"); print arr[1]}' file.txt
# Substitute
$ awk '{gsub(/old/, "new"); print}' file.txt
# Case conversion
$ awk '{print toupper($1)}' file.txt
$ awk '{print tolower($1)}' file.txt
# Find
$ awk '{if (index($0, "pattern") > 0) print}' file.txtControl Structures
bash
# if-else
$ awk '{if ($1 > 100) print "Large"; else print "Small"}' file.txt
# for loop
$ awk '{for (i=1; i<=NF; i++) print $i}' file.txt
# while loop
$ awk '{i=1; while (i<=NF) {print $i; i++}}' file.txt
# Arrays
$ awk '{count[$1]++} END {for (k in count) print k, count[k]}' file.txtFormatted Output
bash
# printf
$ awk '{printf "%-10s %5d\n", $1, $2}' file.txt
# Format specifiers
# %s String
# %d Integer
# %f Floating point
# %- Left align
# %10 WidthPractical Examples
bash
# Count word frequency
$ awk '{for(i=1;i<=NF;i++) count[$i]++} END {for(w in count) print count[w], w}' file.txt | sort -rn
# Calculate total file size
$ ls -l | awk '{sum += $5} END {print sum}'
# Process CSV
$ awk -F ',' '{print $1 "\t" $2}' file.csv
# Extract IPs from log
$ awk '{print $1}' access.log | sort | uniq -c | sort -rn
# Conditional statistics
$ awk '$3 > 1000 {count++} END {print count}' file.txt
# Merge lines
$ awk 'ORS=NR%3?"\t":"\n"' file.txtdiff and patch
diff - Compare Files
bash
# Basic comparison
$ diff file1.txt file2.txt
# Unified format
$ diff -u file1.txt file2.txt
# Side-by-side
$ diff -y file1.txt file2.txt
# Ignore whitespace
$ diff -w file1.txt file2.txt
# Recursive directory comparison
$ diff -r dir1/ dir2/
# Generate patch
$ diff -u old.txt new.txt > changes.patchpatch - Apply Patch
bash
# Apply patch
$ patch < changes.patch
# Specify file
$ patch file.txt < changes.patch
# Reverse (undo)
$ patch -R < changes.patch
# Dry run
$ patch --dry-run < changes.patchcomm - Compare Sorted Files
bash
# Display three columns: only in file1, only in file2, in both
$ comm file1.txt file2.txt
# Show only in both
$ comm -12 file1.txt file2.txt
# Show only in file1
$ comm -23 file1.txt file2.txtjoin - Join Files
bash
# Join based on common field
$ join file1.txt file2.txt
# Specify join fields
$ join -1 2 -2 1 file1.txt file2.txt
# Specify separator
$ join -t ':' file1.txt file2.txtSummary
This chapter introduced powerful Linux text processing tools:
- sed: Stream editor, suitable for simple text replacement and transformation
- awk: Pattern processing language, suitable for structured data processing
- diff/patch: File comparison and patching
- comm/join: File merging and comparison
Mastering sed and awk will greatly improve your text processing efficiency.
Previous chapter: Text Editors
Next chapter: Regular Expressions