Pipes and Filters

What is a Pipe?

A Pipe is one of Unix/Linux's most powerful features. It uses the | symbol to take the output of one command and use it as input to another command, thus combining multiple simple commands into complex data processing flows.

┌─────────┐         ┌─────────┐         ┌─────────┐
│ Command 1│  stdout │ Command 2│  stdout │ Command 3│
│         ├────────►│         ├────────►│         │
│         │    |   │         │    │   │
└─────────┘         └─────────┘         └─────────┘

Basic Syntax

command1 | command2 | command3 | ...

Simple Examples

# List files and display paged
$ ls -la | less

# Count files
$ ls | wc -l

# Find and count
$ grep "error" logfile.txt | wc -l

# Multi-level pipe
$ cat file.txt | grep "pattern" | sort | uniq

Filter Commands

Filter commands are programs that receive standard input, process data, and output to standard output.

# Basic search
$ cat file.txt | grep "pattern"

# Case-insensitive
$ cat file.txt | grep -i "pattern"

# Show line numbers
$ cat file.txt | grep -n "pattern"

# Reverse match
$ cat file.txt | grep -v "pattern"

# Count matching lines
$ cat file.txt | grep -c "pattern"

# Show only matching part
$ cat file.txt | grep -o "pattern"

# Extended regular expression
$ cat file.txt | grep -E "pattern1|pattern2"

# Show context
$ cat file.txt | grep -A 2 -B 2 "pattern"  # 2 lines before and after
$ cat file.txt | grep -C 3 "pattern"        # 3 lines before and after

sort - Sort

# Basic sort (alphabetical)
$ cat file.txt | sort

# Reverse sort
$ cat file.txt | sort -r

# Numeric sort
$ cat file.txt | sort -n

# Sort by specific column
$ cat file.txt | sort -k 2      # By column 2
$ cat file.txt | sort -k 2,2    # Only by column 2
$ cat file.txt | sort -k 2 -n   # Column 2 numeric sort

# Sort by separator
$ cat file.txt | sort -t ':' -k 3 -n

# Sort and deduplicate
$ cat file.txt | sort -u

# Human-readable size sort
$ du -h | sort -h

# Random sort
$ cat file.txt | sort -R

uniq - Deduplicate

# Remove consecutive duplicates (needs sorting first)
$ cat file.txt | sort | uniq

# Show only duplicate lines
$ cat file.txt | sort | uniq -d

# Show only non-duplicate lines
$ cat file.txt | sort | uniq -u

# Count each line's occurrences
$ cat file.txt | sort | uniq -c

# Sort by occurrence count
$ cat file.txt | sort | uniq -c | sort -rn

# Case-insensitive
$ cat file.txt | sort | uniq -i

cut - Cut Columns

# Cut by character position
$ echo "Hello World" | cut -c 1-5
Hello

# Cut by fields (Tab separated by default)
$ cat file.txt | cut -f 1,3

# Specify separator
$ cat /etc/passwd | cut -d ':' -f 1,3
# Extract username and UID

# Specify range
$ cat file.txt | cut -d ',' -f 2-4    # Columns 2 to 4
$ cat file.txt | cut -d ',' -f 3-     # Column 3 to end
$ cat file.txt | cut -d ',' -f -3     # Columns 1 to 3

# Cut by bytes
$ cat file.txt | cut -b 1-10

paste - Merge Columns

# Merge files side by side
$ paste file1.txt file2.txt

# Specify separator
$ paste -d ',' file1.txt file2.txt

# Merge one file's lines into one line
$ paste -s file.txt

# Merge every N lines
$ cat file.txt | paste - - -    # Merge every 3 lines

tr - Character Translation

# Translate characters
$ echo "hello" | tr 'a-z' 'A-Z'
HELLO

# Delete characters
$ echo "hello 123" | tr -d '0-9'
hello

# Compress consecutive characters
$ echo "hello     world" | tr -s ' '
hello world

# Delete newlines
$ cat file.txt | tr -d '\n'

# Replace characters
$ echo "hello:world" | tr ':' ' '
hello world

# Delete non-printing characters
$ cat file.txt | tr -cd '[:print:]\n'

# Character classes
# [:alpha:] Letters
# [:digit:] Numbers
# [:alnum:] Letters and numbers
# [:space:] Whitespace characters
# [:lower:] Lowercase letters
# [:upper:] Uppercase letters

head and tail

# First 10 lines
$ cat file.txt | head

# First N lines
$ cat file.txt | head -n 5
$ head -n 5 file.txt

# All except last N lines
$ cat file.txt | head -n -5

# Last 10 lines
$ cat file.txt | tail

# Last N lines
$ cat file.txt | tail -n 5
$ tail -n 5 file.txt

# Start from line N
$ cat file.txt | tail -n +5

# Combined use (lines 5-10)
$ cat file.txt | head -n 10 | tail -n 5

wc - Statistics

# Count lines, words, bytes
$ cat file.txt | wc
    100     500    3000

# Count only lines
$ cat file.txt | wc -l

# Count only words
$ cat file.txt | wc -w

# Count only characters
$ cat file.txt | wc -m

# Count only bytes
$ cat file.txt | wc -c

# Longest line length
$ cat file.txt | wc -L

tee - Split Output

# Output to both screen and file
$ ls -la | tee filelist.txt

# Append mode
$ ls -la | tee -a filelist.txt

# Output to multiple files
$ ls -la | tee file1.txt file2.txt file3.txt

# Save while in middle of pipe
$ cat file.txt | grep "error" | tee errors.txt | wc -l

xargs - Build Arguments

# Convert input to command arguments
$ echo "file1 file2 file3" | xargs rm

# Process one at a time
$ cat files.txt | xargs -n 1 rm

# Specify replacement position
$ find . -name "*.txt" | xargs -I {} cp {} /backup/

# Execute in parallel
$ cat urls.txt | xargs -n 1 -P 4 wget

# Handle filenames with spaces
$ find . -name "*.txt" -print0 | xargs -0 rm

# Interactive confirmation
$ find . -name "*.tmp" | xargs -p rm

# Show executed commands
$ echo "a b c" | xargs -t echo
echo a b c
a b c

Practical Pipe Combinations

File Analysis

# Count file types
$ find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn

# Find top 10 largest files
$ find . -type f -exec du -h {} + | sort -rh | head -10

# Count code lines
$ find . -name "*.py" | xargs wc -l | tail -1

# Find duplicate files (by size)
$ find . -type f -exec du -b {} + | sort -n | uniq -d -w 10

Log Analysis

# Count IP access frequency
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

# Find error logs
$ cat app.log | grep -i "error" | tail -20

# Filter by time
$ cat app.log | grep "2025-01-09" | grep "ERROR"

# Count HTTP status codes
$ cat access.log | awk '{print $9}' | sort | uniq -c | sort -rn

# Real-time error monitoring
$ tail -f app.log | grep --line-buffered "ERROR"

Text Processing

# Extract email addresses
$ grep -E -o '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Extract URLs
$ grep -E -o 'https?://[^ ]+' file.txt

# Count word frequency
$ cat file.txt | tr -s ' ' '\n' | tr '[:upper:][:lower:]' | sort | uniq -c | sort -rn

# Delete blank lines
$ cat file.txt | grep -v '^$'

# Delete comment lines
$ cat config.txt | grep -v '^#'

System Administration

# View CPU-using processes
$ ps aux | sort -k 3 -rn | head -10

# View memory-using processes
$ ps aux | sort -k 4 -rn | head -10

# View logged in users
$ who | cut -d ' ' -f 1 | sort | uniq

# View listening ports
$ ss -tlnp | grep LISTEN

# Find large directories
$ du -h --max-depth=1 | sort -rh | head -10

Data Transformation

# CSV to TSV
$ cat file.csv | tr ',' '\t'

# JSON field extraction (requires jq)
$ cat data.json | jq '.name'

# List to line
$ cat file.txt | paste -s -d ','

# Line to list
$ cat file.txt | tr ',' '\n'

Pipes and Redirection Combinations

# Save output and errors to different files
$ command 2>&1 | tee output.txt

# Pipe error output
$ command 2>&1 | grep "error"

# Use process substitution
$ diff <(sort file1.txt) <(sort file2.txt)

# Process multiple inputs
$ cat file1.txt file2.txt | sort | uniq

Pipes Notes

Pipe Buffering

# Control buffering with stdbuf
$ tail -f log.txt | stdbuf -oL grep "pattern"

# Use grep's --line-buffered option
$ tail -f log.txt | grep --line-buffered "pattern"

Pipes and Sub-shells

# Pipe creates sub-shell, variables don't pass to parent shell
$ count=0
$ cat file.txt | while read line; do
    ((count++))
done
$ echo $count  # 0, not expected result

# Solution 1: Use process substitution
$ count=0
$ while read line; do
    ((count++))
done < <(cat file.txt)
$ echo $count

# Solution 2: Use lastpipe
$ shopt -s lastpipe
$ count=0
$ cat file.txt | while read line; do
    ((count++))
done
$ echo $count

Getting Pipe Status

# $? returns only last command's status
$ false | true
$ echo $?  # 0

# Use PIPESTATUS array (Bash)
$ false | true
$ echo ${PIPESTATUS[0]} ${PIPESTATUS[1]}  # 1 0

# Use pipefail option
$ set -o pipefail
$ false | true
$ echo $?  # 1

Summary

This chapter introduced Linux pipes and filters:

  • Pipe |: Connect commands, build data processing flows
  • grep: Text search
  • sort/uniq: Sorting and deduplication
  • cut/paste: Column operations
  • tr: Character translation
  • head/tail: View file beginning and end
  • wc: Statistics
  • tee: Split output
  • xargs: Build command arguments

Unix philosophy advocates "do one thing and do it well", pipes let us combine these simple tools to accomplish complex tasks. Skilled use of pipes is key to efficient Linux operation.


Previous chapter: Input/Output Redirection

Next chapter: Text Editors