Pipes and Filters
What is a Pipe?
A Pipe is one of Unix/Linux's most powerful features. It uses the | symbol to take the output of one command and use it as input to another command, thus combining multiple simple commands into complex data processing flows.
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Command 1│ stdout │ Command 2│ stdout │ Command 3│
│ ├────────►│ ├────────►│ │
│ │ | │ │ │ │
└─────────┘ └─────────┘ └─────────┘Basic Syntax
bash
command1 | command2 | command3 | ...Simple Examples
bash
# List files and display paged
$ ls -la | less
# Count files
$ ls | wc -l
# Find and count
$ grep "error" logfile.txt | wc -l
# Multi-level pipe
$ cat file.txt | grep "pattern" | sort | uniqFilter Commands
Filter commands are programs that receive standard input, process data, and output to standard output.
grep - Text Search
bash
# Basic search
$ cat file.txt | grep "pattern"
# Case-insensitive
$ cat file.txt | grep -i "pattern"
# Show line numbers
$ cat file.txt | grep -n "pattern"
# Reverse match
$ cat file.txt | grep -v "pattern"
# Count matching lines
$ cat file.txt | grep -c "pattern"
# Show only matching part
$ cat file.txt | grep -o "pattern"
# Extended regular expression
$ cat file.txt | grep -E "pattern1|pattern2"
# Show context
$ cat file.txt | grep -A 2 -B 2 "pattern" # 2 lines before and after
$ cat file.txt | grep -C 3 "pattern" # 3 lines before and aftersort - Sort
bash
# Basic sort (alphabetical)
$ cat file.txt | sort
# Reverse sort
$ cat file.txt | sort -r
# Numeric sort
$ cat file.txt | sort -n
# Sort by specific column
$ cat file.txt | sort -k 2 # By column 2
$ cat file.txt | sort -k 2,2 # Only by column 2
$ cat file.txt | sort -k 2 -n # Column 2 numeric sort
# Sort by separator
$ cat file.txt | sort -t ':' -k 3 -n
# Sort and deduplicate
$ cat file.txt | sort -u
# Human-readable size sort
$ du -h | sort -h
# Random sort
$ cat file.txt | sort -Runiq - Deduplicate
bash
# Remove consecutive duplicates (needs sorting first)
$ cat file.txt | sort | uniq
# Show only duplicate lines
$ cat file.txt | sort | uniq -d
# Show only non-duplicate lines
$ cat file.txt | sort | uniq -u
# Count each line's occurrences
$ cat file.txt | sort | uniq -c
# Sort by occurrence count
$ cat file.txt | sort | uniq -c | sort -rn
# Case-insensitive
$ cat file.txt | sort | uniq -icut - Cut Columns
bash
# Cut by character position
$ echo "Hello World" | cut -c 1-5
Hello
# Cut by fields (Tab separated by default)
$ cat file.txt | cut -f 1,3
# Specify separator
$ cat /etc/passwd | cut -d ':' -f 1,3
# Extract username and UID
# Specify range
$ cat file.txt | cut -d ',' -f 2-4 # Columns 2 to 4
$ cat file.txt | cut -d ',' -f 3- # Column 3 to end
$ cat file.txt | cut -d ',' -f -3 # Columns 1 to 3
# Cut by bytes
$ cat file.txt | cut -b 1-10paste - Merge Columns
bash
# Merge files side by side
$ paste file1.txt file2.txt
# Specify separator
$ paste -d ',' file1.txt file2.txt
# Merge one file's lines into one line
$ paste -s file.txt
# Merge every N lines
$ cat file.txt | paste - - - # Merge every 3 linestr - Character Translation
bash
# Translate characters
$ echo "hello" | tr 'a-z' 'A-Z'
HELLO
# Delete characters
$ echo "hello 123" | tr -d '0-9'
hello
# Compress consecutive characters
$ echo "hello world" | tr -s ' '
hello world
# Delete newlines
$ cat file.txt | tr -d '\n'
# Replace characters
$ echo "hello:world" | tr ':' ' '
hello world
# Delete non-printing characters
$ cat file.txt | tr -cd '[:print:]\n'
# Character classes
# [:alpha:] Letters
# [:digit:] Numbers
# [:alnum:] Letters and numbers
# [:space:] Whitespace characters
# [:lower:] Lowercase letters
# [:upper:] Uppercase lettershead and tail
bash
# First 10 lines
$ cat file.txt | head
# First N lines
$ cat file.txt | head -n 5
$ head -n 5 file.txt
# All except last N lines
$ cat file.txt | head -n -5
# Last 10 lines
$ cat file.txt | tail
# Last N lines
$ cat file.txt | tail -n 5
$ tail -n 5 file.txt
# Start from line N
$ cat file.txt | tail -n +5
# Combined use (lines 5-10)
$ cat file.txt | head -n 10 | tail -n 5wc - Statistics
bash
# Count lines, words, bytes
$ cat file.txt | wc
100 500 3000
# Count only lines
$ cat file.txt | wc -l
# Count only words
$ cat file.txt | wc -w
# Count only characters
$ cat file.txt | wc -m
# Count only bytes
$ cat file.txt | wc -c
# Longest line length
$ cat file.txt | wc -Ltee - Split Output
bash
# Output to both screen and file
$ ls -la | tee filelist.txt
# Append mode
$ ls -la | tee -a filelist.txt
# Output to multiple files
$ ls -la | tee file1.txt file2.txt file3.txt
# Save while in middle of pipe
$ cat file.txt | grep "error" | tee errors.txt | wc -lxargs - Build Arguments
bash
# Convert input to command arguments
$ echo "file1 file2 file3" | xargs rm
# Process one at a time
$ cat files.txt | xargs -n 1 rm
# Specify replacement position
$ find . -name "*.txt" | xargs -I {} cp {} /backup/
# Execute in parallel
$ cat urls.txt | xargs -n 1 -P 4 wget
# Handle filenames with spaces
$ find . -name "*.txt" -print0 | xargs -0 rm
# Interactive confirmation
$ find . -name "*.tmp" | xargs -p rm
# Show executed commands
$ echo "a b c" | xargs -t echo
echo a b c
a b cPractical Pipe Combinations
File Analysis
bash
# Count file types
$ find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn
# Find top 10 largest files
$ find . -type f -exec du -h {} + | sort -rh | head -10
# Count code lines
$ find . -name "*.py" | xargs wc -l | tail -1
# Find duplicate files (by size)
$ find . -type f -exec du -b {} + | sort -n | uniq -d -w 10Log Analysis
bash
# Count IP access frequency
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
# Find error logs
$ cat app.log | grep -i "error" | tail -20
# Filter by time
$ cat app.log | grep "2025-01-09" | grep "ERROR"
# Count HTTP status codes
$ cat access.log | awk '{print $9}' | sort | uniq -c | sort -rn
# Real-time error monitoring
$ tail -f app.log | grep --line-buffered "ERROR"Text Processing
bash
# Extract email addresses
$ grep -E -o '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
# Extract URLs
$ grep -E -o 'https?://[^ ]+' file.txt
# Count word frequency
$ cat file.txt | tr -s ' ' '\n' | tr '[:upper:][:lower:]' | sort | uniq -c | sort -rn
# Delete blank lines
$ cat file.txt | grep -v '^$'
# Delete comment lines
$ cat config.txt | grep -v '^#'System Administration
bash
# View CPU-using processes
$ ps aux | sort -k 3 -rn | head -10
# View memory-using processes
$ ps aux | sort -k 4 -rn | head -10
# View logged in users
$ who | cut -d ' ' -f 1 | sort | uniq
# View listening ports
$ ss -tlnp | grep LISTEN
# Find large directories
$ du -h --max-depth=1 | sort -rh | head -10Data Transformation
bash
# CSV to TSV
$ cat file.csv | tr ',' '\t'
# JSON field extraction (requires jq)
$ cat data.json | jq '.name'
# List to line
$ cat file.txt | paste -s -d ','
# Line to list
$ cat file.txt | tr ',' '\n'Pipes and Redirection Combinations
bash
# Save output and errors to different files
$ command 2>&1 | tee output.txt
# Pipe error output
$ command 2>&1 | grep "error"
# Use process substitution
$ diff <(sort file1.txt) <(sort file2.txt)
# Process multiple inputs
$ cat file1.txt file2.txt | sort | uniqPipes Notes
Pipe Buffering
bash
# Control buffering with stdbuf
$ tail -f log.txt | stdbuf -oL grep "pattern"
# Use grep's --line-buffered option
$ tail -f log.txt | grep --line-buffered "pattern"Pipes and Sub-shells
bash
# Pipe creates sub-shell, variables don't pass to parent shell
$ count=0
$ cat file.txt | while read line; do
((count++))
done
$ echo $count # 0, not expected result
# Solution 1: Use process substitution
$ count=0
$ while read line; do
((count++))
done < <(cat file.txt)
$ echo $count
# Solution 2: Use lastpipe
$ shopt -s lastpipe
$ count=0
$ cat file.txt | while read line; do
((count++))
done
$ echo $countGetting Pipe Status
bash
# $? returns only last command's status
$ false | true
$ echo $? # 0
# Use PIPESTATUS array (Bash)
$ false | true
$ echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} # 1 0
# Use pipefail option
$ set -o pipefail
$ false | true
$ echo $? # 1Summary
This chapter introduced Linux pipes and filters:
- Pipe
|: Connect commands, build data processing flows - grep: Text search
- sort/uniq: Sorting and deduplication
- cut/paste: Column operations
- tr: Character translation
- head/tail: View file beginning and end
- wc: Statistics
- tee: Split output
- xargs: Build command arguments
Unix philosophy advocates "do one thing and do it well", pipes let us combine these simple tools to accomplish complex tasks. Skilled use of pipes is key to efficient Linux operation.
Previous chapter: Input/Output Redirection
Next chapter: Text Editors