Skip to content

Pipes and Filters

What is a Pipe?

A Pipe is one of Unix/Linux's most powerful features. It uses the | symbol to take the output of one command and use it as input to another command, thus combining multiple simple commands into complex data processing flows.

┌─────────┐         ┌─────────┐         ┌─────────┐
│ Command 1│  stdout │ Command 2│  stdout │ Command 3│
│         ├────────►│         ├────────►│         │
│         │    |   │         │    │   │
└─────────┘         └─────────┘         └─────────┘

Basic Syntax

bash
command1 | command2 | command3 | ...

Simple Examples

bash
# List files and display paged
$ ls -la | less

# Count files
$ ls | wc -l

# Find and count
$ grep "error" logfile.txt | wc -l

# Multi-level pipe
$ cat file.txt | grep "pattern" | sort | uniq

Filter Commands

Filter commands are programs that receive standard input, process data, and output to standard output.

bash
# Basic search
$ cat file.txt | grep "pattern"

# Case-insensitive
$ cat file.txt | grep -i "pattern"

# Show line numbers
$ cat file.txt | grep -n "pattern"

# Reverse match
$ cat file.txt | grep -v "pattern"

# Count matching lines
$ cat file.txt | grep -c "pattern"

# Show only matching part
$ cat file.txt | grep -o "pattern"

# Extended regular expression
$ cat file.txt | grep -E "pattern1|pattern2"

# Show context
$ cat file.txt | grep -A 2 -B 2 "pattern"  # 2 lines before and after
$ cat file.txt | grep -C 3 "pattern"        # 3 lines before and after

sort - Sort

bash
# Basic sort (alphabetical)
$ cat file.txt | sort

# Reverse sort
$ cat file.txt | sort -r

# Numeric sort
$ cat file.txt | sort -n

# Sort by specific column
$ cat file.txt | sort -k 2      # By column 2
$ cat file.txt | sort -k 2,2    # Only by column 2
$ cat file.txt | sort -k 2 -n   # Column 2 numeric sort

# Sort by separator
$ cat file.txt | sort -t ':' -k 3 -n

# Sort and deduplicate
$ cat file.txt | sort -u

# Human-readable size sort
$ du -h | sort -h

# Random sort
$ cat file.txt | sort -R

uniq - Deduplicate

bash
# Remove consecutive duplicates (needs sorting first)
$ cat file.txt | sort | uniq

# Show only duplicate lines
$ cat file.txt | sort | uniq -d

# Show only non-duplicate lines
$ cat file.txt | sort | uniq -u

# Count each line's occurrences
$ cat file.txt | sort | uniq -c

# Sort by occurrence count
$ cat file.txt | sort | uniq -c | sort -rn

# Case-insensitive
$ cat file.txt | sort | uniq -i

cut - Cut Columns

bash
# Cut by character position
$ echo "Hello World" | cut -c 1-5
Hello

# Cut by fields (Tab separated by default)
$ cat file.txt | cut -f 1,3

# Specify separator
$ cat /etc/passwd | cut -d ':' -f 1,3
# Extract username and UID

# Specify range
$ cat file.txt | cut -d ',' -f 2-4    # Columns 2 to 4
$ cat file.txt | cut -d ',' -f 3-     # Column 3 to end
$ cat file.txt | cut -d ',' -f -3     # Columns 1 to 3

# Cut by bytes
$ cat file.txt | cut -b 1-10

paste - Merge Columns

bash
# Merge files side by side
$ paste file1.txt file2.txt

# Specify separator
$ paste -d ',' file1.txt file2.txt

# Merge one file's lines into one line
$ paste -s file.txt

# Merge every N lines
$ cat file.txt | paste - - -    # Merge every 3 lines

tr - Character Translation

bash
# Translate characters
$ echo "hello" | tr 'a-z' 'A-Z'
HELLO

# Delete characters
$ echo "hello 123" | tr -d '0-9'
hello

# Compress consecutive characters
$ echo "hello     world" | tr -s ' '
hello world

# Delete newlines
$ cat file.txt | tr -d '\n'

# Replace characters
$ echo "hello:world" | tr ':' ' '
hello world

# Delete non-printing characters
$ cat file.txt | tr -cd '[:print:]\n'

# Character classes
# [:alpha:] Letters
# [:digit:] Numbers
# [:alnum:] Letters and numbers
# [:space:] Whitespace characters
# [:lower:] Lowercase letters
# [:upper:] Uppercase letters

head and tail

bash
# First 10 lines
$ cat file.txt | head

# First N lines
$ cat file.txt | head -n 5
$ head -n 5 file.txt

# All except last N lines
$ cat file.txt | head -n -5

# Last 10 lines
$ cat file.txt | tail

# Last N lines
$ cat file.txt | tail -n 5
$ tail -n 5 file.txt

# Start from line N
$ cat file.txt | tail -n +5

# Combined use (lines 5-10)
$ cat file.txt | head -n 10 | tail -n 5

wc - Statistics

bash
# Count lines, words, bytes
$ cat file.txt | wc
    100     500    3000

# Count only lines
$ cat file.txt | wc -l

# Count only words
$ cat file.txt | wc -w

# Count only characters
$ cat file.txt | wc -m

# Count only bytes
$ cat file.txt | wc -c

# Longest line length
$ cat file.txt | wc -L

tee - Split Output

bash
# Output to both screen and file
$ ls -la | tee filelist.txt

# Append mode
$ ls -la | tee -a filelist.txt

# Output to multiple files
$ ls -la | tee file1.txt file2.txt file3.txt

# Save while in middle of pipe
$ cat file.txt | grep "error" | tee errors.txt | wc -l

xargs - Build Arguments

bash
# Convert input to command arguments
$ echo "file1 file2 file3" | xargs rm

# Process one at a time
$ cat files.txt | xargs -n 1 rm

# Specify replacement position
$ find . -name "*.txt" | xargs -I {} cp {} /backup/

# Execute in parallel
$ cat urls.txt | xargs -n 1 -P 4 wget

# Handle filenames with spaces
$ find . -name "*.txt" -print0 | xargs -0 rm

# Interactive confirmation
$ find . -name "*.tmp" | xargs -p rm

# Show executed commands
$ echo "a b c" | xargs -t echo
echo a b c
a b c

Practical Pipe Combinations

File Analysis

bash
# Count file types
$ find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn

# Find top 10 largest files
$ find . -type f -exec du -h {} + | sort -rh | head -10

# Count code lines
$ find . -name "*.py" | xargs wc -l | tail -1

# Find duplicate files (by size)
$ find . -type f -exec du -b {} + | sort -n | uniq -d -w 10

Log Analysis

bash
# Count IP access frequency
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

# Find error logs
$ cat app.log | grep -i "error" | tail -20

# Filter by time
$ cat app.log | grep "2025-01-09" | grep "ERROR"

# Count HTTP status codes
$ cat access.log | awk '{print $9}' | sort | uniq -c | sort -rn

# Real-time error monitoring
$ tail -f app.log | grep --line-buffered "ERROR"

Text Processing

bash
# Extract email addresses
$ grep -E -o '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Extract URLs
$ grep -E -o 'https?://[^ ]+' file.txt

# Count word frequency
$ cat file.txt | tr -s ' ' '\n' | tr '[:upper:][:lower:]' | sort | uniq -c | sort -rn

# Delete blank lines
$ cat file.txt | grep -v '^$'

# Delete comment lines
$ cat config.txt | grep -v '^#'

System Administration

bash
# View CPU-using processes
$ ps aux | sort -k 3 -rn | head -10

# View memory-using processes
$ ps aux | sort -k 4 -rn | head -10

# View logged in users
$ who | cut -d ' ' -f 1 | sort | uniq

# View listening ports
$ ss -tlnp | grep LISTEN

# Find large directories
$ du -h --max-depth=1 | sort -rh | head -10

Data Transformation

bash
# CSV to TSV
$ cat file.csv | tr ',' '\t'

# JSON field extraction (requires jq)
$ cat data.json | jq '.name'

# List to line
$ cat file.txt | paste -s -d ','

# Line to list
$ cat file.txt | tr ',' '\n'

Pipes and Redirection Combinations

bash
# Save output and errors to different files
$ command 2>&1 | tee output.txt

# Pipe error output
$ command 2>&1 | grep "error"

# Use process substitution
$ diff <(sort file1.txt) <(sort file2.txt)

# Process multiple inputs
$ cat file1.txt file2.txt | sort | uniq

Pipes Notes

Pipe Buffering

bash
# Control buffering with stdbuf
$ tail -f log.txt | stdbuf -oL grep "pattern"

# Use grep's --line-buffered option
$ tail -f log.txt | grep --line-buffered "pattern"

Pipes and Sub-shells

bash
# Pipe creates sub-shell, variables don't pass to parent shell
$ count=0
$ cat file.txt | while read line; do
    ((count++))
done
$ echo $count  # 0, not expected result

# Solution 1: Use process substitution
$ count=0
$ while read line; do
    ((count++))
done < <(cat file.txt)
$ echo $count

# Solution 2: Use lastpipe
$ shopt -s lastpipe
$ count=0
$ cat file.txt | while read line; do
    ((count++))
done
$ echo $count

Getting Pipe Status

bash
# $? returns only last command's status
$ false | true
$ echo $?  # 0

# Use PIPESTATUS array (Bash)
$ false | true
$ echo ${PIPESTATUS[0]} ${PIPESTATUS[1]}  # 1 0

# Use pipefail option
$ set -o pipefail
$ false | true
$ echo $?  # 1

Summary

This chapter introduced Linux pipes and filters:

  • Pipe |: Connect commands, build data processing flows
  • grep: Text search
  • sort/uniq: Sorting and deduplication
  • cut/paste: Column operations
  • tr: Character translation
  • head/tail: View file beginning and end
  • wc: Statistics
  • tee: Split output
  • xargs: Build command arguments

Unix philosophy advocates "do one thing and do it well", pipes let us combine these simple tools to accomplish complex tasks. Skilled use of pipes is key to efficient Linux operation.


Previous chapter: Input/Output Redirection

Next chapter: Text Editors

Content is for learning and research only.