Text Processing in Linux: grep, awk, and Pipes That Actually Get Work Done
Source: Dev.to
The Problem: Manually Searching Through Files
You need to find all error messages in a 10,000‑line log file. Or extract usernames from system files. Or count how many times a specific IP address appears in access logs.
Opening the file in an editor and searching manually? That’s slow and error‑prone.
Linux text‑processing tools turn these tasks into one‑line commands.
The cut Command – Extract Columns
cut extracts specific characters or fields from each line.
By Character Position
# Get first character from each line
cut -c1 file.txt
# Get characters 1‑3
cut -c1-3 file.txt
# Get characters 1, 2, and 4
cut -c1,2,4 file.txt
Real Example – Extract File Permissions
ls -l | cut -c1-10
# Output: drwxr-xr-x, -rw-r--r--, etc.
The awk Command – Pattern Scanning and Processing
awk is powerful for extracting and manipulating fields (columns).
Basic Field Extraction
# Print first column
awk '{print $1}' file.txt
# Print first and third columns
awk '{print $1, $3}' file.txt
# Print last column (NF = number of fields)
ls -l | awk '{print $NF}'
# Shows filenames from ls -l output
Search and Print
# Find lines containing "Jerry" and print them
awk '/Jerry/ {print}' file.txt
# Or shorter:
awk '/Jerry/' file.txt
Change Field Delimiter
# Use colon as delimiter (common in /etc/passwd)
awk -F: '{print $1}' /etc/passwd
# Output: list of all usernames
Modify Fields
# Replace second field with "JJ"
echo "Hello Tom" | awk '{$2="JJ"; print $0}'
# Output: Hello JJ
Filter by Length
# Get lines longer than 15 characters
awk 'length($0) > 15' file.txt
Real‑World Example – Extract IP Addresses
# Get IP addresses from access log
awk '{print $1}' /var/log/nginx/access.log
# Count unique IPs
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c
The grep Command – Search Text
grep (global regular expression print) searches for keywords in files or output.
Basic Search
# Find keyword in file
grep keyword filename
# Search in output
ls -l | grep Desktop
Useful Flags
# Count occurrences
grep -c keyword file.txt
# Ignore case
grep -i keyword file.txt
# Finds: keyword, Keyword, KEYWORD
# Show line numbers
grep -n keyword file.txt
# Output: 5:line with keyword
# Exclude lines with keyword (invert match)
grep -v keyword file.txt
Real‑World Example – Find Errors in Logs
# Find all error lines
grep -i error /var/log/syslog
# Count errors
grep -i error /var/log/syslog | wc -l
# Find errors but exclude specific ones
grep -i error /var/log/syslog | grep -v "ignore_this_error"
The egrep Command – Multiple Keywords
egrep (or grep -E) searches for multiple patterns at once.
# Search for keyword1 OR keyword2
egrep -i "keyword1|keyword2" file.txt
# Find lines with error or warning
egrep -i "error|warning" /var/log/syslog
The sort Command – Alphabetical Ordering
# Sort alphabetically
sort file.txt
# Reverse sort
sort -r file.txt
# Sort by second field
sort -k2 file.txt
Real Example – Sort By File Size
# Sort files by size (5th column in ls -l)
ls -l | sort -k5 -n
# -n flag for numerical sort
The uniq Command – Remove Duplicates
uniq filters out repeated lines. Important: The input must be sorted first.
# Remove duplicates
sort file.txt | uniq
# Count duplicates
sort file.txt | uniq -c
# Output: 3 line_content (appears 3 times)
# Show only duplicates
sort file.txt | uniq -d
Real Example – Most Common Log Entries
# Find most common errors
grep error /var/log/syslog | sort | uniq -c | sort -rn | head -10
Breaking it down
| Step | Purpose |
|---|---|
grep error | Find error lines |
sort | Group identical lines together |
uniq -c | Count occurrences of each line |
sort -rn | Sort by count (numeric, descending) |
head -10 | Show top 10 results |
The wc Command – Count Lines, Words, Bytes
wc (word count) reads files and reports counts.
# Count lines, words, bytes
wc file.txt
# Output: 45 300 2000 file.txt
# Only lines
wc -l file.txt
# Only words
wc -w file.txt
# Only bytes
wc -c file.txt
Real Examples
# Count files in directory
ls -l | wc -l
# Count how many times a keyword appears
grep keyword file.txt | wc -l
# Count total lines of code in Python files
find . -name "*.py" -exec wc -l {} \; | awk '{sum+=$1} END {print sum}'
Comparing Files – diff
Line‑by‑Line Comparison
# Compare files
diff file1.txt file2.txt
# Output shows differences:
# line in file2
Byte‑by‑Byte Comparison – cmp
# Compare files
cmp file1.txt file2.txt
# Output: first byte that differs
# No output if files are identical
Combining Commands with Pipes
The real power comes from chaining commands together.
Example 1: Find and Count
# How many users have /bin/bash as their shell?
grep "/bin/bash" /etc/passwd | wc -l
Example 2: Top 5 Largest Files
ls -lh | sort -k5 -h -r | head -5
Example 3: Extract and Sort
# Get all usernames and sort them
awk -F: '{print $1}' /etc/passwd | sort
Example 4: Search, Extract, Count
# Find IP addresses that accessed /admin
grep "/admin" /var/log/nginx/access.log \
| awk '{print $1}' \
| sort \
| uniq -c \
| sort -rn
This shows which IPs hit /admin most frequently.
Example 5: Log Analysis
# Find most common error types
grep -i error /var/log/app.log \
| awk '{print $5}' \
| sort \
| uniq -c \
| sort -rn \
| head -10
Real‑World Scenarios
Scenario 1: Find Large Files
# Files larger than 100 MB
find / -type f -size +100M 2>/dev/null \
| xargs ls -lh \
| awk '{print $5, $NF}'
Scenario 2: Monitor Active Connections
# Count connections per IP
netstat -an | grep ESTABLISHED \
| awk '{print $5}' \
| cut -d: -f1 \
| sort \
| uniq -c \
| sort -rn
Scenario 3: Check Failed Login Attempts
# Count failed SSH attempts by IP
grep "Failed password" /var/log/auth.log \
| awk '{print $11}' \
| sort \
| uniq -c \
| sort -rn
Scenario 4: Disk Usage by Directory
# Top 10 directories by size
du -h /var | sort -h -r | head -10
Scenario 5: Extract Email Addresses
# Find all email addresses in a file
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt \
| sort \
| uniq
Common Patterns
Pattern 1: Search, Extract, Sort, Count
grep pattern file | awk '{print $2}' | sort | uniq -c | sort -rn
Pattern 2: Filter and Process
cat file | grep -v exclude_pattern | awk '{print $1}'
Pattern 3: Multiple Conditions
egrep "error|warning" file | grep -v "ignore" | wc -l
Quick Reference
cut
cut -c1-3 file # Characters 1‑3
cut -d: -f1 file # First field (delimiter :)
awk
awk '{print $1}' file # First column
awk -F: '{print $1}' file # Custom delimiter
awk '/pattern/ {print}' file # Pattern matching
awk '{print $NF}' file # Last column
awk 'length($0) > 15' file # Lines > 15 characters
grep
grep pattern file # Search
grep -i pattern file # Ignore case
grep -c pattern file # Count matches
grep -n pattern file # Show line numbers
grep -v pattern file # Invert (exclude)
egrep "pat1|pat2" file # Multiple patterns
sort
sort file # Alphabetical
sort -r file # Reverse
sort -k2 file # By second field
sort -n file # Numerical
uniq
sort file | uniq # Remove duplicates
sort file | uniq -c # Count occurrences
sort file | uniq -d # Show only duplicates
wc
wc file # Lines, words, bytes
wc -l file # Lines only
wc -w file # Words only
wc -c file # Bytes only
diff / cmp
diff file1 file2 # Line‑by‑line comparison
cmp file1 file2 # Byte‑by‑byte comparison
Tips for Efficiency
Tip 1 – Use pipes instead of temporary files
# Instead of:
grep pattern file > temp.txt
sort temp.txt > sorted.txt
# Do:
grep pattern file | sort
Tip 2 – Combine grep with awk
# Filter then extract
grep error log.txt | awk '{print $1, $5}'
Tip 3 – Use awk instead of multiple cuts
# Instead of:
cut -d: -f1 file | cut -d- -f1
# Do:
awk -F: '{print $1}' file | awk -F- '{print $1}'
Tip 4 – Test patterns on small samples first
# Test on first 10 lines
head -10 large_file.txt | grep pattern
Key Takeaways
cut– Extract characters or fields.awk– Process fields, pattern matching, calculations.grep– Search for patterns.egrep/grep -E– Extended regular expressions (multiple patterns).sort,uniq,wc– Organize, deduplicate, and count data.diff/cmp– Compare files (line vs. byte).
Combine these tools with pipes (|) to build powerful, one‑liner command‑line workflows.
**grep** – Search multiple patterns
**sort** – Order lines
**uniq** – Remove duplicates (must sort first)
**wc** – Count lines, words, bytes
**Pipes (|)** – Chain commands together
**diff / cmp** – Compare files
These commands aren't just for showing off; they solve real problems:
- Analyzing logs
- Extracting data
- Monitoring systems
- Processing reports
- Debugging issues
Master these tools and manual file searching becomes a thing of the past.
*What text‑processing task do you do most often? Share your go‑to command combinations in the comments.*```