Text Processing in Linux: grep, awk, and Pipes That Actually Get Work Done

Published: (December 15, 2025 at 05:31 PM EST)
7 min read
Source: Dev.to

Source: Dev.to

The Problem: Manually Searching Through Files

You need to find all error messages in a 10,000‑line log file. Or extract usernames from system files. Or count how many times a specific IP address appears in access logs.

Opening the file in an editor and searching manually? That’s slow and error‑prone.

Linux text‑processing tools turn these tasks into one‑line commands.

The cut Command – Extract Columns

cut extracts specific characters or fields from each line.

By Character Position

# Get first character from each line
cut -c1 file.txt

# Get characters 1‑3
cut -c1-3 file.txt

# Get characters 1, 2, and 4
cut -c1,2,4 file.txt

Real Example – Extract File Permissions

ls -l | cut -c1-10
# Output: drwxr-xr-x, -rw-r--r--, etc.

The awk Command – Pattern Scanning and Processing

awk is powerful for extracting and manipulating fields (columns).

Basic Field Extraction

# Print first column
awk '{print $1}' file.txt

# Print first and third columns
awk '{print $1, $3}' file.txt

# Print last column (NF = number of fields)
ls -l | awk '{print $NF}'
# Shows filenames from ls -l output

Search and Print

# Find lines containing "Jerry" and print them
awk '/Jerry/ {print}' file.txt

# Or shorter:
awk '/Jerry/' file.txt

Change Field Delimiter

# Use colon as delimiter (common in /etc/passwd)
awk -F: '{print $1}' /etc/passwd
# Output: list of all usernames

Modify Fields

# Replace second field with "JJ"
echo "Hello Tom" | awk '{$2="JJ"; print $0}'
# Output: Hello JJ

Filter by Length

# Get lines longer than 15 characters
awk 'length($0) > 15' file.txt

Real‑World Example – Extract IP Addresses

# Get IP addresses from access log
awk '{print $1}' /var/log/nginx/access.log

# Count unique IPs
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c

The grep Command – Search Text

grep (global regular expression print) searches for keywords in files or output.

# Find keyword in file
grep keyword filename

# Search in output
ls -l | grep Desktop

Useful Flags

# Count occurrences
grep -c keyword file.txt

# Ignore case
grep -i keyword file.txt
# Finds: keyword, Keyword, KEYWORD

# Show line numbers
grep -n keyword file.txt
# Output: 5:line with keyword

# Exclude lines with keyword (invert match)
grep -v keyword file.txt

Real‑World Example – Find Errors in Logs

# Find all error lines
grep -i error /var/log/syslog

# Count errors
grep -i error /var/log/syslog | wc -l

# Find errors but exclude specific ones
grep -i error /var/log/syslog | grep -v "ignore_this_error"

The egrep Command – Multiple Keywords

egrep (or grep -E) searches for multiple patterns at once.

# Search for keyword1 OR keyword2
egrep -i "keyword1|keyword2" file.txt

# Find lines with error or warning
egrep -i "error|warning" /var/log/syslog

The sort Command – Alphabetical Ordering

# Sort alphabetically
sort file.txt

# Reverse sort
sort -r file.txt

# Sort by second field
sort -k2 file.txt

Real Example – Sort By File Size

# Sort files by size (5th column in ls -l)
ls -l | sort -k5 -n
# -n flag for numerical sort

The uniq Command – Remove Duplicates

uniq filters out repeated lines. Important: The input must be sorted first.

# Remove duplicates
sort file.txt | uniq

# Count duplicates
sort file.txt | uniq -c
# Output: 3 line_content (appears 3 times)

# Show only duplicates
sort file.txt | uniq -d

Real Example – Most Common Log Entries

# Find most common errors
grep error /var/log/syslog | sort | uniq -c | sort -rn | head -10

Breaking it down

StepPurpose
grep errorFind error lines
sortGroup identical lines together
uniq -cCount occurrences of each line
sort -rnSort by count (numeric, descending)
head -10Show top 10 results

The wc Command – Count Lines, Words, Bytes

wc (word count) reads files and reports counts.

# Count lines, words, bytes
wc file.txt
# Output: 45 300 2000 file.txt

# Only lines
wc -l file.txt

# Only words
wc -w file.txt

# Only bytes
wc -c file.txt

Real Examples

# Count files in directory
ls -l | wc -l

# Count how many times a keyword appears
grep keyword file.txt | wc -l

# Count total lines of code in Python files
find . -name "*.py" -exec wc -l {} \; | awk '{sum+=$1} END {print sum}'

Comparing Files – diff

Line‑by‑Line Comparison

# Compare files
diff file1.txt file2.txt

# Output shows differences:
#  line in file2

Byte‑by‑Byte Comparison – cmp

# Compare files
cmp file1.txt file2.txt

# Output: first byte that differs
# No output if files are identical

Combining Commands with Pipes

The real power comes from chaining commands together.

Example 1: Find and Count

# How many users have /bin/bash as their shell?
grep "/bin/bash" /etc/passwd | wc -l

Example 2: Top 5 Largest Files

ls -lh | sort -k5 -h -r | head -5

Example 3: Extract and Sort

# Get all usernames and sort them
awk -F: '{print $1}' /etc/passwd | sort

Example 4: Search, Extract, Count

# Find IP addresses that accessed /admin
grep "/admin" /var/log/nginx/access.log \
    | awk '{print $1}' \
    | sort \
    | uniq -c \
    | sort -rn

This shows which IPs hit /admin most frequently.

Example 5: Log Analysis

# Find most common error types
grep -i error /var/log/app.log \
    | awk '{print $5}' \
    | sort \
    | uniq -c \
    | sort -rn \
    | head -10

Real‑World Scenarios

Scenario 1: Find Large Files

# Files larger than 100 MB
find / -type f -size +100M 2>/dev/null \
    | xargs ls -lh \
    | awk '{print $5, $NF}'

Scenario 2: Monitor Active Connections

# Count connections per IP
netstat -an | grep ESTABLISHED \
    | awk '{print $5}' \
    | cut -d: -f1 \
    | sort \
    | uniq -c \
    | sort -rn

Scenario 3: Check Failed Login Attempts

# Count failed SSH attempts by IP
grep "Failed password" /var/log/auth.log \
    | awk '{print $11}' \
    | sort \
    | uniq -c \
    | sort -rn

Scenario 4: Disk Usage by Directory

# Top 10 directories by size
du -h /var | sort -h -r | head -10

Scenario 5: Extract Email Addresses

# Find all email addresses in a file
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt \
    | sort \
    | uniq

Common Patterns

Pattern 1: Search, Extract, Sort, Count

grep pattern file | awk '{print $2}' | sort | uniq -c | sort -rn

Pattern 2: Filter and Process

cat file | grep -v exclude_pattern | awk '{print $1}'

Pattern 3: Multiple Conditions

egrep "error|warning" file | grep -v "ignore" | wc -l

Quick Reference

cut

cut -c1-3 file        # Characters 1‑3
cut -d: -f1 file      # First field (delimiter :)

awk

awk '{print $1}' file                # First column
awk -F: '{print $1}' file            # Custom delimiter
awk '/pattern/ {print}' file        # Pattern matching
awk '{print $NF}' file               # Last column
awk 'length($0) > 15' file           # Lines > 15 characters

grep

grep pattern file                   # Search
grep -i pattern file                # Ignore case
grep -c pattern file                # Count matches
grep -n pattern file                # Show line numbers
grep -v pattern file                # Invert (exclude)
egrep "pat1|pat2" file              # Multiple patterns

sort

sort file                           # Alphabetical
sort -r file                        # Reverse
sort -k2 file                       # By second field
sort -n file                        # Numerical

uniq

sort file | uniq                    # Remove duplicates
sort file | uniq -c                 # Count occurrences
sort file | uniq -d                 # Show only duplicates

wc

wc file                             # Lines, words, bytes
wc -l file                          # Lines only
wc -w file                          # Words only
wc -c file                          # Bytes only

diff / cmp

diff file1 file2                    # Line‑by‑line comparison
cmp file1 file2                     # Byte‑by‑byte comparison

Tips for Efficiency

Tip 1 – Use pipes instead of temporary files

# Instead of:
grep pattern file > temp.txt
sort temp.txt > sorted.txt

# Do:
grep pattern file | sort

Tip 2 – Combine grep with awk

# Filter then extract
grep error log.txt | awk '{print $1, $5}'

Tip 3 – Use awk instead of multiple cuts

# Instead of:
cut -d: -f1 file | cut -d- -f1

# Do:
awk -F: '{print $1}' file | awk -F- '{print $1}'

Tip 4 – Test patterns on small samples first

# Test on first 10 lines
head -10 large_file.txt | grep pattern

Key Takeaways

  • cut – Extract characters or fields.
  • awk – Process fields, pattern matching, calculations.
  • grep – Search for patterns.
  • egrep / grep -E – Extended regular expressions (multiple patterns).
  • sort, uniq, wc – Organize, deduplicate, and count data.
  • diff / cmp – Compare files (line vs. byte).

Combine these tools with pipes (|) to build powerful, one‑liner command‑line workflows.

**grep** – Search multiple patterns  

**sort** – Order lines  

**uniq** – Remove duplicates (must sort first)  

**wc** – Count lines, words, bytes  

**Pipes (|)** – Chain commands together  

**diff / cmp** – Compare files  

These commands aren't just for showing off; they solve real problems:

- Analyzing logs  
- Extracting data  
- Monitoring systems  
- Processing reports  
- Debugging issues  

Master these tools and manual file searching becomes a thing of the past.  

*What text‑processing task do you do most often? Share your go‑to command combinations in the comments.*```
Back to Blog

Related posts

Read more »

12 Days of Shell

Article URL: https://12days.cmdchallenge.com Comments URL: https://news.ycombinator.com/item?id=46190577 Points: 31 Comments: 8...