Text Processing in Linux: grep, awk, and Pipes That Actually Get Work Done

Published: 1 month ago (December 15, 2025 at 05:31 PM EST)

7 min read

Source: Dev.to

The Problem: Manually Searching Through Files

You need to find all error messages in a 10,000‑line log file. Or extract usernames from system files. Or count how many times a specific IP address appears in access logs.

Opening the file in an editor and searching manually? That’s slow and error‑prone.

Linux text‑processing tools turn these tasks into one‑line commands.

The `cut` Command – Extract Columns

cut extracts specific characters or fields from each line.

By Character Position

# Get first character from each line
cut -c1 file.txt

# Get characters 1‑3
cut -c1-3 file.txt

# Get characters 1, 2, and 4
cut -c1,2,4 file.txt

Real Example – Extract File Permissions

ls -l | cut -c1-10
# Output: drwxr-xr-x, -rw-r--r--, etc.

The `awk` Command – Pattern Scanning and Processing

awk is powerful for extracting and manipulating fields (columns).

Basic Field Extraction

# Print first column
awk '{print $1}' file.txt

# Print first and third columns
awk '{print $1, $3}' file.txt

# Print last column (NF = number of fields)
ls -l | awk '{print $NF}'
# Shows filenames from ls -l output

Search and Print

# Find lines containing "Jerry" and print them
awk '/Jerry/ {print}' file.txt

# Or shorter:
awk '/Jerry/' file.txt

Change Field Delimiter

# Use colon as delimiter (common in /etc/passwd)
awk -F: '{print $1}' /etc/passwd
# Output: list of all usernames

Modify Fields

# Replace second field with "JJ"
echo "Hello Tom" | awk '{$2="JJ"; print $0}'
# Output: Hello JJ

Filter by Length

# Get lines longer than 15 characters
awk 'length($0) > 15' file.txt

Real‑World Example – Extract IP Addresses

# Get IP addresses from access log
awk '{print $1}' /var/log/nginx/access.log

# Count unique IPs
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c

The `grep` Command – Search Text

grep (global regular expression print) searches for keywords in files or output.

Basic Search

# Find keyword in file
grep keyword filename

# Search in output
ls -l | grep Desktop

Useful Flags

# Count occurrences
grep -c keyword file.txt

# Ignore case
grep -i keyword file.txt
# Finds: keyword, Keyword, KEYWORD

# Show line numbers
grep -n keyword file.txt
# Output: 5:line with keyword

# Exclude lines with keyword (invert match)
grep -v keyword file.txt

Real‑World Example – Find Errors in Logs

# Find all error lines
grep -i error /var/log/syslog

# Count errors
grep -i error /var/log/syslog | wc -l

# Find errors but exclude specific ones
grep -i error /var/log/syslog | grep -v "ignore_this_error"

The `egrep` Command – Multiple Keywords

egrep (or grep -E) searches for multiple patterns at once.

# Search for keyword1 OR keyword2
egrep -i "keyword1|keyword2" file.txt

# Find lines with error or warning
egrep -i "error|warning" /var/log/syslog

The `sort` Command – Alphabetical Ordering

# Sort alphabetically
sort file.txt

# Reverse sort
sort -r file.txt

# Sort by second field
sort -k2 file.txt

Real Example – Sort By File Size

# Sort files by size (5th column in ls -l)
ls -l | sort -k5 -n
# -n flag for numerical sort

The `uniq` Command – Remove Duplicates

uniq filters out repeated lines. Important: The input must be sorted first.

# Remove duplicates
sort file.txt | uniq

# Count duplicates
sort file.txt | uniq -c
# Output: 3 line_content (appears 3 times)

# Show only duplicates
sort file.txt | uniq -d

Real Example – Most Common Log Entries

# Find most common errors
grep error /var/log/syslog | sort | uniq -c | sort -rn | head -10

Breaking it down

Step	Purpose
`grep error`	Find error lines
`sort`	Group identical lines together
`uniq -c`	Count occurrences of each line
`sort -rn`	Sort by count (numeric, descending)
`head -10`	Show top 10 results

The `wc` Command – Count Lines, Words, Bytes

wc (word count) reads files and reports counts.

# Count lines, words, bytes
wc file.txt
# Output: 45 300 2000 file.txt

# Only lines
wc -l file.txt

# Only words
wc -w file.txt

# Only bytes
wc -c file.txt

Real Examples

# Count files in directory
ls -l | wc -l

# Count how many times a keyword appears
grep keyword file.txt | wc -l

# Count total lines of code in Python files
find . -name "*.py" -exec wc -l {} \; | awk '{sum+=$1} END {print sum}'

Comparing Files – `diff`

Line‑by‑Line Comparison

# Compare files
diff file1.txt file2.txt

# Output shows differences:
#  line in file2

Byte‑by‑Byte Comparison – `cmp`

# Compare files
cmp file1.txt file2.txt

# Output: first byte that differs
# No output if files are identical

Combining Commands with Pipes

The real power comes from chaining commands together.

Example 1: Find and Count

# How many users have /bin/bash as their shell?
grep "/bin/bash" /etc/passwd | wc -l

Example 2: Top 5 Largest Files

ls -lh | sort -k5 -h -r | head -5

Example 3: Extract and Sort

# Get all usernames and sort them
awk -F: '{print $1}' /etc/passwd | sort

Example 4: Search, Extract, Count

# Find IP addresses that accessed /admin
grep "/admin" /var/log/nginx/access.log \
    | awk '{print $1}' \
    | sort \
    | uniq -c \
    | sort -rn

This shows which IPs hit /admin most frequently.

Example 5: Log Analysis

# Find most common error types
grep -i error /var/log/app.log \
    | awk '{print $5}' \
    | sort \
    | uniq -c \
    | sort -rn \
    | head -10

Real‑World Scenarios

Scenario 1: Find Large Files

# Files larger than 100 MB
find / -type f -size +100M 2>/dev/null \
    | xargs ls -lh \
    | awk '{print $5, $NF}'

Scenario 2: Monitor Active Connections

# Count connections per IP
netstat -an | grep ESTABLISHED \
    | awk '{print $5}' \
    | cut -d: -f1 \
    | sort \
    | uniq -c \
    | sort -rn

# Count failed SSH attempts by IP
grep "Failed password" /var/log/auth.log \
    | awk '{print $11}' \
    | sort \
    | uniq -c \
    | sort -rn

Scenario 4: Disk Usage by Directory

# Top 10 directories by size
du -h /var | sort -h -r | head -10

Scenario 5: Extract Email Addresses

# Find all email addresses in a file
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt \
    | sort \
    | uniq

Common Patterns

Pattern 1: Search, Extract, Sort, Count

grep pattern file | awk '{print $2}' | sort | uniq -c | sort -rn

Pattern 2: Filter and Process

cat file | grep -v exclude_pattern | awk '{print $1}'

Pattern 3: Multiple Conditions

egrep "error|warning" file | grep -v "ignore" | wc -l

Quick Reference

`cut`

cut -c1-3 file        # Characters 1‑3
cut -d: -f1 file      # First field (delimiter :)

`awk`

awk '{print $1}' file                # First column
awk -F: '{print $1}' file            # Custom delimiter
awk '/pattern/ {print}' file        # Pattern matching
awk '{print $NF}' file               # Last column
awk 'length($0) > 15' file           # Lines > 15 characters

`grep`

grep pattern file                   # Search
grep -i pattern file                # Ignore case
grep -c pattern file                # Count matches
grep -n pattern file                # Show line numbers
grep -v pattern file                # Invert (exclude)
egrep "pat1|pat2" file              # Multiple patterns

`sort`

sort file                           # Alphabetical
sort -r file                        # Reverse
sort -k2 file                       # By second field
sort -n file                        # Numerical

`uniq`

sort file | uniq                    # Remove duplicates
sort file | uniq -c                 # Count occurrences
sort file | uniq -d                 # Show only duplicates

`wc`

wc file                             # Lines, words, bytes
wc -l file                          # Lines only
wc -w file                          # Words only
wc -c file                          # Bytes only

`diff` / `cmp`

diff file1 file2                    # Line‑by‑line comparison
cmp file1 file2                     # Byte‑by‑byte comparison

Tips for Efficiency

Tip 1 – Use pipes instead of temporary files

# Instead of:
grep pattern file > temp.txt
sort temp.txt > sorted.txt

# Do:
grep pattern file | sort

Tip 2 – Combine grep with awk

# Filter then extract
grep error log.txt | awk '{print $1, $5}'

Tip 3 – Use awk instead of multiple cuts

# Instead of:
cut -d: -f1 file | cut -d- -f1

# Do:
awk -F: '{print $1}' file | awk -F- '{print $1}'

Tip 4 – Test patterns on small samples first

# Test on first 10 lines
head -10 large_file.txt | grep pattern

Key Takeaways

cut – Extract characters or fields.
awk – Process fields, pattern matching, calculations.
grep – Search for patterns.
egrep / grep -E – Extended regular expressions (multiple patterns).
sort, uniq, wc – Organize, deduplicate, and count data.
diff / cmp – Compare files (line vs. byte).

Combine these tools with pipes (|) to build powerful, one‑liner command‑line workflows.

**grep** – Search multiple patterns  

**sort** – Order lines  

**uniq** – Remove duplicates (must sort first)  

**wc** – Count lines, words, bytes  

**Pipes (|)** – Chain commands together  

**diff / cmp** – Compare files  

These commands aren't just for showing off; they solve real problems:

- Analyzing logs  
- Extracting data  
- Monitoring systems  
- Processing reports  
- Debugging issues  

Master these tools and manual file searching becomes a thing of the past.  

*What text‑processing task do you do most often? Share your go‑to command combinations in the comments.*```

The Problem: Manually Searching Through Files

The cut Command – Extract Columns

By Character Position

Real Example – Extract File Permissions

The awk Command – Pattern Scanning and Processing

Basic Field Extraction

Search and Print

Change Field Delimiter

Modify Fields

Filter by Length

Real‑World Example – Extract IP Addresses

The grep Command – Search Text

Basic Search

Useful Flags

Real‑World Example – Find Errors in Logs

The egrep Command – Multiple Keywords

The sort Command – Alphabetical Ordering

Real Example – Sort By File Size

The uniq Command – Remove Duplicates

Real Example – Most Common Log Entries

The wc Command – Count Lines, Words, Bytes

Real Examples

Comparing Files – diff

Line‑by‑Line Comparison

Byte‑by‑Byte Comparison – cmp

Combining Commands with Pipes

Example 1: Find and Count

Example 2: Top 5 Largest Files

Example 3: Extract and Sort

Example 4: Search, Extract, Count

Example 5: Log Analysis

Real‑World Scenarios

Scenario 1: Find Large Files

Scenario 2: Monitor Active Connections

Scenario 3: Check Failed Login Attempts

Scenario 4: Disk Usage by Directory

Scenario 5: Extract Email Addresses