Solved: Stop Storing Results in Variables. Pipe Them Instead.
Source: Dev.to
Executive Summary
TL;DR: Storing large command outputs in variables can exhaust memory and crash servers by loading all data into RAM simultaneously. Instead, leverage the PowerShell pipeline to stream data objectābyāobject, ensuring efficient, lowāmemory processing for large datasets.
- Storing command results in a ābucketā variable loads all objects into memory ā memory exhaustion for large datasets.
- The PowerShell pipeline acts as a conveyor belt, processing data objectābyāobject and keeping a small, constant, predictable memory footprint.
- The āFilterāLeftā principle: filter as early as possible (e.g., using native
-Filterparameters) to minimise data transfer and memory usage. - For massive datasets or when reāprocessing is needed, spool to disk (e.g.,
ExportāCsvāImportāCsv) to achieve nearāzero memory usage. - Using
ForEach-Object(or its alias%) directly with piped input guarantees oneāatāaātime processing, avoiding the overhead offoreachloops on preāloaded variables.
Bottom line: Stop storing massive command outputs in variables. Learn to love the pipeline; it streams data objectābyāobject, preventing memory exhaustion and catastrophic script failures on production servers.
A RealāWorld Story
I remember it like it was yesterday: 3:00āÆPM on a Friday. A junior engineer was tasked with a āsimpleā cleanup scriptāfind and log all temp files older than 30āÆdays across our web farm. He wrote a oneāliner:
$files = Get-ChildItem -Path \\web-cluster-*-c$\temp -RecurseTen minutes later, my pager went off. One by one, our entire production web fleet (prod-web-01 ⦠prod-web-20) started throwing memoryāpressure alerts and falling over.
Why? The script tried to load millions of file objects from 20 servers into a single variable on his management box, causing a resource nightmare.
Weāve all been thereāa classic mistake born from procedural thinking instead of streaming. As a Reddit thread summed up:
āWe donāt recommend storing the results in a variable.ā
Variables vs. Pipeline
VariableāBased Approach
$myBigList = Get-ADUser -Filter *PowerShell fetches every user, creates an object for each, and holds them all in $myBigList.
- 50āÆ000 users ā 50āÆ000 objects in RAM.
- Works for a few dozen or hundred items, but catastrophic for thousands or millions.
PipelineāBased Approach
Get-ADUser -Filter * | Where-Object {$_.Enabled -eq $false}The pipeline is a conveyor belt:
Get-ADUseremits the first user object.Where-Objectevaluates it, decides whether to keep or discard it.- The next object is emitted, and the cycle repeats.
Result: Memory footprint stays tiny, constant, and predictable, regardless of processing 100 or 10āÆmillion objects.
Pro Tip: A variable collects everything before you can act. A pipeline lets you act as items arrive. For largeāscale automation, the pipeline is the only scalable approach.
Three Practical Approaches (Quick Fix ā āBreakāGlassā)
1ļøā£ Direct Pipeline to ForEach-Object (The Most Direct Solution)
Instead of saving to a variable and then iterating with foreach, pipe the output straight to ForEach-Object (alias %). This guarantees oneāatāaātime processing.
The Bad Way (Memory Hog)
# WARNING: Loads ALL VMs into memory first!
$allVMs = Get-VM -ComputerName prod-hyperv-cluster
foreach ($vm in $allVMs) {
if ($vm.State -eq 'Off') {
Write-Host "$($vm.Name) is currently off. Removing snapshot."
Get-VMSnapshot -VMName $vm.Name | Remove-VMSnapshot
}
}The Good Way (Streaming)
# Processes one VM at a time. Beautiful.
Get-VM -ComputerName prod-hyperv-cluster | ForEach-Object {
if ($_.State -eq 'Off') {
Write-Host "$($_.Name) is currently off. Removing snapshot."
# $_ represents the current object in the pipeline
Get-VMSnapshot -VMName $_.Name | Remove-VMSnapshot
}
}2ļøā£ Filter Early ā āFilterāLeftā Principle
Do your filtering as far left (as early) in the command chain as possible. Avoid pulling massive data only to discard most of it locally.
Inefficient Way (Filter Late)
# Pulls ALL users, then filters. Bad for network & memory.
Get-ADUser -Filter * -Properties LastLogonDate |
Where-Object {
$_.Enabled -eq $false -and $_.LastLogonDate -lt (Get-Date).AddDays(-90)
} |
Select-Object NameEfficient Way (Filter Left)
# Let the Domain Controller do the heavy lifting.
$ninetyDays = (Get-Date).AddDays(-90).ToFileTime()
Get-ADUser -Filter {
Enabled -eq $false -and LastLogonTimestamp -lt $ninetyDays
} -Properties LastLogonTimestamp |
Select-Object NameBy using the cmdletās native -Filter parameter, you ask the AD server to return only the matching users, drastically reducing the number of objects that ever enter the pipeline.
3ļøā£ Spool to Disk for Massive, Reāprocessable Datasets
When a dataset is truly huge and you need to process it multiple times (or the source API is slow), donāt keep it in memory. Dump it to a file and stream it back when needed.
# Export once (nearāzero memory)
Get-ADUser -Filter * -Properties * | Export-Csv -Path 'AllUsers.csv' -NoTypeInformation
# Later, stream it back for each processing pass
Import-Csv -Path 'AllUsers.csv' | Where-Object { $_.Enabled -eq $false } | ForEach-Object {
# Process each filtered record...
}The export/import cycle uses disk I/O, not RAM, allowing you to reāprocess the data without ever loading the full set into memory.
Takeaway
- Never store massive command outputs in a variable when you can stream them.
- Prefer the pipeline (
|) for all dataāprocessing tasks. - Filter early using native
-Filterparameters. - Spool to disk when you must reāprocess huge datasets.
Adopt these patterns, and your scripts will scale gracefully without bringing down production servers. š
Streaming Large Datasets with PowerShell
Data to a temporary file on disk.
This is a āhackyā but incredibly effective method. You take the oneātime performance hit of writing everything to a file (like a CSV or JSONL), then read that file back lineābyāline, which has a nearāzero memory footprint.
The Process
StepāÆ1 ā Export the massive dataset to a file
# ExportāCsv is great because it handles objects cleanly.
Get-VeryLargeDataset -Server prod-db-01 |
Export-Csv -Path C:\temp\dataset.csv -NoTypeInformationStepāÆ2 ā Process the data by streaming it from the file
# ImportāCsv streams the records one by one if piped.
Import-Csv -Path C:\temp\dataset.csv |
ForEach-Object {
# Work with each row ($_), one at a time.
# The entire file is NOT loaded into memory.
if ($_.Status -eq 'Failed') {
Invoke-MyRetryLogic -ID $_.TransactionID
}
}StepāÆ3 ā Clean up after yourself!
Remove-Item -Path C:\temp\dataset.csvMethods, Pros & Cons
| Method | Pros | Cons |
|---|---|---|
| 1. Pipelining | ⢠Extremely low memory usage ⢠Idiomatic PowerShell ⢠Fast for most tasks | ⢠Data is transient; canāt easily reāprocess without reārunning the initial command |
| 2. Filtering Left | ⢠Most efficient method ⢠Reduces memory, CPU, and network load | ⢠Relies on the source command having robust serverāside filtering capabilities |
| 3. Spooling to Disk | ⢠Handles virtually infinite data sizes ⢠Data is persistent for reāprocessing | ⢠Slowest method due to disk I/O ⢠Requires temporary disk space ⢠More complex code |
Takeaway
The next time you start to type $results = ā¦, pause for a second. Ask yourself:
āHow many items could this command possibly return?ā
If the answer is āI donāt knowā or āa lot,ā do yourself (and your servers) a favor: ditch the variable and embrace the pipeline. Your future selfāwho isnāt getting paged at 3āÆPM on a Fridayāwill thank you.
š Read the original article on [TechResolve.blog]
ā Support my work
If this article helped you, you can buy me a coffee:
š