Scraping a Website Using a Symfony Console Command (Clean & Production-Friendly)
Source: Dev.to
Introduction
Web scraping doesn’t belong in controllers.
It’s long‑running, may fail, is often scheduled, and is essentially automation.
Symfony Console Commands are perfect for this scenario because they run inside your Symfony application with full access to its services and dependency‑injection container.
What the command does
- Scrape country data from a free sandbox site.
- Parse the HTML with DomCrawler.
- Sort the results alphabetically.
- Display a clean CLI table.
GitHub repository to follow along:
Scraping sandbox used in the demo:
Prerequisites
composer require symfony/http-client
composer require symfony/dom-crawler
composer require symfony/css-selector
Create the console command
php bin/console make:command app:create-countries
Inject the HTTP client (DI)
use Symfony\Contracts\HttpClient\HttpClientInterface;
public function __construct(
private HttpClientInterface $client
) {}
Fetch the page
$response = $this->client->request('GET', self::URL);
$html = $response->getContent(); // raw HTML
Parse with DomCrawler
use Symfony\Component\DomCrawler\Crawler;
$crawler = new Crawler($html);
Extract country information
$countryInfo = [];
$crawler->filter('.country')->each(function (Crawler $row) use (&$countryInfo) {
$countryInfo[] = [
$row->filter('.country-name')->text(),
$row->filter('.country-capital')->text(),
$row->filter('.country-population')->text(),
$row->filter('.country-area')->text(),
];
});
Sort alphabetically by country name
usort($countryInfo, function ($a, $b) {
return strcasecmp($a[0], $b[0]);
});
Display results in a formatted table
// Header
printf(
"%-45s | %-20s | %15s | %15s\n",
"Country name",
"Capital",
"Population",
"Area (km2)"
);
// Rows
foreach ($countryInfo as $row) {
printf(
"%-45s | %-20s | %15s | %15s\n",
$row[0],
$row[1],
$row[2],
$row[3]
);
}
Tip: Add a multibyte‑safe padding helper if you need proper alignment with Unicode characters.
Run the command:
php bin/console app:create-countries
You should see an output similar to a professional terminal table rather than raw debug data.
Benefits of this approach
- Separation of concerns – scraping logic lives outside controllers.
- Cron‑friendly – easy to schedule with cron or Symfony’s scheduler.
- Clean architecture – reusable, testable, and maintainable code.
- Scalable – can be refactored into asynchronous jobs if needed.
Scraping etiquette (before targeting real sites)
- Check the site’s Terms of Service.
- Respect
robots.txt. - Avoid aggressive request rates; add delays when scraping multiple pages:
sleep(1); // pause 1 second between requests
Full source code
The complete working project, including the command implementation, formatting helpers, and setup instructions, is available on GitHub:
When to use this pattern
- Data aggregation tools
- Monitoring systems
- Intelligence platforms
- Background automation jobs
Symfony Console Commands combined with DomCrawler provide an underrated yet powerful solution for these use cases.
Next steps
In part 2 of this series, the scraped results will be persisted to a database.