AWS re:Invent 2025 - Illumina DRAGEN pipelines on F2 instances with Nextflow & AWS Batch (CMP353)

Published: (December 5, 2025 at 12:39 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

In this session, Marissa Powers (AWS) and Sean O’Dell (AstraZeneca Centre for Genomics Research) discuss migrating Illumina DRAGEN genomics pipelines from F1 to F2 FPGA instances. The migration achieved a 62 % performance speed‑up and a 71 % cost reduction while maintaining exact equivalence in results. The architecture leverages AWS Batch, Illumina DRAGEN, and Seqera Nextflow to process thousands of samples with dynamic resource provisioning.

Why run genomics pipelines on F2 instances

AstraZeneca categorizes genomic processing into three concerns:

  1. Standardized workflows – Different modalities (e.g., exome vs. whole‑genome) require distinct pipelines, but any migration or upgrade must produce results identical to the previous version. Reproducibility is critical for drug discovery and clinical trials.
  2. Massive data volume – Petabytes of data and tens of millions of files must be stored and retrieved cost‑effectively, with a catalog to track files and downstream analytics for scientists.
  3. Spiky workload – Sample batches arrive irregularly throughout the year, requiring rapid, burst‑capacity processing rather than steady‑state operation.

Using Illumina DRAGEN on F2 instances enables fast turnaround for large batches while keeping costs low.

Equivalence testing methodology

  • Test set – Representative exome and whole‑genome samples with known reference files.
  • Command line – Identical DRAGEN commands executed on both F1 and F2 instances.
  • Software version – Same DRAGEN version for both runs.
  • Verification – Bioinformatics tools compare variant calls and DRAGEN metrics; results match exactly (aside from header timestamps).

Performance and cost results

MetricF1 (prior generation)F2 (latest generation)
Speed‑up+62 %
Cost reduction‑71 %
vCPUs (F2.6xlarge)1624 (50 % more)
Cores per chip2 chips1 chip (200 % increase)
High‑bandwidth memoryup to 16 GB

The performance gain is primarily due to the increased cores per FPGA chip, as DRAGEN processes run on a single chip. Additionally, F2 instances have a lower per‑hour price, amplifying cost savings.

Architecture overview

AWS Batch

AWS Batch is a fully managed service for orchestrating containerized HPC jobs. It handles job queuing, dependency management, and scaling of compute resources (including F2 instances) based on workload demand.

Nextflow & Seqera Platform

Nextflow provides a portable, reproducible workflow engine. Combined with the Seqera Platform, it enables dynamic provisioning of resources, automatic scaling, and seamless integration with AWS Batch.

Storage recommendations

  • Local NVMe – Preferred for intermediate data during DRAGEN processing to achieve high I/O throughput.
  • Amazon EBS – Suitable for persistent storage of final results and reference data; choose provisioned IOPS or gp3 volumes based on performance needs.

Implementation steps

  1. Define Nextflow pipeline – Include DRAGEN container image, reference data, and sample inputs.
  2. Configure AWS Batch compute environment – Use F2.6xlarge instances, enable spot pricing for cost savings, and attach appropriate NVMe storage.
  3. Set up S3 buckets – Store raw input data, reference files, and final outputs.
  4. Run equivalence tests – Execute the pipeline on both F1 and F2, compare outputs, and validate reproducibility.
  5. Scale to production – Submit large batches of samples; AWS Batch automatically provisions the required number of F2 instances, processes jobs in parallel, and de‑provisions resources when idle.

Takeaways

  • Migrating to F2 FPGA instances delivers substantial performance improvements and cost reductions for Illumina DRAGEN pipelines.
  • Exact result equivalence can be verified with standard bioinformatics validation tools.
  • Leveraging AWS Batch, Nextflow, and the Seqera Platform provides a scalable, automated solution for spiky, high‑throughput genomics workloads.
Back to Blog

Related posts

Read more »