Automated Cloud Migrations with Kiro and the Arm MCP Server
Source: Dev.to – Automated Cloud Migrations with Kiro and the Arm MCP Server
Introduction
AWS Graviton delivers the best price‑performance of any EC2 instance on AWS, and the recent Graviton 5 announcement makes the value proposition even stronger. Migrating to Graviton is both financially sensible and a huge performance boost for your applications.
Most applications move over seamlessly, but what if you have x86‑specific optimizations in your code?
Good news: you don’t have to do the migration manually any more.
In this post I’ll show how to use Kiro (AWS’s agentic IDE) together with the Arm MCP Server to automate the entire migration process—Docker images, SIMD intrinsics, compiler flags, the whole thing.
What the Arm MCP Server Gives You
The Arm MCP Server implements the Model Context Protocol (MCP)—a standard way for AI coding assistants to tap into specialized tools. When you connect it to Kiro, you get an agent that can:
- ✅ Check Docker images for
arm64support without digging through manifests. - 🔍 Scan your codebase for x86‑specific code (intrinsics, build flags, etc.).
- 📚 Search Arm’s knowledge base for migration guidance and intrinsic equivalents.
- 🧪 Analyze assembly for performance characteristics.
Example: Migrating a Legacy Benchmarking App
Imagine you’ve inherited a legacy benchmarking application that’s tightly coupled to x86. Below is the original Dockerfile.
FROM centos:6
# CentOS 6 reached EOL – use vault mirrors
RUN sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-Base.repo && \
sed -i 's|^#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Base.repo
# Install EPEL repository (required for some development tools)
RUN yum install -y epel-release && \
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/epel.repo && \
sed -i 's|^#baseurl=http://download.fedoraproject.org/pub/epel|baseurl=http://archives.fedoraproject.org/pub/archive/epel|g' /etc/yum.repos.d/epel.repo
# Install Developer Toolset 2 for better C++11 support (GCC 4.8)
RUN yum install -y centos-release-scl && \
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo && \
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo && \
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo && \
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo
# Install build tools
RUN yum install -y \
devtoolset-2-gcc \
devtoolset-2-gcc-c++ \
devtoolset-2-binutils \
make \
&& yum clean all
WORKDIR /app
COPY *.h *.cpp ./
# AVX2 intrinsics are used in the code
RUN scl enable devtoolset-2 "g++ -O2 -mavx2 -o benchmark \
main.cpp \
matrix_operations.cpp \
-std=c++11"
CMD ["./benchmark"]
Why This Dockerfile Won’t Work on Graviton
| Issue | Explanation |
|---|---|
centos:6 | No arm64 variant – the base image is x86‑only. |
-mavx2 | An x86‑only compiler flag; Arm CPUs don’t understand AVX2. |
AVX2 intrinsics (__m256, etc.) | Won’t compile on Arm; you need Arm‑equivalent SIMD intrinsics. |
If you didn’t spot these problems yourself, that’s fine—Kiro + Arm MCP Server will.
The Problematic Matrix Multiplication Code
Below is the matrix_operations.cpp file that uses AVX2 intrinsics for double‑precision matrix multiplication. (The snippet is intentionally minimal – the full implementation contains the actual multiplication kernels.)
// matrix_operations.cpp
#include "matrix_operations.h"
#include <vector>
#include <random>
#include <iostream>
#include <immintrin.h> // AVX2 intrinsics
// ---------------------------------------------------------------------------
// Matrix constructor
// ---------------------------------------------------------------------------
Matrix::Matrix(size_t r, size_t c) : rows(r), cols(c) {
// Allocate a rows × cols matrix filled with zeros
data.resize(rows, std::vector<double>(cols, 0.0));
}
// ---------------------------------------------------------------------------
// Fill the matrix with random values in the range [0.0, 10.0)
// ---------------------------------------------------------------------------
void Matrix::randomize() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0.0, 10.0);
for (size_t i = 0; i < rows; ++i) {
for (size_t j = 0; j < cols; ++j) {
data[i][j] = dis(gen);
}
}
}
// ---------------------------------------------------------------------------
// (The multiplication routine that uses AVX2 intrinsics is omitted here)
// ---------------------------------------------------------------------------
Why This Code Fails on ARM v8 AArch64
| Issue | Explanation |
|---|---|
AVX2 intrinsics (_mm256_*) | The x86‑64 specific intrinsics are not available on ARM v8. |
| Missing include guards / headers | The original snippet omitted several standard headers, causing compilation errors. |
Incorrect std::vector initialization | std::vector(cols, 0.0) should be std::vector<double>(cols, 0.0). |
| Truncated loops / syntax errors | Parts of the randomize method were incomplete. |
Suggested Migration Path (Using the Kiro Assistant)
-
Detect the Architecture
- Kiro reads the CI configuration (
.github/workflows/*.yml) and determines that the runner isarm64.
- Kiro reads the CI configuration (
-
Identify Incompatible Intrinsics
- It scans the source for AVX2 intrinsics (
_mm256_*) and flags them as non‑portable.
- It scans the source for AVX2 intrinsics (
-
Propose a Portable Alternative
- Kiro suggests replacing AVX2 code with ARM NEON equivalents (
float64x2_t,float64x4_t, etc.) or falling back to a scalar implementation if the performance impact is acceptable.
- Kiro suggests replacing AVX2 code with ARM NEON equivalents (
-
Automated Refactor (Optional PR)
- Kiro can generate a pull request that:
- Updates the Docker/CI image to an ARM‑compatible base.
- Adjusts the build command (e.g., adds
-march=armv8-a+simd). - Substitutes the AVX2 intrinsics with NEON‑compatible code or a portable scalar loop.
- Kiro can generate a pull request that:
-
Verification
- After the changes, Kiro runs the benchmark on an
arm64runner, compares the performance against the original x86 build, and reports any delta.
- After the changes, Kiro runs the benchmark on an
TL;DR
- Problem: An x86‑specific Docker base, compiler flags, and AVX2 intrinsics prevent the code from running on Graviton (ARM) instances.
- Solution: Use Kiro + Arm MCP Server to automatically detect and replace the offending parts (e.g., AVX2 intrinsics, x86‑only pre‑processor checks).
- Result: A clean, ARM‑compatible Docker image and source code that runs on the latest Graviton 5 instances, delivering cost‑effective performance gains.
Give it a try—your future self (and your AWS bill) will thank you!
Header file matrix_operations.h
#ifndef MATRIX_OPERATIONS_H
#define MATRIX_OPERATIONS_H
#include <cstddef> // size_t
#include <vector> // std::vector
#include <iostream> // std::cout (optional, for debugging)
class Matrix {
private:
std::vector<std::vector<double>> data; // 2‑D storage
size_t rows;
size_t cols;
public:
// ctor
Matrix(size_t r, size_t c);
// fill the matrix with random values (0‑1)
void randomize();
// matrix multiplication (returns a new matrix)
Matrix multiply(const Matrix& other) const;
// sum of all elements – handy for a quick sanity check
double sum() const;
// simple accessors
size_t getRows() const { return rows; }
size_t getCols() const { return cols; }
};
// Benchmark driver – defined in a separate .cpp file
void benchmark_matrix_ops();
#endif // MATRIX_OPERATIONS_H
What Was Fixed?
| Issue | Original | Fixed |
|---|---|---|
| Missing include guard end | #endif // MATRIX_OPERATIONS_H appeared before the class definition | Moved the guard to wrap the whole file |
#include line was empty | #include | Added required headers (<cstddef>, <vector>, <iostream>) |
| Vector declaration syntax | std::vector> data; | Changed to std::vector<std::vector<double>> data; |
| Unclosed comment / stray text | "Time: " | Removed – not part of a header file |
| Function prototypes missing semicolons | none | Added ; after each prototype where needed |
Source file main.cpp
#include "matrix_operations.h"
#include <iostream>
int main() {
std::cout << "Matrix Operations Benchmark\n"
<< "============================\n";
#if defined(__x86_64__) || defined(_M_X64)
std::cout << "Running on x86‑64 architecture with AVX2 optimisations\n";
#else
std::cout << "Running on a non‑x86 architecture (e.g., ARM/Graviton)\n";
#endif
// Execute the benchmark – implementation lives in matrix_operations.cpp
benchmark_matrix_ops();
return 0;
}
What Was Fixed?
| Issue | Original | Fixed |
|---|---|---|
Missing include for <iostream> | #include (empty) | Added #include <iostream> |
#error directive prevented compilation on ARM | #error "This code requires x86-64 architecture with AVX2 support" | Replaced with a runtime message; the code now compiles on any architecture. |
| Inconsistent indentation / stray spaces | Mixed tabs/spaces | Normalised to 4‑space indentation |
Unnecessary using namespace std; (not present) – kept explicit std:: for clarity | – | No change needed |
How to Make the Project ARM‑Ready
-
Remove AVX2 intrinsics – replace them with portable SIMD libraries (e.g., xsimd or compiler‑auto‑vectorisation).
-
Update the Dockerfile
# Use an ARM‑compatible base image FROM public.ecr.aws/ubuntu/ubuntu:22.04-arm64 # Install build tools RUN apt-get update && apt-get install -y \ build-essential cmake git \ && rm -rf /var/lib/apt/lists/* # Copy source and build COPY . /app WORKDIR /app RUN cmake -B build && cmake --build build -j$(nproc) -
Compile with the appropriate flags
g++ -O3 -march=armv8.2-a+simd -std=c++20 -I. -o matrix_demo main.cpp matrix_operations.cpp -
Run the container on Graviton – the image will now start without the x86‑specific checks and will benefit from the ARM‑native SIMD extensions.
Quick Test
# Build locally (ARM host or emulated via Docker)
docker build -t matrix-bench:arm .
# Run
docker run --rm matrix-bench:arm
Expected output:
Matrix Operations Benchmark
============================
Running on a non‑x86 architecture (e.g., ARM/Graviton)
[benchmark output…]
Now the code is fully portable, and you can take advantage of the cost‑effective performance that Graviton 5 instances provide. Happy coding!
Migrating to ARM with Kiro + MCP
The code contains many x86‑specific intrinsics, but Kiro can handle the conversion automatically.
1. Configure Kiro to Connect to the ARM MCP Server
Create the file .kiro/settings/mcp.json in the project root:
{
"mcpServers": {
"arm-mcp": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-v", "/path/to/your/code:/workspace",
"armlimited/arm-mcp:1.0.1"
]
}
}
}
Notes
- The command runs the ARM MCP Server via Docker.
- Replace
/path/to/your/codewith the absolute path to your project. - Kiro will automatically pick up the new server. Verify the connection by typing
/mcpin the chat; you should seearm-mcplisted with its tools.
2. Quick Checks in the Chat
You can ask Kiro to perform ad‑hoc checks, e.g.:
Check the base image in the Dockerfile for ARM compatibility
Kiro will use the MCP tools and report that centos:6 only supports amd64.
3. Automate Migrations with a Steering Document
Create .kiro/steering/arm-migration.md:
---
inclusion: manual
---
Your goal is to migrate a codebase from x86 to ARM. Use the MCP server tools to help you with this. Check for x86‑specific dependencies (build flags, intrinsics, libraries, etc.) and replace them with ARM‑equivalent ones, ensuring compatibility and optimizing performance. Examine Dockerfiles, version files