Prompt to AI-generated binary is feasible. And is doomsday for programmers near?
Source: Dev.to
@Freeze (video)
If you google “elon musk ai will skip code machine code”, you’ll find an overwhelming number of software developers disagreeing with and criticizing this idea for various reasons. Below are a few in‑depth articles whose authors appear to have a wide range of knowledge and a deep understanding of computer science and software engineering.
- Elon Wants AI to Skip Code and Spit Out Binaries. That’s Not Progress
- I think Elon is wrong about ‘AI beats compilers’. What’s the actual technical steelman?
- Elon Says AI Will Generate Binary by 2026. Here’s Why That’s a Terrible Idea
- Elon Musk Predicts AI Will Bypass Coding by 2026: Binary Generation & Future of IT Careers – from Cloud Soft Solution, an interest group
- Is the “code + compiler” approach about to disappear?
- Elon Musk Predicts AI Will Render Coding Obsolete by 2026
My Take
I consider what Elon said to be quite feasible. I have no insider information; Elon Musk means that AI would generate machine code directly, rather than generating source code and invoking a compiler that targets a specific processor.
Current:
Code → Compiler → Binary → Execute
Future:Prompt → AI‑generated Binary → Execute
The training of Grok‑Code would have to be fundamentally different from the training of other LLM code agents that produce human‑readable source code.
Review of the History of Programming
Disclaimer
- I am not a computer scientist or AI expert, but an active full‑stack developer—hence a programmer as well.
- I am not old enough to have used plugboards, switches, punch cards, or paper tape, but I have worked with Fortran, Assembly, and PLC during university coursework for a B.Sc. in Physics.
- For my master’s thesis I wrote an M68K processor emulator in C++, interpreting M68K machine code.
Over the years I have used C / C++, Turbo Pascal, VB, Delphi, C#, and TypeScript for day‑to‑day coding. As far as I understand, high‑level languages and most principles of computer science/software engineering are designed for flesh‑and‑blood human brains, e.g.:
- High cohesion & loose coupling
- SOLID principles
- Design patterns
- Agile Manifesto, XP, DevOps, etc.
- Reusable libraries & frameworks
No matter how well you conform to those principles or how clean your source code is, it will eventually be compiled and linked into something resembling spaghetti—something human brains hate but computers do not care about.
Clean code can facilitate better compile‑time, link‑time, and run‑time optimizations; the performance boost could be up to 25 %. I think current optimization algorithms are largely written as fixed rules by programmers, and such rules tend to reward clean code.
In short, these practices exist to help human brains digest functional and technical complexity in order to deliver a working program.
Programming and LLM AI Code Agents
I presume you have a basic idea of how AI‑code‑agent vendors train their models, but I’m not sure what raw code they collect or how they label it:
- Do they simply scan all code on GitHub, SourceForge, and some commercial programs/libraries?
- Do they label code with signals about code quality?
As a programmer, I have benefited greatly from AI code agents, partly because I am poor at remembering trivial technical details. So far, I have found that when I ask AI code agents to implement a non‑trivial feature—no matter how detailed, formal, or simple the prompt—the generated source code is usually over‑bloated in design and implementation, and the line count is typically 3–5 × what it should be.
AI‑Generated Source Code and Mediocre but “Politically Correct” Hand‑Crafted Code
“3–5 ×” sounds like a magic number to me. In several commercial rewrite projects and one rewrite of an open‑source tool—using the same technical stack and language—I observed:
- My codebase was ≈ 1/3 – 1/5 the LoC of the legacy versions (which were only a few months or a year older).
- My designs were much simpler, with fewer third‑party dependencies and less fancy DI/IoC, SOLID, and design patterns.
- Runtime performance was 20 % – 50 % faster.
- The end products were more reliable and robust.
Regarding the legacy codebases:
- One tool’s author was enthusiastic about SOLID and DI/IoC but misunderstood high cohesion and loose coupling, using DI/IoC in the wrong places.
- The authors of the legacy commercial programs knew SOLID and design patterns well, yet applied them incorrectly—likely due to a misunderstanding of cohesion/coupling and introducing overly advanced designs too early.
Basically, these legacy codebases looked “politically correct” with respect to SOLID, but were simply over‑complicated and too lengthy.
How I Have Been Writing Clean and Short Code
I believe those programmers jumped directly into implementing their first workable idea without evaluating simpler alternatives, without spending enough time understanding business and technical contexts, and without following basic Agile practices.
Practices
- Start with basic, working code.
- Write plenty of unit and integration tests.
- Actively refactor in each iteration.
The purpose isn’t to make the design look elegant or impressive; it’s to deepen understanding of business and technical contexts through frequent communication with technical peers and business stakeholders.
Do I write 2–4 × more lines of code than average senior developers in my city?
No. When I lead the SDLC, I typically spend ≈ 1/3 – 1/4 of my billable hours coding (including testing), especially during the early 1/4 – 1/3 of the SDLC when architecture and design are taking shape.
The remaining time is spent thinking, studying, and communicating with stakeholders. Even if I produce the same amount of code—or less—the overall maintenance cost is dramatically reduced.
SOLID Principles (as quoted from Robert C. Martin, UML for Java Programmers)
| Principle | Statement |
|---|---|
| SRP (Single‑Responsibility) | A class should have one and only one reason to change. |
| OCP (Open‑Closed) | It should be possible to change the environment of a class without changing the class itself. |
| LSP (Liskov Substitution) | Avoid making methods of derived classes illegal or degenerate. Users of a base class should not need to know about the derived classes. |
| DIP (Dependency Inversion) | Depend on interfaces and abstract classes instead of volatile concrete classes. |
| ISP (Interface Segregation) | Give each user of an object an interface that contains only the methods that user needs. |
When to apply them
- At the first hint of pain.
- Do not try to make every system conform to all principles all the time; you’ll waste time imagining possible environments for OCP, hunting change sources for SRP, creating dozens of tiny interfaces for ISP, and inventing worthless abstractions for DIP.
Best practice: Apply SOLID reactively—when you detect a structural problem or notice a module being affected by changes elsewhere, consider whether one or more principles can help.
A reactive approach still requires proactive pressure: deliberately search for sore spots. One of the best ways to do this is to write lots of unit tests—ideally before writing the code itself (a topic for another chapter).
AI Code Agents vs. Senior Developers
| Similarities | Differences |
|---|---|
| Both tend to create advanced and complex SOLID structures with popular design patterns in advance. | When prompted progressively or via a big‑bang request, AI code agents accumulate SOLID structures rather than continuously merging and simplifying based on new prompts. |
Advantages of Prompt‑to‑AI‑Generated Binary
- The approach may avoid the code bloat caused by accumulated, complex designs that other AI code agents typically produce.
- A sufficiently powerful AI with strong hardware can handle enormous complexity without relying on traditional CS/SE techniques developed for human programmers.
- Grok Code likely has its own mechanisms to avoid code bloat, so SOLID and design patterns play near‑zero role.
One Fundamental Problem of Prompt‑to‑AI‑Generated Binary
- This approach eliminates human intervention (review, verification, validation) regarding product quality.
- Even if you disassemble the machine code into assembly, the result is extremely difficult for humans to read, even when AI adds symbolic names.
- The entire binary becomes a black box:
- Best case: it behaves as expected but with more than you bargained for.
- Worst case: it contains multiple “Pandora’s boxes” unless you fully trust AI‑based review, verification, and validation.
Areas Where Grok Code May Shine
- Pure mathematics problems and algorithms.
- Development of other AI systems based on mathematical foundations.
- Domain‑expert AI.
- Video games.
- “Time‑killing” apps.
- Advanced hacking, spam, or fraud tools (with or without AI assistance at runtime).
- …
Example: As a mathematician, you could use Grok Code to generate whole Matlab‑like libraries.
My Usual Challenge to AI Code Generators
Question: Given a complex Swagger/OpenAPI definition—such as those used by Medicare Online—can an AI code generator produce usable client libraries in C#, TypeScript, Java, and other languages?
Observation: ChatGPT and Copilot cannot (otherwise Microsoft would have released an online AI‑based code generator for this task instead of Microsoft Kiota).
Future interest: By the end of the year, will Grok Code be able to generate a client library in machine code—based on the Medicare Online OpenAPI definitions—running on Windows 11 and an Intel processor?