
Surviving the Rewrite - Managing Risk and AI Memory Loss in Large-Scale Development
TL;DR: I recently undertook a project that terrifies most engineers: rewriting a massive, critical infrastructure automation tool from scratch. I moved from legacy Bash to Python without writing a single line of manual code - relying entirely on AI agents. Here is how I managed the risk, the architecture, and the “memory loss” of LLMs to build a production-grade tool. The Stakes This wasn’t a simple CRUD app. This tool manages infrastructure for multiple teams. A logic error here doesn’t just throw a stack trace; it could wipe an entire environment or cause immediate customer impact. ...

