AI Engineer

January 8, 2026

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, AllHands

The Managerial Turn: Solving Tech Debt with Agent Fleets

By Robert Brennan

Date: October 2023

This summary is for engineering leaders who want to move beyond simple AI autocomplete and start deploying autonomous workforces. It details how to coordinate fleets of agents to automate the high-toil maintenance tasks that usually stall product velocity.

💡 Why is the transition from IC to "Agent Manager": the most important psychological pivot for modern devs?
💡 How do parallel agents solve: the "compounding error" problem in long-running autonomous tasks?
💡 What does a 30x improvement in: security remediation look like in a production environment?

Robert Brennan, CEO of OpenHands, argues that we are moving from AI as a tool to AI as a workforce. The core tension lies in moving past the "single agent" bottleneck to a world of coordinated orchestration.

The Orchestration Leap

"It feels like the jump I made when I went from being an IC to being a manager."

The Managerial Pivot: Developers are moving from writing lines to directing fleets. This requires a mental reorganization from execution to oversight.
Parallel Processing Power: Running agents in cloud sandboxes allows for massive scale. You can fix thousands of vulnerabilities simultaneously rather than one by one.
Cloud Sandbox Security: Local agents are risky and slow. Moving to containerized cloud environments allows for safe execution without constant babysitting.

Decomposition is the New Coding

"The goal is not to automate this process 100%. It's something like 90% automation."

Bite-Sized Batches: Agents stumble on sprawling, interconnected changes. Breaking work into PR-sized chunks ensures the AI stays within its context window.
Dependency Graph Logic: Mapping how files import each other allows for a logical order of operations. Starting with leaf nodes prevents errors from breaking the entire build.
The Human Filter: Total automation is a trap. Keeping a human in the loop for intermediate reviews prevents the accumulation of high-velocity garbage code.

The Verifier-Fixer Loop

"Just because the refactor is automated doesn't mean it needs to be unreviewed."

Programmatic Verification: Using automated tests to check agent output provides a safety net. This allows for a 30x improvement in resolving security vulnerabilities.
Context Sharing Strategies: Agents need to learn from each other without clogging the context window. Using a shared markdown file allows a fleet to stay aligned on project rules.
Scaffolding for Safety: Creating temporary bridges between old and new code allows for incremental migrations. This lets you validate the application at every step of a massive refactor.

Actionable Takeaways

🌐 The Macro Turn: Software maintenance is moving from a manual craft to an industrial process. As agents handle the toil of migrations and security, human engineers will focus entirely on high-level system design.
⚡ The Tactical Edge: Batch by Dependency. Use the OpenHands SDK to visualize your codebase as a graph and deploy agents to solve the leaf nodes first.
🎯 The Bottom Line: Companies that master agent orchestration will clear their tech debt backlogs in weeks instead of years, creating a massive competitive advantage in product velocity.

Podcast Link: Click here to listen

All right. Thank you all for joining for automating massive refactors with uh with parallel agents. Um super excited to talk to you all today about uh you know what we're doing with Open Hands to really automate large scale chunks of software engineering work. Lots of uh lots of toil related to tech debt, code maintenance, code modernization. Uh these are tasks that are super automatable. Uh you can throw agents at them, but they tend to be way too big for like you know a single just one shot. So it involves a lot of what we call agent orchestration. Uh we're going to talk a little bit about how we do that uh with Open Hands and also just more generically. Uh a little bit about me. Um my name is Robert Brennan. I'm the co-founder and CEO at Open Hands. Uh my background is in dev tooling. I've been working in open source dev tools for over a decade now. I've also been working in natural language processing for about the same amount of time. Um uh I've been really excited over the last few years to see those two fields suddenly converge as LMS are really good at writing code. Um and I'm super excited to be be working in the space. Uh then open hands is an MIT licensed coding agent. Open hands started at open dev about a year and a half ago when Devon first launched their uh demo video of a fully autonomous software engineering agent. Uh my co-founders and I saw that got super excited about you know what was possible what the future of software engineering might look like. uh but realized that that shouldn't happen in a black box, right? If our adopts are going to change, we want that change to be driven by the software development community. We want to have a say in that change. Um and so we started opens uh then open dev as a way to give the community a way to help drive what the future of software engineering might look like in an AI powered world.

So hopefully not uh controversial for me to say that software development is changing. Um, I know my workflow has changed a great deal, uh, in the last year and a half. Um, uh, I would say now, like, you know, pretty much every line of code that I write goes through an agent. Uh, rather than me opening up my IDE and typing out lines of code, I'm now asking an agent to do the work for me. I'm still, you know, doing a lot of critical thinking. You know, a lot of the the mentality of the job hasn't changed, but what the actual work looks like has changed quite a bit. Uh, but what I want to convince you all of is that it's still changing. We're still just in the first innings of this change. We still haven't realized all the um all the impact that large language models are have already brought to the job and are going to continue to bring to the job as they improve.

I would say even if you froze large language models today and they didn't get any better, you would still see the job of software engineering changing very drastically over the next two to three years as we figure out ways to operationalize the technology. Uh I think there's still a lot of uh sort of psychological and organizational hurdles to adopting uh large language models within software engineering. Um and we're seeing a lot of those hurdles disappear as time goes on. A brief history of kind of how we got here. U everything started I would say with what I call contextware code snippets. um some of the first large language models it turned out were very good at writing chunks of code especially things that they'd seen over and over again. So you could ask it to write bubble sort. Uh you could ask it for you know small algorithms you know how to how to access a SQL database things like that. Uh it was able to generate little bits of code. It was able to to you know it seemed to understand the logic a bit. But this was totally context unaware right it was just dropping code into a chat window that you had asked for. It had no idea what project you were working on what the context was.

Shortly thereafter we got these contextaware code generation. Uh so like GitHub copilot as autocomplete um was probably like the the best example here right uh so you actually was in your IDE it could see you know where you're typing you know what the what the code you're working on in uh and it could generate code that was specific to your codebase that reference you know local variable names that reference you know local table names in your database uh huge huge improvement for um uh you know our productivity so instead of copy pasting back and forth between the chat GBT window and your IDE now All of a sudden, you can see the little robot get its eyes. It can see inside your codebase and it can actually generate relevant code for your for your your codebase. And then I think the the giant leap happened in early 2024 um with the launch of Devon and then uh the next day the launch of open devon now open hands. Uh this is where we first started to see autonomous coding agents. So this is when AI started not just writing code but could run the code that it wrote and it could Google an error message that came out, find a stack overflow article, apply that to the code, add some debug statements into the code and run it and see what happens. Basically automating the entire inner loop of development.

Um this was this was a huge uh step function forward. Um you can see the little the little robot gets arms in this picture. Um this was a this was a huge jump at least at least in my own productivity. um being able to like just write a couple sentences of English, give it to an agent and let it churn through the task until it's got something that's actually working, running, tests are passing. And then now what we're seeing is uh parallel agents, what we're calling agent orchestration.

Folks are figuring out how to get multiple agents working uh in parallel, sometimes talking to each other, sometimes spinning up new agents under the hood. Um you know, agents creating agents. Um this is uh I would say kind of bleeding edge of what's possible. Um people are just starting to experiment with this are just starting to see success with this at scale but there are some uh some really good tasks that are um uh very amenable to this sort of workflow.

And it has the potential to really uh automate away a huge mountain of tech that sits under you know every contemporary software company. a little bit about kind of like the the market landscape here. Um, again, you can kind of see that same evolution from left to right where we really started with, you know, plugins like GitHub copilot inside of our existing IDEs and we got these like AI AI empowered IDEs, ids with like AI tacked onto them. Um, I would say your your median developer is kind of adopting local agents now. They may be running cloud code locally for uh one or two things. Um, maybe some ad hoc tasks. Uh your early adopters though are starting to look at cloud-based agents, agents that get their own sandbox running in the cloud. This allows uh those early adopters to run as many agents as they want in parallel. U it allows them to run those agents much more autonomously than if they were running on their local laptop, right? If it's running on your local laptop, there's nothing stopping the agent from doing rmrf slash trying to delete everything in your home directory, whatever it might do, installing some weird software. Whereas if it's got its own like containerized environment somewhere in the cloud, you can run a little bit more safely knowing that you know the worst it can do is ruin its own environment uh and um uh you don't have to like sit there babysitting it and hitting the Y key every time it wants to run a command.

So those cloud-based environments much more scalable uh a bit more secure. Um and then uh I would say at the far right here what we're really just seeing the top like 1% of early adopters uh start to experiment with is orchestration. this idea that you not only have these agents running in the cloud, but you have them talking to each other.

You're coordinating those agents, you know, on a larger task. Uh maybe those agents are spinning out sub aents within the cloud that have their own sandbox environments.

Some really cool stuff happening there. Uh I would say, you know, with open hands, we we generally started with cloud agents. Uh we've leaned back a little bit and built local CLI similar to cloud code in order to meet developers where they are today. you know these these types of experiences are much more comfortable for developers. Uh you know we've been using autocomplete for decades just got million times better with GitHub go- pilot. Um I would say these experiences on the right side are very foreign to developers. They feel very strange to like give off a pass to an agent or a fleet of agents uh and let them do the work for you. It feels kind of like uh for me at least uh the jump that I made when I went from being an IC to being a manager um is is what it feels like going from writing code myself to giving that code to agents. Uh so very very different way of working. I think one of the developers have been very slow to adopt.

Uh but again the top 1% or so of engineers that we've seen adopt the stuff on the right side of this uh landscape.

They've been able to get you know massive massive lifts in productivity and tackle huge backlogs of tech that other teams just weren't getting to.

Some examples of where you would want to use orchestration rather than a single agent.

Typically these are tasks that are going to be very repeatable and very automatable.

So some examples are things like the basic code maintenance tasks, right? Every codebase has to uh you know there's there's a certain amount of work to do to just keep the lights on, right? To keep dependencies up to date to uh make sure that any vulnerabilities get solved.

We have one client for instance that is using open hands to uh remediate CDEs throughout their entire codebase.

They have tens of thousands of developers, thousands of thousands of repositories. Um and basically every time a new vulnerability gets announced in an open source project, they have to go through their entire codebase, figure out which of their repos are vulnerable, uh submit a poll request to that codebase to uh actually uh you know resolve the CVD, update whatever dependency, fix breaking API changes.

And they have seen a 30x improvement on time resolutions for these CVDs by doing uh orchestration at scale. uh they basically have a setup now where every time an ACV gets announced, new vulnerability comes in. Uh they kick off an open hand session to scan a repo for that vulnerability.

Make any code changes that are necessary and open up a pull request and all the downstream team has to do is click merge, validate the changes.

Um you can also do this for like automating documentation and release notes.

Um there's a bunch of modernization challenges that uh companies face. Um, for instance, uh, you might want to add type annotations to your Python codebase if you're working with Python 3. Um, you might want to split your Java, you know, like a monolith into microservices.

These are the sorts of tasks that are still going to take a lot of um, thought for an engineer. You know, you can't just like one shot it with code and say like uh, you know, refactor my model if it's microservices, but it is still very real work, right? You're still just kind of like copying and pasting a lot of code around.

So if you thoughtully or trade agents together, they can do this.

Um a lot of migration stuff. So migrating from like old versions of Java to new versions of Java. We're working with one client to migrate a bunch of Spark 2 jobs to Spark 3.

Um we've uh used Open to migrate our entire front end from React uh from Redux to Zustand. U so you can do these very large migrations. Again, lots of very growth work. still takes a lot of um thinking from a human about how they're going to orchestrate these agents.

Um and there's a lot of tech that uh detecting unused code getting rid of that um you know we we have one client who's using our SDK to basically scan their data.logs every time there's a new error pattern go into the codebase and uh add error handling fix whatever problem is uh is cropping up.

Um, so lots of things that you know are a little too big for a single agent to just one shot. Um, but are super automatable are good tasks to handle with an agent as long as you're thoughtful about orchestrating them. A bit about why these aren't onestopable tasks.

Uh, some of them are technological problems, some of them are more like human psychological problems. On the technology side, you have a limited amount of context uh that you can give to the agent. So extremely long running tasks are tasks that span like a very large code base. Usually you don't really have enough there. You're going to have to uh compact that context window to the point the agent might get lost.

Uh we've all seen the laziness problem. Uh I've tried to launch out some of these types of tasks. And the agent will say, "Okay, I migrated three of your 100 services. I need to hire a team of six people to do the rest." Um uh the agents often lack domain knowledge within your codebase, right? They don't have the same intuition that you do for the problem.

Uh and errors compound when you go on these really long trajectories with an agent.

A tiny error in the beginning is going to uh you know compound over time. The agent is going to basically repeat that error over and over and over again for every single step that it takes in its task.

Uh and then on the human side uh you know we do have this intuition for the problem we can't convey. You know say you want to break your model into microservices. You probably have a mental model of how that's going to work.

Uh if you just tell the agent break the model with microservices it's just going to take a shot in the dark. based on patterns seen in the past without any real understanding of your codebase.

Uh we have some difficulty decomposing apps for agents and understanding like what agent can actually get done uh in one shot. Um uh we also like you you uh do need this intermediate review intermediate checkin from the human as the agent's doing its work. We'll talk a little bit about what that loop looks like later.

Uh but it's again not something you can just like tell an agent to do and expect the final result to come in. have to kind of approve things as the agent goes along.

Uh and then not having a true definition of them. I think uh if you don't really know what finish looks like for this project, it's hard to tell the agent.

On these types of orchestration paths, want to make it super clear that we don't expect every developer to be doing agent orchestration.

Um, we think most developers are going to use a single agent locally uh for you know sort of ad hoc tasks that are common for engineers building new features uh fixing a bug things like that. I think running quad code locally uh in a familiar environment alongside an IDE is probably going to be a common workflow at least for the next couple years.

Uh what we're seeing is that a small percentage of engineers who are early adopters of agents who are really excited about agents are finding ways to orchestrate agents to t tackle like huge mountains of tech debt at scale and get a much bigger lift in productivity for that smaller select set of tasks. Right? You're not going to see 3,000% lifted productivity for all software engineering. Probably going to get more of that, you know, 20% lift that everybody's been reporting.

uh but for some select tasks like CDE remediation or codebased modernization you can get a massive massive lift you can do you know ending your years of work in a in a couple weeks I want to talk a little bit about what these workflows look like in practice so this loop probably looks pretty familiar if you're used to working with local agents um this is very typical loop that looks a lot like the inner loop of development for you know nonI coding as well but basically you know you give the agents a prompt uh it does some work in the background. Maybe you babysit it and watch, you know, everything it's doing and hit the Y key every time it wants to run a command.

Uh then the agent finishes, you look at the output. Uh you see the tests are passing. You see if this actually satisfies uh what you asked for and then maybe you prompt the agent again to get it to get a little closer to the answer. Or maybe you're satisfied with the result. You uh you know, you commit the results and and push.

For bigger orchestrated tasks, this becomes a little bit more complicated.

Uh basically what you need to do is uh you or maybe handinhand with cloud you want to decompose your task into a series of tasks that can be executed individually by agents.

Uh then you'll send off an agent for each one of those individual tasks and you'll do one of those one of those agents for each of the individual tasks. And then finally at the end uh you maybe with the help of an agent are going to need to pull in all the output together from all those individual agents into a single change uh and merge that into your codebase.

Very importantly there's still a lot of human in the loop here.

Um you need to review not just the final output of the collated result but uh the intermediate outputs for each agent.

Um I like to tell folks the goal is not to automate this process 100%. It's something like 90% automation.

Uh that's still, you know, an order of magnitude productivity lift. Um I think this is this is really tricky to get right. This is where a lot of like thought comes into the process of like how am I going to break the tax down so that I can verify individual step uh and so that uh I can actually uh automate this whole process without just ending up with a high coded mess.

Uh this is a typical git workflow that I like to use for tasks like this.

Uh typically we'll start a new branch on our repository.

Uh we might add some high level context to that branch using like an agent or an open hand the concept of a micro agent.

Uh but I just a markdown explaining you know here's what we're doing here.

Uh just so the agent knows okay we're migrating from Redux is us andor we're going to migrate these Spark 2 jobs to Spark 3. uh you might want to put some kind of scaffolding in place.

Uh I'll talk a little bit more about examples of of uh scaffolding later.

Uh you're going to create a bunch of agents based on that on that first branch.

The idea is that they're going to be submitting their work into that branch and it's basically going to accumulate our work as we go along and then eventually once we get to the end we can rip out our scaffolding and merge that branch into main.

Uh now for uh if you're you're kind of getting started with this I would suggest limiting yourself to about three to five concurrent agents.

Uh I find more than that your brain starts to break.

Uh but for folks that have really adopted orchestration at scale uh we see them running hundreds even thousands of agents concurrently. Usually a human is not uh in the loop for you know one human is not on the hook to review every single one but maybe those agents are sending out pull requests to individual teams things like that.

Um, so you can scale up very aggressively once you start to get a feel for how all this works and you feel like you have a very good way of getting that human input into the loop. I'm going to kick it off to uh my coworker Calvin here. He's going to talk about uh a very very large scale migration uh basically u eliminating code smells from the open hands database that he did using our refactor SDK up here.

Open hands excels at solving open tasks. Give it a focused problem something like fix my failing CI add and debug this end point and it delivers. But like all agents it can stumble when the scope grows too large. Let's say I want to refactor an entire code base. Maybe enforce certifiing update with your dependency or even migrate from one framework to another. These are not tasks. They're sprawling interconnected changes that can touch hundreds of files. To battle problems at this scale, we're using the open hands agent SDK to build tools designed to specifically orchestrate collaboration between humans and multiple agents. As an example, let's work to eliminate code from the open answer. Here's the repository structure. Just the core agent definition has about 380 files uh spanning 60,000 lines of code. Says a lot about the volume of the code but not much about the structure. So let's use our new tools to visualize the dependency graph of this chunk of the repository. Here each node represents a file. The edges show dependencies who imports who. And as we keep zooming out it becomes clear this tangled web is why refactoring at scale is hard. To make this manageable, we need to break the scrap up into humanized chunks. Think PR size batches that an agent can handle a human can understand. There are many ways to bash based on what's important to you. Graph theoretic algorithms give strong guarantees about the structure of edges in between induced batches, but for our purposes, we can simply use the existing directory structure to make sure that semantically related files appear inside the same batch. Navigating back to the dependency graph, we can see that the codes of the nodes are no longer randomly distributed. Instead, they correspond to the batch that each of those associated files exist. Zooming out and zooming back in, we easily find a cluster of adjacent notes that are all the same color, which indicates that an agent is going to access all of those files simultaneously. Of course, this graph is still large and incredibly tangled. To construct a simpler view, we'll build a new graph where nodes are batches and the edges between those nodes are dependencies that are inherited from the files within each of those patches. This view is much simpler. We can see the entire structure on our screen at the same time. But this is something we have with using a graph. We can identify batches that have no redies and expect the files that go. Dispatch, for example, add 16. Looks like it's in the file. It's probably empty. Let's check. Now, this is a tool intended for human AI collaboration. So, once we know that this file is empty, we might determine that it's better to move it elsewhere. Or maybe we're okay keeping it inside this batch. And all that we want to do is add a note to ourselves or reach so we know the contents. Of course, when refactoring code, it's important to consider the complexity of what it is you're moving. This batch is trivial. Let's find one that's a little bit more complex. Here's a batch that has four files. They all do and the complexity measures reflect this. These are useful to indicate to a human that we should be more careful when this for example the first examples. You need to identify what's wrong in the first place. Enter the verifier. There are several different ways of defining the verifier based on what you care about. You consider it to be programmatic. So it calls a match command. This is useful if your verification is checking unit tests or running a lender or a text. Instead though, because I'm interested in code smells, I'm going to be using a language model that's going to be looking at the code and trying to identify any problematic patterns based on a set of rules that I provided. Now, let's go back to our first batch and actually put this verifier to use. Remember, this batch is trivial and fortunately the verifier recognizes it as such. It comes back with a nice little report indicating which person identified and didn't. And status of this batch is turned to completed green. Good. And this change in status is also reflected in the batch graph. Navigating back and toggling the color display, we can see that we have exactly one node out of many completed and the rest are still yet to be handled. But this already gives us a really good sense of the work that we've done and how it fits into the bigger picture. So now our strategy for ensuring that there are no code smells in the highly of our repository is straightforward. We just have to ensure that every single node on this batch graph turns green. So let's go back to our batches and continue verifying till we run across a failure. We'll keep going in dependency, making sure that we pick nodes that don't have any dependencies on other batches that we have yet to analyze. This next batch is about as simple as the first, but because the init file is a little bit more complex. The report that gets generated is a little bit more verbose. Continuing down the list, we come across the bash we identified earlier with some chunky files of relatively high code complexity. And this batch happens to give us our first tree later. Notice that the status turns red instead of green. Now this batch has more files than what we've seen in the past. So the verification report is proportionally longer. Looking through see that it is listing file by file. The code that is identified in which I see one file is particularly egregious with its violations. We'll have to come back to that. And if we zoom all the way back out to the bash graph and look at the status indicators, we'll see the two green nodes representing the batches we've already successfully verified. We'll also see the red representing the batch that we just saw that verification. Now, our student goal is to turn this entire graph green. This red node presents a little bit of an issue. To convert this red node into a green node, we need to address the problems that the verifier found using the next step of the pipeline, the fixer. Just like the verifier, the fixer can be defined in a number of different ways. The programmatic fixer can run a batch command or you can feed the entire batch into a language model and hope it addresses the issues in a single step. But by far the most powerful fixer that we have uses the open agent SDK to make clean copy of the code instead of an agent that has access to all sorts of tools to run tests, examine the code, look at documentation on the do whatever it needs to to address these issues. So let's go back to the scaling dash and run the fixer and see what happens. Now this part of the demo is sped up considerably, but because we're exploring these patches in dependency order, while we're waiting, we can continue to go down the list, running our verifiers, and spinning up new instances of the open agent using the SDK until we come across a node that's blocked because one of its extreme dependencies is still complete. When the fixer is done, the status of the batch is set. We'll need to rerun verification in the future to make sure the associated returns again. Looking at the report that the fixer is returned, there's not much information, just the title of the DR. We've set this up so that every fixer produces a nice tidy for request ready for human approval. Just because the refactor is automated doesn't mean it needs to be viewed. And here's the generated. and the agent does an excellent job of summarizing the code smells that identified the changes made to address those as well as any changes that they have to make. It's also less helpful for the reviewer and some notes for anybody working on this part of the code in future. And when we look at the content of this, we see it's very risky. All the changes are tightly focused on addressing the code snails that we provided earlier. And we've only modified a couple hundred lines of code, the bulk of which is simply refactoring messed block into its own function call. Not all the scope to be this small, but our batching strategy and narrow instructions ensure that the scope of the changes are well considered. This helps to improve performance, but it also will easily from here. The full process for removing code smells from the entirety of code becomes clear. Use the verify to identify problems. Use the fixer to spin up the address those problems. Review and merge those PRs. Unblock new fixes and repeat until that entire screen. We've already used this tool to make some pretty significant changes to the code including typing and improving test. And we could not have done it without the open HSDK power everything under the hood.

All right. So, that's the uh open hands refactor SDK powered by our open hands agent SDK. Uh we're going to walk through a little bit later on the workshop how to build something a little simpler but very similar where we get parallel agents working together to fix tasks that were discovered by initial agent.

Uh I want to talk a little bit about strategy for both decomposing tasks and sharing context between these agents. These are both really big important parts of agent orchestration.

Uh so effective task decomposition uh you're really looking to uh break down your very big problem into tasks that a single agent can solve, a single agent can one shot. Um something that can fit in a single commit, single pull request.

Um super super important because you don't want to be, you know, constantly iterating with each of the sub agents. You want each one, you want a pretty good guarantee that each one is just going to one-shot the thing. you'll be able to rubber stamp it and get merged into your ongoing branch.

Uh you want to look for things that can be parallelized. This is going to be a huge way to increase the uh the speed of the task. Um you know, if you're just executing a bunch of different agents serially, you might as well just have a single agent moving through the task serially. U the more you can parallelize, the more you get many agents working at once, the faster you're going to able to move through the task uh and iterate.

Um, you want things that you can verify as correct very easily and quickly. Ideally, you'll have something where you can just like look at the CI/CD status and have good confidence that if everything's green, you're good.

Uh, maybe you'll need to click through the application itself, something like that, run a command yourself to verify that things look good to you.

Uh, but you want to be able to very quickly understand whether an agent has done the work you asked it to or not. U, and you want to have clear dependencies and order in between tasks.

Uh you notice these these uh criteria are pretty similar to how you might break down work for an engineering team, right? You need to make sure that you have tasks that are maybe separable, tasks that like different people on your team can execute in parallel and then colle the results together. You want to know uh once I get task A done, then that unlocks tasks B, C, and D and then once those are done, we can do E. Um so very similar to breaking down work for a team of engineers.

Uh there are a few different strategies for breaking down a very large refactor like the one we saw challenges do.

Uh the simplest like most one is to just go piece by piece. You know you might iterate through every file in the repository, every directory, maybe every function or class. Um you know this this uh is a fairly straightforward way to do things. It works well uh if those um dependencies are can be kind of executed um you know without depending on one another too much.

Um so good examples might be like adding type annotations throughout your pipeline codebase. Um uh and then you know at the very end once you've migrated every single file say you can collect all those results into a single PR. A slightly more sophisticated thing would be to create a dependency tree.

Um and the idea here is to add some ordering to that piece by piece approach where you know you start as we saw Calvin do you start with like the leaf nodes in your dependency graph right you start with maybe your utility files get those migrated over um and then anything that depends on those you know it's going to have those those initial fixes in place and the dependencies can uh can start working through um you know their their set of the process. You can basically back your way up to whatever the entry point of the application is.

Uh this is often a a better way to proceed.

Um it's more kind of a principal approach for how you're going to order through these tasks. Another example is to create some kind of scaffolding that allows you to live in both the like pre-migrated and post migrated worlds.

Um we did this uh for example when migrating our React state management system.

Uh we basically had an agent set up uh some scaffolding that would allow us to to work with both Redux Redux and Zustand at the same time. Um pretty ugly, not something you would actually really want to do.

Um but it allowed us to test the application as each individual component got migrated from the old state management system to the new state management system.

Uh and then we sent off parallel agents for each of the components. uh I got each component done and then at the very end once everything was using zestand we were able to rip out all of the u all the scaffolding so there was no more mention of redux and everything was working but having that scaffolding in place allowed us to validate you know as each agent finished its work for just that one component we could validate the application was still working that component still works uh we didn't have to do everything all at once we got some kind of human uh feedback from the agents uh next I want to talk a bit about context sharing uh as you go through a big large scale project like this uh you're going to learn things right you're going to figure out okay what I my original mental model wasn't actually complete I didn't actually uh you know understand the problem correctly um your agents might uh run into that you know you might have a fleet of agents you got 10 agents running they're all hitting the exact same problem you kind of want to share the solution of that problem so they're not all getting stuck right there's a bunch of different strategies for doing this context sharing between agents

Uh, one strategy that I think the most naive thing you can do is share everything. Basically, every agent sees every other agent's context.

Uh, this is, uh, not great.

Uh, it's basically the same thing as just having a single agent working iteratively through the task.

Uh, you're going to leave your context window really quickly if you do something like this.

Uh, so this is this is not going to help.

Uh, a a better value approach would be to have the human being just sort of manually enter information into the agents.

Uh if you have a chat message, a chat window with each agent, you can just paste in like hey use library 1.2.3 instead of 1.2.2. Um the human can also modify like an agent MD or micro agent to pass messages to these agents.

Uh but this does involve manual human effort.

Um it

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, AllHands

The Managerial Turn: Solving Tech Debt with Agent Fleets

The Orchestration Leap

Decomposition is the New Coding

The Verifier-Fixer Loop

Actionable Takeaways

Others You May Like

Dario Amodei and Dwarkesh Patel – Exponential Scaling vs. Real World Friction

The Deflationary Singularity: Why Everything is Going to ZERO w/ Salim Ismail

What If Intelligence Didn't Evolve? It "Was There" From the Start! - Blaise Agüera y Arcas

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, AllHands

The Managerial Turn: Solving Tech Debt with Agent Fleets

The Orchestration Leap

Decomposition is the New Coding

The Verifier-Fixer Loop

Actionable Takeaways

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

Dario Amodei and Dwarkesh Patel – Exponential Scaling vs. Real World Friction

The Deflationary Singularity: Why Everything is Going to ZERO w/ Salim Ismail

What If Intelligence Didn't Evolve? It "Was There" From the Start! - Blaise Agüera y Arcas