AI agent isolation in development: a safer way to run generated code

An agent in the terminal looks like a teammate, but it runs as a process

A familiar scene: you open a repository, ask an AI agent to fix a failing test, and it confidently proposes a plan. At first it feels like talking to a capable teammate: inspect a file, edit a function, install a package, run checks. Technically, though, this is not a teammate. It is a process that receives access to the file system, the command line, the network, and sometimes secrets.

That is where advice turns into action. Wrong text in a chat can be ignored. A wrong command can delete temporary files, rewrite a lockfile, pull in an unwanted dependency, call an external service, or accidentally print a token in logs. Docker’s article on AI agent isolation describes this shift as the move from tools that answer to tools that act. For developers, the practical conclusion is simple: not every agent action should run in the same place where everything valuable lives.

Step one: the agent wants to change code

The lowest-risk scenario is an agent reading files and proposing a patch. Even here, a boundary helps. If the task touches one module, the agent does not need freedom to rewrite the whole repository. A good working model is a separate branch, a clean working tree, a clear list of allowed directories, and a mandatory diff review after the change.

What can go wrong? The agent may fix a symptom while breaking a neighboring scenario. It may rename a variable more broadly than needed. It may change a formatter configuration because that makes the test easier to pass. For code edits, the host can often be acceptable, but only when version control is in place, important local changes are not mixed in, and the agent cannot automatically commit or push the result.

The anti-pattern is starting an agent in a repository with a dirty working tree and later trying to figure out which edits were yours and which were generated. Before starting, either save your own changes separately or create a disposable environment from a clean copy.

Step two: the agent asks to install a package

Installing dependencies is much riskier. It can change a lockfile, run postinstall scripts, contact the network, and add code you have not reviewed. If the agent says, “let’s add a small library, it will be faster,” that is not automatically wrong. But it is no longer just editing a file.

For this class of action, a container or sandbox is a better default. A container is useful when you need to reproduce the project environment without polluting the host. A sandbox or microVM is more appropriate when the command comes from an agent, the dependency is new, or you are not sure installation scripts are safe.

A practical boundary looks like this: the host may read the repository and run familiar local commands; a container may install dependencies and run normal tests; a sandbox or microVM should handle uncertain packages, unfamiliar scripts, code generators, and commands that touch system settings. Secrets should not automatically enter that environment. If a test does not require a real token, use a fake value. If it does require one, prefer narrow temporary access instead of copying a developer’s main key.

Step three: the agent runs tests and commands

Tests feel safe because the command is familiar: npm test, pytest, go test. But a test suite can still write to the file system, open ports, create a database, reach the network, or execute package manager scripts. In a normal project, that may be expected. With an AI agent, the question is different: do you understand exactly which command it is running and what permissions that command has?

If the command is known and already belongs to the project, running it on the host after review may be fine. If the agent composed a long shell chain by itself, move it into a disposable environment. Be especially careful with commands that delete files, change permissions, install packages globally, access the Docker socket, use SSH, call cloud CLIs, or read environment variables that contain secrets.

Network access should also be split by need. A linter does not need the internet. Unit tests often do not either. Integration tests may need a local database, but not necessarily access to the whole internet. The less network a command can see, the lower the chance that a mistake or malicious dependency can send data outside.

Anti-patterns that get expensive quickly

The first anti-pattern is giving an agent the same permissions a developer has during daily work. A person has context, remembers local agreements, and may stop before running a strange command. An agent can sound confident even when it does not understand the consequences.

The second is keeping secrets next to the execution environment. If the agent can see .env, SSH keys, or cloud provider tokens, every command mistake becomes more dangerous. Secrets should be passed only where they are actually needed, preferably through short-lived access.

The third is treating a container as absolute protection. A container is very useful, but its strength depends on configuration: mounted directories, network access, privileges, sockets, and environment variables. If a container can see the home directory and the system Docker socket, the isolation boundary becomes thin.

Decision model: where to run the action

For daily work, a simple rule helps. Code reading, small patches, and familiar checks can run on the host if the working tree is clean and a human reviews the diff. Dependency installation, file generation, and full test runs are better placed in a container. Uncertain code, new installation scripts, networked commands, and agent experiments belong in a sandbox or microVM. For tasks where the result can be easily discarded, the best option is a disposable environment: create it, let the agent work, keep the useful patch, delete everything else.

This is not panic, and it is not a ban on AI tools. It is the opposite: the clearer the boundaries, the more confidently you can delegate routine work. An agent can be a fast assistant, but it should be run like uncertain code with access to real resources.

Sources

Docker Blog: Why AI Agents Need Isolation

Quick checklist

define which files the agent may modify and which it may only read
avoid running dependency installation and uncertain tests directly on the host
do not pass secrets into the agent environment unless there is a specific need
restrict network access for commands that should not reach the internet
prepare a disposable working copy for experiments
review the diff, lockfile, and new scripts before merging changes

Prompt Pack: access boundary review for an AI agent

You are helping prepare a safe execution plan for an AI agent in a software repository. Inputs: - project type and stack; - which files the agent should read and modify; - which commands the agent wants to run; - whether secrets, APIs, a database, or network access are required; - what is valuable on the host: working files, keys, local databases, SSH access; - whether a disposable environment can be created. Tasks: 1. Split actions into safe for the host, suitable for a container, suitable only for a sandbox or microVM, and requiring manual execution. 2. Explain the risk of each group in plain language. 3. Propose minimum permissions for the file system, network, and secrets. 4. Give a short rollback plan if the agent damages files or installs unwanted dependencies. Output format: - decision table: action / environment / permissions / risk / verification; - list of anti-patterns; - final pre-run checklist.