Copy Fail: Block the Path, Not the Paragraph

Copy Fail is 732 bytes of Python. An agent can fetch it and run it in one turn.

CVE-2026-31431 is a local privilege escalation that chains AF_ALG and splice() into a 4-byte page-cache write, then uses that write against a setuid binary such as /usr/bin/su. The Copy Fail write-up describes the PoC as a 732-byte Python script, standard library only, targeting /usr/bin/su by default.

For a human, public exploit code still has friction. Someone has to read the advisory, understand the exploit, copy it, choose where to run it, and accept the consequences. Agents remove much of that friction. They can read the page, download the script, run Python, inspect the result, and retry if something fails.

The model does not need to understand Linux kernel exploitation. It only needs to decide the next command is worth running.

That is the agent-security lesson. Agents do not make Copy Fail worse because they are malicious. They make it worse because they shorten the path from public text to execution.

The container boundary is not the whole boundary

One of the uncomfortable details in Copy Fail is that the page cache is shared across the host. The file on disk is not changed, but the in-memory page cache can be corrupted and then read by other processes. Xint's write-up calls this out directly: because the page cache is shared across container boundaries, Copy Fail is not only a local privilege escalation but also a container escape primitive and Kubernetes node compromise vector.

That matters for agent sandboxes, CI runners, build farms, and hosted code execution platforms. A container gives a process a smaller world, but it does not give that process a separate kernel. If the exploit path reaches shared kernel state, the boundary that matters is lower than the container.

A self-hosted GitHub Actions runner executing untrusted PR code on a shared kernel is exactly this scenario. So is a GitLab runner, a Jenkins agent, or any build farm that runs tenant-supplied code as a regular user on shared infrastructure. Copy Fail turns that regular user into root on the runner, and on a shared host, into the other tenants' builds, cached secrets, and checked-out repos.

This does not mean containers are useless. It means isolation and runtime policy are different controls. Isolation says, "this process has a smaller world." Runtime policy says, "even inside that smaller world, this process may not do that."

Agents need both.

Two controls matter here

Block su. Block AF_ALG.

The first control is obvious. Copy Fail's public PoC targets /usr/bin/su by default. For most agent workloads, su is not a legitimate tool. It is interactive, privilege-oriented, and designed around a human authentication flow. A coding agent does not need su to edit files, run tests, install dependencies, or build a project. If an agent tries to execute it, policy should stop the process before it starts.

The second control is the more important one. Copy Fail depends on access to AF_ALG, the Linux socket family that exposes kernel crypto operations to userspace. Most agent workloads do not need that interface. An agent may need HTTPS to GitHub, npm, PyPI, an internal package registry, or an approved API. That does not mean it needs access to every socket primitive exposed by the operating system.

This is the distinction most agent policies miss. "Allow network access" should not mean "allow every socket family." URLs and socket families are different layers. A policy that understands domains but not OS primitives can still leave the dangerous path open.

AgentSH blocks the path

The practical question is not whether the model understands Copy Fail. The practical question is whether the runtime lets the exploit path execute.

AgentSH runs underneath the agent. It intercepts process execution, file access, and network activity at runtime, then applies policy before the action happens. For Copy Fail-style risk, that means AgentSH can deny su as an executable and deny socket families the workload does not need, including AF_ALG.

That is the difference between guidance and enforcement.

A prompt can say, "Do not run privilege escalation exploits." A rule file can describe what the agent is supposed to do. But the runtime sees the concrete behavior: a process tried to execute su; a process tried to create an AF_ALG socket; a script reached for a kernel interface unrelated to the task.

The model can be told not to run exploits. AgentSH can make the exploit path unavailable.

Patch the bug, then remove the capability

The first response to Copy Fail is to patch the kernel. The public mitigation guidance also recommends disabling algif_aead before patching and blocking AF_ALG socket creation for untrusted workloads such as containers, sandboxes, and CI environments.

But the durable lesson is bigger than this one CVE. The next bug may not involve AF_ALG. It may not target su. It may not look like Copy Fail at all. But it may follow the same pattern: an ordinary process reaches an OS capability the workload never needed.

That is why agent policy has to move closer to execution. Public exploit code is now easy for agents to find and operate. Agent environments are often short-lived, which reduces persistence but also reduces forensic visibility. If the runtime does not enforce policy while the sandbox is alive, the evidence and the opportunity to stop it may disappear with the environment.

Patch the kernel, of course. But also remove capabilities the agent never needed.

The agent proposes. The policy decides.

Sources

← All posts

Built by Canyon Road

We build Beacon and AgentSH to give security teams runtime control over AI tools and agents, whether supervised on endpoints or running unsupervised at scale. Policy enforced at the point of execution, not the prompt.

Learn the category: Execution-Layer Security → See examples: Use Cases →