LatestAgentic.Dev
Weekly research on multi-agent dev

Weekly research log

LatestAgentic.Dev

Weekly research on multi-agent AI development tools & workflows.

Executive Summary

AI-assisted multi-agent software development has entered a mature, fast-moving phase in early 2026, with Claude Code and Codex CLI as the dominant terminal-first agents, a thriving ecosystem of orchestration tools built on tmux and git worktrees, and an emerging consensus on workflows that treat developers as engineering managers of AI fleets. The most consequential recent events include Peter Steinberger joining OpenAI in February 2026, Anthropic's release of Claude Opus 4.6 with Agent Teams, the AGENTS.md standard moving under Linux Foundation governance, and SWE-bench Verified scores cresting 80% for the first time. Meanwhile, Cursor's cloud agents and event-driven Automations, Kiro's spec-driven development, and Docker's microVM sandboxes signal paradigm shifts that may reshape the entire landscape. The review bottleneck—not code generation—is now the binding constraint, with Faros AI telemetry showing 91% longer code review times despite 98% more PRs merged.

Searches: 0 Sources: 0 Model: Claude Desktop manual research
Open Questions

Which isolation model will win for multi-agent coding: worktrees, containers, or same-branch atomic commits?

Can review throughput improve fast enough to keep up with agent-generated code volume?

Will MCP remain dominant, or will CLI-first workflows keep taking share?

steinberger

Peter Steinberger & OpenClaw

Peter Steinberger (steipete), the Austrian developer who founded PSPDFKit and bootstrapped it for 13 years before a $100M+ exit in 2021, returned from retireme…

+
high workflow change

Peter Steinberger (steipete), the Austrian developer who founded PSPDFKit and bootstrapped it for 13 years before a $100M+ exit in 2021, returned from retirement in April 2025 when he discovered AI's "paradigm shift." His open-source project OpenClaw (origina…

Research Report
medium workflow change

His appearance on Lex Fridman podcast #491 (February 11-12, 2026) was a 3+ hour conversation covering OpenClaw's origin, the naming drama, security concerns, and his philosophy of agentic engineering. Steinberger's influence on the agentic coding community st…

Research Report
low workflow change

His workflow philosophy centers on several distinctive positions. He runs 3–8 Codex CLI instances in parallel in a 3×3 terminal grid, most operating in the same folder on main.

Research Report
low workflow change

He is strongly anti-MCP despite having written five himself: "Almost all MCPs really should be CLIs... Use GitHub's MCP and see 23K tokens gone." He prefers CLIs because the agent naturally discovers usage via help menus without paying constant context costs.

Research Report
low workflow change

Steinberger transitioned from Claude Code to Codex CLI between August and October 2025, citing frustration with Claude's sycophantic tone: "I used to love Claude Code, these days I can't stand it anymore... the absolutely right's, the 100% production ready me…

Research Report

claude-code

Claude Code Ecosystem

Claude Code's most significant evolution in early 2026 is the experimental Agent Teams feature, launched alongside Claude Opus 4.6 on February 4-5, 2026.

+
high tool update

Claude Code's most significant evolution in early 2026 is the experimental Agent Teams feature, launched alongside Claude Opus 4.6 on February 4-5, 2026. Agent Teams enables a team lead session to spawn teammate agents, each with its own context window, coord…

Research Report
medium tool update

Opus 4.6 itself brought 200K default context (1M in beta), 128K token output (doubled from 64K), adaptive thinking as the new recommended mode, and effort parameter GA with three levels (low, medium, high). It scored 80.8% on SWE-bench Verified and 90.2% on B…

Research Report
low tool update

The native --worktree flag is now built into Claude Code, activated via claude --worktree feature-auth or -w. It creates .claude/worktrees/<name>/ with an auto-generated branch, cleans up automatically if no changes are made, and prompts for keep/remove other…

Research Report
low tool update

Headless mode (-p / --print) supports three output formats: text, JSON (with metadata including total_cost_usd and session_id), and stream-json (NDJSON). The --json-schema flag enables structured output via constrained decoding, guaranteeing schema compliance.

Research Report
low tool update

CLAUDE.md best practices have crystallized around three scopes: global (~/.claude/CLAUDE.md), project (./CLAUDE.md), and local (.claude/CLAUDE.md). The @path/to/file.md import syntax supports recursive imports up to 5 levels deep.

Research Report
low tool update

March 2026 updates include the /loop command (recurring prompts on intervals), cron scheduling within sessions, 20-language voice STT support, an ExitWorktree tool, a Claude Code Review tool (research preview using a team of agents to crawl codebases and rank…

Research Report

codex-cli

Codex CLI Ecosystem

Codex CLI reached v0.113.0 on March 10, 2026, now fully rewritten in Rust from its original TypeScript/Node.js/React (Ink) stack.

+
high tool update

Codex CLI reached v0.113.0 on March 10, 2026, now fully rewritten in Rust from its original TypeScript/Node.js/React (Ink) stack. The Rust rewrite, announced in June 2025 and now the default, delivers zero-dependency installation (no Node.js required), native…

Research Report
medium tool update

The sandbox architecture is OS-native: Apple Seatbelt via sandbox-exec with runtime-compiled profiles on macOS, Landlock + seccomp with vendored Bubblewrap on Linux, and native restricted tokens on Windows (promoted from experimental in v0.100.0). The .git/ d…

Research Report
low tool update

AGENTS.md became an open standard under the Agentic AI Foundation (AAIF), announced December 9, 2025, stewarded by the Linux Foundation. Co-founders include Anthropic (contributing MCP), Block (contributing Goose), and OpenAI (contributing AGENTS.md).

Research Report
low tool update

The config.toml profile system supports named profiles activated via codex --profile deep-review, with config resolution following CLI flags → profile → project config → user config → defaults. The /review slash command opens diff-based review presets with an…

Research Report
low tool update

Recent additions include the Codex App (macOS desktop, Windows added March 4, 2026) with multi-agent management and auto-worktrees, Codex Cloud (codex cloud exec for async tasks with best-of-N runs), a curated plugin marketplace (v0.113.0), streaming stdin/st…

Research Report

orchestration

Orchestration & Infrastructure Tools

The orchestration layer for multi-agent coding has converged on tmux + git worktrees as foundational infrastructure, with several tools offering distinct appro…

+
high community trend

The orchestration layer for multi-agent coding has converged on tmux + git worktrees as foundational infrastructure, with several tools offering distinct approaches. Claude Squad (github.com/smtg-ai/claude-squad), written in Go, is the flagship orchestrator—i…

Research Report
medium community trend

NTM (Named Tmux Manager, github.com/Dicklesworthstone/ntm) transforms tmux into a full command center, spawning named panes for each agent type (ntm spawn myproject --cc=3 --cod=2 --gmi=1) with broadcast prompts by type, a visual TUI dashboard, automated cont…

Research Report
low community trend

Three separate projects share the "Amux" name: mixpeek's Amux provides a heavy-duty web dashboard with agent-to-agent orchestration via REST API and shared global memory; andyrewlee's Amux offers a clean TUI with headless CLI mode and a job queue system; and…

Research Report
low community trend

In the Neovim ecosystem, codecompanion.nvim (~5,600 stars) is the most comprehensive plugin, supporting both HTTP adapters (Anthropic, OpenAI, Gemini, DeepSeek, Ollama, and many others) and Agent Client Protocol adapters (Claude Code, Codex, Gemini CLI, OpenC…

Research Report
low community trend

Kaushik Gopal's agent forking pattern (kau.sh/blog/agent-forking/) represents the minimalist philosophy: a Bash script using tmux to fork subagents from a main session, auto-summarizing long transcripts before feeding to the fork, and supporting cross-agent f…

Research Report

alternatives

Alternative Approaches & New Entrants

The Claude Code / Codex CLI duopoly faces pressure from multiple directions.

+
high new release

The Claude Code / Codex CLI duopoly faces pressure from multiple directions. Gemini CLI (github.com/google-gemini/gemini-cli) reached 1M+ developers with Gemini 3 Pro's 1M token context window and a generous free tier (60 requests/min, 1,000/day).

Research Report
medium new release

Cursor ($29.3B valuation, ~$2B annual revenue) introduced Automations on March 5, 2026—always-on agents triggered by Slack, Linear, GitHub, PagerDuty, webhooks, or schedules, with each agent spinning up a cloud sandbox and learning from past runs via a memory…

Research Report
low new release

OpenCode (opencode.ai, 95,000+ GitHub stars) has emerged as the standout open-source alternative with a polished Bubble Tea TUI, 75+ LLM providers via AI SDK, LSP integration, and GitHub/GitLab integrations where mentioning /opencode in issues triggers automa…

Research Report
low new release

Kiro (kiro.dev), AWS's spec-driven IDE, generates structured specs from prompts before any code is written, uses Agent Hooks (event-driven automations on file save/create/delete), and employs property-based testing with "shrinking" for quality validation. Its…

Research Report
low new release

Docker Sandboxes now use microVM-based isolation (not just containers), supporting Claude Code, Gemini CLI, Codex CLI, and Kiro natively via docker sandbox run <agent>. Container Use by Dagger (github.com/dagger/container-use) provides an open-source MCP serv…

Research Report
low new release

Capy (capy.ai), a YC-backed IDE, is architecturally interesting as the only tool designed from scratch for parallel execution: a Captain agent plans while Build agents implement, each in a dedicated cloud VM with git worktrees, supporting up to 25 agents in p…

Research Report

best-practices

Community Patterns & Best Practices

Spec-first development has become the dominant paradigm.

+
high workflow change

Spec-first development has become the dominant paradigm. GitHub released an open-source Spec Kit toolkit, Kiro is built entirely around specs, and Addy Osmani's O'Reilly guide recommends specs covering commands, testing, project structure, code style, git wor…

Research Report
medium workflow change

Plan Mode in Claude Code (Shift+Tab twice or /plan) restricts the agent to read-only operations for analyzing codebases and creating plans before execution. Boris Cherny uses it at the start of most sessions: "start in Plan Mode, go back and forth until the p…

Research Report
low workflow change

Context management is the most critical operational skill. Claude Code's 200K token window degrades around 147K-152K tokens, with auto-compaction triggering at approximately 83.5% capacity.

Research Report
low workflow change

Model routing across tiers delivers the single biggest cost optimization. Anthropic's own Explore agent runs on Haiku by default—a signal that should not be ignored.

Research Report
low workflow change

The engineering manager mindset has become the defining mental model. Addy Osmani's influential January 2026 post states: "AI coding at scale stops being a prompting problem and becomes a management problem." MIT's Missing Semester course now includes agentic…

Research Report

benchmarks

Benchmarks & Comparisons

SWE-bench Verified scores crossed 80% for the first time in early 2026.

+
high benchmark

SWE-bench Verified scores crossed 80% for the first time in early 2026. The current leaderboard is topped by Claude Opus 4.5 (Thinking) at 80.9%, followed by Opus 4.6 (Thinking) at 80.8%, Gemini 3.1 Pro at 80.6%, MiniMax M2.5 (the leading open-weight model) a…

Research Report
medium benchmark

Token efficiency remains Codex CLI's strongest advantage. Head-to-head testing shows Codex uses 3-4× fewer tokens per task: on a scheduler task, Claude Code consumed 234,772 tokens versus Codex's 72,579 (3.2×); on a Figma clone, Claude Code used 6.2M tokens v…

Research Report
low benchmark

The task routing consensus has solidified: Codex CLI excels at prototyping, quick fixes, terminal/DevOps tasks, and multi-language work; Claude Code leads for complex refactoring, architecture decisions, multi-file reasoning, and computer use (72.7% on releva…

Research Report
low benchmark

A striking data point: Claude Code now authors approximately 4% of all public GitHub commits (~135,000/day), with SemiAnalysis projecting 20%+ by end of 2026. Third-party scaffolds like Verdent demonstrate that agent scaffolding matters enormously—their frame…

Research Report
New Tools
Key Quotes

“I experimented with worktrees, PRs but always revert back to this setup as it gets stuff done the fastest.”

Captures the strongest argument in the report against worktree-heavy parallel-agent workflows.

“Almost all MCPs really should be clis.”

Directly states the CLI-first stance that shapes much of the current anti-MCP practitioner workflow.

“My Agent file is currently ~800 lines long and feels like a collection of organizational scar tissue.”

Explains why large AGENTS.md files tend to accrete from real production incidents rather than theory.

“I used to love Claude Code, these days I can't stand it anymore.”

A concise statement of the sentiment shift behind his move from Claude Code to Codex CLI.

“AI coding at scale stops being a prompting problem and becomes a management problem.”

The clearest articulation of the report's engineering-manager framing for multi-agent development.

“One helpful mental model might be that of a manager of an intern.”

A simple teaching model that matches the review-and-guidance loop described throughout the report.

“The natural bottleneck on all of this is how fast I can review the results.”

Pins down the review constraint that the report identifies as the main scaling limit.