AI Learning Digest

Daily curated insights from Twitter/X about AI, machine learning, and developer tools

The Multi-Agent Era: Developers Now Run Swarms of Claude Code Instances in Parallel

The Rise of Multi-Agent Development

The most striking trend emerging from today's posts is the normalization of running multiple AI coding agents in parallel. What started as an experimental workflow has become standard practice for power users.

"gm to all multi clauders" — @cto_junior

"how many claude codes do you run at once?" — @pleometric

Jeffrey Emanuel (@doodlestein) shared his comprehensive "Flywheel" system, describing the later stages of development as "basically mindless machine tending of your swarm of 5-15 agents." His workflow emphasizes front-loading human effort into planning while relegating implementation to parallel agent execution.

@idosal1 announced building AgentCraft v1, managing "up to 9 Claude Code agents with the RTS interface." The gaming metaphor is apt—developers are increasingly treating AI agents like units in a real-time strategy game, coordinating multiple autonomous workers toward a common goal.

The Skills Ecosystem Expands

Claude Code's skills system is gaining traction across the ecosystem:

  • Vercel released react-best-practices, installable via npx add-skill vercel-labs/agent-skills, containing performance rules and evals to catch regressions
  • Trail of Bits published 17 security skills that @koylanai praised as "the beginning of something massive"
  • @steipete's clawdbot continues to impress, with @LLMJunky describing tasks kicked off "from the Denny's parking lot" that coordinated repo research, documentation pulling, and migration planning

@koylanai made a bold prediction: "Every company with technical docs will ship Skill packages, not because it's nice to have, but because agents won't adopt your product without them. Agents (or humans) won't read docs; they execute Skills."

Platform Wars Heat Up

OpenAI: Open Responses Spec

OpenAI Developers announced Open Responses, an open-source specification for building multi-provider, interoperable LLM interfaces. Key features:

  • Multi-provider by default
  • Designed for real-world workflows
  • Extensible without fragmentation

The pitch: "Build agentic systems without rewriting your stack for every model."

GitHub: Copilot Gets Memory

GitHub Copilot now has agentic memory in public preview:

  • Learns repo details to boost agent, code review, and CLI help
  • Memories scoped to repos, expire in 28 days
  • Shared across Copilot features

Cursor: Better Bug Detection

"Cursor now catches 2.5x as many real bugs per PR." — @cursor_ai

The Planning Paradox

Jeffrey Emanuel's thread contained a critical insight that deserves attention:

"The one thing people seem to get wrong is ignoring what I say about planning or transforming their plan into beads. They make a slipshod plan all at once with Claude Code. Or they try to one-shot turning the plan into beads... Well, of course the project is going to suck and be a buggy mess if you do that."

The paradox: as AI makes coding faster, the value of human planning increases. Emanuel recommends spending "most of your energy and human time/focus on the markdown plan" and doing "at least 3 rounds of polishing, improving, and expanding" before letting agents execute.

AI Engineering's Runtime Problem

@ashpreetbedi highlighted a structural issue in the AI engineering stack:

"Claude Code shipped two years after function calling. Models have outpaced the application layer. We have frameworks to build agents, we have observability to trace them, we have evals to test them."

The article argues that the runtime layer—where agents actually execute—remains underdeveloped compared to model capabilities.

Generative Interfaces

Guillermo Rauch (@rauchg) shared a glimpse of "a world of fully generative interfaces" with the flow: AI → JSON → UI. This points toward a future where interfaces are dynamically generated rather than pre-built, raising questions about design systems, accessibility, and the role of frontend development.

Engineering Hiring in the AI Era

Mitchell Hashimoto offered a provocative take on interviewing:

"I think a really effective engineering interview would be to explicitly ask someone to use AI to solve a task, and see how they navigate. Ignore results, the way AI is driven is maybe the most effective tool at exposing idiots I've ever seen."

This suggests that AI proficiency isn't just a nice-to-have—it's becoming a core competency that reveals underlying engineering judgment.

The Humor Corner

@askOkara captured the zeitgeist:

"I saw a guy coding today. No Okara. No Cursor. No OpenCode. No Claude Code. He just sat there, typing code manually. Like a psychopath."

And Kent C. Dodds vindicated his earlier position on MCP:

"When everyone was saying MCP is doomed because context bloat, I was saying all you need is search. Feels good to have my bets validated once again."

Looking Ahead

Harrison Chase (@hwchase17) clarified LangChain's approach to agent memory: "We don't use an actual filesystem. We use Postgres but have a wrapper on top of it to expose it to the LLM as a filesystem." This abstraction pattern—making complex systems appear as familiar interfaces to LLMs—may become a key design principle.

@BlasMoros shared a prescient quote about software economics:

"LLMs have proven themselves to be remarkably efficient at [translation between human and computer language] and will drive the cost of creating software to zero. What happens when software no longer has to make money? We will experience a Cambrian explosion of software, the same way we did with content."

The multi-agent future isn't coming—it's here. The question now is how quickly the rest of the ecosystem catches up.

Source Posts

I
Ido Salomon @idosal1 ·
Building AgentCraft v1 with AgentCraft v0 is 🤌 Managed up to 9 Claude Code agents with the RTS interface so far. There's a lot to explore, but it feels right. v1 coming soon
E
Evan Boyle @_Evan_Boyle ·
@JoshXT We are working on org-scoped fine-grained PATs for higher rate limits, especially for automation/CI scenarios. More news on this soon!
A
Ashpreet Bedi @ashpreetbedi ·
AI Engineering has a Runtime Problem
L
Lee Robinson @leerob ·
Rules, commands, MCP servers, subagents, modes, hooks, skills... There's a lot of stuff! And tbh it's a little confusing. Here's what you need to know (and how we got here). https://t.co/UomcW2Y0c3
n
near @nearcyan ·
this is how i claude code now. it's fun! https://t.co/thkWyCji2S
K
Kent C. Dodds ⚡ @kentcdodds ·
When everyone was saying MCP is doomed because context bloat, I was saying all you need is search. https://t.co/LPGctd1szt Feels good to have my bets validated once again
T Thariq @trq212

Tool Search now in Claude Code

T
TDM (e/λ) (L8 vibe coder 💫) @cto_junior ·
gm to all multi clauders https://t.co/92HB27f7xF
G
Guillermo Rauch @rauchg ·
Glimpse of a world of fully generative interfaces. AI → JSON → UI: https://t.co/BKcvtDky5K https://t.co/QH6ctR1ldA
M
Mitchell Hashimoto @mitchellh ·
I'm not presently hiring, but I think a really effective engineering interview would be to explicitly ask someone to use AI to solve a task, and see how they navigate. Ignore results, the way AI is driven is maybe the most effective tool at exposing idiots I've ever seen.
F
Fahd Ananta @fahdananta ·
One of most common self sabotages in a workplace is constantly bringing up matters already decided upon Straight from the CIA handbook
F Fahd Ananta @fahdananta

How to sabotage a workplace by the CIA sounds similar to a lot of company culture manuals today https://t.co/BYAvncTr2g

M
Muratcan Koylan @koylanai ·
These 17 security Skills for Claude Code are really well-written. - Decision trees agents can actually follow - Authoritative sources with specific file paths - Nested references for deeper context My take is that this is the beginning of something massive. Trail of Bits works with DARPA and Facebook. They don't do things casually. Every company with technical docs will ship Skill packages, not because it's nice to have, but because agents won't adopt your product without them. Agents (or humans) won't read docs; they execute Skills. If you're thinking about how agent-readable knowledge should be structured or are building/leading a startup that plans to create your own Skills: I'd love to chat for 5-10 min to exchange ideas. DMs open.
D Dan Guido @dguido

.@trailofbits released our first batch of Claude Skills. Official announcement coming later. https://t.co/vI4amorZrc

C
Cursor @cursor_ai ·
Cursor now catches 2.5x as many real bugs per PR. More on how we build and measure agents for code review: https://t.co/E5GKYIchqX
J
Jeffrey Emanuel @doodlestein ·
If you don’t want to dive directly into my entire Flywheel system all at once, at least try this: 1. Install agent mail using the curl | bash one-liner: curl -fsSL "https://t.co/4cpumwIS41 +%s)" | bash -s -- --yes That will automatically install beads if you don’t already have it. Then install beads_viewer with its one-liner: curl -fsSL "https://t.co/OETEyjZZhN +%s)" | bash Then set up your AGENTS dot md file for your project. You can start with this one and just remove the sections for the tools you’re not using yet: https://t.co/UEViYk7x3Z Then ask CC to adapt it to better fit the tech stack for your particular project. That’s all you need to get started. Then follow this workflow: https://t.co/xkxAQzMPQl Try to start with a smaller, self-contained greenfield (new) project and see whether you can get it all working perfectly without looking at any of the code, just from following the workflow. Spend most of your energy and human time/focus on the markdown plan. Don’t be lazy about the plan! The more you iterate on it with GPT Pro and layer in feedback from other models, the better your project will turn out. Also don’t be lazy about turning the markdown plan into beads, either. Don’t try to one-shot it with CC, you will 100% miss stuff from the plan. This is the easiest thing to screw up assuming you already have a great markdown plan. Do at least 3 rounds of polishing, improving, and expanding the beads. Once you have the beads in good shape based on a great markdown plan, I almost view the project as a foregone conclusion at that point. The rest is basically mindless “machine tending” of your swarm of 5-15 agents as they build out the beads. It’s mostly just juggling these tasks: - Making sure to make them read AGENTS dot md after compactions. - Using many rounds of the “fresh eyes” review prompt whenever an agent tells you it’s done implementing one of the beads. - Swapping accounts when you run out of usage (ugh!). - Making sure you commit frequently to GitHub using my “logically grouped” commits prompt. - When all beads are complete, doing many rounds of the random code inspection and review. - Adding more and more unit and e2e tests. - Setting up gh actions for testing, builds, tags, releases, checksums, etc. - Writing a README and help/docs/tutorials. - Iterating on a “robot mode” (you added one, right?) with feedback from the agents to make it better. - Seeing if you can make your project work better when controlled by Claude Code by making a skill for it. But most of these things can be done using very little mental focus or attention/energy. Save all of that for the ideation and planning phases! The one thing people seem to get wrong is ignoring what I say about planning or transforming their plan into beads. They make a slipshod plan all at once with Claude Code. Or they try to one-shot turning the plan into beads. Or they even do both of those things! Well, of course the project is going to suck and be a buggy mess if you do that. So don’t be lazy. Or if you insist on being lazy, save it for the stages after planning. A great set of beads is all you need. As for the rest of my tools: Once you get comfortable with that workflow, start layering in the other tools, starting with ubs to help find bugs during the review phases. Then add in dcg. You’ll actually appreciate dcg a lot more once Claude wipes out all the work from the other agents since the last commit! As you build up a good session history, layer in cass so you can tap into that history. And then try cm (cass memory system) to start extracting and codifying lessons from your past sessions. And I know I’ve said that I don’t really use ntm yet (I’m not dogfooding it at least), but that’s not quite true. I’ve been using it as a handy building block because of its robot mode. For example, ntm is used by ru (repo_updater) to automate handling gh issues. Good luck, and come to the Discord with any questions!
C Craig Van @craigvandotcom

So what would you recommend to someone who wants to start using your stack? I don’t want to use it all at once because then I don’t really feel how it works, if I add layers as I’m comfortable then I’ll feel better. What would be the simple to complex or critical to optional setup sequence?

H
Harrison Chase @hwchase17 ·
I should have clarified in blog (but am now afk and can’t edit articles from phone, plz fix Elon): We don’t use an actual filesystem. We use Postgres but have a wrapper on top of it expose it to the LLM as a filesystem
H Harrison Chase @hwchase17

How we built Agent Builder’s memory system

O
Okara @askOkara ·
I saw a guy coding today. No Okara. No Cursor. No OpenCode. No Claude Code. He just sat there, typing code manually. Like a psychopath.
V
Vercel @vercel ·
We just released 𝚛𝚎𝚊𝚌𝚝-𝚋𝚎𝚜𝚝-𝚙𝚛𝚊𝚌𝚝𝚒𝚌𝚎𝚜, a repo for coding agents. React performance rules and evals to catch regressions, like accidental waterfalls and growing client bundles. How we collected them and how to install the skill ↓ https://t.co/kfLSbKl15X
B
Blas @BlasMoros ·
prescient "Software is expensive because developers are expensive. They are skilled translators–they translate human language into computer language and vice-versa. LLMs have proven themselves to be remarkably efficient at this and will drive the cost of creating software to zero. What happens when software no longer has to make money? We will experience a Cambrian explosion of software, the same way we did with content."
C Chris Paik @cpaik

The End of Software https://t.co/JWg6QYqLzO

P
Peter Steinberger @steipete ·
I still think https://t.co/fz1tUJADRo is a better approach. agents know really well how to handle clis.
T Thariq @trq212

Tool Search now in Claude Code

O
OpenAI Developers @OpenAIDevs ·
Today we’re announcing Open Responses: an open-source spec for building multi-provider, interoperable LLM interfaces built on top of the original OpenAI Responses API. ✅ Multi-provider by default ✅ Useful for real-world workflows ✅ Extensible without fragmentation Build agentic systems without rewriting your stack for every model: https://t.co/ZJPNDemq40
a
am.will @LLMJunky ·
@clawdbot is utterly cracked. From my phone, I had to do repo research, indexing all the migrations, edge functions, and tables I have in my supabase It then passed this context into a Codex agent which used Context7 to pull documentation to help migrate Supabase over to @convex Codex completed the plan, saved it to my repo, and it's ready for migration. Keep in mind, this is a task I could have kicked off from the Denny's parking lot. And to implement the plan? Would have been as instructing it to spin up another Codex (or Claude, Gemini, whatever). @steipete is a legend. I'm only just scratching the surface, but there's an entire library of skills that I've installed. Browser automation, remind me, deep research. It understands images. I can leave it voice memos. I don't know if this is AGI, but its about as close as you can get right now.
V
Vercel Developers @vercel_dev ·
① Install the skill: $ npx add-skill vercel-labs/agent-skills ② Paste this prompt: Assess this repo against React best practices. Make a prioritized list of quick wins and top fixes. ③ Review and prompt to "make the fixes"
A Alex Sidorenko @asidorenko_

"How can I use react-best-practices skills?" Codex example 👇 https://t.co/dUrnqOUWIu

l
luffy @0xluffy ·
i made a chrome extension that converts X articles into a speed reader. no copy pasting. just a button made with @capydotai https://t.co/uLO3ubJ0nc
O Oliur @UltraLinx

Can you read 900 words per minute? Try it. https://t.co/31ubbZWvXH

P
Pleometric @pleometric ·
how many claude codes do you run at once? gas town?👋😅 https://t.co/3WjMK2XkQT
G
GitHub Changelog @GHchangelog ·
Agentic memory for GitHub Copilot is in public preview. • Copilot learns repo details to boost agent, code review, CLI help • Memories scoped to repos, expire in 28 days, shared across Copilot features • Enable via Copilot or org settings Explore more → https://t.co/beDx97EDg3
K
Katherine Boyle @KTmBoyle ·
We’ve rejected boredom at every stage of life. Constant attention and entertainment are the norm now, so parents have become the cruise directors for their kids. Boredom is extremely important for children, and yet “good parenting” would have you believe your kids need activities and constant stimulation, an endless list of appointments and fun. What they really need is time to sit with their own thoughts, while mom and dad need time to do laundry and their taxes. We’re all trying and worrying way too much when we should let the kids be consumed by boredom. It’s how they learn to create new worlds.
T Tim Carney @TPCarney

Here's what's going on: Dads are spending more time with their kids, which is good. Moms are working more outside the home. Moms are NONETHELESS spending more of their day on *purely parenting responsibilities.* We're overparenting. It's anti-natal. https://t.co/KNXwqN0HNk