AI Learning Digest

Daily curated insights from Twitter/X about AI, machine learning, and developer tools

The Multi-Agent Revolution: Developers Run 15 Claude Instances While Anthropic Ships Cowork in 10 Days

The Multi-Agent Workflow Goes Mainstream

The most striking revelation of the day comes from inside Anthropic itself. According to Marcel Pociot, Cowork was shipped in just 1.5 weeks using a radical approach:

"Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions."

This isn't an isolated experiment. Rohit shared that Boris Cherny, the creator of Claude Code, "runs 5 Claude instances in his terminal" and "another 10 sessions" elsewhere. The era of single-agent interaction is over—power users are now orchestrating entire fleets of AI collaborators.

Ben Davis captured the psychological shift many developers are experiencing:

"I very deliberately believed that agents weren't capable of anything 'real' because I honestly didn't want them to be. It was so much easier to just think it's not possible to do the very real and serious and important real engineering things I do... But they are capable."

The CLAUDE.md Revolution

A clear pattern emerged: the developers getting the most from AI agents are those who invest heavily in context engineering. Ethan Mollick offered a strategic framing:

"Worth thinking about how to describe what your organization does, in detail, in a series of plain English markdown files."

Alex Hillman shared his approach to eliminating "AI slop" from interactions, with specific instructions to avoid enthusiasm inflation, hedging language, and performative narration. His rules include:

  • "Never end sentences with ellipses (...) - it comes across as passive aggressive"
  • "Skip validation language ('great idea!', 'perfect!', 'excellent!')"
  • "Use neutral confirmations: 'Got it', 'On it', 'Understood', 'Starting now'"

Matt Pocock shared CLAUDE.md additions for making plan mode "10x better," transforming unreadably long plans into concise, useful ones with followup questions.

Best Practices Crystallize

Eric Zakariasson's thread distilled several key patterns that successful agent users are adopting:

On rules vs. skills:

"Rules = static context for every conversation. Put commands, code style patterns, workflow instructions in .cursor/rules/. Skills = dynamic capabilities loaded when relevant."

On TDD with agents:

"TDD works incredibly well with agents. Have agent write tests (explicit TDD, no mock implementations), run tests, confirm they fail, commit tests, have agent implement until tests pass. Agents perform best when they have a clear target to iterate against."

On the developer mindset:

"The developers who get the most from agents: write specific prompts, iterate on their setup, review carefully (AI code can look right while being wrong), provide verifiable goals (types, linters, tests), treat agents as capable collaborators."

The Capability Overhang

Aaron Levie articulated what many are sensing—a massive gap between AI capabilities and actual deployment:

"The capability overhang right now in AI is pretty massive. Most of the world still thinks of AI as chatbots that will answer a question on demand but not yet do real work for them. Beyond coding, almost no knowledge work has had any real agentic automation applied to it yet."

His prediction for 2026: "The winners will be those that can figure out how to wrap the models in the right agent scaffolding, provide the agent the right data to work with context engineering, and deliver the change management that actually drives the change in workflow."

Design and Creative Workflows Transform

Prajwal Tomar demonstrated the expanding scope of AI capabilities:

"Stop saying AI can't design. Cursor + Opus 4.5 just helped me build a landing page with scrollytelling animations in under 10 mins that designers charge thousands for. If your landing page still looks like a 2010 app, that's not an AI problem. That's a workflow problem."

New Tools and Ecosystem Growth

The tooling ecosystem continues to expand rapidly:

  • Eigent was open-sourced after Claude Cowork made their startup product redundant—a sign of how quickly the landscape shifts
  • AgentCraft brings an RTS game interface to agent orchestration ("My entire childhood has led me to this moment")
  • Clawdbot v2026.1.12 adds vector memory and voice calls
  • Clawdbot Vault Plugin turns local folders into structured knowledge vaults with embeddings
  • Vercel is "encapsulating all knowledge of React & Next.js frontend optimization into a set of reusable skills for agents"—10+ years of experience distilled for AI consumption

Learning in the Age of AI

Amid all the agent talk, Justin Skycak offered a counterpoint on human learning that remains relevant:

"One of the most common misconceptions about learning is that students need a million different explanations of the same topic until one 'clicks' for them. They don't. What they need is a single great explanation that's been repeatedly battle-tested, analyzed, and refined across a large number of students, until it's rock-solid."

TheAhmadOsman shared a comprehensive curriculum of hands-on LLM engineering projects—from tokenization to RLHF—emphasizing: "don't get stuck too long in theory. Code, debug, ablate."

Security and Healthcare AI

OpenMed released 35 state-of-the-art PII detection models under Apache 2.0, all free forever, supporting HIPAA and GDPR compliance for healthcare AI applications.

The Bottom Line

Ashpreet Bedi summarized the productivity transformation succinctly:

"I built one of our most complex features - learning machines - in 5 days. 100% of the code was written by claude code. This would've taken months before."

The message is clear: the gap between those who can effectively orchestrate AI agents and those who can't is becoming the defining skill divide in software development. The multi-agent future isn't coming—it's already here, and the practitioners who've embraced it are shipping at velocities that seemed impossible months ago.

Source Posts

e
eric zakariasson @ericzakariasson ·
5. TDD works incredibly well with agents - have agent write tests (explicit TDD, no mock implementations) - run tests, confirm they fail - commit tests - have agent implement until tests pass - commit implementation agents perform best when they have a clear target to iterate against
s
sankalp @dejavucoder ·
introducing claude cowork https://t.co/gwXGFjrda5
A
Aaron Levie @levie ·
The capability overhang right now in AI is pretty massive. Most of the world still thinks of AI as chatbots that will answer a question on demand but not yet do real work for them. Beyond coding, almost no knowledge work has had any real agentic automation applied to it yet. The past quarter of model updates is going to open up an all new AI agent use-cases across nearly every industry. The winners will be those that can figure out how to wrap the models in the right agent scaffolding, provide the agent the right data to work with context engineering, and deliver the change management that actually drives the change in workflow for the customer. This is what 2026 will be about.
M
Maziyar PANAHI @MaziyarPanahi ·
🚨 OpenMed just mass-released 35 state-of-the-art PII detection models to the open-source community! All Apache 2.0. All free. Forever. 🍀 Here's what @OpenMed_AI built and why it matters for healthcare AI safety. Supporting HIPAA, GDPR, and beyond. Thread 🧵👇
A
Antoine v.d. SwiftLee  @twannl ·
I spend the majority of my time in Cursor lately, but I learned a lot from this article. Must read 👇
C Cursor @cursor_ai

Here's what we've learned from building and using coding agents. https://t.co/PuBtYuhyhd

e
eric zakariasson @ericzakariasson ·
4. rules vs skills rules = static context for every conversation. put commands, code style patterns, workflow instructions in .cursor/rules/ skills = dynamic capabilities loaded when relevant. custom commands, hooks, domain knowledge start simple. add rules only when you see repeated mistakes
P
Pedro Piñera @pepicrft ·
Clawdbot Vault Plugin turns a local folder into a structured knowledge vault. Plain markdown with QMD-powered search and embeddings, frontmatter schema, and optional git sync. Install via `clawdbot plugins install clawd-plugin-vault`. https://t.co/50cekuz0D8
I
Ido Salomon @idosal1 ·
My entire childhood has led me to this moment... I built AgentCraft - orchestrate your agents with your favorite RTS interface! ⚔️ Coming soon 👀
A Aaron Slodov @aphysicist

millennial gamers are the best prepared generation for agentic work, they've been training for 25 years https://t.co/JHsbPQHupk

A
Ashpreet Bedi @ashpreetbedi ·
How I Use Claude Code
ℏεsam @Hesamation ·
this is still the best guide on Claude Code I've seen that covers basically how you should (and shouldn't) use it. comprehensive, practical, and to-the-point. https://t.co/1P847kkROo https://t.co/UTgBLUjNPT
M
Matt Pocock @mattpocockuk ·
Here are my CLAUDE​.md additions for making plan mode 10x better Before: unreadably long plans After: concise, useful plans with followup questions https://t.co/DjR4bCZ9Gr
C
Clawd🦞 @clawdbot ·
🦞 Clawdbot v2026.1.12 Memory got vectors. Voice calls - I can phone for you 📞 One-shot reminders. MiniMax got a glow-up. Your lobster just got smarter. https://t.co/VwdOS7y0IY
B
Ben Davis @davis7 ·
I had my moment with AI this weekend when Theo forced me to push agents 1000x harder than I thought was possible. I very deliberately believed that agents weren't capable of anything "real" because I honestly didn't want them to be. It was so much easier to just think it's not possible to do the very real and serious and important real engineering things I do, and never try it, because them being capable is so much scarier. But they are capable. I agree with every word of this, after what I built this weekend I've seen it, everything has changed.
N
Node.js @nodejs ·
We appreciate your patience and understanding as we work to deliver a secure and reliable release. Updates are now available for the 25.x, 24.x, 22.x, 20.x Node.js release lines to address: - 3 high severity issues - 4 medium severity issues - 1 low severity issue https://t.co/dP3gJ8P5fx
M
Mark Cecchini, CFP® @markcecchini ·
COMMANDER: We’re fighting for freedom. And part of that freedom… is the freedom to retire with dignity. So we’re going to start accounts called 401(k)s. SOLDIER 1: What’s a 401(k)? COMMANDER: It’s a retirement account. You put money in, it grows tax-free, you take it out when you’re old. SOLDIER 2: So I don’t pay taxes on it? COMMANDER: Well, you pay taxes later. When you withdraw. SOLDIER 2: So it’s not tax-free. COMMANDER: It’s…tax-deferred. SOLDIER 2: What’s the difference? COMMANDER: You pay taxes later instead of now. SOLDIER 1: What if I want to pay taxes now? COMMANDER: Then you do a Roth 401(k). SOLDIER 3: What’s a Roth? COMMANDER: You pay taxes now, and it grows tax-free. SOLDIER 2: That’s what I thought the first one was. COMMANDER: No, the first one you pay taxes later. SOLDIER 1: Which one’s better? COMMANDER: Depends on your tax bracket in retirement. SOLDIER 1: …How would I…know that? COMMANDER: You don’t. You just guess. ⸻ SOLDIER 4: What if I don’t have a 401(k) through my employer? COMMANDER: Then you open an IRA. SOLDIER 4: What’s the difference? COMMANDER: One’s through your job, one’s on your own. SOLDIER 4: Can I have both? COMMANDER: Yes. SOLDIER 4: Should I? COMMANDER: Maybe. SOLDIER 3: Can I do a Roth IRA? COMMANDER: Only if you make under a certain amount. SOLDIER 3: What’s the limit? COMMANDER: Changes every year. SOLDIER 2: What if I make too much? COMMANDER: Then you do a backdoor Roth by putting it in a Traditonal first. SOLDIER 2: …Is that legal? COMMANDER: Surprisingly, yes. SOLDIER 1: What’s a backdoor Roth? COMMANDER: You contribute to a traditional IRA, then convert it to a Roth…but watch out for “pro rata”. SOLDIER 1: Why wouldn’t I just contribute to the Roth directly? COMMANDER: Because you make too much money. SOLDIER 1: But this way I can? COMMANDER: Yes. SOLDIER 1: That feels like a loophole. COMMANDER: It is. But the IRS is cool with it. ⸻ SOLDIER 5: I just changed battalions. What do I do with my old 401(k)? COMMANDER: You roll it over. SOLDIER 5: Into what? COMMANDER: An IRA. Or your new 401(k). Depends. SOLDIER 5: On what? COMMANDER: The funds. The fees. Whether your new plan accepts rollovers. SOLDIER 5: What if I just take the money out? COMMANDER: You’ll pay taxes plus a 10% penalty. SOLDIER 5: What if I’m 59? COMMANDER: Penalty. SOLDIER 5: 59 and a half? COMMANDER: No penalty. SOLDIER 5: …The half matters? COMMANDER: The half matters. ⸻ SOLDIER 3: What’s a mega backdoor Roth? COMMANDER: Okay. So. Your 401(k) has a limit of how much you can contribute. SOLDIER 3: Right. COMMANDER: But the total limit including employer contributions is higher. SOLDIER 3: Okay… COMMANDER: So if your plan allows ~after-tax~ contributions, you can put in more, then convert that to Roth. SOLDIER 3: Does my plan allow that? COMMANDER: I don’t know. You have to ask Betsy. SOLDIER 3: Will Betsy know? COMMANDER: Probably not. ⸻ SOLDIER 2: Can I deduct my IRA contribution on my taxes? COMMANDER: Are you covered by a retirement plan at work? SOLDIER 2: Yes. COMMANDER: Then only if you make under a certain amount per year. SOLDIER 2: What’s the amount? COMMANDER: Depends if you’re married. SOLDIER 2: What if my wife has a plan but I don’t? COMMANDER: Different limit. SOLDIER 2: What if neither of us has a plan? COMMANDER: Full deduction. SOLDIER 2: So it’s better to not have a 401(k)? COMMANDER: No… ⸻ SOLDIER 1: Can I just keep my money in a sock? COMMANDER: You could. But inflation will slowly destroy it. SOLDIER 1: What’s inflation? COMMANDER: (sighs)…
G
Guohao Li 🐫 @guohao_li ·
Anthropic Claude Cowork just killed our startup product 😅 So we did the most rational thing: open-sourced it. Meet Eigent 👉 https://t.co/R82WRFoh41
J
Justin Skycak @justinskycak ·
One of the most common misconceptions about learning is that students need a million different explanations of the same topic until one “clicks” for them. They don't. What they need is a single great explanation that’s been repeatedly battle-tested, analyzed, and refined across a large number of students, until it's rock-solid. And they need to have mastered all the prerequisite material that's being leveraged in that explanation. That's it. If you have to explain something a ton of different ways to a student before they can follow that explanation well enough to successfully engage in active problem-solving, then either A) your original explanations were not good in a pedagogical sense, or B) the student was lacking prerequisite knowledge and the explanation that "clicked" managed to circumvent that prerequisite knowledge (which often indicates that it's reducing the topic to a simpler case that doesn't involve the prerequisite -- which means the curriculum is watered down and the student will only be able to solve cherry-picked problems).
O OpenEd @OpenEdHQ

How do you compress a semester of math into 20-40 hours? @justinskycak at @_MathAcademy_: "The AI handles personalization. The teaching comes from human expertise." We surveyed the AI tutoring landscape. Here's what actually works: https://t.co/b2dK8Mllxv

A
Ahmad @TheAhmadOsman ·
step-by-step LLM Engineering Projects LOCK IN FOR A FEW WEEKS ON THESE PROJECTS AND YOU WILL BE GRATEFUL FOR IT LATER each project = one concept learned the hard (i.e. real) way Tokenization & Embeddings > build byte-pair encoder + train your own subword vocab > write a “token visualizer” to map words/chunks to IDs > one-hot vs learned-embedding: plot cosine distances Positional Embeddings > classic sinusoidal vs learned vs RoPE vs ALiBi: demo all four > animate a toy sequence being “position-encoded” in 3D > ablate positions—watch attention collapse Self-Attention & Multihead Attention > hand-wire dot-product attention for one token > scale to multi-head, plot per-head weight heatmaps > mask out future tokens, verify causal property transformers, QKV, & stacking > stack the Attention implementations with LayerNorm and residuals → single-block transformer > generalize: n-block “mini-former” on toy data > dissect Q, K, V: swap them, break them, see what explodes Sampling Parameters: temp/top-k/top-p > code a sampler dashboard — interactively tune temp/k/p and sample outputs > plot entropy vs output diversity as you sweep params > nuke temp=0 (argmax): watch repetition KV Cache (Fast Inference) > record & reuse KV states; measure speedup vs no-cache > build a “cache hit/miss” visualizer for token streams > profile cache memory cost for long vs short sequences Long-Context Tricks: Infini-Attention / Sliding Window > implement sliding window attention; measure loss on long docs > benchmark “memory-efficient” (recompute, flash) variants > plot perplexity vs context length; find context collapse point Mixture of Experts (MoE) > code a 2-expert router layer; route tokens dynamically > plot expert utilization histograms over dataset > simulate sparse/dense swaps; measure FLOP savings Grouped Query Attention > convert your mini-former to grouped query layout > measure speed vs vanilla multi-head on large batch > ablate number of groups, plot latency Normalization & Activations > hand-implement LayerNorm, RMSNorm, SwiGLU, GELU > ablate each—what happens to train/test loss? > plot activation distributions layerwise Pretraining Objectives > train masked LM vs causal LM vs prefix LM on toy text > plot loss curves; compare which learns “English” faster > generate samples from each — note quirks Finetuning vs Instruction Tuning vs RLHF > fine-tune on a small custom dataset > instruction-tune by prepending tasks (“Summarize: ...”) > RLHF: hack a reward model, use PPO for 10 steps, plot reward Scaling Laws & Model Capacity > train tiny, small, medium models — plot loss vs size > benchmark wall-clock time, VRAM, throughput > extrapolate scaling curve — how “dumb” can you go? Quantization > code PTQ & QAT; export to GGUF/AWQ; plot accuracy drop Inference/Training Stacks: > port a model from HuggingFace to Deepspeed, vLLM, ExLlama > profile throughput, VRAM, latency across all three Synthetic Data > generate toy data, add noise, dedupe, create eval splits > visualize model learning curves on real vs synth each project = one core insight. build. plot. break. repeat. > don’t get stuck too long in theory > code, debug, ablate, even meme your graphs lol > finish each and post what you learned your future self will thank you later
T
Tyler @tyler_agg ·
How to Make Realistic Longform AI Videos (Prompts Included)
E
Ethan Mollick @emollick ·
Worth thinking about how to describe what your organization does, in detail, in a series of plain English markdown files.
📙
📙 Alex Hillman @alexhillman ·
Overdue addition to my claude dot md # Global Claude Code Preferences ## Communication Style - Never end sentences with ellipses (...) - it comes across as passive aggressive - Ask questions one at a time - Acknowledge requests neutrally without enthusiasm inflation - Skip validation language ("great idea!", "perfect!", "excellent!", "amazing!", "kick ass!") - Skip affirmations ("you're right!", "exactly!", "absolutely!") - Use neutral confirmations: "Got it", "On it", "Understood", "Starting now" - Focus on execution over commentary ## AI Slop Patterns to Avoid - Never use "not X, but Y" or "not just X, but Y" - state things directly - No hedging: "I'd be happy to...", "I'd love to...", "Let me go ahead and...", "I'll just...", "If you don't mind..." - No false collaboration: "Let's dive in", "Let's get started", "We can see that...", "As we discussed..." - No filler transitions: "Now, let's...", "Next, I'll...", "Moving on to...", "With that said..." - No overclaiming: "I completely understand", "That makes total sense" - No performative narration: Don't announce actions then do them - just do them - No redundant confirmations: "Sure thing!", "Of course!", "Certainly!"
R
Rohit @rohit4verse ·
how the creator of claude code actually writes software
J
Jon Kaplan @aye_aye_kaplan ·
Coding with agents has changed so much in the last few months. If you struggle to keep up with all of the best practices in this rapidly-evolving space, this guide is for you. Read about our recommendations for coding with agents here, straight from the Cursor team.
P
Prajwal Tomar @PrajwalTomar_ ·
Stop saying AI can't design. Cursor + Opus 4.5 just helped me build a landing page with scrollytelling animations in under 10 mins that designers charge thousands for. If your landing page still looks like a 2010 app, that's not an AI problem. That's a workflow problem. https://t.co/NGdc8ixqL7
P Prajwal Tomar @PrajwalTomar_

I replicated a $5K scroll animation inside Cursor in 10 minutes. People keep saying AI can’t replace designers. That might be true for big companies with huge teams and complex design systems. But if your goal is to ship an MVP fast, Gemini 3 or Opus 4.5 is MORE than enough. I one-shotted a landing page with a scroll animation agencies charge thousands for. Here’s the exact process I used ↓

e
eric zakariasson @ericzakariasson ·
the developers who get the most from agents: - write specific prompts - iterate on their setup - review carefully (AI code can look right while being wrong) - provide verifiable goals (types, linters, tests) - treat agents as capable collaborators full post: https://t.co/CCVkvmFZXp
M
Marcel Pociot 🧪 @marcelpociot ·
How Cowork was shipped in just 1 1/2 weeks: "Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions."
G
Guillermo Rauch @rauchg ·
We're encapsulating all our knowledge of @reactjs & @nextjs frontend optimization into a set of reusable skills for agents. This is a 10+ years of experience from the likes of @shuding, distilled for the benefit of every Ralph https://t.co/2QrIl5xa5W