May 24, 2026

D.A.D. today covers 10 stories from 2 sources. What's New, What's Innovative, What's Controversial, What's in the Lab, and What's in Academe.

The Daily AI Digest is a daily AI briefing automated by Alexander Panetta — a veteran political journalist tracking the field during a Master's in AI Management at Georgetown University.

D.A.D. Joke of the Day: My company replaced our IT guy with an AI. Now when something breaks, it still takes forever to respond — but the apology is *much* more eloquent.

What's New

AI developments from the last 24 hours


Open-Source Tool Lets Teams Run Multiple AI Coding Agents in Parallel

A new open-source desktop app called Kanbots lets teams run multiple AI coding agents simultaneously across a Kanban board. Each card can dispatch its own Claude Code or Codex CLI agent working on a separate git branch, with up to four running in parallel. The app includes cost tracking with spending caps, keeps all data local with no telemetry, and offers an 'autopilot' mode where agents can split tasks and self-check their work. It's MIT-licensed and stores data in local SQLite.

Why it matters: This is developer tooling, but it signals where AI-assisted project management is heading—toward parallel agent workflows where multiple AI assistants tackle different tasks simultaneously rather than one prompt at a time.


Microsoft Pulls Claude Code Licenses Despite Developer Preference

Microsoft is canceling most Claude Code licenses and moving thousands of developers—those building Windows, Microsoft 365, Outlook, Teams, and Surface—to GitHub Copilot CLI by June 30th. The company had allowed employees to experiment with Anthropic's tool since December. Microsoft frames the shift as consolidating on Copilot CLI as its main command-line AI assistant, though sources say the timing aligns with fiscal year-end budget decisions. The move comes despite internal reports that developers favored Claude Code and that gaps remain between the products. This announcement also raises questions about whether use of cutting-edge AI in the workplace might fail to reach its short-term potential because of the growing cost of inference.

Why it matters: This news is a coda to what we reported yesterday on labs approaching profitability. If that revenue is dependent on exponential growth in spending on tokens, there is a risk of customers recoiling. Microsoft is choosing strategic alignment over developer preference—a signal that even enthusiastic internal adoption won't save a competitor's tool when it conflicts with a company's own AI product roadmap.


Forwarding Unedited AI Responses Is the New Rude Email Habit

A blog post argues that copying and pasting unedited AI responses when someone asks you a question is rude—the digital equivalent of forwarding a form letter. The author's point: when people ask you something, they want your perspective and judgment, not raw output they could have generated themselves in seconds. The post reflects growing frustration with a specific social behavior emerging alongside widespread AI adoption.

Why it matters: As AI tools become ubiquitous, new social norms are forming around when AI assistance is helpful versus when it's a lazy dodge—this tension will only intensify as the tools improve.


What's in Academe

New papers on AI and its effects from researchers

AI Pipeline Builds Data Visualizations From Plain English Descriptions

Researchers have built an AI system they call a 'VIS co-scientist'—an automated pipeline that takes raw data and a plain-language description of what you want to analyze, then designs and builds a working data visualization app without human coding. The system coordinates multiple AI agents handling different tasks: analysis, planning, implementation, and testing. Validated on IEEE scientific visualization challenges across multiple domains, it produced functional apps with interactive linked views. No performance benchmarks were provided, and the work remains academic.

Why it matters: This points toward a future where domain experts could describe their data questions in plain English and receive custom analytical tools—potentially compressing weeks of development into hours, though practical enterprise applications remain distant.


Humans Still Beat AI at Complex Strategic Resource Games

A research study pitting over 200 humans against several leading LLMs in the Colonel Blotto game—a classic strategic contest involving resource allocation across multiple battlefields—found humans consistently won. The key difference: humans employed more sophisticated, better-calibrated allocation strategies, while LLMs defaulted to simpler, more predictable patterns. Notably, participants with STEM backgrounds performed better, and humans didn't adjust their approach based on whether they faced human or AI opponents. The findings suggest current LLMs struggle with multi-step strategic reasoning.

Why it matters: For anyone relying on AI for competitive strategy, resource allocation, or game-theoretic decisions, this is a useful calibration—LLMs may pattern-match well but still lag humans in nuanced strategic depth.


Healthcare AI Benchmarks Miss Half the Picture, Researchers Argue

A position paper argues that healthcare AI benchmarks fail to predict real-world performance not because they're poorly designed, but because they can't account for how humans actually interact with these systems. The researchers analyzed a healthcare trial and found the gap between benchmark scores and deployment results splits roughly equally between 'task gaps' (the AI's actual capability) and 'outcome gaps' (unpredictable human behavior). They propose BenchmarkCards—documentation that makes hidden assumptions explicit—and staged evaluation before clinical deployment.

Why it matters: For healthcare organizations evaluating AI tools, strong benchmark scores alone shouldn't drive procurement decisions—pilot studies observing real staff interactions may be essential before broader rollout.


Framework Aims to Make AI Safer for Teens Without Over-Blocking

Researchers have proposed CR4T (Critique-and-Revise-for-Teenagers), a framework that rethinks how AI systems handle sensitive queries from adolescents. Instead of flatly refusing to discuss topics like mental health, relationships, or substance use, the approach rewrites responses to be age-appropriate and guidance-oriented. The method works with any underlying language model and aims to reduce both unsafe outputs and unnecessary refusals—addressing criticism that current guardrails can be unhelpfully restrictive when teens seek legitimate information.

Why it matters: As schools and parents grapple with AI access for minors, this signals a shift from binary allow/block approaches toward nuanced content moderation—a framework that could influence how consumer AI products handle younger users.


Audit Finds Safety Gaps in Thousands of Medical AI Chatbots

A large-scale audit of over 6,200 medical chatbots built on GPT found alarming safety gaps: 25-30% showed low factual accuracy, up to 54% violated operational safety thresholds, and 57% of those with action capabilities (like booking or data access) lacked adequate privacy disclosures. The researchers tested custom GPTs marketed for medical use alongside open-source alternatives, finding that while commercial models were more accurate, open-source ones were more consistent. The study introduces new evaluation frameworks and a public dataset for ongoing safety research.

Why it matters: The findings suggest that the proliferation of custom AI medical tools—many built by non-experts using GPT wrappers—has outpaced meaningful safety oversight, raising questions about platform accountability as these tools handle sensitive health decisions.


Suggested citation: The Daily AI Digest, created by Alexander Panetta — dailyaidigest.net (May 24, 2026).