ChatGPT and Claude Come For Consulting
May 5, 2026
D.A.D. today covers 8 stories from 5 sources. What's New, What's Innovative, What's Controversial, What's in the Lab, and What's in Academe.
D.A.D. Joke of the Day: My AI assistant just apologized for a mistake it didn't make. Finally, something in this office with management potential.
What's New
AI developments from the last 24 hours
OpenAI Partners with PwC to Embed AI Agents in the Office of the CFO
Integrating AI agents into enterprise operations is the white-hot market of the moment. It is also a hard sell—most companies don't have the in-house engineering to deploy frontier AI on their own, which has turned the world's big consulting firms into the brokers of the boom. Now the labs themselves are moving into that business, and Anthropic and OpenAI announced major moves on the same day.
OpenAI and PwC announced a collaboration to build AI agents for enterprise finance teams—targeting planning, forecasting, reporting, procurement, payments, treasury, tax, and the accounting close. PwC brings the finance-transformation and implementation muscle; OpenAI brings the models and Codex. OpenAI's own finance organization is serving as "customer zero," claiming Codex now processes 5x more contracts at the same headcount and that a custom GPT tool handled 200+ investor interactions during its recent fundraise.
Why it matters: OpenAI is plugging into the world's biggest enterprises through one of the world's biggest consulting firms—a delivery-channel play, not just a product launch. Read together with the Anthropic item directly below, the story is the same on both sides: shipping the model is no longer enough. The labs now want a hand in how it gets installed.
Anthropic Co-Founds a New Enterprise AI Services Firm with Blackstone, Goldman Sachs, and Hellman & Friedman
The other half of today's same-day pair on the consulting layer. Anthropic announced it is co-founding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs—with additional backing from General Atlantic, Apollo Global Management, Sequoia Capital, GIC, and Leonard Green. The new firm's mandate: bring Claude into the core operations of mid-sized companies that lack the in-house engineering to deploy frontier AI on their own—community banks, regional health systems, mid-sized manufacturers. Anthropic's Applied AI engineers will work alongside the firm's own engineering team. The new company will also join Anthropic's existing Claude Partner Network, alongside Accenture, Deloitte, and PwC. CFO Krishna Rao framed the move as a response to demand "significantly outpacing any single delivery model."
Why it matters: OpenAI's answer to the integration-boom question was to embed deeper with one of the existing big consulting firms. Anthropic's answer is to build its own. (Yet another instance of the labs mirroring each other's announcements—witness the recent "too powerful to release" moves on new models—though here the mirroring is more about timing than substance.) The segmentation underneath is interesting: Anthropic is going after the mid-market the big firms underserve, while OpenAI rides shotgun with PwC into the Fortune 500. One important wrinkle: PwC now sits on both sides of the race—headline collaborator for OpenAI, partner-network member for Anthropic. The big consultancies aren't picking sides; they're banking both. And Anthropic's new firm is added capacity, not first contact—it joins an existing Claude Partner Network that includes Accenture, Deloitte, and PwC, which is exactly what Krishna Rao means by demand outpacing "any single delivery model."
What's Controversial
Stories sparking genuine backlash, policy fights, or heated disagreement in the AI community
The Big Shift Is Near: AIs Building Themselves, Writes Anthropic Co-Founder
Jack Clark, Anthropic co-founder and the company's policy lead, published a long essay on his Import AI Substack arguing he now sees a 60%+ probability that AI systems will be capable of "no-human-involved AI R&D"—a model autonomously training its own successor—by the end of 2028, and a 30% chance by end of 2027. He frames the view as reluctant: "I'm not sure society is ready for the kinds of changes implied." His evidence is a mosaic of public benchmarks: SWE-Bench scores have risen from ~2% in 2023 to 93.9% with Claude Mythos Preview today; METR's time-horizon plot shows AI systems jumping from 30 seconds of independent work in 2022 (GPT-3.5) to ~12 hours in 2026 (Opus 4.6); CORE-Bench (paper reproduction) is effectively solved; PostTrainBench shows AI now achieving about half the uplift of human researchers when fine-tuning open-weight models. Anthropic's own internal test of how much its models can speed up a CPU language-model training task went from 2.9× with Opus 4 in May 2025 to 52× with Mythos Preview in April 2026. Clark also flags alignment as the central risk: if your alignment technique is 99.9% accurate, that compounds to ~60% accuracy after 500 generations of recursive self-improvement. Significant pushback in the comments from researchers including Herbie Bradley, who argues this conflates the schlep components of AI research (which models can already do) with the creative-taste components (which they can't), and frames the change as a task-distribution shift rather than a phase transition.
Why it matters: This is not a fringe forecast. It's from Anthropic's co-founder and policy lead, working from public data plus presumably what he sees inside the company. Two things to weigh against the claim, though. First, the substantive disagreement: research taste and the ability to pick fruitful directions are the contested middle ground, and not everyone in the field thinks current models are anywhere close to it. Second, the incentive structure: Anthropic benefits commercially from being seen as at the frontier of capability AND politically from being seen as taking safety seriously enough to flag risks. Clark's post does both at once. That doesn't make it wrong—Clark is a careful thinker and the underlying benchmark data is real—but "Anthropic co-founder publishes alarming essay about AI's near-term capabilities" is also a market signal, not just a research one.
What's in the Lab
New announcements from major AI labs
Google Bets Big on AI Agents with New Enterprise Platform, TPUs, and Open Model
Google unveiled the Gemini Enterprise Agent Platform for business automation, eighth-generation TPUs built for agentic AI workloads, and Gemma 4—which Google calls 'byte for byte the most capable open model.' Other releases include Deep Research Max for advanced data analysis and Learn Mode in Colab for coding assistance. Google says nearly 75% of its cloud customers now use its AI services, with 330 organizations each processing over a trillion tokens in the past year. The Gemma model family has been downloaded over 500 million times since launch.
Why it matters: The rollout signals Google is betting heavily on 'agentic AI'—systems that can take actions, not just answer questions—and building dedicated infrastructure to run it, giving enterprise buyers another serious option alongside Microsoft and Amazon.
Gemini API Adds Webhooks for Long-Running Tasks
Google added webhook support to the Gemini API, letting developers receive instant notifications when long-running tasks complete instead of repeatedly checking for results. This applies to operations that can take minutes or hours—deep research queries, long video generation, and batch processing jobs. The system follows industry-standard webhook conventions with signed requests for security and will retry failed deliveries for up to 24 hours.
Why it matters: For teams building Gemini into production workflows, this is standard infrastructure that reduces server load and speeds up response times—table stakes for serious API integrations, now available.
What's in Academe
New papers on AI and its effects from researchers
Making Software Agent-Friendly Could Matter as Much as Making Agents Smarter
Researchers propose redesigning software interfaces to make them more reliable for AI agents that control computers. Their paper adapts Jakob Nielsen's classic usability heuristics—the same principles that made websites more human-friendly—for AI agents that click, type, and navigate on your behalf. Testing across controlled environments showed these design tweaks consistently improved task completion rates without making interfaces harder for humans to use.
Why it matters: As companies deploy AI agents to automate desktop tasks, this research signals that interface design—not just model capability—may determine whether those agents work reliably in practice.
Surprising Finding: Stricter AI Oversight Improved Performance and Cut Worker Fatigue
Researchers developed HAAS, a framework for dynamically dividing tasks between humans and AI in software engineering and manufacturing settings. The system combines rule-based governance with a learning algorithm that adapts collaboration modes over time. The surprising finding: in manufacturing contexts, stricter AI oversight actually improved operational performance while reducing worker fatigue—challenging the assumption that governance is purely a compliance burden. The framework treats oversight as a tunable dial rather than an on/off switch, with five collaboration modes ranging from full AI autonomy to human-led work.
Why it matters: For organizations deploying AI assistants alongside human workers, this suggests that finding the right governance level isn't just about risk management—it could be a competitive advantage.
AI Tutor Predicts Pair Programming Breakdowns 30 Seconds Before They Happen
Researchers developed ProPACT, an AI tutoring system that monitors pair programmers and predicts when collaboration is about to break down—up to 30 seconds before it happens. The system tracks where both people are looking and their cognitive load, then delivers light-touch guidance to get pairs back on track. In a study with 26 programming pairs, proactive feedback improved debugging success, task efficiency, and sustained better collaboration patterns. The approach treats collaboration itself as a teachable skill rather than just correcting code errors.
Why it matters: This is a research prototype, not a product—but it signals growing interest in AI that coaches soft skills like collaboration, not just technical tasks, which could eventually reshape how teams train and work together.
What's On The Pod
Some new podcast episodes
How I AI — The internal AI tool that’s transforming how Stripe designs products | Owen Williams