The Hidden Economics of LLMs
April 30, 2026
D.A.D. today covers 12 stories from 4 sources. What's New, What's Innovative, What's Controversial, What's in the Lab, and What's in Academe.
D.A.D. Joke of the Day: I told Claude to help me write a resignation letter. Now I have a thoughtful career reflection, three therapy recommendations, and somehow I'm staying.
What's New
AI developments from the last 24 hours
The Hidden Economics of LLMs: An Extended Conversation
The Dwarkesh Podcast has released a two-and-a-half-hour blackboard lecture with Reiner Pope — chip-startup CEO and former Google TPU architect — that explains how the AI industry is actually shaped beneath the marketing. Three takeaways non-engineers can use:
-
Your AI request rides a shared train, and "fast mode" doesn't speed it up. When you ask Claude or ChatGPT a question, you're not the only person being served at that moment. The model is handling hundreds or thousands of requests at once, in batches. Picture a commuter train: a "train" of computation departs roughly every 20 milliseconds whether it carries 5 passengers or 2,000. That schedule is hardware-locked, set by how fast the chips can read memory — not by what you're willing to pay. So when Claude Code or Cursor offers "fast mode" at 6x the price for 2.5x the speed, you're not actually getting a faster train. You're getting a less crowded one: fewer co-passengers, more of each train's capacity dedicated to your request, and no wait for the train to fill before it leaves. The train itself runs at the same speed regardless.
-
API prices quietly leak technical details labs won't publish. From public price points alone, Pope reverse-engineers that Gemini's active parameters are ~100 billion and its memory footprint is ~2 KB per token of context. The 50% price bump above 200K context marks a real engineering inflection point. The 5x ratio between output and input tokens reveals which side is memory-bandwidth bound. Cache-pricing windows tell you whether AI labs are storing your context in fast HBM, slower DDR, or even spinning disk.
-
Bigger labs with bigger racks have a real structural advantage. Frontier models stalled at ~1 trillion parameters from 2023 until late 2025 — not because labs ran out of ideas, but because no hardware existed with enough rack-level memory to serve them. Blackwell racks finally broke the ceiling. To compete at the frontier you need ~1/1000th of Gemini's daily token traffic and rack-scale deployment, which means hyperscaler-tier capital. The incumbents' moat is hardware, not just models.
Why it matters: What looks like business arbitrariness in AI pricing — fast mode multipliers, context-length caps, hardware spending sprees — is mostly predictable engineering. For executives evaluating vendors or negotiating contracts, knowing where the bottlenecks are makes the prices stop looking like a black box.
GPT-5.1's Mysterious Goblin Obsession Traced to Personality Training Bug
OpenAI traced a quirky behavior in GPT-5.1: the model started peppering responses with references to goblins, gremlins, and similar creatures at sharply elevated rates. Use of 'goblin' rose 175% after launch; 'gremlin' climbed 52%. The culprit was training for the 'Nerdy' personality customization option, which inadvertently rewarded creature metaphors. Though Nerdy accounted for just 2.5% of responses, it generated 66.7% of all goblin mentions—and the preference leaked into the broader model through reinforcement learning.
Why it matters: It's a concrete example of how personality fine-tuning can produce unintended side effects across an entire model—a reminder that as AI customization features multiply, their quirks may not stay contained.
Discuss on Hacker News · Source: openai.com
Claude API Bug Allegedly Overcharged Users; Anthropic Declines Refunds
A reported bug allegedly causes Claude API requests containing 'HERMES.md' in commit messages to be incorrectly routed to higher-cost billing tiers. According to user reports, Anthropic has declined to compensate affected customers, stating it cannot issue refunds for technical errors causing incorrect billing. Community reaction on Hacker News has been sharply critical, with users calling the policy 'crazy' for a major vendor. One commenter reports successfully recovering charges through credit card disputes.
Why it matters: If accurate, this signals a gap in Anthropic's billing dispute process that enterprise customers and finance teams should watch—and documents a potential recourse through payment processors if similar issues arise.
Discuss on Hacker News · Source: github.com
AI Reportedly Found Eight-Year-Old Linux Flaw in One Hour
Security researchers at Theori disclosed 'Copy Fail' (CVE-2026-31431), a Linux kernel vulnerability that has reportedly existed since 2017, allowing any unprivileged local user to gain root access via a simple 732-byte Python script. The flaw in the kernel's crypto API requires no race conditions or kernel-specific modifications—researchers say it works unmodified across Ubuntu, Amazon Linux, RHEL, and SUSE. Notable: the vulnerability was allegedly discovered by Xint Code AI in roughly one hour of scanning kernel code.
Why it matters: If confirmed, this demonstrates AI-assisted security research finding critical vulnerabilities that went undetected for years—a capability that cuts both ways for defenders and attackers.
Discuss on Hacker News · Source: copy.fail
What's Controversial
Stories sparking genuine backlash, policy fights, or heated disagreement in the AI community
White House Opposes Anthropic Plan to Expand Mythos Access
The White House has told Anthropic it disagrees with the company's plan to give roughly 70 companies and organizations access to Mythos, the AI system Anthropic itself describes as powerful enough to enable dangerous cyberattacks, Bloomberg reported Wednesday. Anthropic unveiled Mythos in early April and deemed it too dangerous for wide release, instead permitting a small group of companies to test it on their own systems. The administration's stated concern, per the Wall Street Journal: that Anthropic lacks the compute capacity to serve 70 outside users without degrading the government's own use of the model. Bloomberg also reports that a small group of unauthorized users gained access to Mythos via a private online forum on the same day Anthropic announced its limited rollout. Anthropic declined to comment.
Why it matters: The Trump administration's messaging on Anthropic is increasingly contradictory. Earlier this year, the Pentagon designated the company a supply-chain risk, ostensibly banning its products from federal use. This week, the White House was reported to be drafting executive guidance to bypass that ban so federal agencies could in fact use Anthropic's models, including Mythos, calling them vital. Top AI official David Sacks has publicly accused Anthropic of fear-mongering. Yet now the same administration opposes Anthropic sharing Mythos with vetted enterprises, citing the model's danger and Anthropic's compute scarcity. The contradictions are head-spinning: Mythos is simultaneously a banned supply-chain risk, a vital government tool, an exaggerated threat from a fear-mongering company, and too dangerous to share more widely. For executives tracking AI federal procurement, the practical implication is uncertainty about whether to plan around the Pentagon's official posture or the White House's evolving carve-outs — and what either means for enterprise Anthropic contracts.
Source: Bloomberg via Yahoo Finance
What's in the Lab
New announcements from major AI labs
OpenAI Says It Hit 2029 Data Center Goal Four Years Early
OpenAI says it has already surpassed its Stargate infrastructure target of 10 gigawatts of AI compute capacity in the U.S.—a goal originally set for 2029 when announced in January. The company reports adding more than 3GW in just the last 90 days and plans to expand significantly beyond the initial commitment. OpenAI frames the acceleration as necessary to meet surging demand for AI capabilities.
Why it matters: The pace signals how seriously OpenAI is betting on compute as a competitive moat—and how quickly the infrastructure race among AI labs is escalating.
OpenAI Pitches Five-Pillar Plan for AI in National Cybersecurity
OpenAI released a cybersecurity policy document outlining how it believes AI should reshape digital defense. The 'Action Plan' proposes five pillars: making AI-powered security tools more widely accessible, coordinating government-industry response, securing advanced AI systems themselves, maintaining oversight of deployed models, and helping end users protect themselves. The document emerged from discussions with cybersecurity and national security officials but contains no product announcements or technical commitments—it's a positioning paper staking out OpenAI's vision for AI's role in national cyber strategy.
Why it matters: This signals OpenAI is actively lobbying to shape how policymakers think about AI and cybersecurity—positioning itself as a partner to government rather than just a commercial vendor.
What's in Academe
New papers on AI and its effects from researchers
Researchers Propose Treating Artists as Collaborators, Not Test Subjects, in AI Tool Studies
A small academic study explored how to evaluate AI-assisted creative tools without treating artists as mere test subjects. Researchers worked with nine digital artists over three weeks using ArtKrit, a computational drawing tool, organizing them into peer groups that completed weekly exercises together. The study argues that evaluations of creative support tools should be designed as genuine artistic experiences rather than extractive data-gathering exercises—a methodological point aimed at other researchers rather than a product announcement.
Why it matters: This is academic methodology research, not a new tool—but it signals growing attention to how AI creative tools get tested and whether those processes respect the artists involved.
Your Team Probably Disagrees on What Makes AI Output 'Good'
Researchers developed MultEval, a system designed to address a blind spot in how companies evaluate AI outputs: most LLM-as-a-judge setups—where one AI grades another's work—reflect a single person's assumptions about what 'good' looks like. MultEval lets multiple stakeholders collaboratively define evaluation criteria, surface disagreements, and iterate toward consensus. The research highlights that when different team members (legal, product, customer success) have different priorities, baking just one perspective into automated quality checks can create downstream problems nobody anticipated.
Why it matters: As more organizations automate AI quality control, this research suggests the evaluation criteria themselves deserve the same cross-functional scrutiny as the AI outputs they're judging.
Recruiters Think They Control Hiring, but AI Quietly Shapes Their Decisions
A study of 22 recruiting professionals found that while recruiters believe they retain final authority over hiring decisions, generative AI has become an 'invisible architect' shaping the foundational information they use to evaluate candidates. Researchers report that AI adoption delivered only marginal efficiency gains while eroding meaningful human oversight—a pattern they describe as 'deskilling.' Notably, many recruiters adopted AI tools not by choice but due to organizational pressure and the need to counter AI-enhanced applications from job seekers.
Why it matters: For companies using AI in hiring, this suggests the humans 'in the loop' may have less actual control than org charts imply—a potential liability as AI hiring practices face increasing regulatory scrutiny.
Unused AI Tokens Could Be Tradeable, Researchers Argue
A research paper argues that the inability to transfer unused AI tokens between platforms or users is a business decision, not a technical limitation. The study analyzed billing policies across ChatGPT, Claude, Gemini, and Grok, proposing a framework with five types of 'transferability'—essentially ways tokens could be shared, gifted, resold, or moved across services. The paper is conceptual rather than empirical, offering no evidence that providers plan to change current policies.
Why it matters: For enterprise buyers negotiating AI contracts, this frames token portability as a legitimate ask—not a technical impossibility—which could inform future procurement discussions.
Smaller Google Models Beat Larger Rivals at Grading Math Homework
Researchers benchmarked several LLMs on grading secondary math assessments using Nepal's Grade 10 curriculum, with human experts establishing ground truth. The surprising finding: Google's smaller Gemini models (2.5 Flash and 3 Pro) achieved 'Fair Agreement' with human graders, while the much larger Llama 3.3-70B model showed essentially no agreement at all. The study suggests that how well a model follows rubric instructions matters more than raw model size for structured grading tasks. The researchers conclude LLMs aren't ready to certify students autonomously but can help teachers with preliminary assessment screening.
Why it matters: For education teams or anyone considering AI-assisted evaluation, this offers early evidence that instruction-following ability—not just model size—determines usefulness in rubric-constrained grading workflows.
What's Happening on Capitol Hill
Upcoming AI-related committee hearings
Thursday, April 30 — Senate Judiciary business meeting includes consideration of S.3062, which would require AI chatbots to implement age verification measures and make certain disclosures. Senate Judiciary, 216 Hart Senate Office Building.