AI Symptom Interviews Showed Higher Diagnostic Accuracy Than Clinicians Reviewing Same Dialogues in ~14,000-Participant Study
May 6, 2026
D.A.D. today covers 13 stories from 3 sources. What's New, What's Innovative, What's Controversial, What's in the Lab, and What's in Academe.
D.A.D. Joke of the Day: My AI wrote a condolence card for my coworker. It was so moving, I almost forwarded it to HR as a resignation letter.
What's New
AI developments from the last 24 hours
Germany's .de Domain Suffers Outage, Security Settings Suspected
Germany's .de top-level domain reportedly experienced an outage, with DNSSEC configuration issues suspected as the cause. The .de domain is one of the world's largest country-code TLDs, covering millions of German websites. Technical diagnostics shared online point to potential problems with cryptographic DNS security records, though details remain limited. DNSSEC outages, while rare, can make entire domain zones unreachable until administrators fix the signing chain.
Why it matters: Infrastructure failures at the TLD level can knock thousands of businesses offline simultaneously—a reminder that internet reliability depends on layers most users never see.
Discuss on Hacker News · Source: dnssec-analyzer.verisignlabs.com
Google Claims 3x Speed Boost for Open-Source Gemma Models
Google released a speed upgrade for its open-source Gemma 4 models, claiming up to 3x faster responses using a technique called multi-token prediction. The method lets the model predict multiple words at once rather than one at a time, then verify them—a trick called speculative decoding. Google says quality stays identical. A demo showed the Gemma 4 26B model running at roughly double speed on high-end NVIDIA hardware. The Gemma family has logged over 60 million downloads since launch.
Why it matters: For teams running Gemma locally or self-hosting AI, this could meaningfully cut response times and compute costs—if the 3x claim holds in real workloads.
Discuss on Hacker News · Source: blog.google
Chrome Reportedly Downloads 4 GB AI Model Without User Consent
Google Chrome is reportedly downloading a 4 GB AI model (Gemini Nano) to users' devices without explicit consent, according to a technical analysis gaining attention online. The file, which appears in a Chrome directory called OptGuideOnDeviceModel, allegedly re-downloads itself if users delete it. The author claims this violates EU privacy regulations including the ePrivacy Directive and GDPR, and estimates the practice could generate thousands of tonnes of CO2 emissions at Chrome's global scale. Google has not publicly addressed the claims.
Why it matters: If verified, this would represent a significant shift in how browser makers deploy AI capabilities—downloading large models without user approval raises both privacy and bandwidth concerns, particularly for users with metered connections or storage constraints.
Discuss on Hacker News · Source: thatprivacyguy.com
Linux Laptop Maker Promises 18-Hour Battery and 5-Year Firmware Support
System76 launched StarFighter, a 16-inch Linux laptop aimed at developers and power users who want high-end specs without Windows. The hardware includes Intel Core Ultra or AMD Ryzen 9 processors, up to 64GB of fast memory, and a 4K 120Hz display. Privacy features include a removable webcam and hardware kill switch for wireless. The company claims 18 hours of battery life and promises 5 years of firmware updates. Pricing and availability weren't specified.
Why it matters: For teams standardizing on Linux workstations—common in AI/ML work—this represents a rare premium option that doesn't require wiping Windows or compromising on specs.
Discuss on Hacker News · Source: us.starlabs.systems
Three Rules for Working With AI: Don't Anthropomorphize, Don't Blindly Trust, Stay Accountable
A proposed framework called the 'Three Inverse Laws of AI' (a nod to Asimov's Laws of Robotics) offers guidelines for humans interacting with AI systems: don't anthropomorphize AI, don't blindly trust its output, and remain fully responsible for consequences of AI use. The author argues these principles are necessary as AI services become more embedded in professional workflows. No evidence or research backs the framework—it's a normative argument about best practices rather than an empirical finding.
Why it matters: The framing inverts a classic science fiction concept to make a practical point: as AI tools proliferate, the burden of safe use falls on humans, not machines—a perspective increasingly relevant as organizations grapple with AI governance policies.
Discuss on Hacker News · Source: susam.net
What's Controversial
Stories sparking genuine backlash, policy fights, or heated disagreement in the AI community
Publishers Allege Zuckerberg Personally Approved Pirated Books for Llama Training
Five major publishers—Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage—plus novelist Scott Turow filed a proposed class-action lawsuit against Meta and Mark Zuckerberg on May 5, alleging the company illegally copied millions of books from pirate sites to train Llama. The suit claims Zuckerberg personally authorized the infringement and that Meta abandoned a potential $200 million licensing strategy at his direction. Internal communications cited in the filing allegedly show an employee warning that licensing even one book would undermine Meta's fair use defense.
Why it matters: This is the most direct legal challenge yet to AI labs' fair use arguments, and naming Zuckerberg personally—with alleged internal evidence of strategic decisions to avoid licensing—raises the stakes considerably for how courts will view AI training practices across the industry.
Discuss on Hacker News · Source: variety.com
What's in the Lab
New announcements from major AI labs
ChatGPT's New Default Model Claims 50% Fewer Errors on Complex Questions
OpenAI rolled out GPT-5.5 Instant as the new default model for all ChatGPT users, replacing GPT-5.3 Instant. The company claims significant accuracy improvements: internal testing showed 52.5% fewer hallucinated claims on high-stakes prompts (medicine, law, finance) and 37.3% fewer inaccurate responses on conversations users had flagged as problematic. OpenAI says the model is also less verbose and better at personalization, drawing on past chats, uploaded files, and connected Gmail to tailor responses.
Why it matters: If the hallucination reductions hold up in real-world use, this addresses one of the biggest barriers to trusting AI output for consequential decisions—though OpenAI's internal benchmarks deserve independent verification.
ChatGPT Opens Self-Serve Ads Platform to US Businesses
OpenAI is opening ChatGPT advertising to more businesses with a self-serve Ads Manager now in beta for US advertisers. The expansion adds cost-per-click bidding alongside existing cost-per-thousand-impressions pricing, plus measurement tools including a Conversions API and tracking pixels. OpenAI says it will provide only aggregated performance data, not individual chat content, to advertisers. The move signals OpenAI is building serious ad infrastructure—not just testing the waters.
Why it matters: This positions ChatGPT as a potential competitor to Google and Meta for digital ad dollars, while creating a new channel marketers may soon need to evaluate alongside traditional search and social.
What's in Academe
New papers on AI and its effects from researchers
Your Old Teddy Bear May Make a Better AI Companion Than Purpose-Built Agents
Researchers built Deco, a mobile app that uses multimodal AI and augmented reality to animate users' existing comfort objects—stuffed animals, figurines, keepsakes—as interactive digital companions. The surprising finding: digital agents tied to objects people already have emotional history with outperformed purpose-built AI companions on measures of emotional bond and perceived companionship (p<0.01 across metrics). A 17-person week-long field test showed sustained use and modest well-being gains.
Why it matters: This suggests AI companion products might work better by piggybacking on existing emotional attachments than by trying to manufacture new ones—a design principle that could reshape how companies approach consumer AI relationships.
All Five LLMs Tested Show Gender Bias in ER Triage Decisions
A fairness audit of five major LLMs used for emergency department triage found all failed a pre-set bias threshold, with gender-related decision flip rates ranging from 9.9% to 43.8% when patient gender was swapped in otherwise identical cases. DeepSeek showed the steepest disparity, undertriaging women at more than twice the rate of men. Notably, chain-of-thought prompting—often touted as improving AI reasoning—actually degraded accuracy across all models tested. Demographic blinding helped in some cases but didn't eliminate bias entirely.
Why it matters: Healthcare systems piloting AI triage tools face real liability and patient safety risks; this research suggests no current model is deployment-ready without rigorous per-model fairness auditing.
Bigger Medical AI Models Aren't Automatically Safer, 34-Model Study Finds
New research challenges a core assumption in healthcare AI: bigger models aren't automatically safer. Researchers tested 34 clinical LLMs and found that accuracy and safety follow different scaling laws. The key factor isn't model size—it's evidence quality. When models received clean, verified medical evidence, high-risk errors dropped from 12% to 2.6% and dangerous overconfidence fell from 8% to 1.6%. Retrieval-augmented generation helped accuracy but didn't fully close the safety gap. The team released RadSaFE-200, a radiology-specific benchmark with clinician-labeled risk categories.
Why it matters: For healthcare organizations evaluating AI tools, this suggests procurement decisions should focus less on model size and more on how systems source and verify medical evidence—a shift from 'biggest is best' to 'cleanest data wins.'
Benchmark Exposes Gaps in AI Search for Complex Reasoning Tasks
Researchers released BRIGHT-Pro, a benchmark designed to test how well AI retrieval systems handle complex, reasoning-heavy searches—the kind where you need to find evidence across multiple documents to answer a nuanced question. Their key finding: standard evaluation metrics miss important retriever behaviors that only show up when you test for multi-step reasoning and agentic search. They also released RTriever-4B, a fine-tuned model they say substantially outperforms its base model on these harder tasks.
Why it matters: As AI assistants increasingly search documents and databases on your behalf, this research suggests current retrieval systems may perform worse on complex queries than simple benchmarks indicate—relevant for anyone relying on AI-powered research or document analysis.
AI Symptom Interviews Outperformed Clinicians in 14,000-Patient Study
A Google-backed study of nearly 14,000 Fitbit app users found that AI agents conducting symptom interviews produced more accurate differential diagnoses than independent clinicians working from the same conversation transcripts. The AI was 2.5 times more likely to reach the correct diagnosis (OR = 2.47, p < 0.001). Structured AI-led interviews also significantly outperformed conversations where users guided the questioning. Researchers validated results against clinician-provided diagnoses and a 517-case expert panel review.
Why it matters: This is one of the largest randomized studies suggesting AI can outperform clinicians at a specific diagnostic task—potentially reshaping how symptom-checking apps and telehealth triage evolve.
What's On The Pod
Some new podcast episodes
AI in Business — From Experimentation to Clinical-grade AI in Healthcare - with Alex Tyrrell of Wolters Kluwer
How I AI — The internal AI tool that’s transforming how Stripe designs products | Owen Williams