Inside Huawei Cloud’s Bold 2026 Partner Strategy: How Data Centers Become the Cornerstone of AI Infrastructure Expansion

Inside Huawei Cloud’s Bold 2026 Partner Strategy: How Data Centers Become the Cornerstone of AI Infrastructure Expansion

Huawei Cloud's 2026 partner strategy positions data centers as strategic allies in AI infrastructure expansion, offering unprecedented revenue-sharing models and technical integration. The approach targets emerging markets with generous incentives while navigating geopolitical constraints and semiconductor restrictions.

Posted on: by Samuel Johnson
Upwind’s Runtime Revolution: $250M Fuels $1.5B Cloud Security Unicorn

Upwind’s Runtime Revolution: $250M Fuels $1.5B Cloud Security Unicorn

Upwind's $250 million Series B catapults it to $1.5 billion valuation, powering runtime-first cloud security amid 900% revenue surge. Backed by Bessemer and all-stars, the ex-Spot.io team targets AI-era threats for giants like Siemens and Roku.

Posted on: by Ivy Bailey
Pentagon’s New Technology Chiefs Signal Major Shift in Defense Innovation Strategy

Pentagon’s New Technology Chiefs Signal Major Shift in Defense Innovation Strategy

The Pentagon's Chief Technology Officer has selected six defense technology veterans with diverse backgrounds—from Amazon executives to marine biologists—to lead Critical Technology Areas, signaling a major shift in how the Defense Department approaches innovation and maintains technological superiority against strategic competitors.

Posted on: by Emily Chen
Inside Elon Musk’s Audacious Plan to Fuse Rockets and AI: The SpaceX-xAI Megamerger

Inside Elon Musk’s Audacious Plan to Fuse Rockets and AI: The SpaceX-xAI Megamerger

Elon Musk is merging SpaceX and xAI in a deal combining an $800 billion rocket manufacturer with a $230 billion AI startup, advancing his vision of space-based data centers while consolidating his technological empire ahead of a planned summer IPO.

Posted on: by Emily Chen
Verizon’s Subscriber Surge Signals Schulman’s Turnaround Triumph

Verizon’s Subscriber Surge Signals Schulman’s Turnaround Triumph

Verizon crushed Q4 2025 expectations with 616,000 postpaid phone adds under CEO Dan Schulman, issuing bullish 2026 guidance post-Frontier acquisition. Revenue hit $36.4 billion, signaling a strategic revival amid fierce competition.

Posted on: by Liam Murphy
Nevada’s Urgent Hunt for a Cyber Sentinel After Ransomware Chaos

Nevada’s Urgent Hunt for a Cyber Sentinel After Ransomware Chaos

Nevada seeks a permanent CISO after 2025 ransomware chaos disrupted 60 agencies, stole data, and exposed gaps. The role demands strategy, response leadership amid SOC buildup and federal aid, signaling a hardened push for resilience.

Posted on: by Ivy Bailey
How a Startup’s Unsecured Database Exposed the Fragility of AI Agent Platforms

How a Startup’s Unsecured Database Exposed the Fragility of AI Agent Platforms

Moltbook's completely exposed database allowed anyone to hijack AI agents on the platform, revealing how rapid AI deployment is outpacing basic cybersecurity practices. The incident highlights growing security debt in the AI startup ecosystem and regulatory gaps in governing autonomous agent platforms.

Posted on: by Roman Grant
DevSecOps Arsenal: Pentagon’s Push for Warfighter Code at Warp Speed

DevSecOps Arsenal: Pentagon’s Push for Warfighter Code at Warp Speed

The Pentagon's DevSecOps revolution integrates security into rapid software delivery, powering over 50 factories and slashing deployment times. From Platform One's secure pipelines to cATO approvals, it equips warfighters with resilient digital edge against evolving threats.

Posted on: by Jack Chen
The Invisible Shield: Why Industrial Cybersecurity Still Can’t Quantify Its Worth to the Boardroom

The Invisible Shield: Why Industrial Cybersecurity Still Can’t Quantify Its Worth to the Boardroom

Despite mounting threats to industrial control systems, OT cybersecurity teams face a persistent challenge: proving their value to executives when success means incidents that never happen. The struggle to quantify risk reduction in business terms leaves critical infrastructure chronically underprotected.

Posted on: by Claire Bell
Data Scientist’s Trek: From Paris Courts to Australian Mineshafts

Data Scientist’s Trek: From Paris Courts to Australian Mineshafts

Simon Barres bridges labs and mines at QuantumBlack, deploying AI to optimize mining yields with sensor data and real-time models. His journey from Guadeloupe basketball to Amsterdam AI leadership highlights multidisciplinary impact in heavy industry.

Posted on: by Zoe Patel

LLMs Fail Biomedical Code Test: New Agent Hits 74% Accuracy

Roman Grant | 2026-03-16
LLMs Fail Biomedical Code Test: New Agent Hits 74% Accuracy

Biomedical researchers hoping to lean on large language models for data analysis face a stark reality: these tools falter badly on real-world coding tasks. A new benchmark from University of Illinois researchers reveals that even top proprietary and open-source LLMs score below 40% accuracy when generating code for biomedical data science, raising alarms about blindly trusting AI outputs in high-stakes research.

The study, published January 22, 2026, in Nature Biomedical Engineering , introduces BioDSBench, a rigorous test set of 293 coding tasks pulled from 39 peer-reviewed studies spanning seven areas: biomarkers, integrative analysis, genomic profiling, molecular characterization, therapeutic response, translational research, and pan-cancer analysis. Tasks demand everything from plotting survival curves to integrative multi-omics visualizations, using real anonymized patient data from cBioPortal and UCSC Xena.

“Large language models (LLMs) can generate impressive data visualizations from simple requests, yet their accuracy remains underexplored,” the authors write, led by Zifeng Wang and Benjamin Danek of Keiji AI and the University of Illinois Urbana-Champaign, with corresponding author Jimeng Sun.

Benchmark Exposes Cracks in AI Foundations

Eight proprietary models—GPT-4o, Claude 3.5 Sonnet, Gemini 1.5, OpenAI o3-mini—and eight open-source ones like Llama 3, Code Llama, Qwen2.5-coder, and Deepseek-R1 were pitted against BioDSBench under chain-of-thought prompting and retrieval-augmented generation. None broke 40% overall accuracy. “This low accuracy raises serious concerns about the risk of propagating incorrect scientific findings when blindly relying on AI-generated analyses,” the paper warns.

Proprietary models edged out open-source counterparts slightly, but both struggled with biomedical nuances like handling genomic datasets or therapeutic response metrics. The benchmark, hosted on Hugging Face at https://huggingface.co/datasets/zifeng-ai/BioDSBench , includes reference solutions and test cases for reproducibility.

Recent web searches confirm this isn’t isolated: a npj Digital Medicine paper on medical LLMs notes persistent gaps in specialized knowledge, while Scientific Reports highlights scalability issues in oncology tasks.

Iterative Agents Rescue Reliability

To fix this, the team built an AI agent that drafts and iteratively refines an analysis plan before coding, drawing on ReAct reasoning-acting synergy and self-refine feedback loops. This boosted accuracy to 74%, nearly doubling baseline performance. The agent breaks complex tasks into steps: plan, code, test, refine.

Code for the agent lives on GitHub at https://github.com/RyanWangZf/BioDSBench . In a user study, five medical researchers used a new human-AI platform to codevelop plans and execute them, finishing over 80% of analysis code for three real studies. The platform, accessible via https://keiji.ai/contact.html , integrates planning, coding, and execution in one environment; see the demo at https://www.youtube.com/watch?v=c5ZJsFXQ_B0 .

“Benchmarking eight proprietary and eight open-source LLMs under various prompting strategies reveals an overall accuracy below 40%,” per the Nature paper. Figures in the study detail model comparisons (Fig. 2) and adaptation strategies (Fig. 3).

Real-World Ripples and Broader Warnings

X posts from insiders like Stephen Turner echo the findings, linking the paper as essential reading for data scientists. Broader critiques abound: npj Digital Medicine calls out shaky foundations for electronic health records, and npj Precision Oncology scrutinizes oncology applications.

The low scores underscore why biomedical work can’t afford hallucinations—errors in genomic profiling or pan-cancer analysis could mislead drug development. Yet the agentic fix points forward: structured planning tames LLM chaos. Ziwei Yang of Kyoto University and Zheng Chen of Osaka University aided dataset curation.

Funding came from NSF grants and JSPS, with no competing interests declared. Peer-reviewed by Chao Yan and others.

Path Forward for Trustworthy AI Tools

This platform shifts paradigms from solo LLM reliance to collaborative copilots. Researchers can now iteratively refine plans with AI, execute in integrated setups, slashing manual coding time. User study results (Fig. 5) show practical gains, with over 80% task completion.

While limited to 293 tasks and five users, the work scales: datasets from cBioPortal ensure real-world relevance. As npj Artificial Intelligence notes on LLMs in science, deep integration with human goals is key, backed by clear metrics.

Jimeng Sun, corresponding author, emphasizes supervision in acknowledgments. For insiders, BioDSBench sets a new standard—test your copilot before trusting it.

Subscribe Newsletter

Subscribe to our newsletter and stay up to date with the latest news, updates, and exclusive offers. Join our community today!

Comments

Join the discussion and share your thoughts.

No comments yet. Be the first to comment.

Leave a Reply

Your email address will not be published.

Join Us

Share your perspective with confidence. Your experience could inform, inspire, and help someone live better.

Archives

Authors

More ...

Search NexaPress