India’s AI Classroom Revolution: Google’s Gemini Scales Where Silicon Valley Stumbles

India’s AI Classroom Revolution: Google’s Gemini Scales Where Silicon Valley Stumbles

India leads global Gemini usage for learning, teaching Google to scale AI amid 247 million students, state curricula, and access gaps. Partnerships and tools like JEE mocks position it as a worldwide proving ground.

Posted on: by Micah Shaw
DeepSeek’s Bold Push: AI Search and Agents Challenge Google, OpenAI

DeepSeek’s Bold Push: AI Search and Agents Challenge Google, OpenAI

DeepSeek's January job postings reveal plans for a multilingual, multimodal AI search engine and persistent agents, intensifying rivalry with Google and OpenAI. Building on cost-efficient models like R1, the startup targets phone-first queries and autonomous task execution.

Posted on: by Vivian Stewart
NASA’s Artemis Fuel System Failures Expose Critical Vulnerabilities in America’s Return to Lunar Exploration

NASA’s Artemis Fuel System Failures Expose Critical Vulnerabilities in America’s Return to Lunar Exploration

NASA's Space Launch System faces persistent hydrogen fuel leaks that have delayed the Artemis moon program, exposing critical gaps in expertise and raising questions about the $93 billion program's sustainability amid rising costs and international competition in lunar exploration.

Posted on: by Aria Brooks
AI Agents Shatter Compliance Foundations, Forcing CISOs to the Front Lines

AI Agents Shatter Compliance Foundations, Forcing CISOs to the Front Lines

AI agents are upending SOX, GDPR, PCI DSS, and HIPAA by autonomously executing regulated tasks, thrusting CISOs into accountability for compliance via identity and access controls. New governance treats AI as non-human identities amid rising regulatory demands.

Posted on: by Emily Scott
How One Company’s Radical AI Profit-Sharing Plan Is Rewriting the Productivity Playbook

How One Company’s Radical AI Profit-Sharing Plan Is Rewriting the Productivity Playbook

A company's innovative profit-sharing program ties employee compensation directly to AI tool usage and productivity gains, creating financial incentives that drive adoption rates far beyond industry norms while addressing worker concerns about automation and job security.

Posted on: by Samuel Johnson
Musk’s Abundance Dream vs. Amodei’s Job Apocalypse: AI’s Economic Reckoning

Musk’s Abundance Dream vs. Amodei’s Job Apocalypse: AI’s Economic Reckoning

Elon Musk predicts AI-driven abundance will render retirement savings irrelevant by 2030, while Anthropic's Dario Amodei warns of massive job losses and inequality demanding urgent fixes. Their visions clash on the path to AI's economic transformation.

Posted on: by Zoe Wright
The Agent-Native Revolution: How AI Agents Are Rewriting the Rules of Software Development

The Agent-Native Revolution: How AI Agents Are Rewriting the Rules of Software Development

The software industry is undergoing a fundamental transformation as agent-native architecture emerges, where AI agents rather than humans become the primary users of digital systems. This shift demands new approaches to development, security, and business operations.

Posted on: by Jack Chen
Uber’s Calculated Return to Greater China: Why Macau Marks a Pivotal Strategic Shift

Uber’s Calculated Return to Greater China: Why Macau Marks a Pivotal Strategic Shift

Uber's expansion into Macau marks its first new Asian market in years, representing a calculated test of whether the ride-hailing giant can succeed in Greater China after its costly 2016 retreat. The tourism-dependent territory offers unique advantages that could inform future regional strategy.

Posted on: by Zoe Wright
How Anthropic’s AI Is Driving NASA’s Mars Rover Through Uncharted Terrain

How Anthropic’s AI Is Driving NASA’s Mars Rover Through Uncharted Terrain

NASA's deployment of Anthropic's Claude AI to navigate the Perseverance rover on Mars marks a pivotal shift in space exploration, demonstrating how artificial intelligence can augment human decision-making in extraterrestrial missions and accelerate scientific discovery millions of miles from Earth.

Posted on: by Leo Rossi
Behind Closed Doors: The Rigorous Quarantine Protocol Protecting NASA’s Artemis Moon Crews

Behind Closed Doors: The Rigorous Quarantine Protocol Protecting NASA’s Artemis Moon Crews

NASA's reinstatement of mandatory quarantine for Artemis moon crews reveals sophisticated protocols balancing astronaut health, mission success, and lessons from decades of spaceflight. Modern isolation facilities feature advanced medical monitoring and technology unavailable during Apollo, creating optimal conditions for historic lunar missions.

Posted on: by Liam Price

Poetiq’s Lean Squad Outsmarts AI Giants on Reasoning Frontier

Elena Brooks | 2026-04-03
Poetiq’s Lean Squad Outsmarts AI Giants on Reasoning Frontier

In a striking demonstration of ingenuity over brute force, Poetiq, a six-person startup founded by former Google DeepMind researchers, has topped the ARC-AGI-2 benchmark, surpassing efforts from Google and Anthropic while spending just $40,000 on hardware. The company emerged from stealth with a $45.8 million seed round, signaling investor confidence in its meta-system approach that enhances existing large language models without retraining.

Launched in June 2025 by co-CEOs Shumeet Baluja and Ian Fischer, Poetiq leverages recursive self-improvement to generate specialized “expert agents” for complex tasks. Clients supply a problem and a few hundred examples, far fewer than the thousands needed for traditional fine-tuning. This layer sits atop models like OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama, optimizing for accuracy and efficiency. Puck News detailed how the team achieved one of the highest scores on ARC-AGI-2 in six months.

The ARC-AGI-2, created by François Chollet in 2019, tests abstract reasoning and generalization—skills where LLMs traditionally falter. Poetiq’s system hit 54% accuracy on the semi-private set using Gemini 3 Pro, beating Gemini 3 Deep Think’s 45% at half the cost per task ($30.57 vs. $77.16), as verified by ARC Prize. Later, integrating GPT-5.2 X-High pushed public eval accuracy to 75%, exceeding prior records by 16 points at $8 per task. ARC Prize confirmed these refinements redraw performance frontiers.

Recursive Self-Improvement Unlocks Hidden Potential

“LLMs are impressive databases that encode a vast amount of humanity’s collective knowledge. They are simply not the best tools for deep reasoning,” Baluja told Pulse 2.0 . Poetiq’s meta-system uses iterative loops: generate solutions, critique, refine, and verify. It employs self-auditing to halt at optimal points, averaging fewer than two requests per ARC problem. This avoids wasteful compute, contrasting with reinforcement learning’s demands.

The open-sourced GitHub repo allows reproduction of Poetiq’s configs, showing pure Gemini-based setups. On ARC-AGI-1 public evals, it outperformed baselines across cost-performance curves. ARC Prize noted similar gains on Claude Opus 4.5, though at higher cost. Poetiq’s adaptability shone post-GPT-5.2 release, integrating it hours later for new highs. OpenAI’s Greg Brockman tweeted recognition of exceeding human baselines, per PR Newswire .

Founded after Baluja and Fischer’s decade at DeepMind, Poetiq’s team boasts 53 years combined experience. Garry Tan, Y Combinator CEO, praised the feat: “Getting to the top of ARC-AGI is no small feat, and recursive improvement a powerful milestone.” A NeurIPS talk with Fischer explored ensembles, voting, and system optimization sans model weights. Y Combinator’s X post highlighted this.

Massive Seed Backs Frugal Innovation

The $45.8 million seed, co-led by FYRFLY Venture Partners and Surface Ventures, included Y Combinator, 468 Capital, Operator Collective, Hico Ventures, and Neuron Venture Partners. “That Poetiq managed to top ARC-AGI within six months of launching is remarkable,” said Philipp Stauffer of FYRFLY. Gyan Kapur of Surface added, “Poetiq doesn’t need to outcompete frontier models… it enhances any combination of LLMs.” VentureBurn covered the round.

Allison Barr Allen of Operator Collective echoed excitement on X: “They have raised a $45.8M seed round after beating industry-leading benchmarks with a small team of 6.” Poetiq’s Miami HQ and business/productivity software focus, per PitchBook, position it for enterprise. Unlike GPU-heavy rivals, its $40K hardware bill underscores efficiency. Her X post celebrated the partnership.

Investors see enterprise potential in reasoning boosts for claims triage, fraud detection, and support. MIT’s Project NANDA found 95% of GenAI pilots lack P&L impact due to reliability issues—Poetiq targets this gap. ARC Prize’s 2025 report emphasized refinements like Poetiq’s as key, predicting integration into commercial APIs.

Benchmark Breakthrough Signals Paradigm Shift

From sub-5% in early 2025 to Poetiq’s 54%, ARC-AGI-2 progress accelerated. Humans average 60%, but Poetiq neared or passed on subsets. Reddit threads on r/singularity hailed it as breaking 50%, though debates noted benchmark overfitting risks. ARC Prize stressed private sets prevent this, verifying Poetiq’s semi-private SOTA.

Poetiq’s blog detailed Pareto frontier shifts on both ARC-1 and -2, using diverse tasks for self-improvement. It tackles noise and uncertainty in reasoning. Poetiq’s site confirmed verified results, teasing more benchmarks. The Rundown called it a shift to application-layer gains over scale.

Beyond ARC, Poetiq eyes retrieval and reasoning tasks. Harj Taggar tweeted: “Poetiq just crushed the ARC A.G.I. benchmark, beating Anthropic and Google, with only six people.” Techmeme amplified Puck’s scoop on the frugal win. As X buzz grows, Poetiq proves small teams can lead via smart orchestration.

Enterprise Edge and AGI Path Ahead

For businesses, Poetiq slashes costs: half of Gemini Deep Think, integrable with any LLM stack. It automates prompt engineering, a NeurIPS focus. ARC Prize’s technical report lauded domain-specific harnesses evolving general-purpose via DSPy-like methods. Poetiq’s model-agnostic design future-proofs against lab races.

“We used recursive self-improvement to produce specialized agents in a matter of hours,” Baluja noted, contrasting RL’s slowness. Grishin Robotics highlighted enterprise failures on integration—Poetiq bridges this. With funding, expansion targets AI product teams and researchers needing reliability.

Critics question transfer beyond ARC, but Poetiq’s multi-benchmark work and open code invite scrutiny. As Tan said, “You don’t always need a bigger model.” Poetiq’s rise challenges scale-alone dogma, betting on meta-systems for safe superintelligence—their bio’s bold claim.

Subscribe Newsletter

Subscribe to our newsletter and stay up to date with the latest news, updates, and exclusive offers. Join our community today!

Comments

Join the discussion and share your thoughts.

No comments yet. Be the first to comment.

Leave a Reply

Your email address will not be published.

Join Us

Share your perspective with confidence. Your experience could inform, inspire, and help someone live better.

Archives

Authors

More ...

Search NexaPress