AI Research Trends, H1 2026: What 170,927 Papers Reveal

We analyzed 170,927 AI research papers posted to arXiv since the beginning of 2025 up to June 26^th across its four main machine-learning categories, cs.CL (computation and language), cs.CV (computer vision), cs.LG (machine learning), and cs.AI (artificial intelligence, with the goal to find out what is actually changing in AI research right now.

This is also the first edition of our AI Research Pulse, a recurring report from AI Papers Academy tracking how AI research evolves over time.

The method is based on keyword matching against the title and abstract of every paper, using a curated set of topics, model families, and institutions. We split the window into three consecutive half-year periods, H1 2025, H2 2025, and H1 2026. Because the field grew ~25% overall, we track share of papers rather than absolute counts.

Research Acceleration Index

The above chart ranks which research topics are capturing a disproportionately larger slice of the field, ranked by share-of-papers growth. Each paper can count toward multiple topics.

The pattern jumps out immediately: many of the top entries are agent-adjacent areas that barely existed as distinct research areas eighteen months ago.

Interestingly, Reasoning & CoT ranks only #9 by share growth but #1 by raw volume at 11,636 papers. It’s so large that even a modest share gain represents thousands of papers. Similarly, alignment & AI safety at #10 still grew its share 33% on an already-large base of 8,121 papers.

While the Research Acceleration Index tracks what’s growing fastest, below are the topics that dominate AI research by sheer volume.

The Largest Topics in AI Research (H1 2026) by volume

The Agent Infrastructure Explosion

Agents-related topics mentions in papers between start of 2025 to mid-2026

Focusing on agent-related topics, in the above chart we can see their growth (by %, not by count) from 2025. Mentions of agentic workflows climbed from 4,585 to 10,496. Strong on its own, but the specialized building blocks underneath it are growing two to five times faster. Long-horizon planning, getting a model to pursue a goal across many steps without losing the thread, rocketed from 264 to 1,611 (+510%), the fastest-growing topic in the entire dataset.

The agentic workflows field is getting more mature and the growth of these subdomains represents the direction of its development. The field has moved from “can we build agents?” to “how do we make agents plan, reason, use tools, and judge their own outputs?”.

The Foundation Model Power Shift

Qwen overtakes Llama as the most referenced open model

The center of gravity in open-weight AI is shifting. Alibaba’s Qwen ecosystem is becoming the default starting point for researchers who need an accessible, capable base model, a position Meta’s Llama held unchallenged a year ago.

Alibaba’s Qwen nearly doubled its footprint, from 752 to 1,489 mentions (+98%), while Llama grew just +14% (1,085 → 1,232). For the first time, a Chinese-developed open model family is the most-referenced in Western AI research. Qwen didn’t merely catch Llama, it surpassed it convincingly, and now leads by more than 250 mentions per half-year.

In the below chart we can see the full landscape of model families growth in paper mentions. Google’s two-track strategy is gaining traction on both ends: its small open model Gemma gained the fastest percentage growth of any family at +147%, while Gemini grew +95%. Anthropic’s Claude was the fastest-growing proprietary model, up +130%.

Which Areas Showed Diminishing Growth?

Knowing what the field is stopping is as valuable as knowing what it’s starting. The declining topics share a pattern: they were “next big thing” narratives from 2023–2024 that either got absorbed into broader categories or hit diminishing research returns. At the bottom of the above chart we can see areas with a negative or very small growth.

Synthetic data didn’t just lose share (−24%), it shrank in absolute terms (1,564 → 1,475), a possible sign of quality concerns and the risk of models trained on model output drifting from reality.
State space models (Mamba) lost 20% share on flat raw numbers (495 → 492). However, SSM ideas are being absorbed into hybrid architectures.
Diffusion models lost 15% share despite ticking up to 1,774, as the energy migrates from images toward video generation.

The Closed-Source Horizon

The Proprietary Model Shift - GPT holds flat as Claude and Gemini close in

The above chart shows the shifting dominance trend in the proprietary models landscape. OpenAI’s GPT family still holds the highest absolute volume but has effectively plateaued. Meanwhile, the competition is closing the gap, led by Anthropic’s Claude (+130%) and Google’s Gemini (+95%). The data points to a clear reality that the era of the single-model monopoly is officially behind us.