awesome-adaptation-agentic-ai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Awesome Adaptation of Agentic AI

Agentic AI适配研究精选合集

Skill by ara.so — AI Agent Skills collection.

由ara.so提供的Skill — AI Agent技能合集。

Overview

概述

This repository is a curated research collection accompanying the paper "Adaptation of Agentic AI" (arXiv:2512.16301). It systematically organizes papers on how AI agents adapt their behavior, particularly focusing on:

Agent Adaptation: Methods to improve agent decision-making through tool execution signals (A1) or output signals (A2)
Tool Adaptation: Approaches to optimize tools either independently (T1) or with agent supervision (T2)

The collection categorizes 40+ research papers by adaptation strategy, training method (RL, SFT, DPO), task domain, and implementation details.

本仓库是配合论文《Adaptation of Agentic AI》（arXiv:2512.16301）的精选研究合集，系统整理了关于AI Agent如何调整行为的相关论文，重点关注以下方向：

Agent适配：通过工具执行信号（A1）或输出信号（A2）提升Agent决策能力的方法
工具适配：独立优化工具（T1）或在Agent监督下优化工具（T2）的方案

本合集按适配策略、训练方法（RL、SFT、DPO）、任务领域和实现细节对40余篇研究论文进行了分类。

Repository Structure

仓库结构

awesome-adaptation-of-agentic-ai/
├── README.md                 # Main paper collection with categorized tables
├── images/
│   ├── intro.png            # Overview diagram
│   ├── a1_illustrate.png    # Tool execution signaled adaptation
│   ├── a1_timeline.png      # Development timeline
│   ├── paper_icon.png       # Paper link icons
│   └── code_icon.png        # Code link icons
└── LICENSE

awesome-adaptation-of-agentic-ai/
├── README.md                 # 包含分类表格的主论文合集
├── images/
│   ├── intro.png            # 概览图
│   ├── a1_illustrate.png    # 工具执行信号适配示意图
│   ├── a1_timeline.png      # 发展时间线
│   ├── paper_icon.png       # 论文链接图标
│   └── code_icon.png        # 代码链接图标
└── LICENSE

Installation & Usage

安装与使用

Cloning the Repository

克隆仓库

bash

undefined

bash

undefined

Clone the repository

git clone https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI.git cd Awesome-Adaptation-of-Agentic-AI

View the main README

cat README.md

undefined

cat README.md

undefined

Browsing Papers

浏览论文

The repository organizes papers into four main categories:

A1: Tool Execution Signaled Agent Adaptation
- RL-based methods (GRPO, PPO, AlphaZero-like)
- SFT & DPO methods
A2: Agent Output Signaled Adaptation
- Coming in future updates
T1: Agent-Agnostic Tool Adaptation
- Coming in future updates
T2: Agent-Supervised Tool Adaptation
- Coming in future updates

本仓库将论文分为四大类：

A1：工具执行信号驱动的Agent适配
- 基于RL的方法（GRPO、PPO、类AlphaZero）
- SFT & DPO方法
A2：Agent输出信号驱动的适配
- 后续更新中
T1：与Agent无关的工具适配
- 后续更新中
T2：Agent监督下的工具适配
- 后续更新中

Key Research Categories

核心研究分类

A1: Tool Execution Signaled (RL-Based)

A1：工具执行信号驱动（基于RL）

Papers where agents learn from tool execution feedback using reinforcement learning:

Notable Methods:

Orion (2025.11): IR agents with GRPO on LFM2
DeepSeek-R1-Zero (2025.01): Coding agents with code executor feedback
DeepSeek-Prover-V2 (2025.04): Formal theorem proving with Lean compiler
FTRL (2025.08): Multi-step tool-use with GRPO

Common Pattern:

Agent → Tool Call → Execution Feedback → RL Update (GRPO/PPO)

此类论文中，Agent通过强化学习从工具执行反馈中学习：

代表性方法：

Orion（2025.11）：基于LFM2数据集、采用GRPO方法的IR Agent
DeepSeek-R1-Zero（2025.01）：借助代码执行器反馈的编码Agent
DeepSeek-Prover-V2（2025.04）：基于Lean编译器的形式化定理证明Agent
FTRL（2025.08）：采用GRPO方法的多步工具使用Agent

通用流程：

Agent → 工具调用 → 执行反馈 → RL更新（GRPO/PPO）

A1: Tool Execution Signaled (SFT & DPO)

A1：工具执行信号驱动（SFT & DPO）

Papers using supervised fine-tuning and direct preference optimization:

Notable Methods:

ToolLLM (2023.07): API planning with real-world APIs
RetPO (2024.02): Information retrieval with DPO
AWL (2024.12): Scientific reasoning with adaptive learning

Common Pattern:

Agent → Tool Call → Execution Trace → Supervised Learning

此类论文采用监督微调（SFT）和直接偏好优化（DPO）方法：

代表性方法：

ToolLLM（2023.07）：基于真实世界API的API规划Agent
RetPO（2024.02）：采用DPO方法的信息检索Agent
AWL（2024.12）：具备自适应学习能力的科学推理Agent

通用流程：

Agent → 工具调用 → 执行轨迹 → 监督学习

Common Use Cases

常见使用场景

Finding Papers by Task

按任务查找论文

Example: Formal Theorem Proving

bash

undefined

示例：形式化定理证明

bash

undefined

Search README for theorem proving papers

在README中搜索定理证明相关论文

grep -i "theorem proving" README.md


Papers include: AlphaProof, BFS-Prover-V2, Goedel-Prover-V2, Leanabell-Prover-V2, DeepSeek-Prover-V1.5/V2

**Example: Coding & Code Execution**

```bash

grep -i "theorem proving" README.md


相关论文包括：AlphaProof、BFS-Prover-V2、Goedel-Prover-V2、Leanabell-Prover-V2、DeepSeek-Prover-V1.5/V2

**示例：编码与代码执行**

```bash

Find coding-related papers

查找编码相关论文

grep -i "coding|code executor" README.md


Papers include: olmOCR2, R1-Code-Interpreter, Code-R1, DeepSeek-R1-Zero, RLEF, LeDex, CYCLE, CodeAct

grep -i "coding|code executor" README.md


相关论文包括：olmOCR2、R1-Code-Interpreter、Code-R1、DeepSeek-R1-Zero、RLEF、LeDex、CYCLE、CodeAct

Finding Papers by Method

按方法查找论文

Example: GRPO (Group Relative Policy Optimization)

bash

undefined

示例：GRPO（Group Relative Policy Optimization）

bash

undefined

List all GRPO papers

列出所有采用GRPO方法的论文

grep "GRPO" README.md


Commonly used in: Tool-N1, DeepSeek-Prover-V2, SQL-R1, Rec-R1, DeepRetrieval, etc.

**Example: DPO (Direct Preference Optimization)**

```bash

grep "GRPO" README.md


GRPO方法常用于：Tool-N1、DeepSeek-Prover-V2、SQL-R1、Rec-R1、DeepRetrieval等Agent

**示例：DPO（Direct Preference Optimization）**

```bash

List all DPO papers

列出所有采用DPO方法的论文

grep "DPO" README.md


Used in: AWL, LeReT, TP-LLaMA, RetPO

grep "DPO" README.md


DPO方法用于：AWL、LeReT、TP-LLaMA、RetPO等Agent

Finding Papers by Model Backbone

按模型骨干查找论文

Example: Qwen2.5-based agents

bash

undefined

示例：基于Qwen2.5的Agent

bash

undefined

Find Qwen2.5 implementations

查找基于Qwen2.5的实现

grep "Qwen2.5" README.md


Models include: olmOCR2, ToolExpander, BFS-Prover-V2, WebGen-Agent, Tool-R1, etc.

grep "Qwen2.5" README.md


相关模型包括：olmOCR2、ToolExpander、BFS-Prover-V2、WebGen-Agent、Tool-R1等

Accessing Paper Resources

获取论文资源

Paper Links

论文链接

All papers include arXiv or conference links:

markdown

[Paper](https://arxiv.org/abs/2512.16301)

所有论文均包含arXiv或会议链接：

markdown

[Paper](https://arxiv.org/abs/2512.16301)

Code Repositories

代码仓库

Many papers provide implementation code:

markdown

[Code](https://github.com/example/repo)

许多论文提供了实现代码：

markdown

[Code](https://github.com/example/repo)

Reading Strategy

阅读策略

python

undefined

python

undefined

Pseudo-code for systematic review

系统性综述伪代码

def research_adaptation_strategy(task_domain, method_type): """ Navigate to specific adaptation category

Args:
    task_domain: e.g., "coding", "theorem proving", "IR"
    method_type: "RL", "SFT", "DPO"

Returns:
    List of relevant papers with links
"""
# 1. Go to appropriate section (A1, A2, T1, T2)
# 2. Filter by method_type (RL-based vs SFT/DPO)
# 3. Search table for task_domain
# 4. Check paper links, code availability, model backbones
pass

undefined

def research_adaptation_strategy(task_domain, method_type): """ 导航至特定适配分类

参数:
    task_domain: 例如 "coding", "theorem proving", "IR"
    method_type: "RL", "SFT", "DPO"

返回:
    包含链接的相关论文列表
"""
# 1. 进入对应章节（A1、A2、T1、T2）
# 2. 按method_type筛选（基于RL vs SFT/DPO）
# 3. 在表格中搜索task_domain
# 4. 查看论文链接、代码可用性、模型骨干
pass

undefined

Citation Format

引用格式

When using this repository for research:

bibtex

@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}

当将本仓库用于研究时，请使用以下引用格式：

bibtex

@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}

Contributing

贡献指南

The repository welcomes pull requests for:

New papers on agentic AI adaptation
Updates to existing paper information
Corrections to categorizations
Additional metadata (benchmarks, datasets)

Contribution Pattern:

bash

undefined

本仓库欢迎以下类型的Pull Request：

新增Agentic AI适配相关的论文
更新现有论文信息
修正分类错误
添加额外元数据（基准测试、数据集）

贡献流程：

bash

undefined

1. Fork the repository

1. Fork本仓库

2. Add paper to appropriate category in README.md

2. 在README.md的对应分类中添加论文

3. Follow existing table format:

3. 遵循现有表格格式:

| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |

| 时间 | 方法 | 会议/期刊<br>🔗 Paper<br>💻 Code | 任务 | 工具 | Agent骨干 | 调优方法 |

Example entry:

示例条目:

| 2025.XX | YourMethod | Venue<br>🔗 Paper<br>💻 Code | Task | Tools | Model | Method |

4. Submit pull request

4. 提交Pull Request

undefined

undefined

Key Insights from the Collection

合集核心见解

Adaptation Taxonomy

适配分类体系

Signal Type:
- Tool execution feedback (A1)
- Agent output quality (A2)
Training Methods:
- GRPO: Group Relative Policy Optimization (most common in 2025)
- PPO: Proximal Policy Optimization
- AlphaZero-like: Self-play with value/policy networks
- SFT: Supervised fine-tuning on execution traces
- DPO: Direct preference optimization
Trend: Growing use of GRPO for tool-augmented agents (2025), especially with Qwen2.5 and DeepSeek models

信号类型:
- 工具执行反馈（A1）
- Agent输出质量（A2）
训练方法:
- GRPO: Group Relative Policy Optimization（2025年最常用）
- PPO: Proximal Policy Optimization
- 类AlphaZero: 结合价值/策略网络的自博弈
- SFT: 基于执行轨迹的监督微调
- DPO: 直接偏好优化
趋势: 2025年GRPO方法在工具增强Agent中的应用日益广泛，尤其是搭配Qwen2.5和DeepSeek模型

Domain Coverage

领域覆盖

Coding: Code execution sandboxes (DeepSeek-R1-Zero, R1-Code-Interpreter)
Formal Math: Lean compilers (AlphaProof, DeepSeek-Prover)
Information Retrieval: Search engines, retrievers (DeepRetrieval, ReZero)
Tool-Calling: API execution (ToolLLM, Tool-N1, ToolExpander)
Web Agents: GUI interaction (WebGen-Agent)

编码: 代码执行沙箱（DeepSeek-R1-Zero、R1-Code-Interpreter）
形式化数学: Lean编译器（AlphaProof、DeepSeek-Prover）
信息检索: 搜索引擎、检索器（DeepRetrieval、ReZero）
工具调用: API执行（ToolLLM、Tool-N1、ToolExpander）
Web Agent: GUI交互（WebGen-Agent）

Troubleshooting

故障排查

Finding Specific Research

查找特定研究

Q: How do I find papers using a specific model like LLaMA3?

bash

grep -i "llama3" README.md

Q: Which papers have open-source code?

bash

undefined

问题：如何找到使用LLaMA3等特定模型的论文？

bash

grep -i "llama3" README.md

问题：哪些论文提供开源代码？

bash

undefined

Look for code icon in tables

在表格中查找代码图标

grep "code_icon.png" README.md


**Q: What are the most recent papers?**

```bash

grep "code_icon.png" README.md


**问题：最新的论文有哪些？**

```bash

Check Time column (sorted by date within categories)

查看时间列（各分类内按日期排序）

Most recent: 2025.11 (Orion), 2025.10 (olmOCR2, AlphaProof)

最新论文：2025.11（Orion）、2025.10（olmOCR2、AlphaProof）

undefined

undefined

Understanding Abbreviations

缩写说明

IR: Information Retrieval
GRPO: Group Relative Policy Optimization
PPO: Proximal Policy Optimization
SFT: Supervised Fine-Tuning
DPO: Direct Preference Optimization
TTRL: Test-Time Reinforcement Learning
EI: Expert Iteration

IR: Information Retrieval（信息检索）
GRPO: Group Relative Policy Optimization（组相对策略优化）
PPO: Proximal Policy Optimization（近端策略优化）
SFT: Supervised Fine-Tuning（监督微调）
DPO: Direct Preference Optimization（直接偏好优化）
TTRL: Test-Time Reinforcement Learning（测试时强化学习）
EI: Expert Iteration（专家迭代）

Repository Maintenance

仓库维护

The repository is actively maintained with:

Last Update: 2026-05-15 (per metadata)
Stars: 650+ (growing at ~3 stars/day)
Open Issues: 2

For questions or issues, check: https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI/issues

本仓库持续维护中：

最后更新: 2026-05-15（依据元数据）
星标数: 650+（每日新增约3个）
开放问题: 2个

如有疑问或问题，请访问：https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI/issues