awesome-adaptation-agentic-ai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAwesome Adaptation of Agentic AI
Agentic AI适配研究精选合集
Overview
概述
This repository is a curated research collection accompanying the paper "Adaptation of Agentic AI" (arXiv:2512.16301). It systematically organizes papers on how AI agents adapt their behavior, particularly focusing on:
- Agent Adaptation: Methods to improve agent decision-making through tool execution signals (A1) or output signals (A2)
- Tool Adaptation: Approaches to optimize tools either independently (T1) or with agent supervision (T2)
The collection categorizes 40+ research papers by adaptation strategy, training method (RL, SFT, DPO), task domain, and implementation details.
本仓库是配合论文《Adaptation of Agentic AI》(arXiv:2512.16301)的精选研究合集,系统整理了关于AI Agent如何调整行为的相关论文,重点关注以下方向:
- Agent适配:通过工具执行信号(A1)或输出信号(A2)提升Agent决策能力的方法
- 工具适配:独立优化工具(T1)或在Agent监督下优化工具(T2)的方案
本合集按适配策略、训练方法(RL、SFT、DPO)、任务领域和实现细节对40余篇研究论文进行了分类。
Repository Structure
仓库结构
awesome-adaptation-of-agentic-ai/
├── README.md # Main paper collection with categorized tables
├── images/
│ ├── intro.png # Overview diagram
│ ├── a1_illustrate.png # Tool execution signaled adaptation
│ ├── a1_timeline.png # Development timeline
│ ├── paper_icon.png # Paper link icons
│ └── code_icon.png # Code link icons
└── LICENSEawesome-adaptation-of-agentic-ai/
├── README.md # 包含分类表格的主论文合集
├── images/
│ ├── intro.png # 概览图
│ ├── a1_illustrate.png # 工具执行信号适配示意图
│ ├── a1_timeline.png # 发展时间线
│ ├── paper_icon.png # 论文链接图标
│ └── code_icon.png # 代码链接图标
└── LICENSEInstallation & Usage
安装与使用
Cloning the Repository
克隆仓库
bash
undefinedbash
undefinedClone the repository
Clone the repository
git clone https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI.git
cd Awesome-Adaptation-of-Agentic-AI
git clone https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI.git
cd Awesome-Adaptation-of-Agentic-AI
View the main README
View the main README
cat README.md
undefinedcat README.md
undefinedBrowsing Papers
浏览论文
The repository organizes papers into four main categories:
-
A1: Tool Execution Signaled Agent Adaptation
- RL-based methods (GRPO, PPO, AlphaZero-like)
- SFT & DPO methods
-
A2: Agent Output Signaled Adaptation
- Coming in future updates
-
T1: Agent-Agnostic Tool Adaptation
- Coming in future updates
-
T2: Agent-Supervised Tool Adaptation
- Coming in future updates
本仓库将论文分为四大类:
-
A1:工具执行信号驱动的Agent适配
- 基于RL的方法(GRPO、PPO、类AlphaZero)
- SFT & DPO方法
-
A2:Agent输出信号驱动的适配
- 后续更新中
-
T1:与Agent无关的工具适配
- 后续更新中
-
T2:Agent监督下的工具适配
- 后续更新中
Key Research Categories
核心研究分类
A1: Tool Execution Signaled (RL-Based)
A1:工具执行信号驱动(基于RL)
Papers where agents learn from tool execution feedback using reinforcement learning:
Notable Methods:
- Orion (2025.11): IR agents with GRPO on LFM2
- DeepSeek-R1-Zero (2025.01): Coding agents with code executor feedback
- DeepSeek-Prover-V2 (2025.04): Formal theorem proving with Lean compiler
- FTRL (2025.08): Multi-step tool-use with GRPO
Common Pattern:
Agent → Tool Call → Execution Feedback → RL Update (GRPO/PPO)此类论文中,Agent通过强化学习从工具执行反馈中学习:
代表性方法:
- Orion(2025.11):基于LFM2数据集、采用GRPO方法的IR Agent
- DeepSeek-R1-Zero(2025.01):借助代码执行器反馈的编码Agent
- DeepSeek-Prover-V2(2025.04):基于Lean编译器的形式化定理证明Agent
- FTRL(2025.08):采用GRPO方法的多步工具使用Agent
通用流程:
Agent → 工具调用 → 执行反馈 → RL更新(GRPO/PPO)A1: Tool Execution Signaled (SFT & DPO)
A1:工具执行信号驱动(SFT & DPO)
Papers using supervised fine-tuning and direct preference optimization:
Notable Methods:
- ToolLLM (2023.07): API planning with real-world APIs
- RetPO (2024.02): Information retrieval with DPO
- AWL (2024.12): Scientific reasoning with adaptive learning
Common Pattern:
Agent → Tool Call → Execution Trace → Supervised Learning此类论文采用监督微调(SFT)和直接偏好优化(DPO)方法:
代表性方法:
- ToolLLM(2023.07):基于真实世界API的API规划Agent
- RetPO(2024.02):采用DPO方法的信息检索Agent
- AWL(2024.12):具备自适应学习能力的科学推理Agent
通用流程:
Agent → 工具调用 → 执行轨迹 → 监督学习Common Use Cases
常见使用场景
Finding Papers by Task
按任务查找论文
Example: Formal Theorem Proving
bash
undefined示例:形式化定理证明
bash
undefinedSearch README for theorem proving papers
在README中搜索定理证明相关论文
grep -i "theorem proving" README.md
Papers include: AlphaProof, BFS-Prover-V2, Goedel-Prover-V2, Leanabell-Prover-V2, DeepSeek-Prover-V1.5/V2
**Example: Coding & Code Execution**
```bashgrep -i "theorem proving" README.md
相关论文包括:AlphaProof、BFS-Prover-V2、Goedel-Prover-V2、Leanabell-Prover-V2、DeepSeek-Prover-V1.5/V2
**示例:编码与代码执行**
```bashFind coding-related papers
查找编码相关论文
grep -i "coding|code executor" README.md
Papers include: olmOCR2, R1-Code-Interpreter, Code-R1, DeepSeek-R1-Zero, RLEF, LeDex, CYCLE, CodeActgrep -i "coding|code executor" README.md
相关论文包括:olmOCR2、R1-Code-Interpreter、Code-R1、DeepSeek-R1-Zero、RLEF、LeDex、CYCLE、CodeActFinding Papers by Method
按方法查找论文
Example: GRPO (Group Relative Policy Optimization)
bash
undefined示例:GRPO(Group Relative Policy Optimization)
bash
undefinedList all GRPO papers
列出所有采用GRPO方法的论文
grep "GRPO" README.md
Commonly used in: Tool-N1, DeepSeek-Prover-V2, SQL-R1, Rec-R1, DeepRetrieval, etc.
**Example: DPO (Direct Preference Optimization)**
```bashgrep "GRPO" README.md
GRPO方法常用于:Tool-N1、DeepSeek-Prover-V2、SQL-R1、Rec-R1、DeepRetrieval等Agent
**示例:DPO(Direct Preference Optimization)**
```bashList all DPO papers
列出所有采用DPO方法的论文
grep "DPO" README.md
Used in: AWL, LeReT, TP-LLaMA, RetPOgrep "DPO" README.md
DPO方法用于:AWL、LeReT、TP-LLaMA、RetPO等AgentFinding Papers by Model Backbone
按模型骨干查找论文
Example: Qwen2.5-based agents
bash
undefined示例:基于Qwen2.5的Agent
bash
undefinedFind Qwen2.5 implementations
查找基于Qwen2.5的实现
grep "Qwen2.5" README.md
Models include: olmOCR2, ToolExpander, BFS-Prover-V2, WebGen-Agent, Tool-R1, etc.grep "Qwen2.5" README.md
相关模型包括:olmOCR2、ToolExpander、BFS-Prover-V2、WebGen-Agent、Tool-R1等Accessing Paper Resources
获取论文资源
Paper Links
论文链接
All papers include arXiv or conference links:
markdown
[Paper](https://arxiv.org/abs/2512.16301)所有论文均包含arXiv或会议链接:
markdown
[Paper](https://arxiv.org/abs/2512.16301)Code Repositories
代码仓库
Many papers provide implementation code:
markdown
[Code](https://github.com/example/repo)许多论文提供了实现代码:
markdown
[Code](https://github.com/example/repo)Reading Strategy
阅读策略
python
undefinedpython
undefinedPseudo-code for systematic review
系统性综述伪代码
def research_adaptation_strategy(task_domain, method_type):
"""
Navigate to specific adaptation category
Args:
task_domain: e.g., "coding", "theorem proving", "IR"
method_type: "RL", "SFT", "DPO"
Returns:
List of relevant papers with links
"""
# 1. Go to appropriate section (A1, A2, T1, T2)
# 2. Filter by method_type (RL-based vs SFT/DPO)
# 3. Search table for task_domain
# 4. Check paper links, code availability, model backbones
passundefineddef research_adaptation_strategy(task_domain, method_type):
"""
导航至特定适配分类
参数:
task_domain: 例如 "coding", "theorem proving", "IR"
method_type: "RL", "SFT", "DPO"
返回:
包含链接的相关论文列表
"""
# 1. 进入对应章节(A1、A2、T1、T2)
# 2. 按method_type筛选(基于RL vs SFT/DPO)
# 3. 在表格中搜索task_domain
# 4. 查看论文链接、代码可用性、模型骨干
passundefinedCitation Format
引用格式
When using this repository for research:
bibtex
@article{jiang2025adaptation,
title={Adaptation of Agentic AI},
author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
journal={arXiv preprint arXiv:2512.16301},
year={2025}
}当将本仓库用于研究时,请使用以下引用格式:
bibtex
@article{jiang2025adaptation,
title={Adaptation of Agentic AI},
author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
journal={arXiv preprint arXiv:2512.16301},
year={2025}
}Contributing
贡献指南
The repository welcomes pull requests for:
- New papers on agentic AI adaptation
- Updates to existing paper information
- Corrections to categorizations
- Additional metadata (benchmarks, datasets)
Contribution Pattern:
bash
undefined本仓库欢迎以下类型的Pull Request:
- 新增Agentic AI适配相关的论文
- 更新现有论文信息
- 修正分类错误
- 添加额外元数据(基准测试、数据集)
贡献流程:
bash
undefined1. Fork the repository
1. Fork本仓库
2. Add paper to appropriate category in README.md
2. 在README.md的对应分类中添加论文
3. Follow existing table format:
3. 遵循现有表格格式:
| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |
| 时间 | 方法 | 会议/期刊<br>🔗 Paper<br>💻 Code | 任务 | 工具 | Agent骨干 | 调优方法 |
Example entry:
示例条目:
| 2025.XX | YourMethod | Venue<br>🔗 Paper<br>💻 Code | Task | Tools | Model | Method |
| 2025.XX | YourMethod | Venue<br>🔗 Paper<br>💻 Code | Task | Tools | Model | Method |
4. Submit pull request
4. 提交Pull Request
undefinedundefinedKey Insights from the Collection
合集核心见解
Adaptation Taxonomy
适配分类体系
-
Signal Type:
- Tool execution feedback (A1)
- Agent output quality (A2)
-
Training Methods:
- GRPO: Group Relative Policy Optimization (most common in 2025)
- PPO: Proximal Policy Optimization
- AlphaZero-like: Self-play with value/policy networks
- SFT: Supervised fine-tuning on execution traces
- DPO: Direct preference optimization
-
Trend: Growing use of GRPO for tool-augmented agents (2025), especially with Qwen2.5 and DeepSeek models
-
信号类型:
- 工具执行反馈(A1)
- Agent输出质量(A2)
-
训练方法:
- GRPO: Group Relative Policy Optimization(2025年最常用)
- PPO: Proximal Policy Optimization
- 类AlphaZero: 结合价值/策略网络的自博弈
- SFT: 基于执行轨迹的监督微调
- DPO: 直接偏好优化
-
趋势: 2025年GRPO方法在工具增强Agent中的应用日益广泛,尤其是搭配Qwen2.5和DeepSeek模型
Domain Coverage
领域覆盖
- Coding: Code execution sandboxes (DeepSeek-R1-Zero, R1-Code-Interpreter)
- Formal Math: Lean compilers (AlphaProof, DeepSeek-Prover)
- Information Retrieval: Search engines, retrievers (DeepRetrieval, ReZero)
- Tool-Calling: API execution (ToolLLM, Tool-N1, ToolExpander)
- Web Agents: GUI interaction (WebGen-Agent)
- 编码: 代码执行沙箱(DeepSeek-R1-Zero、R1-Code-Interpreter)
- 形式化数学: Lean编译器(AlphaProof、DeepSeek-Prover)
- 信息检索: 搜索引擎、检索器(DeepRetrieval、ReZero)
- 工具调用: API执行(ToolLLM、Tool-N1、ToolExpander)
- Web Agent: GUI交互(WebGen-Agent)
Troubleshooting
故障排查
Finding Specific Research
查找特定研究
Q: How do I find papers using a specific model like LLaMA3?
bash
grep -i "llama3" README.mdQ: Which papers have open-source code?
bash
undefined问题:如何找到使用LLaMA3等特定模型的论文?
bash
grep -i "llama3" README.md问题:哪些论文提供开源代码?
bash
undefinedLook for code icon in tables
在表格中查找代码图标
grep "code_icon.png" README.md
**Q: What are the most recent papers?**
```bashgrep "code_icon.png" README.md
**问题:最新的论文有哪些?**
```bashCheck Time column (sorted by date within categories)
查看时间列(各分类内按日期排序)
Most recent: 2025.11 (Orion), 2025.10 (olmOCR2, AlphaProof)
最新论文:2025.11(Orion)、2025.10(olmOCR2、AlphaProof)
undefinedundefinedUnderstanding Abbreviations
缩写说明
- IR: Information Retrieval
- GRPO: Group Relative Policy Optimization
- PPO: Proximal Policy Optimization
- SFT: Supervised Fine-Tuning
- DPO: Direct Preference Optimization
- TTRL: Test-Time Reinforcement Learning
- EI: Expert Iteration
- IR: Information Retrieval(信息检索)
- GRPO: Group Relative Policy Optimization(组相对策略优化)
- PPO: Proximal Policy Optimization(近端策略优化)
- SFT: Supervised Fine-Tuning(监督微调)
- DPO: Direct Preference Optimization(直接偏好优化)
- TTRL: Test-Time Reinforcement Learning(测试时强化学习)
- EI: Expert Iteration(专家迭代)
Repository Maintenance
仓库维护
The repository is actively maintained with:
- Last Update: 2026-05-15 (per metadata)
- Stars: 650+ (growing at ~3 stars/day)
- Open Issues: 2
For questions or issues, check: https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI/issues
本仓库持续维护中:
- 最后更新: 2026-05-15(依据元数据)
- 星标数: 650+(每日新增约3个)
- 开放问题: 2个
Related Resources
相关资源
- Homepage: https://arxiv.org/abs/2512.16301
- License: CC BY-NC-ND 4.0 (content), NOASSERTION (code)
- Topics: adaptation, agentic-ai, large-language-models
- 主页: https://arxiv.org/abs/2512.16301
- 许可证: CC BY-NC-ND 4.0(内容)、NOASSERTION(代码)
- 主题: adaptation, agentic-ai, large-language-models