awesome-adaptation-agentic-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Awesome Adaptation of Agentic AI

Agentic AI适配研究精选合集

Skill by ara.so — AI Agent Skills collection.
ara.so提供的Skill — AI Agent技能合集。

Overview

概述

This repository is a curated research collection accompanying the paper "Adaptation of Agentic AI" (arXiv:2512.16301). It systematically organizes papers on how AI agents adapt their behavior, particularly focusing on:
  • Agent Adaptation: Methods to improve agent decision-making through tool execution signals (A1) or output signals (A2)
  • Tool Adaptation: Approaches to optimize tools either independently (T1) or with agent supervision (T2)
The collection categorizes 40+ research papers by adaptation strategy, training method (RL, SFT, DPO), task domain, and implementation details.
本仓库是配合论文《Adaptation of Agentic AI》(arXiv:2512.16301)的精选研究合集,系统整理了关于AI Agent如何调整行为的相关论文,重点关注以下方向:
  • Agent适配:通过工具执行信号(A1)或输出信号(A2)提升Agent决策能力的方法
  • 工具适配:独立优化工具(T1)或在Agent监督下优化工具(T2)的方案
本合集按适配策略、训练方法(RL、SFT、DPO)、任务领域和实现细节对40余篇研究论文进行了分类。

Repository Structure

仓库结构

awesome-adaptation-of-agentic-ai/
├── README.md                 # Main paper collection with categorized tables
├── images/
│   ├── intro.png            # Overview diagram
│   ├── a1_illustrate.png    # Tool execution signaled adaptation
│   ├── a1_timeline.png      # Development timeline
│   ├── paper_icon.png       # Paper link icons
│   └── code_icon.png        # Code link icons
└── LICENSE
awesome-adaptation-of-agentic-ai/
├── README.md                 # 包含分类表格的主论文合集
├── images/
│   ├── intro.png            # 概览图
│   ├── a1_illustrate.png    # 工具执行信号适配示意图
│   ├── a1_timeline.png      # 发展时间线
│   ├── paper_icon.png       # 论文链接图标
│   └── code_icon.png        # 代码链接图标
└── LICENSE

Installation & Usage

安装与使用

Cloning the Repository

克隆仓库

bash
undefined
bash
undefined

Clone the repository

Clone the repository

git clone https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI.git cd Awesome-Adaptation-of-Agentic-AI
git clone https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI.git cd Awesome-Adaptation-of-Agentic-AI

View the main README

View the main README

cat README.md
undefined
cat README.md
undefined

Browsing Papers

浏览论文

The repository organizes papers into four main categories:
  1. A1: Tool Execution Signaled Agent Adaptation
    • RL-based methods (GRPO, PPO, AlphaZero-like)
    • SFT & DPO methods
  2. A2: Agent Output Signaled Adaptation
    • Coming in future updates
  3. T1: Agent-Agnostic Tool Adaptation
    • Coming in future updates
  4. T2: Agent-Supervised Tool Adaptation
    • Coming in future updates
本仓库将论文分为四大类:
  1. A1:工具执行信号驱动的Agent适配
    • 基于RL的方法(GRPO、PPO、类AlphaZero)
    • SFT & DPO方法
  2. A2:Agent输出信号驱动的适配
    • 后续更新中
  3. T1:与Agent无关的工具适配
    • 后续更新中
  4. T2:Agent监督下的工具适配
    • 后续更新中

Key Research Categories

核心研究分类

A1: Tool Execution Signaled (RL-Based)

A1:工具执行信号驱动(基于RL)

Papers where agents learn from tool execution feedback using reinforcement learning:
Notable Methods:
  • Orion (2025.11): IR agents with GRPO on LFM2
  • DeepSeek-R1-Zero (2025.01): Coding agents with code executor feedback
  • DeepSeek-Prover-V2 (2025.04): Formal theorem proving with Lean compiler
  • FTRL (2025.08): Multi-step tool-use with GRPO
Common Pattern:
Agent → Tool Call → Execution Feedback → RL Update (GRPO/PPO)
此类论文中,Agent通过强化学习从工具执行反馈中学习:
代表性方法:
  • Orion(2025.11):基于LFM2数据集、采用GRPO方法的IR Agent
  • DeepSeek-R1-Zero(2025.01):借助代码执行器反馈的编码Agent
  • DeepSeek-Prover-V2(2025.04):基于Lean编译器的形式化定理证明Agent
  • FTRL(2025.08):采用GRPO方法的多步工具使用Agent
通用流程:
Agent → 工具调用 → 执行反馈 → RL更新(GRPO/PPO)

A1: Tool Execution Signaled (SFT & DPO)

A1:工具执行信号驱动(SFT & DPO)

Papers using supervised fine-tuning and direct preference optimization:
Notable Methods:
  • ToolLLM (2023.07): API planning with real-world APIs
  • RetPO (2024.02): Information retrieval with DPO
  • AWL (2024.12): Scientific reasoning with adaptive learning
Common Pattern:
Agent → Tool Call → Execution Trace → Supervised Learning
此类论文采用监督微调(SFT)和直接偏好优化(DPO)方法:
代表性方法:
  • ToolLLM(2023.07):基于真实世界API的API规划Agent
  • RetPO(2024.02):采用DPO方法的信息检索Agent
  • AWL(2024.12):具备自适应学习能力的科学推理Agent
通用流程:
Agent → 工具调用 → 执行轨迹 → 监督学习

Common Use Cases

常见使用场景

Finding Papers by Task

按任务查找论文

Example: Formal Theorem Proving
bash
undefined
示例:形式化定理证明
bash
undefined

Search README for theorem proving papers

在README中搜索定理证明相关论文

grep -i "theorem proving" README.md

Papers include: AlphaProof, BFS-Prover-V2, Goedel-Prover-V2, Leanabell-Prover-V2, DeepSeek-Prover-V1.5/V2

**Example: Coding & Code Execution**

```bash
grep -i "theorem proving" README.md

相关论文包括:AlphaProof、BFS-Prover-V2、Goedel-Prover-V2、Leanabell-Prover-V2、DeepSeek-Prover-V1.5/V2

**示例:编码与代码执行**

```bash

Find coding-related papers

查找编码相关论文

grep -i "coding|code executor" README.md

Papers include: olmOCR2, R1-Code-Interpreter, Code-R1, DeepSeek-R1-Zero, RLEF, LeDex, CYCLE, CodeAct
grep -i "coding|code executor" README.md

相关论文包括:olmOCR2、R1-Code-Interpreter、Code-R1、DeepSeek-R1-Zero、RLEF、LeDex、CYCLE、CodeAct

Finding Papers by Method

按方法查找论文

Example: GRPO (Group Relative Policy Optimization)
bash
undefined
示例:GRPO(Group Relative Policy Optimization)
bash
undefined

List all GRPO papers

列出所有采用GRPO方法的论文

grep "GRPO" README.md

Commonly used in: Tool-N1, DeepSeek-Prover-V2, SQL-R1, Rec-R1, DeepRetrieval, etc.

**Example: DPO (Direct Preference Optimization)**

```bash
grep "GRPO" README.md

GRPO方法常用于:Tool-N1、DeepSeek-Prover-V2、SQL-R1、Rec-R1、DeepRetrieval等Agent

**示例:DPO(Direct Preference Optimization)**

```bash

List all DPO papers

列出所有采用DPO方法的论文

grep "DPO" README.md

Used in: AWL, LeReT, TP-LLaMA, RetPO
grep "DPO" README.md

DPO方法用于:AWL、LeReT、TP-LLaMA、RetPO等Agent

Finding Papers by Model Backbone

按模型骨干查找论文

Example: Qwen2.5-based agents
bash
undefined
示例:基于Qwen2.5的Agent
bash
undefined

Find Qwen2.5 implementations

查找基于Qwen2.5的实现

grep "Qwen2.5" README.md

Models include: olmOCR2, ToolExpander, BFS-Prover-V2, WebGen-Agent, Tool-R1, etc.
grep "Qwen2.5" README.md

相关模型包括:olmOCR2、ToolExpander、BFS-Prover-V2、WebGen-Agent、Tool-R1等

Accessing Paper Resources

获取论文资源

Paper Links

论文链接

All papers include arXiv or conference links:
markdown
[Paper](https://arxiv.org/abs/2512.16301)
所有论文均包含arXiv或会议链接:
markdown
[Paper](https://arxiv.org/abs/2512.16301)

Code Repositories

代码仓库

Many papers provide implementation code:
markdown
[Code](https://github.com/example/repo)
许多论文提供了实现代码:
markdown
[Code](https://github.com/example/repo)

Reading Strategy

阅读策略

python
undefined
python
undefined

Pseudo-code for systematic review

系统性综述伪代码

def research_adaptation_strategy(task_domain, method_type): """ Navigate to specific adaptation category
Args:
    task_domain: e.g., "coding", "theorem proving", "IR"
    method_type: "RL", "SFT", "DPO"

Returns:
    List of relevant papers with links
"""
# 1. Go to appropriate section (A1, A2, T1, T2)
# 2. Filter by method_type (RL-based vs SFT/DPO)
# 3. Search table for task_domain
# 4. Check paper links, code availability, model backbones
pass
undefined
def research_adaptation_strategy(task_domain, method_type): """ 导航至特定适配分类
参数:
    task_domain: 例如 "coding", "theorem proving", "IR"
    method_type: "RL", "SFT", "DPO"

返回:
    包含链接的相关论文列表
"""
# 1. 进入对应章节(A1、A2、T1、T2)
# 2. 按method_type筛选(基于RL vs SFT/DPO)
# 3. 在表格中搜索task_domain
# 4. 查看论文链接、代码可用性、模型骨干
pass
undefined

Citation Format

引用格式

When using this repository for research:
bibtex
@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}
当将本仓库用于研究时,请使用以下引用格式:
bibtex
@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}

Contributing

贡献指南

The repository welcomes pull requests for:
  • New papers on agentic AI adaptation
  • Updates to existing paper information
  • Corrections to categorizations
  • Additional metadata (benchmarks, datasets)
Contribution Pattern:
bash
undefined
本仓库欢迎以下类型的Pull Request:
  • 新增Agentic AI适配相关的论文
  • 更新现有论文信息
  • 修正分类错误
  • 添加额外元数据(基准测试、数据集)
贡献流程:
bash
undefined

1. Fork the repository

1. Fork本仓库

2. Add paper to appropriate category in README.md

2. 在README.md的对应分类中添加论文

3. Follow existing table format:

3. 遵循现有表格格式:

| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |

| 时间 | 方法 | 会议/期刊<br>🔗 Paper<br>💻 Code | 任务 | 工具 | Agent骨干 | 调优方法 |

Example entry:

示例条目:

| 2025.XX | YourMethod | Venue<br>🔗 Paper<br>💻 Code | Task | Tools | Model | Method |

| 2025.XX | YourMethod | Venue<br>🔗 Paper<br>💻 Code | Task | Tools | Model | Method |

4. Submit pull request

4. 提交Pull Request

undefined
undefined

Key Insights from the Collection

合集核心见解

Adaptation Taxonomy

适配分类体系

  1. Signal Type:
    • Tool execution feedback (A1)
    • Agent output quality (A2)
  2. Training Methods:
    • GRPO: Group Relative Policy Optimization (most common in 2025)
    • PPO: Proximal Policy Optimization
    • AlphaZero-like: Self-play with value/policy networks
    • SFT: Supervised fine-tuning on execution traces
    • DPO: Direct preference optimization
  3. Trend: Growing use of GRPO for tool-augmented agents (2025), especially with Qwen2.5 and DeepSeek models
  1. 信号类型:
    • 工具执行反馈(A1)
    • Agent输出质量(A2)
  2. 训练方法:
    • GRPO: Group Relative Policy Optimization(2025年最常用)
    • PPO: Proximal Policy Optimization
    • 类AlphaZero: 结合价值/策略网络的自博弈
    • SFT: 基于执行轨迹的监督微调
    • DPO: 直接偏好优化
  3. 趋势: 2025年GRPO方法在工具增强Agent中的应用日益广泛,尤其是搭配Qwen2.5和DeepSeek模型

Domain Coverage

领域覆盖

  • Coding: Code execution sandboxes (DeepSeek-R1-Zero, R1-Code-Interpreter)
  • Formal Math: Lean compilers (AlphaProof, DeepSeek-Prover)
  • Information Retrieval: Search engines, retrievers (DeepRetrieval, ReZero)
  • Tool-Calling: API execution (ToolLLM, Tool-N1, ToolExpander)
  • Web Agents: GUI interaction (WebGen-Agent)
  • 编码: 代码执行沙箱(DeepSeek-R1-Zero、R1-Code-Interpreter)
  • 形式化数学: Lean编译器(AlphaProof、DeepSeek-Prover)
  • 信息检索: 搜索引擎、检索器(DeepRetrieval、ReZero)
  • 工具调用: API执行(ToolLLM、Tool-N1、ToolExpander)
  • Web Agent: GUI交互(WebGen-Agent)

Troubleshooting

故障排查

Finding Specific Research

查找特定研究

Q: How do I find papers using a specific model like LLaMA3?
bash
grep -i "llama3" README.md
Q: Which papers have open-source code?
bash
undefined
问题:如何找到使用LLaMA3等特定模型的论文?
bash
grep -i "llama3" README.md
问题:哪些论文提供开源代码?
bash
undefined

Look for code icon in tables

在表格中查找代码图标

grep "code_icon.png" README.md

**Q: What are the most recent papers?**

```bash
grep "code_icon.png" README.md

**问题:最新的论文有哪些?**

```bash

Check Time column (sorted by date within categories)

查看时间列(各分类内按日期排序)

Most recent: 2025.11 (Orion), 2025.10 (olmOCR2, AlphaProof)

最新论文:2025.11(Orion)、2025.10(olmOCR2、AlphaProof)

undefined
undefined

Understanding Abbreviations

缩写说明

  • IR: Information Retrieval
  • GRPO: Group Relative Policy Optimization
  • PPO: Proximal Policy Optimization
  • SFT: Supervised Fine-Tuning
  • DPO: Direct Preference Optimization
  • TTRL: Test-Time Reinforcement Learning
  • EI: Expert Iteration
  • IR: Information Retrieval(信息检索)
  • GRPO: Group Relative Policy Optimization(组相对策略优化)
  • PPO: Proximal Policy Optimization(近端策略优化)
  • SFT: Supervised Fine-Tuning(监督微调)
  • DPO: Direct Preference Optimization(直接偏好优化)
  • TTRL: Test-Time Reinforcement Learning(测试时强化学习)
  • EI: Expert Iteration(专家迭代)

Repository Maintenance

仓库维护

The repository is actively maintained with:
  • Last Update: 2026-05-15 (per metadata)
  • Stars: 650+ (growing at ~3 stars/day)
  • Open Issues: 2
本仓库持续维护中:
  • 最后更新: 2026-05-15(依据元数据)
  • 星标数: 650+(每日新增约3个)
  • 开放问题: 2个

Related Resources

相关资源

  • 主页: https://arxiv.org/abs/2512.16301
  • 许可证: CC BY-NC-ND 4.0(内容)、NOASSERTION(代码)
  • 主题: adaptation, agentic-ai, large-language-models