validate-ui
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseArchon Web UI — Comprehensive E2E Validation
Archon Web UI — 全面端到端验证
Run exhaustive end-to-end browser automation tests and codebase review of the Archon Web UI.
The goal: determine whether Archon is doing the best it possibly can to solve the problem of
managing parallel agents, executing custom workflows, and providing full visibility into agent work.
Optional focus argument: (e.g., "workflows", "chat", "projects"). If empty, run ALL sections.
$ARGUMENTS对Archon Web UI执行详尽的端到端浏览器自动化测试和代码库审查。
目标:判断Archon是否在最大程度上解决了并行Agent管理、自定义工作流执行以及Agent工作全可见性的问题。
可选聚焦参数:(例如:"workflows"、"chat"、"projects")。若为空,则运行所有模块。
$ARGUMENTSPhase 0: Environment Setup
阶段0:环境搭建
0.1 Kill Old Archon Processes
0.1 终止旧的Archon进程
bash
undefinedbash
undefinedKill any running Archon dev servers (backend + frontend)
Kill any running Archon dev servers (backend + frontend)
pkill -f "bun.*dev:server" 2>/dev/null || true
pkill -f "bun.*dev:web" 2>/dev/null || true
pkill -f "bun.*packages/server" 2>/dev/null || true
pkill -f "bun.*packages/web" 2>/dev/null || true
pkill -f "vite.*5173" 2>/dev/null || true
pkill -f "bun.*dev:server" 2>/dev/null || true
pkill -f "bun.*dev:web" 2>/dev/null || true
pkill -f "bun.*packages/server" 2>/dev/null || true
pkill -f "bun.*packages/web" 2>/dev/null || true
pkill -f "vite.*5173" 2>/dev/null || true
Kill any leftover processes on our ports
Kill any leftover processes on our ports
lsof -ti:3090 | xargs kill -9 2>/dev/null || true
lsof -ti:5173 | xargs kill -9 2>/dev/null || true
lsof -ti:3090 | xargs kill -9 2>/dev/null || true
lsof -ti:5173 | xargs kill -9 2>/dev/null || true
Wait for ports to free up
Wait for ports to free up
sleep 2
sleep 2
Verify ports are free
Verify ports are free
! lsof -i:3090 && ! lsof -i:5173 && echo "Ports 3090 and 5173 are free" || echo "WARNING: Ports still in use"
undefined! lsof -i:3090 && ! lsof -i:5173 && echo "Ports 3090 and 5173 are free" || echo "WARNING: Ports still in use"
undefined0.2 Install agent-browser (if needed)
0.2 安装agent-browser(如需)
bash
undefinedbash
undefinedCheck if agent-browser is available
Check if agent-browser is available
which agent-browser 2>/dev/null || npx agent-browser --version 2>/dev/null
which agent-browser 2>/dev/null || npx agent-browser --version 2>/dev/null
If not installed globally, install it:
If not installed globally, install it:
npm install -g agent-browser && agent-browser install
npm install -g agent-browser && agent-browser install
On WSL2/Linux, use --with-deps to get Chromium system dependencies:
On WSL2/Linux, use --with-deps to get Chromium system dependencies:
agent-browser install --with-deps
agent-browser install --with-deps
IMPORTANT: Do NOT use bunx — Bun skips postinstall scripts that agent-browser needs.
IMPORTANT: Do NOT use bunx — Bun skips postinstall scripts that agent-browser needs.
Use npx or global npm install.
Use npx or global npm install.
undefinedundefined0.3 Start Archon Backend + Frontend
0.3 启动Archon后端 + 前端
Start both services. Backend must be up before frontend SSE connections work.
bash
undefined启动两个服务。前端的SSE连接正常工作前,后端必须先启动。
bash
undefinedFrom the repo root: /path/to/archon
From the repo root: /path/to/archon
Start backend (port 3090)
Start backend (port 3090)
cd /path/to/archon && bun run dev:server &
sleep 5 # Wait for server initialization + DB
cd /path/to/archon && bun run dev:server &
sleep 5 # Wait for server initialization + DB
Verify backend is healthy
Verify backend is healthy
curl -s http://localhost:3090/api/health | head -c 200
curl -s http://localhost:3090/api/health | head -c 200
Start frontend (port 5173)
Start frontend (port 5173)
cd /path/to/archon && bun run dev:web &
sleep 5 # Wait for Vite dev server
cd /path/to/archon && bun run dev:web &
sleep 5 # Wait for Vite dev server
Verify frontend is serving
Verify frontend is serving
curl -s http://localhost:5173 | head -c 200
**URLs:**
- Frontend: `http://localhost:5173`
- Backend API: `http://localhost:3090/api`
- SSE streams: `http://localhost:3090/api/stream/{conversationId}` (bypasses Vite proxy in dev)curl -s http://localhost:5173 | head -c 200
**URLs:**
- 前端:`http://localhost:5173`
- 后端API:`http://localhost:3090/api`
- SSE流:`http://localhost:3090/api/stream/{conversationId}`(开发环境中绕过Vite代理)0.4 Seed Test Data (if needed)
0.4 植入测试数据(如需)
Check if there are existing codebases and conversations. If empty, create test data:
bash
undefined检查是否存在现有代码库和对话。若为空,则创建测试数据:
bash
undefinedCheck existing codebases
Check existing codebases
curl -s http://localhost:3090/api/codebases | python3 -m json.tool 2>/dev/null || curl -s http://localhost:3090/api/codebases
curl -s http://localhost:3090/api/codebases | python3 -m json.tool 2>/dev/null || curl -s http://localhost:3090/api/codebases
Register the current repo as a codebase (if none exist)
Register the current repo as a codebase (if none exist)
curl -s -X POST http://localhost:3090/api/codebases
-H "Content-Type: application/json"
-d '{"path": "/path/to/archon"}'
-H "Content-Type: application/json"
-d '{"path": "/path/to/archon"}'
curl -s -X POST http://localhost:3090/api/codebases
-H "Content-Type: application/json"
-d '{"path": "/path/to/archon"}'
-H "Content-Type: application/json"
-d '{"path": "/path/to/archon"}'
Create a test conversation
Create a test conversation
curl -s -X POST http://localhost:3090/api/conversations
-H "Content-Type: application/json"
-d '{}' | python3 -m json.tool 2>/dev/null
-H "Content-Type: application/json"
-d '{}' | python3 -m json.tool 2>/dev/null
---curl -s -X POST http://localhost:3090/api/conversations
-H "Content-Type: application/json"
-d '{}' | python3 -m json.tool 2>/dev/null
-H "Content-Type: application/json"
-d '{}' | python3 -m json.tool 2>/dev/null
---Phase 1: Browser Automation — End-to-End Testing
阶段1:浏览器自动化 — 端到端测试
Use the CLI for all browser interactions. Follow the snapshot-refs workflow:
agent-browser- — navigate
agent-browser open <url> - — get interactive elements with refs
agent-browser snapshot -i - Interact using refs (click, fill, etc.)
- Re-snapshot after navigation or DOM changes
Take screenshots at each major test point:
agent-browser screenshot /tmp/archon-test-{name}.png使用 CLI进行所有浏览器交互。遵循快照引用工作流:
agent-browser- — 导航
agent-browser open <url> - — 获取带引用的交互元素
agent-browser snapshot -i - 使用引用进行交互(点击、填充等)
- 导航或DOM变更后重新快照
在每个主要测试点截图:
agent-browser screenshot /tmp/archon-test-{name}.pngTest Suite 1: Dashboard (Route: /
)
/测试套件1:仪表盘(路由:/
)
/1.1 Initial Load
- Open
http://localhost:5173 - Verify dashboard renders: stats cards (Running Workflows, Conversations, System Status)
- Check system health indicator shows "Healthy" (green)
- Screenshot the full dashboard
1.2 Stats Accuracy
- Compare "Running Workflows" count against
GET /api/workflows/runs?status=running - Compare "Conversations" count against
GET /api/conversations - Verify numbers update after creating new data
1.3 Recent Items
- Verify "Recent Conversations" list shows up to 10 items
- Verify "Recent Workflow Runs" list shows up to 10 items
- Click a conversation — verify navigation to
/chat/{id} - Click a workflow run — verify navigation to
/workflows/runs/{id} - Use browser back button — verify return to dashboard
1.4 Empty State
- If no conversations/runs exist: verify the empty state with "New Chat" CTA renders
- Click "New Chat" from empty state — verify navigation to
/chat
1.1 初始加载
- 打开
http://localhost:5173 - 验证仪表盘渲染:统计卡片(运行中工作流、对话、系统状态)
- 检查系统健康指示器显示"Healthy"(绿色)
- 截图完整仪表盘
1.2 统计准确性
- 将"运行中工作流"计数与的结果对比
GET /api/workflows/runs?status=running - 将"对话"计数与的结果对比
GET /api/conversations - 验证创建新数据后数字更新
1.3 最近项
- 验证"最近对话"列表最多显示10项
- 验证"最近工作流运行"列表最多显示10项
- 点击一个对话 — 验证导航至
/chat/{id} - 点击一个工作流运行 — 验证导航至
/workflows/runs/{id} - 使用浏览器返回按钮 — 验证返回仪表盘
1.4 空状态
- 若没有对话/运行存在:验证显示带"New Chat"号召性用语的空状态
- 从空状态点击"New Chat" — 验证导航至
/chat
Test Suite 2: Project Management
测试套件2:项目管理
2.1 Add Project (GitHub URL)
- Click the button next to "Projects" in the sidebar
+ - Fill in a GitHub URL (e.g., )
https://github.com/anthropics/claude-code - Submit and verify the project appears in the sidebar
- Verify the project is auto-selected
2.2 Add Project (Local Path)
- Click again
+ - Fill in a local path (e.g., )
/path/to/archon - Submit and verify the project appears
- Verify deduplication: if the path was already registered, it should not create a duplicate
2.3 Select/Deselect Project
- Click a project in the sidebar — verify it becomes selected (highlighted)
- Verify the sidebar content switches to view (shows project name, repo URL, conversations scoped to project, workflow runs)
ProjectDetail - Click "All Projects" — verify sidebar switches to (all conversations, no project filter)
AllConversationsView - Verify persists selection across page refresh
localStorage
2.4 Delete Project
- Hover over a project — verify the trash icon appears
- Click trash — verify confirmation dialog appears
- Confirm deletion — verify project is removed from list
- Verify conversations and runs associated with the project are handled gracefully
2.5 Project Selector in Collapsible
- When a project is selected, verify the collapsible header shows the project name
- Click the chevron to expand — verify other projects are listed
- Switch projects via the collapsible — verify the view updates
2.1 添加项目(GitHub URL)
- 点击侧边栏中"Projects"旁的按钮
+ - 填写GitHub URL(例如:)
https://github.com/anthropics/claude-code - 提交并验证项目出现在侧边栏中
- 验证项目被自动选中
2.2 添加项目(本地路径)
- 再次点击
+ - 填写本地路径(例如:)
/path/to/archon - 提交并验证项目出现
- 验证去重:若路径已注册,则不应创建重复项
2.3 选择/取消选择项目
- 点击侧边栏中的项目 — 验证它被选中(高亮)
- 验证侧边栏内容切换到视图(显示项目名称、仓库URL、项目范围内的对话、工作流运行)
ProjectDetail - 点击"All Projects" — 验证侧边栏切换到(所有对话,无项目过滤)
AllConversationsView - 验证在页面刷新后保留选择
localStorage
2.4 删除项目
- 悬停在项目上 — 验证垃圾桶图标出现
- 点击垃圾桶 — 验证确认对话框出现
- 确认删除 — 验证项目从列表中移除
- 验证与项目关联的对话和运行被妥善处理
2.5 可折叠面板中的项目选择器
- 选中项目时,验证可折叠面板头部显示项目名称
- 点击 Chevron 展开 — 验证列出其他项目
- 通过可折叠面板切换项目 — 验证视图更新
Test Suite 3: Chat Interface
测试套件3:聊天界面
3.1 New Chat (No Project)
- Click "New Chat" in sidebar (with no project selected)
- Verify empty chat interface renders with message input
- Type a message and send
- Verify: user message appears right-aligned, assistant "thinking" dots appear
- Verify: conversation is created and URL updates to
/chat/{conversationId} - Verify: conversation appears in sidebar
3.2 New Chat (With Project)
- Select a project first
- Click "New Chat"
- Send a message (e.g., )
/status - Verify: conversation is scoped to the selected project
- Verify: project context (cwd, codebase) is attached
3.3 Slash Commands
- Send — verify response shows session status
/status - Send — verify help text renders in markdown
/help - Send — verify command list renders
/commands - Send — verify working directory is shown
/getcwd - Verify: commands execute instantly (no "thinking" animation needed)
3.4 Message Rendering
- Send a message that triggers a markdown response from the AI
- Verify: code blocks render with syntax highlighting
- Verify: tables render properly in assistant messages
- Verify: links open in new tabs ()
target="_blank" - Verify: blockquotes render with left border
- Verify: inline code renders with monospace font
- Send a very long message — verify no layout overflow
3.5 Streaming & Real-time Updates
- Send a message that triggers an AI response
- Verify: blinking cursor appears during streaming
- Verify: text appears incrementally (not all at once)
- Verify: lock indicator shows "Agent is working..."
- Verify: lock indicator hides when response completes
- Verify: message flag clears on completion
isStreaming
3.6 Tool Call Cards
- Send a message that triggers tool usage (e.g., a code question in a project context)
- Verify: tool call cards appear below the assistant message
- Verify: card shows tool name and input summary
- Click to expand a tool card — verify full input JSON and output render
- Verify: running tools show spinner animation and primary border
- Verify: completed tools show duration badge
- Test "Show N more lines" for long tool outputs
3.7 Error Handling
- Trigger an error condition (e.g., send a message with no AI credentials configured)
- Verify: error card renders with AlertCircle icon
- Verify: error classification badge shows (transient/fatal)
- Verify: suggested actions are listed
- Verify: the chat remains functional after an error
3.8 Queue Position
- If possible, trigger multiple concurrent messages to the same conversation
- Verify: queue position indicator appears ("Position N in queue")
- Verify: the lock indicator updates when the queue advances
3.9 Auto-scroll Behavior
- Scroll up during a streaming response
- Verify: auto-scroll stops (respects user scroll position)
- Verify: "Jump to bottom" button appears
- Click "Jump to bottom" — verify scroll snaps to latest message
- Scroll back to bottom manually — verify auto-scroll resumes
3.10 Conversation Navigation
- Create multiple conversations
- Click between them in the sidebar
- Verify: each conversation loads its own message history
- Verify: messages are not leaked between conversations
- Verify: the correct conversation is highlighted in the sidebar
3.1 新聊天(无项目)
- 点击侧边栏中的"New Chat"(未选中项目时)
- 验证渲染空聊天界面及消息输入框
- 输入消息并发送
- 验证:用户消息右对齐显示,助手"思考"圆点出现
- 验证:对话已创建,URL更新为
/chat/{conversationId} - 验证:对话出现在侧边栏中
3.2 新聊天(有项目)
- 先选中一个项目
- 点击"New Chat"
- 发送消息(例如:)
/status - 验证:对话限定在选中项目范围内
- 验证:项目上下文(cwd、代码库)已附加
3.3 斜杠命令
- 发送— 验证响应显示会话状态
/status - 发送— 验证帮助文本以Markdown格式渲染
/help - 发送— 验证命令列表渲染
/commands - 发送— 验证显示工作目录
/getcwd - 验证:命令立即执行(无需"思考"动画)
3.4 消息渲染
- 发送一条触发AI返回Markdown格式响应的消息
- 验证:代码块带语法高亮渲染
- 验证:表格在助手消息中正确渲染
- 验证:链接在新标签页打开()
target="_blank" - 验证:块引用带左侧边框渲染
- 验证:行内代码等宽字体渲染
- 发送一条很长的消息 — 验证无布局溢出
3.5 流式传输与实时更新
- 发送一条触发AI响应的消息
- 验证:流式传输期间出现闪烁光标
- 验证:文本增量显示(并非一次性全部显示)
- 验证:锁定指示器显示"Agent is working..."
- 验证:响应完成后锁定指示器隐藏
- 验证:消息标记在完成后清除
isStreaming
3.6 工具调用卡片
- 发送一条触发工具使用的消息(例如:项目上下文中的代码问题)
- 验证:工具调用卡片出现在助手消息下方
- 验证:卡片显示工具名称和输入摘要
- 点击展开工具卡片 — 验证渲染完整输入JSON和输出
- 验证:运行中的工具显示旋转动画和主边框
- 验证:已完成的工具显示时长徽章
- 测试长工具输出的"Show N more lines"
3.7 错误处理
- 触发错误条件(例如:发送未配置AI凭证的消息)
- 验证:错误卡片带AlertCircle图标渲染
- 验证:显示错误分类徽章(瞬态/致命)
- 验证:列出建议操作
- 验证:错误发生后聊天仍可正常使用
3.8 队列位置
- 若可能,向同一对话触发多条并发消息
- 验证:队列位置指示器出现("Position N in queue")
- 验证:队列前进时锁定指示器更新
3.9 自动滚动行为
- 流式响应期间向上滚动
- 验证:自动滚动停止(尊重用户滚动位置)
- 验证:"Jump to bottom"按钮出现
- 点击"Jump to bottom" — 验证滚动跳转到最新消息
- 手动滚动回底部 — 验证自动滚动恢复
3.10 对话导航
- 创建多个对话
- 在侧边栏中点击切换对话
- 验证:每个对话加载自己的消息历史
- 验证:消息不会在对话间泄露
- 验证:侧边栏中高亮显示正确的对话
Test Suite 4: Conversation Management
测试套件4:对话管理
4.1 Rename Conversation
- Hover over a conversation in the sidebar — verify pencil icon appears
- Click pencil — verify inline edit input appears
- Type a new title and press Enter
- Verify: title updates in sidebar and in the chat header
- Press Escape during rename — verify it cancels without saving
4.2 Delete Conversation
- Hover over a conversation — verify trash icon appears
- Click trash — verify confirmation dialog appears
- Confirm deletion — verify conversation is removed
- If the deleted conversation was active: verify redirect to
/ - Verify: soft-delete (conversation is hidden, not destroyed)
4.3 Auto-title
- Create a new conversation and send a non-command message
- Wait 2-3 seconds
- Verify: the conversation title updates automatically based on the first message
- Verify: title is truncated to ~80 characters
4.4 Search
- Type in the sidebar search bar
- Verify: conversations are filtered by title match
- Clear search — verify all conversations reappear
- Press key — verify search input focuses
/ - Press Escape — verify search clears
4.1 重命名对话
- 悬停在侧边栏中的对话上 — 验证铅笔图标出现
- 点击铅笔 — 验证出现行内编辑输入框
- 输入新标题并按Enter
- 验证:侧边栏和聊天标题中的标题更新
- 重命名期间按Escape — 验证取消操作且不保存
4.2 删除对话
- 悬停在对话上 — 验证垃圾桶图标出现
- 点击垃圾桶 — 验证确认对话框出现
- 确认删除 — 验证对话被移除
- 若被删除的对话处于激活状态:验证重定向至
/ - 验证:软删除(对话被隐藏,未销毁)
4.3 自动标题
- 创建新对话并发送非命令消息
- 等待2-3秒
- 验证:对话标题根据第一条消息自动更新
- 验证:标题截断至约80字符
4.4 搜索
- 在侧边栏搜索栏中输入内容
- 验证:对话按标题匹配过滤
- 清除搜索 — 验证所有对话重新出现
- 按键 — 验证搜索输入框获得焦点
/ - 按Escape — 验证搜索内容清除
Test Suite 5: Workflow Management
测试套件5:工作流管理
5.1 Workflow List Page ()
/workflows- Navigate to
/workflows - Verify: "Available Workflows" tab shows all discovered workflows
- Verify: each workflow card shows name and description
- Verify: "Recent Runs" tab shows recent workflow runs
- Verify: running workflows show a pulsing dot on the "Recent Runs" tab label
5.2 Invoke Workflow from Workflows Page
- Click on a workflow card (e.g., )
archon-assist - Verify: inline run panel expands with project selector and message input
- Select a project from the dropdown
- Type a message and click "Run"
- Verify: conversation is created and navigation goes to
/chat/{conversationId} - Verify: workflow execution begins (messages appear from the AI)
5.3 Invoke Workflow from Sidebar (WorkflowInvoker)
- Select a project in the sidebar
- Verify: workflow dropdown appears in view
ProjectDetail - Select a workflow from the dropdown
- Type a message and submit
- Verify: new conversation created, navigation to chat, workflow runs
5.4 Workflow Router (Agent Orchestrator)
- In a project chat, send a natural language message (e.g., "Help me understand the authentication flow")
- Verify: the router detects the intent and routes to the appropriate workflow
- Verify: workflow dispatch status message appears (e.g., "Dispatching workflow: archon-assist (background)")
- Verify: badge appears with spinner
WorkflowDispatchInline - Verify: clicking the dispatch badge navigates to the workflow run or worker conversation
5.5 Workflow Progress in Chat
- While a workflow is running, verify appears in the chat
WorkflowProgressCard - Verify: compact mode shows workflow name, step count, elapsed time
- Verify: elapsed timer updates every second
- Click "Open Full View" — verify navigation to
/workflows/runs/{runId} - Verify: returning to chat still shows the progress card
5.6 Workflow Execution Page ()
/workflows/runs/:runId- Navigate to an active or completed workflow run
- Verify: header shows workflow name, status, and elapsed time
- Verify: step progress panel (left side) shows all steps with status icons
- Click different steps — verify the log panel (right side) updates
- Verify: "Chat" link back to parent conversation works
- For dispatched workflows: verify renders the worker conversation messages
WorkflowLogs
5.7 Parallel Agent Steps
- Run a workflow with parallel agents (e.g., has 5 parallel agents)
archon-comprehensive-pr-review - Verify: renders showing parent step and nested agent list
ParallelBlockView - Verify: each agent shows its own status (pending/running/completed/failed)
- Verify: overall block status derives correctly (any failed = failed, any running = running, all complete = complete)
- Verify: progress counter shows
(completed/total agents)
5.8 Loop Iterations
- Run a loop workflow (e.g., or
archon-test-loop)archon-ralph-fresh - Verify: renders with iteration counter
LoopIterationView - Verify: progress bar fills proportionally (current/max)
- Verify: each iteration shows status
- Verify: completion signal () ends the loop
<promise>COMPLETE</promise>
5.9 Workflow Artifacts
- After a workflow completes that produces artifacts (PR URLs, commits, branches)
- Verify: component renders at the bottom
ArtifactSummary - Verify: URLs are clickable links opening in new tabs
- Verify: artifact type icons are correct (PR, Commit, Branch, File)
5.10 Workflow Stale Detection
- During a running workflow, if the SSE connection drops briefly
- Verify: indicator appears on the workflow card
stale - Verify: polling fallback kicks in (checks every 15 seconds)
- Verify: stale state clears when fresh data arrives
5.11 Cancel Workflow
- While a workflow is running, look for "Cancel" button
- If present: click and verify the workflow status changes to failed/cancelled
- If not present: note this as a UX gap
5.1 工作流列表页面()
/workflows- 导航至
/workflows - 验证:"Available Workflows"标签页显示所有已发现的工作流
- 验证:每个工作流卡片显示名称和描述
- 验证:"Recent Runs"标签页显示最近的工作流运行
- 验证:运行中的工作流在"Recent Runs"标签页标签上显示脉动点
5.2 从工作流页面调用工作流
- 点击工作流卡片(例如:)
archon-assist - 验证:展开行内运行面板,包含项目选择器和消息输入框
- 从下拉菜单中选择项目
- 输入消息并点击"Run"
- 验证:对话已创建,导航至
/chat/{conversationId} - 验证:工作流执行开始(AI发送消息)
5.3 从侧边栏调用工作流(WorkflowInvoker)
- 在侧边栏中选择项目
- 验证:视图中出现工作流下拉菜单
ProjectDetail - 从下拉菜单中选择工作流
- 输入消息并提交
- 验证:创建新对话,导航至聊天界面,工作流运行
5.4 工作流路由器(Agent Orchestrator)
- 在项目聊天中,发送自然语言消息(例如:"Help me understand the authentication flow")
- 验证:路由器检测意图并路由至合适的工作流
- 验证:显示工作流调度状态消息(例如:"Dispatching workflow: archon-assist (background)")
- 验证:出现带旋转器的徽章
WorkflowDispatchInline - 验证:点击调度徽章导航至工作流运行或Worker对话
5.5 聊天中的工作流进度
- 工作流运行期间,验证聊天中出现
WorkflowProgressCard - 验证:紧凑模式显示工作流名称、步骤数、耗时
- 验证:耗时计时器每秒更新
- 点击"Open Full View" — 验证导航至
/workflows/runs/{runId} - 验证:返回聊天仍显示进度卡片
5.6 工作流执行页面()
/workflows/runs/:runId- 导航至活跃或已完成的工作流运行
- 验证:头部显示工作流名称、状态和耗时
- 验证:步骤进度面板(左侧)显示所有步骤及状态图标
- 点击不同步骤 — 验证日志面板(右侧)更新
- 验证:返回父对话的"Chat"链接有效
- 对于已调度的工作流:验证渲染Worker对话消息
WorkflowLogs
5.7 并行Agent步骤
- 运行带并行Agent的工作流(例如:有5个并行Agent)
archon-comprehensive-pr-review - 验证:渲染,显示父步骤和嵌套Agent列表
ParallelBlockView - 验证:每个Agent显示自己的状态(待处理/运行中/已完成/失败)
- 验证:整体块状态正确推导(任意失败=失败,任意运行中=运行中,全部完成=完成)
- 验证:进度计数器显示
(completed/total agents)
5.8 循环迭代
- 运行循环工作流(例如:或
archon-test-loop)archon-ralph-fresh - 验证:渲染,带迭代计数器
LoopIterationView - 验证:进度条按比例填充(当前/最大值)
- 验证:每个迭代显示状态
- 验证:完成信号()终止循环
<promise>COMPLETE</promise>
5.9 工作流产物
- 完成产生产物(PR URL、提交、分支)的工作流后
- 验证:底部渲染组件
ArtifactSummary - 验证:URL为可点击链接,在新标签页打开
- 验证:产物类型图标正确(PR、提交、分支、文件)
5.10 工作流 stale 检测
- 工作流运行期间,若SSE连接短暂断开
- 验证:工作流卡片上出现指示器
stale - 验证:轮询回退启动(每15秒检查一次)
- 验证:获取新数据后stale状态清除
5.11 取消工作流
- 工作流运行期间,查找"Cancel"按钮
- 若存在:点击并验证工作流状态变为失败/已取消
- 若不存在:记录为UX缺口
Test Suite 6: Project-Scoped Views
测试套件6:项目范围视图
6.1 Project Detail — Conversations
- Select a project
- Verify: only conversations scoped to that project appear
- Create a new chat within the project
- Verify: the new conversation appears in the filtered list
- Verify: conversations from other projects are NOT shown
6.2 Project Detail — Workflow Runs
- Verify: workflow runs scoped to the selected project appear
- Verify: runs are sorted by priority: failed > running > completed
- Click a run — verify navigation to
/workflows/runs/{id} - Verify: conversation status dots show on conversations with active runs
6.3 Cross-Project Navigation
- Start a workflow in Project A
- Switch to Project B in the sidebar
- Verify: Project A's workflow is not visible in Project B's view
- Switch back to Project A — verify the workflow run is still visible
- Click "All Projects" — verify you can see conversations from all projects
6.1 项目详情 — 对话
- 选择项目
- 验证:仅显示该项目范围内的对话
- 在项目内创建新聊天
- 验证:新对话出现在过滤后的列表中
- 验证:不显示其他项目的对话
6.2 项目详情 — 工作流运行
- 验证:显示选中项目范围内的工作流运行
- 验证:运行按优先级排序:失败 > 运行中 > 已完成
- 点击运行 — 验证导航至
/workflows/runs/{id} - 验证:带活跃运行的对话显示状态点
6.3 跨项目导航
- 在项目A中启动工作流
- 在侧边栏中切换到项目B
- 验证:项目A的工作流在项目B的视图中不可见
- 切换回项目A — 验证工作流运行仍可见
- 点击"All Projects" — 验证可查看所有项目的对话
Test Suite 7: SSE & Real-time Infrastructure
测试套件7:SSE与实时基础设施
7.1 SSE Connection
- Open browser DevTools Network tab (via or console)
agent-browser eval - Verify: EventSource connection to is established
/api/stream/{conversationId} - Verify: heartbeat events arrive every ~30 seconds
- Verify: connection state is OPEN (readyState 1)
7.2 SSE Reconnection
- Kill the backend server temporarily
- Verify: the UI shows a disconnected state (grey dot in header)
- Restart the backend
- Verify: SSE reconnects automatically
- Verify: the connection indicator turns green again
- Verify: buffered messages are delivered on reconnect
7.3 Multiple Tabs
- Open the same conversation in two browser tabs (use for parallel)
agent-browser --session - Send a message from tab 1
- Verify: response streams in BOTH tabs (SSE fan-out via stream registry replacement)
- Note: the web adapter replaces old streams on new connections, so only the latest tab gets live SSE
7.1 SSE连接
- 打开浏览器开发者工具网络标签(通过或控制台)
agent-browser eval - 验证:建立与的EventSource连接
/api/stream/{conversationId} - 验证:心跳事件约每30秒到达一次
- 验证:连接状态为OPEN(readyState 1)
7.2 SSE重连
- 临时终止后端服务器
- 验证:UI显示断开连接状态(头部灰色点)
- 重启后端
- 验证:SSE自动重连
- 验证:连接指示器再次变为绿色
- 验证:重连后传递缓冲消息
7.3 多标签页
- 在两个浏览器标签页中打开同一对话(使用实现并行)
agent-browser --session - 从标签页1发送消息
- 验证:响应在两个标签页中流式传输(通过流注册表替换实现SSE扇出)
- 注意:Web适配器在新连接时替换旧流,因此只有最新标签页获得实时SSE
Test Suite 8: UI/UX Quality Audit
测试套件8:UI/UX质量审计
8.1 Visual Hierarchy & Dark Theme
- Screenshot the full app at different states
- Verify: text hierarchy (primary/secondary/tertiary) is readable
- Verify: interactive elements have clear hover states
- Verify: accent colors (blue-purple) are used consistently
- Verify: success (green), warning (amber), error (red) colors are correct
- Verify: borders and dividers create clear visual separation
8.2 Loading States
- Observe loading states when:
- Dashboard is loading
- Conversation messages are loading
- Workflows list is loading
- Workflow runs are fetching
- Verify: all loading states show appropriate feedback (spinners, skeletons, or text)
- Verify: no blank/flash-of-unstyled-content moments
8.3 Empty States
- Check empty states for:
- No conversations (dashboard + sidebar)
- No projects registered
- No workflows available
- No workflow runs
- No messages in a conversation
- Verify: each empty state has a helpful message and CTA
8.4 Responsiveness
- Set viewport to different sizes:
bash
agent-browser set viewport 1920 1080 # Desktop agent-browser set viewport 1366 768 # Laptop agent-browser set viewport 1024 768 # Tablet landscape agent-browser set viewport 768 1024 # Tablet portrait agent-browser set viewport 375 812 # Mobile - At each size: screenshot and check for layout breakage, overflow, truncation
8.5 Sidebar Resize
- Drag the sidebar resize handle
- Verify: sidebar width changes smoothly (240-400px range)
- Verify: width persists in localStorage across refresh
- Verify: content reflows properly at different sidebar widths
8.6 Keyboard Navigation
- Press — verify search focuses
/ - Press — verify search clears
Escape - Press in message input — verify sends message
Enter - Press — verify inserts newline (does NOT send)
Shift+Enter - Tab through interactive elements — verify focus order is logical
8.7 Copy/Clipboard
- Click the working directory path in the chat header
- Verify: path copies to clipboard
- Verify: visual feedback (tooltip or flash) indicates copy succeeded
8.8 External Links
- Click "Open in IDE" button (VSCode link)
- Verify: URL is constructed correctly
vscode://file/... - Click links in assistant messages — verify they open in new tabs
8.1 视觉层次与深色主题
- 在不同状态下截图完整应用
- 验证:文本层次(主/次/三级)可读
- 验证:交互元素有清晰的悬停状态
- 验证:强调色(蓝紫色)使用一致
- 验证:成功(绿色)、警告(琥珀色)、错误(红色)颜色正确
- 验证:边框和分隔线创建清晰的视觉分隔
8.2 加载状态
- 观察以下场景的加载状态:
- 仪表盘加载
- 对话消息加载
- 工作流列表加载
- 工作流运行获取中
- 验证:所有加载状态显示适当反馈(旋转器、骨架屏或文本)
- 验证:无空白/未样式内容闪现的情况
8.3 空状态
- 检查以下场景的空状态:
- 无对话(仪表盘 + 侧边栏)
- 无已注册项目
- 无可用工作流
- 无工作流运行
- 对话中无消息
- 验证:每个空状态都有有用的消息和号召性用语
8.4 响应式
- 设置不同视口尺寸:
bash
agent-browser set viewport 1920 1080 # Desktop agent-browser set viewport 1366 768 # Laptop agent-browser set viewport 1024 768 # Tablet landscape agent-browser set viewport 768 1024 # Tablet portrait agent-browser set viewport 375 812 # Mobile - 每个尺寸下:截图并检查布局断裂、溢出、截断情况
8.5 侧边栏调整大小
- 拖动侧边栏调整手柄
- 验证:侧边栏宽度平滑变化(范围240-400px)
- 验证:宽度在页面刷新后通过localStorage保留
- 验证:不同侧边栏宽度下内容正确重排
8.6 键盘导航
- 按— 验证搜索框获得焦点
/ - 按— 验证搜索内容清除
Escape - 在消息输入框按— 验证发送消息
Enter - 按— 验证插入换行(不发送)
Shift+Enter - 按Tab键遍历交互元素 — 验证焦点顺序符合逻辑
8.7 复制/剪贴板
- 点击聊天头部的工作目录路径
- 验证:路径复制到剪贴板
- 验证:视觉反馈(提示框或闪烁)表明复制成功
8.8 外部链接
- 点击"Open in IDE"按钮(VSCode链接)
- 验证:正确构造URL
vscode://file/... - 点击助手消息中的链接 — 验证在新标签页打开
Test Suite 9: Edge Cases & Stress Tests
测试套件9:边缘情况与压力测试
9.1 Rapid Message Sending
- Send multiple messages in quick succession (before previous responses complete)
- Verify: messages are queued properly (no duplicate or lost messages)
- Verify: lock indicator shows queue position
- Verify: responses arrive in order
9.2 Long Content
- Send a message that produces very long output (e.g., "List all files in the project")
- Verify: markdown renders without layout overflow
- Verify: code blocks have horizontal scroll
- Verify: truncation works (500 chars / 8 lines with "Show more")
WorkflowResultCard - Verify: tool call output truncation works (20 lines shown, expandable)
9.3 Special Characters
- Send messages with special characters: , markdown chars
<script>alert('xss')</script>, emoji*_[]() - Verify: no XSS vulnerability (HTML is escaped)
- Verify: markdown renders correctly
- Verify: emoji displays properly
9.4 Browser Refresh During Streaming
- While AI is streaming a response, refresh the page
- Verify: on reload, historical messages are loaded from the API
- Verify: any in-progress response is not lost (persisted segments appear)
- Verify: SSE reconnects and picks up new events
9.5 Concurrent Workflows
- Launch 2-3 workflows simultaneously (different projects or same project)
- Verify: each workflow tracks independently
- Verify: workflow progress cards in respective chats are correct
- Verify: no cross-contamination of events between workflows
9.6 Network Latency
- Add artificial network latency if possible
- Verify: UI remains responsive during slow responses
- Verify: loading indicators appear for slow API calls
- Verify: no timeout errors in normal usage
9.1 快速发送消息
- 快速连续发送多条消息(在前一个响应完成前)
- 验证:消息正确排队(无重复或丢失消息)
- 验证:锁定指示器显示队列位置
- 验证:响应按顺序到达
9.2 长内容
- 发送产生超长输出的消息(例如:"List all files in the project")
- 验证:Markdown渲染无布局溢出
- 验证:代码块有水平滚动
- 验证:截断功能正常(500字符/8行,带"Show more")
WorkflowResultCard - 验证:工具调用输出截断功能正常(显示20行,可展开)
9.3 特殊字符
- 发送含特殊字符的消息:、Markdown字符
<script>alert('xss')</script>、表情符号*_[]() - 验证:无XSS漏洞(HTML已转义)
- 验证:Markdown正确渲染
- 验证:表情符号正常显示
9.4 流式传输期间浏览器刷新
- AI流式响应期间,刷新页面
- 验证:重新加载时,从API加载历史消息
- 验证:任何进行中的响应未丢失(显示已持久化的片段)
- 验证:SSE重连并接收新事件
9.5 并发工作流
- 同时启动2-3个工作流(不同项目或同一项目)
- 验证:每个工作流独立跟踪
- 验证:各自聊天中的工作流进度卡片正确
- 验证:工作流间无事件交叉污染
9.6 网络延迟
- 若可能,添加人工网络延迟
- 验证:缓慢响应期间UI保持响应
- 验证:缓慢API调用时显示加载指示器
- 验证:正常使用中无超时错误
Phase 2: Codebase Review
阶段2:代码库审查
Read the source code of every component and module listed below. For each, evaluate:
- Correctness: Are there logic bugs, race conditions, or broken state transitions?
- UX quality: Does the component provide good feedback, handle edge cases, feel polished?
- Performance: Are there unnecessary re-renders, missing memoization, or expensive operations?
- Accessibility: Are interactive elements properly labeled? Keyboard navigable?
- Error handling: Are errors caught, displayed, and recoverable?
阅读以下列出的每个组件和模块的源代码。针对每个项,评估:
- 正确性:是否存在逻辑漏洞、竞争条件或损坏的状态转换?
- UX质量:组件是否提供良好反馈、处理边缘情况、感觉精致?
- 性能:是否存在不必要的重渲染、缺少 memoization 或昂贵操作?
- 可访问性:交互元素是否正确标记?是否支持键盘导航?
- 错误处理:错误是否被捕获、显示并可恢复?
Frontend Files to Review
前端待审查文件
| File | Focus Areas |
|---|---|
| Route config, error boundary, QueryClient settings |
| SSE handler correctness, message state management, new-chat flow, workflow dispatch handling |
| Auto-scroll, WorkflowDispatchInline polling, WorkflowResultCard truncation |
| Markdown rendering, streaming cursor, thinking dots |
| Expand/collapse, running state animation, output truncation |
| Timer accuracy, compact vs full mode, stale indicator |
| Show/hide transitions, queue position display |
| Enter vs Shift+Enter, auto-resize, disabled state |
| Resize drag, project add flow, search, new chat, localStorage persistence |
| Scoped queries, conversation status dots, workflow run sorting |
| Workflow fetch, create conversation + run flow, error handling |
| Search filtering, codebase map construction, "New Chat" |
| Delete confirmation, "All Projects" button |
| Rename inline edit, delete flow, active state highlighting |
| Two-tab layout, inline run panel, running indicator pulse |
| Initial data reconstruction from events, live SSE overlay, worker vs parent flows |
| Read-only chat view, SSE handlers, message filtering by timestamp |
| Step list rendering, parallel block delegation, active step highlight |
| Virtual scrolling, auto-scroll, metadata header |
| Artifact type icons, URL links, path display |
| Progress bar math, max iteration capping |
| Overall status derivation, nested agent list |
| Text batching (50ms flush), reconnection, handler ref stability |
| Workflow state map, polling fallback (15s), stale detection |
| Scroll threshold (50px), user scroll-up detection |
| SSE_BASE_URL calculation, error handling, 404 swallowing |
| SSEEvent union completeness, ChatMessage fields, WorkflowState shape |
| Cache key correctness, memory management |
| Stale project ID cleanup, codebase polling interval |
| 文件 | 聚焦领域 |
|---|---|
| 路由配置、错误边界、QueryClient设置 |
| SSE处理正确性、消息状态管理、新聊天流程、工作流调度处理 |
| 自动滚动、WorkflowDispatchInline轮询、WorkflowResultCard截断 |
| Markdown渲染、流式光标、思考圆点 |
| 展开/折叠、运行状态动画、输出截断 |
| 计时器准确性、紧凑/完整模式、stale指示器 |
| 显示/隐藏过渡、队列位置显示 |
| Enter与Shift+Enter、自动调整大小、禁用状态 |
| 调整大小拖动、项目添加流程、搜索、新聊天、localStorage持久化 |
| 范围查询、对话状态点、工作流运行排序 |
| 工作流获取、创建对话+运行流程、错误处理 |
| 搜索过滤、代码库映射构建、"New Chat" |
| 删除确认、"All Projects"按钮 |
| 行内编辑重命名、删除流程、活跃状态高亮 |
| 双标签页布局、行内运行面板、运行指示器脉动 |
| 从事件重建初始数据、实时SSE覆盖、Worker与父流程 |
| 只读聊天视图、SSE处理程序、按时间戳过滤消息 |
| 步骤列表渲染、并行块委托、活跃步骤高亮 |
| 虚拟滚动、自动滚动、元数据头部 |
| 产物类型图标、URL链接、路径显示 |
| 进度条计算、最大迭代限制 |
| 整体状态推导、嵌套Agent列表 |
| 文本批处理(50ms刷新)、重连、处理程序引用稳定性 |
| 工作流状态映射、轮询回退(15秒)、stale检测 |
| 滚动阈值(50px)、用户向上滚动检测 |
| SSE_BASE_URL计算、错误处理、404忽略 |
| SSEEvent联合完整性、ChatMessage字段、WorkflowState形状 |
| 缓存键正确性、内存管理 |
| Stale项目ID清理、代码库轮询间隔 |
Backend Files to Review
后端待审查文件
| File | Focus Areas |
|---|---|
| Endpoint correctness, CORS, SSE heartbeat loop, workflow run endpoint, codebase deduplication |
| sendMessage category filtering, structured event handling, lock event flushing |
| Segment splitting logic, tool call duration tracking, flush timing, 50-segment cap |
| Stream replacement race condition fix, buffer limits (100 msg / 200 conv), zombie reaper |
| Event mapping completeness, bridge subscription lifecycle, parent forwarding |
| Router prompt construction, background dispatch fire-and-forget, isolation resolution |
| Stale workflow detection (15min), step session continuity, parallel Promise.all, loop completion signal |
| Case-insensitive matching, multiline regex, fallback behavior |
| Listener error isolation, max listener cap, run registration lifecycle |
| 文件 | 聚焦领域 |
|---|---|
| 端点正确性、CORS、SSE心跳循环、工作流运行端点、代码库去重 |
| sendMessage类别过滤、结构化事件处理、锁定事件刷新 |
| 分段拆分逻辑、工具调用时长跟踪、刷新时机、50分段上限 |
| 流替换竞争条件修复、缓冲区限制(100消息/200对话)、僵尸回收 |
| 事件映射完整性、桥接订阅生命周期、父级转发 |
| 路由器提示构建、后台调度即发即弃、隔离解析 |
| Stale工作流检测(15分钟)、步骤会话连续性、并行Promise.all、循环完成信号 |
| 不区分大小写匹配、多行正则、回退行为 |
| 监听器错误隔离、最大监听器上限、运行注册生命周期 |
Review Checklist
审查清单
For every file reviewed, note findings in these categories:
- Bugs — Logic errors, race conditions, state inconsistencies, crashes
- UX Issues — Missing feedback, confusing interactions, unclear states, dead ends
- Performance — Unnecessary re-renders, missing React.memo/useMemo/useCallback, expensive computations in render
- Accessibility — Missing ARIA labels, focus management gaps, screen reader issues
- Error Handling — Unhandled promise rejections, missing try/catch, silent failures
- Code Quality — Dead code, TODOs, inconsistent patterns, missing types
对于每个审查的文件,按以下类别记录发现:
- 漏洞 — 逻辑错误、竞争条件、状态不一致、崩溃
- UX问题 — 缺少反馈、交互混淆、状态不清晰、死胡同
- 性能 — 不必要的重渲染、缺少React.memo/useMemo/useCallback、渲染中的昂贵计算
- 可访问性 — 缺少ARIA标签、焦点管理缺口、屏幕阅读器问题
- 错误处理 — 未处理的Promise拒绝、缺少try/catch、静默失败
- 代码质量 — 死代码、TODO、不一致模式、缺少类型
Phase 3: Report
阶段3:报告
After completing all tests and reviews, produce a structured report:
完成所有测试和审查后,生成结构化报告:
Report Format
报告格式
markdown
undefinedmarkdown
undefinedArchon Web UI Validation Report
Archon Web UI验证报告
Date: {date}
Tester: Claude Code (agent-browser + codebase review)
Archon Version: {git commit hash}
Screenshots: /tmp/archon-test-*.png
日期:{date}
测试者:Claude Code(agent-browser + 代码库审查)
Archon版本:{git commit hash}
截图:/tmp/archon-test-*.png
Executive Summary
执行摘要
{2-3 sentences: overall quality assessment, critical issues count, UX rating}
{2-3句话:整体质量评估、关键问题数量、UX评分}
Critical Bugs (P0)
关键漏洞(P0)
{Bugs that break core functionality or lose data}
{破坏核心功能或丢失数据的漏洞}
Major Issues (P1)
主要问题(P1)
{Issues that significantly degrade the experience}
{显著降低体验的问题}
Minor Issues (P2)
次要问题(P2)
{Polish items, edge cases, visual inconsistencies}
{优化项、边缘情况、视觉不一致}
UX Recommendations
UX建议
{Suggestions for improving the user experience — not just bugs but "could be better"}
{改善用户体验的建议 — 不仅是漏洞,还有"可以更好"的地方}
Accessibility Findings
可访问性发现
{Keyboard nav gaps, ARIA issues, contrast problems}
{键盘导航缺口、ARIA问题、对比度问题}
Performance Observations
性能观察
{Slow renders, unnecessary work, optimization opportunities}
{缓慢渲染、不必要的工作、优化机会}
Codebase Quality Notes
代码库质量说明
{Dead code, inconsistencies, architectural concerns}
{死代码、不一致性、架构问题}
What's Working Well
表现良好的方面
{Positive findings — features that are solid, patterns that are good}
{积极发现 — 稳定的功能、良好的模式}
Detailed Test Results
详细测试结果
Dashboard Tests
仪表盘测试
| Test | Status | Notes |
|---|---|---|
| 1.1 Initial Load | PASS/FAIL | ... |
| ... |
| 测试项 | 状态 | 备注 |
|---|---|---|
| 1.1 初始加载 | PASS/FAIL | ... |
| ... |
Project Management Tests
项目管理测试
...
...
Chat Interface Tests
聊天界面测试
...
...
Workflow Management Tests
工作流管理测试
...
undefined...
undefinedKey Question to Answer
关键问题
Is Archon currently doing the best it possibly can to solve the problem of managing a lot of agents in parallel and executing custom workflows with full visibility?
Specifically evaluate:
- Can users easily see what all their agents are doing at a glance?
- Is workflow status visible and understandable without clicking through multiple pages?
- Can users quickly navigate between the orchestrator chat, individual workflow runs, and task logs?
- Is the experience of kicking off a workflow through the router intuitive?
- Are parallel agents presented clearly with their individual status?
- Does the UI surface errors and issues prominently enough?
- Is the overall information architecture logical for someone managing 5-10 concurrent agents?
Archon当前是否在最大程度上解决了并行Agent管理和自定义工作流执行的全可见性问题?
具体评估:
- 用户能否一目了然地看到所有Agent的工作状态?
- 工作流状态是否可见且无需点击多个页面即可理解?
- 用户能否快速在编排器聊天、单个工作流运行和任务日志之间导航?
- 通过路由器启动工作流的体验是否直观?
- 并行Agent是否清晰展示各自的状态?
- UI是否足够突出地显示错误和问题?
- 对于管理5-10个并发Agent的用户,整体信息架构是否合理?
Execution Notes
执行说明
- Run all commands via the Bash tool
agent-browser - Use if not installed globally
npx agent-browser - After each navigation, re-snapshot () to get fresh refs
agent-browser snapshot -i - Take screenshots liberally — save to
/tmp/archon-test-{section}-{name}.png - If a test fails, document it immediately and continue to the next test
- Use after actions that trigger API calls
agent-browser wait --load networkidle - For SSE testing, use to check EventSource state
agent-browser eval - Remember: WSL2 headless mode works fine — no display server needed
- Close the browser session when done:
agent-browser close
- 通过Bash工具运行所有命令
agent-browser - 若未全局安装,使用
npx agent-browser - 每次导航后,重新快照()以获取最新引用
agent-browser snapshot -i - 大量截图 — 保存至
/tmp/archon-test-{section}-{name}.png - 若测试失败,立即记录并继续下一个测试
- 触发API调用的操作后,使用
agent-browser wait --load networkidle - SSE测试时,使用检查EventSource状态
agent-browser eval - 注意:WSL2无头模式运行正常 — 无需显示服务器
- 完成后关闭浏览器会话:
agent-browser close