video-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Search Workflows
视频搜索工作流
Alpha Feature — not recommended for production use.
Search video archives by natural language using Cosmos Embed1 embeddings. Requires the search profile — deploy with the skill (). These videos sources can be ingested files or RTSP streams.
deploy-p searchAlpha 功能 — 不推荐用于生产环境。
使用Cosmos Embed1嵌入模型通过自然语言搜索视频档案。需要搜索配置文件——使用技能()进行部署。视频来源可以是已导入的文件或RTSP流。
deploy-p searchWhen to Use
使用场景
- "Find all instances of forklifts"
- "When did someone enter the restricted area?"
- "Show me people near the loading dock"
- "Search for vehicles between 8am and noon"
- Any natural-language search across video archives
- "查找所有叉车出现的画面"
- "有人进入限制区域的时间是何时?"
- "展示装卸码头附近的人员"
- "搜索早上8点到12点之间的车辆"
- 任何针对视频档案的自然语言搜索需求
Deployment prerequisite
部署前提
This skill requires the VSS search profile running on the host at . Before any request:
$HOST_IP-
Probe the stack:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \ && curl -sf --max-time 5 "http://${HOST_IP}:9200/" >/dev/null(The second check confirms Elasticsearch is up — unique to the search profile.) -
If the probe fails, ask the user:"The VSSprofile isn't running on
search. Shall I deploy it now using the$HOST_IPskill with/deploy?"-p search- If yes → hand off to the skill. Return here once it succeeds.
/deploy - If no → stop. Do not run this skill against a missing or wrong-profile stack.
(If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invokedirectly.)/deploy - If yes → hand off to the
-
If the probe passes, proceed.
此技能要求VSS search配置文件在对应的主机上运行。发起任何请求前:
$HOST_IP-
探测服务栈:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \ && curl -sf --max-time 5 "http://${HOST_IP}:9200/" >/dev/null(第二个检查用于确认Elasticsearch已启动——这是search配置文件独有的要求。) -
如果探测失败,询问用户:"VSS配置文件未在
search上运行。是否现在使用$HOST_IP技能并指定/deploy进行部署?"-p search- 如果用户同意 → 转交至技能。部署成功后再返回此流程。
/deploy - 如果用户拒绝 → 终止流程。不得在缺少对应配置文件或配置文件错误的服务栈上运行此技能。
(如果调用者已明确预先授权自主部署——例如请求中注明"已预先授权部署前置依赖",或在具备该权限的非交互式评估环境中运行——则跳过确认步骤,直接调用。)/deploy - 如果用户同意 → 转交至
-
如果探测通过,继续后续流程。
How Search Works
搜索原理
- Ingest — Videos are uploaded or streamed via VIOS. The RTVI-Embed service (Cosmos Embed1) generates vector embeddings for video segments.
- Index — Embeddings are stored in Elasticsearch via the Kafka pipeline.
- Query — Natural-language queries are embedded and matched against stored vectors by similarity.
- Results — Timestamped video segments ranked by relevance, with clip playback links.
This search orchestrated by VSS agent can lead to 3 behaviors:
- Attribute-only: when the LLM decomposes the query and finds only appearance attributes with no action (e.g. "person wearing red jacket")
- Embed-only: when the query has no extractable attributes (e.g. "show me forklifts")
- Fusion: when the query has both an action and attributes (e.g., "person in red jacket running"), it runs embed search first, then reranks using attribute search
- 导入 — 视频通过VIOS上传或流式传输。RTVI-Embed服务(基于Cosmos Embed1)为视频片段生成向量嵌入。
- 索引 — 嵌入向量通过Kafka管道存储到Elasticsearch中。
- 查询 — 自然语言查询会被转换为嵌入向量,并与存储的向量进行相似度匹配。
- 结果 — 返回按相关性排序的带时间戳视频片段,附带片段播放链接。
由VSS agent编排的此搜索会产生三种行为:
- 仅属性搜索:当LLM分解查询后发现只有外观属性而无动作描述时(例如"穿红色夹克的人")
- 仅嵌入搜索:当查询无法提取出属性时(例如"展示叉车画面")
- 融合搜索:当查询同时包含动作和属性时(例如"穿红色夹克的人在奔跑"),先执行嵌入搜索,再通过属性搜索重新排序结果
Mandatory workflow
强制工作流
When using this skill, ALWAYS follow this high-level workflow:
- Resolve inputs from user instructions — HARD STOP if is not explicitly provided. See § Input resolution below. Do NOT default to
$HOST_IP,localhost, the host the agent itself is running on, or any other guess. Do NOT issue a127.0.0.1request until the user has supplied an endpoint. Respond to the user with a single question asking forPOST http://.../generate/ the VSS agent endpoint and wait.HOST_IP - Run the search(es) via approach chosen
- Present the results to the user query. Format response as a professional inspection report but name it : — Use clear section headers
Video Search Results- Organize findings individually with supporting detail, and close with a summary
- Use tables where comparisons help. Write like a technical report, not a chat message.
- CRITICAL: Verify the results and explain this to the user concisely. If search fails, or returns unexpected results (i.e. videos that do not appear to match user query, zero matches, zero videos returned, error etc.), STOP. Do not proceed without reading troubleshooting.md to iterate with feedback loops until proper results are found and presented like a professional inspection report.
- Final verifications:
- ALWAYS inform user that final and further verifications can be run. Present this as a
Verification Step - ONLY IF user agrees, download screenshots using the of the best candidates (highest similarity scores) from the search hits (JSON results) to
screenshot_url. Read them and verify if they correspond to the user query/tmp
- ALWAYS inform user that final and further verifications can be run. Present this as a
使用此技能时,必须遵循以下高层级工作流:
- 从用户指令中解析输入——如果未明确提供则立即终止。详见下方§ 输入解析部分。不得默认使用
$HOST_IP、localhost、agent自身运行的主机或其他猜测值。在用户提供端点前,不得发送127.0.0.1请求。向用户发送单一问题,请求提供POST http://.../generate/ VSS agent端点并等待回复。HOST_IP - 通过选定的方式执行搜索
- 向用户查询展示结果。将响应格式化为专业检查报告,命名为: — 使用清晰的章节标题
视频搜索结果- 逐条整理发现内容并附带支持细节,最后添加总结
- 在需要对比的地方使用表格。撰写风格需类似技术报告,而非聊天消息。
- 关键步骤:验证结果并向用户简要说明。 如果搜索失败,或返回意外结果(例如视频与用户查询不匹配、无匹配结果、未返回视频、出现错误等),立即终止流程。必须先阅读troubleshooting.md,通过反馈循环迭代直至得到正确结果,并以专业检查报告形式呈现,否则不得继续。
- 最终验证:
- 必须告知用户可以进行最终及进一步验证。将此作为呈现
验证步骤 - 仅当用户同意时,从搜索结果(JSON格式)中选取相似度最高的候选结果,通过其将截图下载至
screenshot_url。查看截图并验证是否符合用户查询/tmp
- 必须告知用户可以进行最终及进一步验证。将此作为
Input resolution
输入解析
Infer these inputs only from the conversation or user query (no other files unless provided). If some cannot be inferred, ask the user immediately:
- $HOST_IP: where the VSS agent backend runs
仅从对话或用户查询中推断这些输入(除非提供其他文件)。如果某些输入无法推断,立即询问用户:
- $HOST_IP:VSS agent后端运行的地址
Gotchas
注意事项
- ALWAYS step into the troubleshooting step of the workflow immediately if anything unexpected happens, read troubleshooting.md
- Queries work best with concrete visual descriptions (objects, actions, locations). Augment user queries if needed to enhance the quality of the questions, expanding potential details
- User queries to do video search supposes video sources are already ingested. No need to search for them locally. Assume this unless the findings show the video source is not ingested yet
- Use skill to cross-reference search results with incident/alert data
video-analytics
- 一旦出现任何意外情况,立即进入工作流中的故障排除步骤,阅读troubleshooting.md
- 查询在包含具体视觉描述(物体、动作、位置)时效果最佳。如有需要,可扩充用户查询以提升问题质量,补充潜在细节
- 用户发起视频搜索请求时,默认视频来源已完成导入。无需在本地搜索视频来源。 除非搜索结果显示视频来源尚未导入,否则默认此前提成立
- 使用技能将搜索结果与事件/告警数据进行交叉验证
video-analytics
Search via REST API
通过REST API搜索
Default to using this REST API approach, unless user specifies otherwise.
bash
undefined默认使用此REST API方式,除非用户另有指定。
bash
undefinedConsider only ingested video file sources by default
默认仅考虑已导入的视频文件来源
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts"}' | jq .
undefinedcurl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts"}' | jq .
undefinedMore Examples
更多示例
bash
undefinedbash
undefinedSearch by object
按物体搜索
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find vehicles in the parking lot"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find vehicles in the parking lot"}' | jq .
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find vehicles in the parking lot"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find vehicles in the parking lot"}' | jq .
Search by action
按动作搜索
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "show me people running"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "show me people running"}' | jq .
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "show me people running"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "show me people running"}' | jq .
Search by time context
按时间上下文搜索
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "what happened at the entrance between 2pm and 3pm?"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "what happened at the entrance between 2pm and 3pm?"}' | jq .
curl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "what happened at the entrance between 2pm and 3pm?"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "what happened at the entrance between 2pm and 3pm?"}' | jq .
Consider only RTSP sources with search_source_type
filter i.e. live camera streams
search_source_type使用search_source_type
过滤器仅搜索RTSP来源(即实时摄像头流)
search_source_typecurl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts", "search_source_type": "rtsp"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts", "search_source_type": "rtsp"}' | jq .
undefinedcurl -s -X POST http://${HOST_IP}:8000/generate
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts", "search_source_type": "rtsp"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "find all instances of forklifts", "search_source_type": "rtsp"}' | jq .
undefinedAdvanced control knobs
高级控制选项
If user query is ambiguous, user wants more guidance or when fine-grained control is needed, augment the user by calling out explicitly certain options in plain-text and steering the agent in the desired direction. Available control axes:
input_message| Axes | Type | Default | Description |
|---|---|---|---|
| string[] | null | Filter to specific cameras or sensor names |
| int | 10 | Max results |
| float | 0.0 | Min similarity threshold; raise (e.g. 0.3) to filter noise |
| bool | true | VLM verifies each result and removes false positives |
| string | null | Filter by camera metadata (e.g. location, category) if metadata is available |
Pick and choose some of these tuning options. Adjust them as needed for the user’s situation and query.
For examples of discovery modes leveraging these, see discovery_modes.md.
当用户查询模糊、用户需要更多指导或需要细粒度控制时,可以通过在用户的中明确调用某些选项,以自然语言引导agent执行所需操作。可用的控制维度:
input_message| 控制维度 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string[] | null | 筛选特定摄像头或传感器名称 |
| int | 10 | 最大返回结果数 |
| float | 0.0 | 最小相似度阈值;提高阈值(例如0.3)可过滤噪声结果 |
| bool | true | VLM验证每个结果并移除误报 |
| string | null | 如果元数据可用,按摄像头元数据(例如位置、类别)筛选 |
根据用户的场景和查询需求选择并调整这些调优选项。如需了解利用这些选项的发现模式示例,请查看discovery_modes.md。
Search via Agent UI
通过Agent UI搜索
Open and type natural-language queries:
http://${HOST_IP}:3000/find all instances of forklifts
show me people near the loading dock
when did a truck arrive at the gate?
find someone wearing a red jacketResults include timestamped clips with similarity scores.
打开并输入自然语言查询:
http://${HOST_IP}:3000/find all instances of forklifts
show me people near the loading dock
when did a truck arrive at the gate?
find someone wearing a red jacket结果包含带时间戳的片段及相似度评分。
Interact via Browser (agent-browser)
通过浏览器交互(agent-browser)
bash
npx agent-browser --auto-connect open http://${HOST_IP}:3000
npx agent-browser --auto-connect wait --load networkidle
npx agent-browser --auto-connect snapshot -iFind the chat input, enter a search query, and snapshot results.
bash
npx agent-browser --auto-connect open http://${HOST_IP}:3000
npx agent-browser --auto-connect wait --load networkidle
npx agent-browser --auto-connect snapshot -i找到聊天输入框,输入搜索查询并截取结果截图。