semantic-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemantic Search Skill
语义搜索Skill
Search through files and directories for content using keyword matching and basic semantic analysis.
使用关键词匹配和基础语义分析来搜索文件和目录中的内容。
When to Use
适用场景
✅ USE this skill when:
- Finding code that implements a feature
- Searching documentation for topics
- Locating files by their content
- Finding similar code patterns
- Researching codebase structure
✅ 当以下情况时使用该Skill:
- 查找实现某个功能的代码
- 搜索文档中的特定主题
- 根据内容定位文件
- 查找相似的代码模式
- 研究代码库结构
When NOT to Use
不适用场景
❌ DON'T use this skill when:
- Searching binary files → use file tools
- Exact regex patterns → use grep
- Searching very large repos (>100k files) → use indexed search
❌ 以下情况请勿使用该Skill:
- 搜索二进制文件 → 使用文件工具
- 精确正则表达式模式 → 使用grep
- 搜索超大型仓库(>100k文件)→ 使用索引搜索
Installation
安装
bash
cd /job
npm install natural compromisebash
cd /job
npm install natural compromiseFeatures
功能特性
- Keyword Search: Simple text matching across files
- Stemming: Matches word variations (run, running, ran)
- TF-IDF Scoring: Ranks results by relevance
- File Filtering: Filter by extension, path patterns
- Context: Shows surrounding lines for each match
- 关键词搜索: 在文件中进行简单文本匹配
- 词干提取: 匹配单词变体(如run、running、ran)
- TF-IDF评分: 根据相关性对结果排序
- 文件过滤: 按扩展名、路径模式过滤
- 上下文展示: 显示每个匹配结果的周边代码行
Usage
使用方法
Basic Search
基础搜索
javascript
const { searchFiles } = require('./semantic-search');
const results = await searchFiles('.', {
query: 'authentication middleware',
extensions: ['.js', '.ts'],
maxResults: 20
});
console.log(results);javascript
const { searchFiles } = require('./semantic-search');
const results = await searchFiles('.', {
query: 'authentication middleware',
extensions: ['.js', '.ts'],
maxResults: 20
});
console.log(results);Advanced Search
高级搜索
javascript
const results = await searchFiles('/path/to/code', {
query: 'error handling database',
excludeDirs: ['node_modules', 'dist', '.git'],
extensions: ['.js', '.ts', '.py'],
contextLines: 3,
maxResults: 50,
minScore: 0.3
});javascript
const results = await searchFiles('/path/to/code', {
query: 'error handling database',
excludeDirs: ['node_modules', 'dist', '.git'],
extensions: ['.js', '.ts', '.py'],
contextLines: 3,
maxResults: 50,
minScore: 0.3
});Node.js Implementation
Node.js实现
javascript
const fs = require('fs');
const path = require('path');
const natural = require('natural');
class SemanticSearcher {
constructor(options = {}) {
this.stemmer = natural.PorterStemmer;
this.tokenizer = new natural.WordTokenizer();
this.maxFileSize = options.maxFileSize || 1024 * 1024; // 1MB
this.excludeDirs = options.excludeDirs || [
'node_modules', 'dist', 'build', '.git', 'vendor',
'__pycache__', '.next', '.nuxt'
];
}
tokenize(text) {
return this.tokenizer.tokenize(text.toLowerCase())
.map(token => this.stemmer.stem(token));
}
calculateTF(tokens) {
const tf = {};
tokens.forEach(token => {
tf[token] = (tf[token] || 0) + 1;
});
const maxFreq = Math.max(...Object.values(tf));
Object.keys(tf).forEach(key => {
tf[key] /= maxFreq;
});
return tf;
}
scoreDocument(queryTokens, docTokens) {
const querySet = new Set(queryTokens);
let score = 0;
docTokens.forEach(token => {
if (querySet.has(token)) score++;
});
return score / Math.max(docTokens.length, 1);
}
async searchFiles(rootDir, query, options = {}) {
const queryTokens = this.tokenize(query);
const results = [];
const files = await this.walkDirectory(rootDir, options);
for (const file of files) {
try {
const content = await fs.promises.readFile(file, 'utf-8');
const tokens = this.tokenize(content);
const score = this.scoreDocument(queryTokens, tokens);
if (score > (options.minScore || 0.1)) {
const lines = content.split('\n');
const matchLines = this.findMatchingLines(lines, queryTokens, options.contextLines || 2);
results.push({
file: path.relative(rootDir, file),
score: score.toFixed(3),
matches: matchLines,
totalLines: lines.length
});
}
} catch (e) {
// Skip unreadable files
}
}
return results.sort((a, b) => parseFloat(b.score) - parseFloat(a.score))
.slice(0, options.maxResults || 20);
}
async walkDirectory(dir, options = {}) {
const files = [];
const extensions = options.extensions || null;
async function walk(currentDir) {
const entries = await fs.promises.readdir(currentDir, { withFileTypes: true });
for (const entry of entries) {
if (entry.isDirectory()) {
if (!this.excludeDirs.includes(entry.name)) {
await walk(path.join(currentDir, entry.name));
}
} else if (entry.isFile()) {
if (!extensions || extensions.some(ext => entry.name.endsWith(ext))) {
const filePath = path.join(currentDir, entry.name);
const stats = await fs.promises.stat(filePath);
if (stats.size <= this.maxFileSize) {
files.push(filePath);
}
}
}
}
}
await walk.call(this, dir);
return files;
}
findMatchingLines(lines, queryTokens, contextLines) {
const matches = [];
lines.forEach((line, index) => {
const lineTokens = this.tokenize(line);
const matchCount = lineTokens.filter(t => queryTokens.includes(t)).length;
if (matchCount > 0) {
const start = Math.max(0, index - contextLines);
const end = Math.min(lines.length, index + contextLines + 1);
matches.push({
lineNumber: index + 1,
content: line.trim(),
context: lines.slice(start, end).join('\n'),
matchScore: matchCount
});
}
});
return matches.slice(0, 10);
}
}
// Usage
const searcher = new SemanticSearcher();
const results = await searcher.searchFiles('.', 'authentication', {
extensions: ['.js', '.ts'],
maxResults: 10
});
console.log(JSON.stringify(results, null, 2));javascript
const fs = require('fs');
const path = require('path');
const natural = require('natural');
class SemanticSearcher {
constructor(options = {}) {
this.stemmer = natural.PorterStemmer;
this.tokenizer = new natural.WordTokenizer();
this.maxFileSize = options.maxFileSize || 1024 * 1024; // 1MB
this.excludeDirs = options.excludeDirs || [
'node_modules', 'dist', 'build', '.git', 'vendor',
'__pycache__', '.next', '.nuxt'
];
}
tokenize(text) {
return this.tokenizer.tokenize(text.toLowerCase())
.map(token => this.stemmer.stem(token));
}
calculateTF(tokens) {
const tf = {};
tokens.forEach(token => {
tf[token] = (tf[token] || 0) + 1;
});
const maxFreq = Math.max(...Object.values(tf));
Object.keys(tf).forEach(key => {
tf[key] /= maxFreq;
});
return tf;
}
scoreDocument(queryTokens, docTokens) {
const querySet = new Set(queryTokens);
let score = 0;
docTokens.forEach(token => {
if (querySet.has(token)) score++;
});
return score / Math.max(docTokens.length, 1);
}
async searchFiles(rootDir, query, options = {}) {
const queryTokens = this.tokenize(query);
const results = [];
const files = await this.walkDirectory(rootDir, options);
for (const file of files) {
try {
const content = await fs.promises.readFile(file, 'utf-8');
const tokens = this.tokenize(content);
const score = this.scoreDocument(queryTokens, tokens);
if (score > (options.minScore || 0.1)) {
const lines = content.split('\n');
const matchLines = this.findMatchingLines(lines, queryTokens, options.contextLines || 2);
results.push({
file: path.relative(rootDir, file),
score: score.toFixed(3),
matches: matchLines,
totalLines: lines.length
});
}
} catch (e) {
// Skip unreadable files
}
}
return results.sort((a, b) => parseFloat(b.score) - parseFloat(a.score))
.slice(0, options.maxResults || 20);
}
async walkDirectory(dir, options = {}) {
const files = [];
const extensions = options.extensions || null;
async function walk(currentDir) {
const entries = await fs.promises.readdir(currentDir, { withFileTypes: true });
for (const entry of entries) {
if (entry.isDirectory()) {
if (!this.excludeDirs.includes(entry.name)) {
await walk(path.join(currentDir, entry.name));
}
} else if (entry.isFile()) {
if (!extensions || extensions.some(ext => entry.name.endsWith(ext))) {
const filePath = path.join(currentDir, entry.name);
const stats = await fs.promises.stat(filePath);
if (stats.size <= this.maxFileSize) {
files.push(filePath);
}
}
}
}
}
await walk.call(this, dir);
return files;
}
findMatchingLines(lines, queryTokens, contextLines) {
const matches = [];
lines.forEach((line, index) => {
const lineTokens = this.tokenize(line);
const matchCount = lineTokens.filter(t => queryTokens.includes(t)).length;
if (matchCount > 0) {
const start = Math.max(0, index - contextLines);
const end = Math.min(lines.length, index + contextLines + 1);
matches.push({
lineNumber: index + 1,
content: line.trim(),
context: lines.slice(start, end).join('\n'),
matchScore: matchCount
});
}
});
return matches.slice(0, 10);
}
}
// Usage
const searcher = new SemanticSearcher();
const results = await searcher.searchFiles('.', 'authentication', {
extensions: ['.js', '.ts'],
maxResults: 10
});
console.log(JSON.stringify(results, null, 2));Command Line Usage
命令行使用
bash
undefinedbash
undefinedSearch for authentication code
搜索认证相关代码
node index.js search "auth middleware" --ext .js,.ts --max 10
node index.js search "auth middleware" --ext .js,.ts --max 10
Search with context
带上下文搜索
node index.js search "error handling" --context 5
node index.js search "error handling" --context 5
Search specific directory
搜索指定目录
node index.js search "database" --dir src/
undefinednode index.js search "database" --dir src/
undefinedOutput Format
输出格式
json
{
"query": "authentication middleware",
"totalMatches": 5,
"results": [
{
"file": "src/middleware/auth.js",
"score": "0.847",
"matches": [
{
"lineNumber": 42,
"content": "function authenticateUser(token) {",
"context": "...",
"matchScore": 3
}
]
}
]
}json
{
"query": "authentication middleware",
"totalMatches": 5,
"results": [
{
"file": "src/middleware/auth.js",
"score": "0.847",
"matches": [
{
"lineNumber": 42,
"content": "function authenticateUser(token) {",
"context": "...",
"matchScore": 3
}
]
}
]
}Quick Tips
快速提示
- Use specific terms: "JWT validation" not just "auth"
- Include type hints: ".js" files often have different patterns
- Multiple words improve accuracy
- Use camelCase terms for code search
- 使用特定术语:比如用"JWT验证"而不只是"auth"
- 包含类型提示:.js文件通常有不同的模式
- 多词查询提高准确性
- 搜索代码时使用驼峰式术语
Notes
注意事项
- Searches text files only
- Case-insensitive matching
- Stemming improves recall
- Scores range from 0.0 to 1.0
- 仅搜索文本文件
- 不区分大小写匹配
- 词干提取提高召回率
- 评分范围为0.0到1.0