browser-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Automation

浏览器自动化

When to Use This Skill

何时使用本技能

  • Setting up browser automation (Playwright, Puppeteer, Selenium)
  • Testing Chrome extensions (Manifest V3)
  • Cloud browser testing (LambdaTest, BrowserStack)
  • Using Chrome DevTools Protocol (CDP)
  • Handling dynamic/lazy-loaded content
  • Debugging automation issues
  • 搭建浏览器自动化环境(Playwright、Puppeteer、Selenium)
  • 测试Chrome扩展(Manifest V3)
  • 云端浏览器测试(LambdaTest、BrowserStack)
  • 使用Chrome DevTools Protocol(CDP)
  • 处理动态/懒加载内容
  • 调试自动化问题

Tool Selection

工具选择

Playwright (Recommended)

Playwright(推荐)

bash
npm install -D @playwright/test
Pros:
  • Multi-browser support (Chrome, Firefox, Safari, Edge)
  • Built-in test runner with great DX
  • Auto-wait mechanisms reduce flakiness
  • Excellent debugging tools (trace viewer, inspector)
  • Strong TypeScript support
Use when:
  • Cross-browser testing needed
  • Writing end-to-end tests
  • TypeScript project

bash
npm install -D @playwright/test
优势:
  • 多浏览器支持(Chrome、Firefox、Safari、Edge)
  • 内置测试运行器,开发者体验(DX)出色
  • 自动等待机制减少测试不稳定问题
  • 优秀的调试工具(追踪查看器、检查器)
  • 完善的TypeScript支持
适用场景:
  • 需要跨浏览器测试
  • 编写端到端测试
  • TypeScript项目

Puppeteer

Puppeteer

bash
npm install puppeteer
Pros:
  • Simpler API, easier to learn
  • Smaller footprint
  • Direct Chrome/Chromium control
  • Official Chrome team project
Use when:
  • Only need Chrome
  • Simple automation tasks
  • Quick scripts/prototypes

bash
npm install puppeteer
优势:
  • API更简洁,易于学习
  • 资源占用更小
  • 可直接控制Chrome/Chromium
  • Chrome官方团队维护项目
适用场景:
  • 仅需支持Chrome浏览器
  • 简单自动化任务
  • 快速编写脚本/原型

Selenium

Selenium

bash
npm install selenium-webdriver
Use when:
  • Legacy projects already using it
  • Multi-language team
  • Need specific Selenium features

bash
npm install selenium-webdriver
适用场景:
  • 已有项目正在使用Selenium
  • 团队使用多语言开发
  • 需要Selenium特定功能

AI-Powered Automation

AI驱动的自动化工具

Stagehand
bash
npm install @anthropic-ai/stagehand
AI agent that automates web tasks using Claude + CDP.
Use when:
  • Complex multi-step web workflows
  • Dynamic/changing UIs
  • Natural language task descriptions
  • Have budget for LLM API calls
Not suitable for:
  • Chrome extension testing
  • Simple, predictable automation
  • Cost-sensitive projects

Browser-Use
bash
pip install browser-use
Python library for LLM-controlled browser automation.
Use when:
  • Python-based projects
  • Need AI to navigate/interact with sites
  • Exploratory automation

Skyvern
Vision-based web automation using computer vision + LLMs.
Use when:
  • Sites with no accessible DOM selectors
  • Need to handle CAPTCHAs/complex visuals
  • Budget for vision API calls

Stagehand
bash
npm install @anthropic-ai/stagehand
基于Claude + CDP实现网页任务自动化的AI Agent。
适用场景:
  • 复杂多步骤网页工作流
  • 动态/频繁变化的UI
  • 支持自然语言任务描述
  • 有预算调用大语言模型API
不适用场景:
  • Chrome扩展测试
  • 简单、可预测的自动化任务
  • 对成本敏感的项目

Browser-Use
bash
pip install browser-use
由大语言模型控制的Python浏览器自动化库。
适用场景:
  • Python技术栈项目
  • 需要AI导航/交互网页
  • 探索性自动化任务

Skyvern
结合计算机视觉与大语言模型的视觉型网页自动化工具。
适用场景:
  • 网站无可用DOM选择器
  • 需要处理验证码/复杂视觉元素
  • 有预算调用视觉API

Chrome Extension Testing

Chrome扩展测试

Local Testing (Recommended)

本地测试(推荐)

For Manifest V3 Extensions:
javascript
// playwright.config.ts
export default defineConfig({
  use: {
    headless: false,
    args: [
      `--disable-extensions-except=${extensionPath}`,
      `--load-extension=${extensionPath}`,
    ],
  },
})
Find extension ID via CDP:
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensionTarget = targetInfos.find((target: any) =>
  target.type === 'service_worker' &&
  target.url.startsWith('chrome-extension://')
)

const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]
Navigate to extension pages:
typescript
await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)

针对Manifest V3扩展:
javascript
// playwright.config.ts
export default defineConfig({
  use: {
    headless: false,
    args: [
      `--disable-extensions-except=${extensionPath}`,
      `--load-extension=${extensionPath}`,
    ],
  },
})
通过CDP获取扩展ID:
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensionTarget = targetInfos.find((target: any) =>
  target.type === 'service_worker' &&
  target.url.startsWith('chrome-extension://')
)

const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]
导航至扩展页面:
typescript
await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)

Cloud Testing Limitations

云端测试限制

What works:
  • Extension uploads to LambdaTest/BrowserStack
  • Extensions load in cloud browsers
  • Service workers run
  • Can test content scripts on regular sites
What doesn't work:
  • Cannot navigate to
    chrome-extension://
    URLs
  • All attempts blocked with
    net::ERR_BLOCKED_BY_CLIENT
Why: Cloud platforms block extension URLs for security in shared environments.
Verdict: Use local testing for extension UI testing. Cloud for content script testing only.

支持的功能:
  • 扩展可上传至LambdaTest/BrowserStack
  • 扩展可在云端浏览器中加载
  • 服务工作者可正常运行
  • 可在常规网站上测试内容脚本
不支持的功能:
  • 无法导航至
    chrome-extension://
    协议的URL
  • 所有尝试都会被拦截并返回
    net::ERR_BLOCKED_BY_CLIENT
原因: 云端平台为保障共享环境的安全性,拦截了扩展协议的URL。
结论: 扩展UI测试请使用本地环境,仅内容脚本测试可使用云端环境。

Chrome DevTools Protocol (CDP)

Chrome DevTools Protocol(CDP)

Get All Browser Targets

获取所有浏览器目标

typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')

Execute Code in Extension Context

在扩展上下文中执行代码

typescript
// Attach to extension service worker
const swTarget = await client.send('Target.attachToTarget', {
  targetId: extensionTarget.targetId,
  flatten: true,
})

// Execute in service worker context
await client.send('Runtime.evaluate', {
  expression: `
    chrome.storage.local.get(['key']).then(console.log)
  `,
  awaitPromise: true,
})
typescript
// 连接至扩展服务工作者
const swTarget = await client.send('Target.attachToTarget', {
  targetId: extensionTarget.targetId,
  flatten: true,
})

// 在服务工作者上下文中执行代码
await client.send('Runtime.evaluate', {
  expression: `
    chrome.storage.local.get(['key']).then(console.log)
  `,
  awaitPromise: true,
})

Intercept Network Requests

拦截网络请求

typescript
await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
  patterns: [{ urlPattern: '*' }],
})

client.on('Network.requestIntercepted', async (event) => {
  await client.send('Network.continueInterceptedRequest', {
    interceptionId: event.interceptionId,
    headers: { ...event.request.headers, 'X-Custom': 'value' },
  })
})
typescript
await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
  patterns: [{ urlPattern: '*' }],
})

client.on('Network.requestIntercepted', async (event) => {
  await client.send('Network.continueInterceptedRequest', {
    interceptionId: event.interceptionId,
    headers: { ...event.request.headers, 'X-Custom': 'value' },
  })
})

Get Console Messages

获取控制台消息

typescript
await client.send('Runtime.enable')
await client.send('Log.enable')

client.on('Runtime.consoleAPICalled', (event) => {
  console.log('Console:', event.args.map(a => a.value))
})

client.on('Runtime.exceptionThrown', (event) => {
  console.error('Exception:', event.exceptionDetails)
})

typescript
await client.send('Runtime.enable')
await client.send('Log.enable')

client.on('Runtime.consoleAPICalled', (event) => {
  console.log('控制台输出:', event.args.map(a => a.value))
})

client.on('Runtime.exceptionThrown', (event) => {
  console.error('异常信息:', event.exceptionDetails)
})

Handling Dynamic Content

处理动态内容

Wait Strategies

等待策略

typescript
// Wait for specific content
await page.waitForSelector('.product-price', { timeout: 10000 })

// Wait for network to be idle
await page.goto(url, { waitUntil: 'networkidle' })

// Wait for custom condition
await page.waitForFunction(() => {
  return document.querySelectorAll('.item').length > 10
})
typescript
// 等待指定内容加载完成
await page.waitForSelector('.product-price', { timeout: 10000 })

// 等待网络空闲
await page.goto(url, { waitUntil: 'networkidle' })

// 等待自定义条件满足
await page.waitForFunction(() => {
  return document.querySelectorAll('.item').length > 10
})

Time-Based vs Scroll-Based Lazy Loading

基于时间 vs 基于滚动的懒加载

Key insight: Some sites load content based on time elapsed, not scroll position.
Testing approach:
javascript
// Test 1: Wait with no scroll
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length

// Test 2: Scroll immediately
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length

// If same result: site uses time-based loading
// No scroll automation needed - just wait
Benefits of detecting time-based loading:
  • Simpler automation code
  • No visual disruption
  • More reliable extraction

关键洞察: 部分网站的内容加载基于时间流逝,而非滚动位置。
测试方法:
javascript
// 测试1:不滚动,仅等待
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length

// 测试2:立即滚动
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length

// 若结果相同:网站使用基于时间的加载策略
// 无需实现滚动自动化,仅需等待即可
检测基于时间的加载策略的优势:
  • 自动化代码更简洁
  • 无视觉干扰
  • 内容提取更可靠

Handling Lazy-Loaded Images

处理懒加载图片

javascript
// Force lazy images to load
await page.evaluate(() => {
  // Handle data-src → src pattern
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src
  })

  // Handle loading="lazy" attribute
  document.querySelectorAll('[loading="lazy"]').forEach(el => {
    el.loading = 'eager'
  })
})

javascript
// 强制加载懒加载图片
await page.evaluate(() => {
  // 处理data-src → src的懒加载模式
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src
  })

  // 处理带有loading="lazy"属性的图片
  document.querySelectorAll('[loading="lazy"]').forEach(el => {
    el.loading = 'eager'
  })
})

Advanced Lazy Loading Techniques

高级懒加载处理技巧

Googlebot-Style Tall Viewport

模拟Googlebot的高视口策略

Key insight: Googlebot doesn't scroll - it uses a 12,140px viewport and manipulates IntersectionObserver.
javascript
// Temporarily expand document for IntersectionObserver
async function triggerLazyLoadViaViewport() {
  const originalHeight = document.documentElement.style.height;
  const originalOverflow = document.documentElement.style.overflow;

  // Googlebot uses 12,140px mobile / 9,307px desktop
  document.documentElement.style.height = '20000px';
  document.documentElement.style.overflow = 'visible';

  // Wait for observers to trigger
  await new Promise(r => setTimeout(r, 500));

  // Restore
  document.documentElement.style.height = originalHeight;
  document.documentElement.style.overflow = originalOverflow;
}
Pros: No visible scrolling, works with standard IntersectionObserver Cons: Won't work with scroll-event listeners or virtualized lists

关键洞察: Googlebot不会滚动页面,它会使用12140px的视口并操纵IntersectionObserver。
javascript
// 临时扩展文档高度以触发IntersectionObserver
async function triggerLazyLoadViaViewport() {
  const originalHeight = document.documentElement.style.height;
  const originalOverflow = document.documentElement.style.overflow;

  // Googlebot移动端使用12140px,桌面端使用9307px
  document.documentElement.style.height = '20000px';
  document.documentElement.style.overflow = 'visible';

  // 等待观察者触发
  await new Promise(r => setTimeout(r, 500));

  // 恢复原始设置
  document.documentElement.style.height = originalHeight;
  document.documentElement.style.overflow = originalOverflow;
}
优势: 无视觉滚动,兼容标准IntersectionObserver 劣势: 对滚动事件监听或虚拟列表无效

IntersectionObserver Override

重写IntersectionObserver

Patch IntersectionObserver before page loads to force everything to "intersect":
javascript
// Must inject at document_start (before page JS runs)
const script = document.createElement('script');
script.textContent = `
  const OriginalIO = window.IntersectionObserver;
  window.IntersectionObserver = function(callback, options) {
    // Override rootMargin to include everything off-screen
    const modifiedOptions = {
      ...options,
      rootMargin: '10000px 10000px 10000px 10000px'
    };
    return new OriginalIO(callback, modifiedOptions);
  };
  window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);
Pros: Elegant, works at the source, no DOM manipulation Cons: Must inject before page JS runs, may break other functionality

在页面加载前注入代码,重写IntersectionObserver以强制所有元素触发“交叉”状态:
javascript
// 必须在document_start阶段注入(页面JS加载前)
const script = document.createElement('script');
script.textContent = `
  const OriginalIO = window.IntersectionObserver;
  window.IntersectionObserver = function(callback, options) {
    // 修改rootMargin以包含所有屏幕外元素
    const modifiedOptions = {
      ...options,
      rootMargin: '10000px 10000px 10000px 10000px'
    };
    return new OriginalIO(callback, modifiedOptions);
  };
  window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);
优势: 实现优雅,从根源解决问题,无需DOM操作 劣势: 必须在页面JS加载前注入,可能影响其他功能

Direct Attribute Manipulation

直接修改属性

Force lazy elements to load by modifying their attributes:
javascript
function forceLoadLazyContent() {
  // Handle data-src → src pattern
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src;
  });

  document.querySelectorAll('[data-srcset]').forEach(el => {
    if (!el.srcset) el.srcset = el.dataset.srcset;
  });

  // Handle background images
  document.querySelectorAll('[data-background]').forEach(el => {
    el.style.backgroundImage = `url(${el.dataset.background})`;
  });

  // Trigger lazysizes library if present
  if (window.lazySizes) {
    document.querySelectorAll('.lazyload').forEach(el => {
      window.lazySizes.loader.unveil(el);
    });
  }
}

通过修改元素属性强制加载懒加载内容:
javascript
function forceLoadLazyContent() {
  // 处理data-src → src模式
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src;
  });

  document.querySelectorAll('[data-srcset]').forEach(el => {
    if (!el.srcset) el.srcset = el.dataset.srcset;
  });

  // 处理背景图片
  document.querySelectorAll('[data-background]').forEach(el => {
    el.style.backgroundImage = `url(${el.dataset.background})`;
  });

  // 若页面存在lazysizes库,触发加载
  if (window.lazySizes) {
    document.querySelectorAll('.lazyload').forEach(el => {
      window.lazySizes.loader.unveil(el);
    });
  }
}

MutationObserver for Progressive Extraction

使用MutationObserver实现渐进式提取

Watch for DOM changes and extract content as it loads:
javascript
function setupProgressiveExtraction(onNewContent) {
  let debounceTimer = null;

  const observer = new MutationObserver((mutations) => {
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => {
      const addedNodes = mutations
        .flatMap(m => Array.from(m.addedNodes))
        .filter(n => n.nodeType === Node.ELEMENT_NODE);

      if (addedNodes.length > 0) {
        onNewContent(addedNodes);
      }
    }, 300);
  });

  observer.observe(document.body, {
    childList: true,
    subtree: true
  });

  return () => observer.disconnect();
}

监听DOM变化,在内容加载时实时提取:
javascript
function setupProgressiveExtraction(onNewContent) {
  let debounceTimer = null;

  const observer = new MutationObserver((mutations) => {
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => {
      const addedNodes = mutations
        .flatMap(m => Array.from(m.addedNodes))
        .filter(n => n.nodeType === Node.ELEMENT_NODE);

      if (addedNodes.length > 0) {
        onNewContent(addedNodes);
      }
    }, 300);
  });

  observer.observe(document.body, {
    childList: true,
    subtree: true
  });

  return () => observer.disconnect();
}

Lazy Loading Decision Matrix

懒加载策略决策矩阵

ApproachScrolling?ReliabilityComplexity
Tall ViewportNoMediumLow
IO OverrideNoMediumMedium
Attribute ManipulationNoLowLow
MutationObserverUser-initiatedHighLow
Recommendation: Start with IO Override + Tall Viewport for most cases. Use MutationObserver when user scrolling is acceptable.

方法是否需要滚动可靠性复杂度
高视口策略中等
重写IntersectionObserver中等中等
直接修改属性
MutationObserver需要用户触发滚动
推荐方案: 大多数场景优先使用重写IntersectionObserver + 高视口策略。若允许用户触发滚动,可使用MutationObserver

Vanity URLs vs Internal IDs

vanity URL vs 内部ID

Problem: Some sites use vanity URLs that differ from internal identifiers.
URL: /user/john-smith
Internal ID: john-smith-a2b3c4d5
Solution: Match by displayed content, not URL:
javascript
// Strategy 1: Try URL-based ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)

// Strategy 2: Fall back to displayed name
if (!profile) {
  const displayedName = document.querySelector('h1')?.textContent?.trim()
  profile = findByName(displayedName)
}

问题: 部分网站使用 vanity URL(友好URL),与内部标识符不一致。
URL: /user/john-smith
内部ID: john-smith-a2b3c4d5
解决方案: 基于显示内容匹配,而非URL:
javascript
// 策略1:尝试从URL中提取ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)

// 策略2: fallback到显示的名称
if (!profile) {
  const displayedName = document.querySelector('h1')?.textContent?.trim()
  profile = findByName(displayedName)
}

Cloud Browser Integration

云端浏览器集成

LambdaTest Setup

LambdaTest配置

typescript
// playwright.lambdatest.config.ts
const capabilities = {
  'LT:Options': {
    'username': process.env.LT_USERNAME,
    'accessKey': process.env.LT_ACCESS_KEY,
    'platformName': 'Windows 10',
    'browserName': 'Chrome',
    'browserVersion': 'latest',
  }
}

export default defineConfig({
  projects: [{
    name: 'lambdatest',
    use: {
      connectOptions: {
        wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
      },
    },
  }],
})

typescript
// playwright.lambdatest.config.ts
const capabilities = {
  'LT:Options': {
    'username': process.env.LT_USERNAME,
    'accessKey': process.env.LT_ACCESS_KEY,
    'platformName': 'Windows 10',
    'browserName': 'Chrome',
    'browserVersion': 'latest',
  }
}

export default defineConfig({
  projects: [{
    name: 'lambdatest',
    use: {
      connectOptions: {
        wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
      },
    },
  }],
})

Performance Optimization

性能优化

Block Unnecessary Resources

拦截不必要的资源

typescript
await page.route('**/*', route => {
  const type = route.request().resourceType()
  if (['image', 'font', 'media'].includes(type)) {
    route.abort()
  } else {
    route.continue()
  }
})
typescript
await page.route('**/*', route => {
  const type = route.request().resourceType()
  if (['image', 'font', 'media'].includes(type)) {
    route.abort()
  } else {
    route.continue()
  }
})

Reuse Browser Context

复用浏览器上下文

typescript
// Good: Reuse browser, create new contexts
const browser = await chromium.launch()
for (const url of urls) {
  const context = await browser.newContext()
  const page = await context.newPage()
  // ...
  await context.close()
}
await browser.close()
typescript
// 推荐:复用浏览器实例,创建新的上下文
const browser = await chromium.launch()
for (const url of urls) {
  const context = await browser.newContext()
  const page = await context.newPage()
  // ...执行任务
  await context.close()
}
await browser.close()

Parallel Execution

并行执行

typescript
import pLimit from 'p-limit'
const limit = pLimit(5) // Max 5 concurrent

await Promise.all(
  urls.map(url => limit(() => processUrl(url)))
)

typescript
import pLimit from 'p-limit'
const limit = pLimit(5) // 最大并发数为5

await Promise.all(
  urls.map(url => limit(() => processUrl(url)))
)

Debugging

调试技巧

Visual Debugging

可视化调试

typescript
// Screenshots
await page.screenshot({ path: 'debug.png' })

// Video recording
const context = await browser.newContext({
  recordVideo: { dir: 'videos/' }
})
typescript
// 截图
await page.screenshot({ path: 'debug.png' })

// 录制视频
const context = await browser.newContext({
  recordVideo: { dir: 'videos/' }
})

Trace Viewer

追踪查看器

typescript
await context.tracing.start({ screenshots: true, snapshots: true })
// ... run test
await context.tracing.stop({ path: 'trace.zip' })

// View: npx playwright show-trace trace.zip
typescript
await context.tracing.start({ screenshots: true, snapshots: true })
// ...运行测试
await context.tracing.stop({ path: 'trace.zip' })

// 查看追踪:npx playwright show-trace trace.zip

Slow Motion & Pause

慢动作执行与暂停

typescript
const browser = await chromium.launch({
  headless: false,
  slowMo: 1000,
})

await page.pause() // Opens Playwright Inspector

typescript
const browser = await chromium.launch({
  headless: false,
  slowMo: 1000,
})

await page.pause() // 打开Playwright Inspector

Quick Reference

快速参考

Common Selectors

常用选择器

typescript
// CSS
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')

// Text
await page.locator('text="Exact text"')

// Playwright-specific
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')
typescript
// CSS选择器
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')

// 文本选择器
await page.locator('text="Exact text"')

// Playwright专属选择器
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')

Data Extraction

数据提取

typescript
// Single element
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')

// Multiple elements
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))

// Complex extraction
const data = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.product')).map(el => ({
    title: el.querySelector('.title')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }))
})

typescript
// 提取单个元素内容
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')

// 提取多个元素内容
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))

// 复杂数据提取
const data = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.product')).map(el => ({
    title: el.querySelector('.title')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }))
})

Common Issues

常见问题

Element Not Found

元素未找到

typescript
// Wait for element
await page.waitForSelector('.element', { state: 'visible' })

// Check if in iframe
const frame = page.frame({ url: /example\.com/ })
if (frame) {
  await frame.waitForSelector('.element')
}
typescript
// 等待元素出现
await page.waitForSelector('.element', { state: 'visible' })

// 检查元素是否在iframe中
const frame = page.frame({ url: /example\.com/ })
if (frame) {
  await frame.waitForSelector('.element')
}

Browser Connection Lost

浏览器连接丢失

typescript
try {
  await page.goto(url)
} catch (error) {
  if (error.message.includes('Browser closed')) {
    browser = await chromium.launch()
    // retry
  }
}

typescript
try {
  await page.goto(url)
} catch (error) {
  if (error.message.includes('Browser closed')) {
    browser = await chromium.launch()
    // 重试任务
  }
}

Resources

参考资源