Loading...
Loading...
Use this skill when the user asks to parse the content of an unstructured file (PDF, PPTX, DOCX...)
npx skill4agent add run-llama/llamaparse-agent-skills llamaparseI'm ready to use LlamaParse to parse files. Before we begin, please confirm that:
- `LLAMA_CLOUD_API_KEY` is set as environment variable within the current environment
- `@llamaindex/llama-cloud@latest` is installed and available within the current Node environment
If both of them are set, please provide:
1. One or more files to be parsed
2. Specific parsing options, such as tier, API version, custom prompt, processing options...
3. Any requests you might have regarding the parsed content of the file.
I will produce a Typescript script to run the parsing job and, once you approved its execution, I will report the results back to you based on your request.llama-cloud@llamaindex/llama-cloudnpm install @llamaindex/llama-cloud@latesthttps://developers.llamaindex.ai/python/cloud/llamaparse/api-v2-guide/LlamaCloudLlamaCloudimport LlamaCloud from "@llamaindex/llama-cloud";
// Define a client
const client = new LlamaCloud({
apiKey: process.env["LLAMA_CLOUD_API_KEY"], // This is the default and can be omitted
});
parse()import { readFile, writeFile } from "fs/promises";
import { basename } from "path";
// 1. Convert the file path into a File object
const buffer = await readFile(filePath);
const fileName = basename(filePath);
const file = new File([buffer], fileName);
// 2. Upload the file to the cloud
const fileObj = await client.files.create({
file: file,
purpose: "parse",
});
// 3. Get the file ID
const fileId = fileObj.id;
// 4. Use the file ID to parse the file
const result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
...
});| Tier | When to Use |
|---|---|
| Speed is the priority; simple documents |
| Budget-conscious; straightforward text extraction |
| Complex layouts, tables, mixed content (default recommendation) |
| Advanced analysis, highest accuracy |
agenticexpandexpand| Value | Returns |
|---|---|
| Plain text via |
| Markdown via |
| Page-level JSON via |
| Per-page text metadata |
| Per-page markdown metadata |
| Per-page items metadata |
| Image list with presigned URLs |
| Output PDF metadata |
| Excel-specific metadata |
*_content_metadataresult.text_fullresult.markdown_fullresult.itemsundefinedconst text = result.text_full ?? "";
const markdown = result.markdown_full ?? "";const result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
input_options: {
presentation: {
skip_embedded_data: false,
},
},
output_options: {
images_to_save: ["screenshot"],
markdown: {
tables: { output_tables_as_markdown: true },
annotate_links: true,
},
},
processing_options: {
specialized_chart_parsing: "agentic",
ocr_parameters: { languages: ["de", "en"] },
},
agentic_options: {
custom_prompt:
"Extract text from the provided file and translate it from German to English.",
},
expand: [
"markdown_full",
"images_content_metadata",
"markdown_content_metadata",
],
});agentic_options.custom_prompthttpximages_content_metadataexpandif (result.images_content_metadata) {
for (const image of result.images_content_metadata.images) {
if (image.presigned_url) {
const response = await fetch(image.presigned_url, {
headers: {
Authorization: `Bearer ${process.env["LLAMA_CLOUD_API_KEY"]}`,
},
});
if (response.ok) {
const content = await response.bytes();
await writeFile(image.filename, content);
}
}
}
}#!/usr/bin/env nodeIn order to run typescript scripts, it is highly recommended to use:.npx tsx script.ts