Loading...
Loading...
Compare original and translation side by side
from scripts.text_summarizer import TextSummarizerfrom scripts.text_summarizer import TextSummarizerundefinedundefinedsummarizer = TextSummarizer(
method="textrank", # textrank, lsa, frequency
language="english"
)summarizer = TextSummarizer(
method="textrank", # 可选值:textrank, lsa, frequency
language="english"
)undefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefined| Argument | Description | Default |
|---|---|---|
| Input file path | Required |
| Output file path | stdout |
| Directory of files | - |
| Output directory | - |
| Summary ratio (0.0-1.0) | 0.2 |
| Number of sentences | - |
| Maximum words | - |
| Extract N key points | - |
| Algorithm to use | textrank |
| Keep sentence order | False |
| 参数 | 描述 | 默认值 |
|---|---|---|
| 输入文件路径 | 必填 |
| 输出文件路径 | 标准输出 |
| 待处理文件所在目录 | - |
| 结果输出目录 | - |
| 摘要占原文的比例(0.0-1.0) | 0.2 |
| 摘要的句子数量 | - |
| 摘要的最大单词数 | - |
| 提取的关键要点数量 | - |
| 使用的算法 | textrank |
| 是否保留原文句子顺序 | False |
summarizer = TextSummarizer()
article = """
[Long news article text...]
"""summarizer = TextSummarizer()
article = """
[长新闻文章文本...]
"""undefinedundefinedsummarizer = TextSummarizer(method="lsa")
paper = open("research_paper.txt").read()summarizer = TextSummarizer(method="lsa")
paper = open("research_paper.txt").read()undefinedundefinedsummarizer = TextSummarizer()
notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""
summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)
print("Summary:", summary)
print("\nAction Items:")
for point in points:
print(f"• {point}")summarizer = TextSummarizer()
notes = """
会议于下午2点开始。John展示了第三季度业绩,增长率达15%。
Sarah提出了供应链延迟影响第四季度预测的担忧。
团队讨论了包括双重采购在内的缓解策略。
营销预算获批5万美元。
下一步是在周五前联系供应商。
后续会议定于下周二举行。
"""
summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)
print("摘要:", summary)
print("\n行动项:")
for point in points:
print(f"• {point}")summarizer = TextSummarizer()
import os
for filename in os.listdir("./documents"):
if filename.endswith(".txt"):
text = open(f"./documents/{filename}").read()
summary = summarizer.summarize(text, ratio=0.2)
with open(f"./summaries/{filename}", "w") as f:
f.write(summary)
print(f"Summarized: {filename}")summarizer = TextSummarizer()
import os
for filename in os.listdir("./documents"):
if filename.endswith(".txt"):
text = open(f"./documents/{filename}").read()
summary = summarizer.summarize(text, ratio=0.2)
with open(f"./summaries/{filename}", "w") as f:
f.write(summary)
print(f"已完成摘要: {filename}")| Algorithm | Speed | Quality | Best For |
|---|---|---|---|
| TextRank | Medium | High | General text |
| LSA | Fast | Good | Technical docs |
| Frequency | Fast | Medium | Quick summaries |
| 算法 | 速度 | 质量 | 适用场景 |
|---|---|---|---|
| TextRank | 中等 | 高 | 通用文本 |
| LSA | 快 | 良好 | 技术文档 |
| 基于频率的算法 | 快 | 中等 | 快速生成摘要 |
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0