tooluniverse

Original🇺🇸 English
Translated

General strategies for using ToolUniverse effectively with 10000+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, explore drug/target/disease relationships, or need guidance on how to use ToolUniverse tools wisely.

5installs
Added on

NPX Install

npx skill4agent add mims-harvard/tooluniverse tooluniverse

ToolUniverse General Usage Strategies

Master strategies for using ToolUniverse's 10000+ scientific tools effectively. These principles apply regardless of how you access ToolUniverse (MCP server, SDK, or direct tool calls).

Core Philosophy

ToolUniverse has MANY tools - the challenge is discovering and using them effectively:
  1. Search widely - Don't assume you know all relevant tools; always search for more
  2. Query multiple databases - Cross-reference data across sources
  3. Multi-hop persistence - Many answers require 3-5 tool calls in sequence
  4. Never give up easily - If one tool fails, try alternatives
  5. Comprehensive reports - Use all available data; detail is valuable
  6. English-first queries - Always use English terms in tool calls, even if the user writes in another language

Step 0: Clarify the Request Before Researching

CRITICAL: Before starting any research, ensure you understand what the user actually needs. Wasted tool calls on the wrong entity or scope are expensive.

When to Ask Clarifying Questions

SignalExampleWhat to Clarify
Vague entity"Research cancer"Which cancer type? Which aspect (treatment, genetics, epidemiology)?
Ambiguous name"Tell me about JAK"JAK1/2/3? The gene family? A specific inhibitor?
Unclear scope"Look into metformin"Drug profile? Repurposing? Safety? Mechanism?
Missing context"What targets this?"Which compound/disease/pathway?
Multiple interpretations"ACE"ACE the gene? ACE inhibitors? ACE2?

When NOT to Ask

Proceed directly when the request is specific enough:
  • "What is the structure of EGFR kinase domain?" - Clear entity + clear data type
  • "Find all drugs targeting BRAF V600E" - Specific variant + clear task
  • "Research Alzheimer's disease comprehensively" - Broad but unambiguous

Clarification Checklist

Before starting research, confirm you know:
  1. Entity - Exactly which gene/protein/drug/disease?
  2. Species - Human unless stated otherwise
  3. Scope - Comprehensive profile or specific aspect?
  4. Output - Report, data table, quick answer, or comparison?
If any of these are unclear, ask the user one concise question covering all ambiguities rather than asking multiple rounds of questions.

Strategy 1: Exhaustive Tool Discovery

CRITICAL: ToolUniverse has 10000+ tools. Before any research task, search for ALL relevant tools.

Tool Discovery Methods

Use the tool finder tools to discover what's available:
MethodToolBest For
Keyword
Tool_Finder_Keyword
Fast search by terms
LLM-based
Tool_Finder_LLM
Intelligent matching by description
Embedding
Tool_Finder
Semantic similarity search

Discovery Best Practices

PracticeWhyExample
Search with multiple termsSame data from different angles"protein expression", "gene expression", "tissue expression"
Search by database nameFind all tools for a source"UniProt", "ChEMBL", "OpenTargets"
Search by data typeComprehensive coverage"variant", "mutation", "SNP", "polymorphism"
Search by use caseTask-oriented discovery"druggability", "target validation"

Minimum Discovery Queries

Before any research task, run at least these searches:
  1. Main topic query:
    [your topic]
  2. Related terms:
    [synonym1]
    ,
    [synonym2]
  3. Database-specific:
    [relevant database names]
  4. Data type specific:
    [required data types]
Example for target research:
  • "protein information"
  • "gene expression"
  • "drug target"
  • "UniProt", "OpenTargets"
  • "protein interaction"
  • "variant mutation"

Strategy 2: Multi-Hop Tool Chains

CRITICAL: Most scientific questions require multiple tool calls. A single tool rarely gives the complete answer.

Why Multi-Hop Matters

Question TypeSingle Tool AnswerMulti-Hop Answer
"Tell me about EGFR"Basic protein infoFull profile with structure, expression, drugs, variants, literature
"What drugs target TP53?"List of drug namesDrug details, mechanisms, clinical trials, bioactivity data
"Research Alzheimer's"Disease definitionGenes, pathways, drugs, trials, phenotypes, GWAS, literature

Common Multi-Hop Patterns

Pattern A: ID Resolution Chain

Name → ID → Data → Related Data

Example: Gene name to complete profile
1. gene_name → Ensembl ID
2. Ensembl ID → UniProt accession  
3. UniProt accession → Protein entry
4. UniProt accession → Domains
5. UniProt accession → Structure

Pattern B: Cross-Database Enrichment

Primary Data → Cross-reference → Enriched View

Example: Drug compound enrichment
1. drug_name → PubChem CID
2. drug_name → ChEMBL ID
3. CID → properties
4. ChEMBL ID → bioactivity
5. ChEMBL ID → targets
6. SMILES → ADMET predictions

Pattern C: Network Expansion

Seed Entity → Connected Entities → Entity Details

Example: Target interaction network
1. gene → protein interactions
2. For each interactor → gene info
3. Interactor → disease associations

Pattern D: Literature + Data Integration

Database Annotations → Literature Evidence → Synthesis

Example: Disease mechanism research
1. disease → associated genes
2. disease → phenotypes
3. disease → drugs
4. disease → literature
5. key papers → citations

Multi-Hop Persistence Rules

  1. Don't stop at first result - One tool gives partial data; keep going
  2. Follow cross-references - Use IDs from one tool to query others
  3. Chain until complete - 5-10 tool calls for comprehensive answers is normal
  4. Track all IDs - Save every identifier for potential future use

Strategy 3: Query Multiple Databases for Same Data

CRITICAL: Different databases have different coverage. Query ALL relevant sources.

Database Redundancy Principle

For any data type, query multiple sources:
Data TypePrimarySecondaryTertiary
Protein infoUniProtProteins APINCBI Protein
Gene expressionGTExHuman Protein AtlasArrayExpress
Drug targetsChEMBLDGIdbOpenTargets
VariantsgnomADClinVarOpenTargets
LiteraturePubMedEurope PMCOpenAlex
PathwaysReactomeKEGGWikiPathways
StructuresRCSB PDBPDBeAlphaFold
Disease associationsOpenTargetsClinVarGWAS Catalog

Merge Results Strategy

When querying multiple databases:
  1. Collect all results - Don't stop at first success
  2. Note data source - Track where each datum came from
  3. Handle conflicts - Document when sources disagree
  4. Prefer curated - Weight RefSeq over GenBank, UniProt over predictions

Strategy 3.1: Abstract Search vs Full-Text Search (Literature)

CRITICAL: Many biomedical “needle” terms (rsIDs like
rs58542926
, reagent catalog numbers, supplementary-table IDs) never appear in titles/abstracts. If you only search abstracts, you will miss papers even when they are open access.

Quick rule

  • If your keywords look like body-only terms (rsIDs, figure/table references, “Supplementary Table”), use full-text-aware tools first.

Tools that can match full text (indexed or retrieved)

GoalToolsNotes
Indexed full-text search (biomed OA)
PMC_search_papers
NCBI “pmc” database indexes full text; good for rsIDs.
Indexed full-text search (Europe PMC subset)
EuropePMC_search_articles
with
require_has_ft=true
+
fulltext_terms=[...]
Uses Europe PMC
HAS_FT:Y
+
BODY:\"...\"
fielded queries; works only when Europe PMC has indexed full text.
Best-effort full-text retrieval + keyword snippets
EuropePMC_get_fulltext_snippets
Fetches full text (XML → HTML fallbacks) and returns bounded snippets with
retrieval_trace
.
OA aggregation + (sometimes) full-text search
CORE_search_papers
Coverage varies; a paper may not exist in CORE even if OA elsewhere.
Download-and-scan fallback
CORE_get_fulltext_snippets
Local PDF scan for body-only terms when index-based search misses; can fail if the “PDF” URL returns HTML/403 (check trace/content-type).
Partial full-text indexing (not guaranteed)
openalex_search_works
/
openalex_literature_search
with
require_has_fulltext
/
fulltext_terms
Only matches works where OpenAlex has indexed full text; can miss PMC-hosted full text. Use as a secondary signal.

Recommended flow for body-only keywords

  1. Try
    PMC_search_papers
    and
    EuropePMC_search_articles
    (with
    require_has_ft
    +
    fulltext_terms
    ).
  2. If you have a PMCID/PMID, use
    EuropePMC_get_fulltext_snippets
    to confirm the term is in the paper.
  3. If you only have a PDF URL, use
    CORE_get_fulltext_snippets
    as a last resort, and treat HTTP
    200
    as “request succeeded”, not “PDF succeeded” (validate
    content_type
    ).

Strategy 4: Disambiguation First

CRITICAL: Before any research, resolve entity identity to avoid wrong data and missed results.

Why Disambiguation Matters

ProblemExampleConsequence
Naming collision"JAK" = Janus kinase OR "just another kinase"Wrong papers retrieved
Multiple IDsGene has symbol, Ensembl, Entrez, UniProt IDsMiss data in some databases
Salt formsMetformin vs metformin HCl (different CIDs)Incomplete compound data
Species ambiguityBRCA1 in human vs mouseWrong expression/function data

Disambiguation Workflow

Step 1: Establish Canonical IDs
    gene_name → UniProt, Ensembl, NCBI Gene, ChEMBL target
    compound_name → PubChem CID, ChEMBL ID, SMILES
    disease_name → EFO ID, ICD-10, UMLS CUI

Step 2: Gather Synonyms
    All aliases, alternative names, historical names
    
Step 3: Detect Naming Collisions
    Search "[TERM]"[Title] → check if results are on-topic
    Build negative filters: NOT [collision_term]
    
Step 4: Species Confirmation
    Verify organism is correct (default: Homo sapiens)

ID Types by Entity

Genes/Proteins:
  • Gene Symbol (EGFR, TP53)
  • UniProt accession (P00533)
  • Ensembl ID (ENSG00000146648)
  • NCBI Gene ID (1956)
  • ChEMBL Target ID (CHEMBL203)
Compounds:
  • PubChem CID (2244)
  • ChEMBL ID (CHEMBL25)
  • SMILES string
  • InChI/InChIKey
Diseases:
  • EFO ID (EFO_0000249)
  • ICD-10 code (G30)
  • UMLS CUI (C0002395)
  • SNOMED CT

Strategy 5: Never Give Up on Search

CRITICAL: When a tool fails or returns empty, don't give up. Try alternatives.

Failure Handling Protocol

Attempt 1: Primary tool
    ↓ fails
Wait briefly, then retry
    ↓ fails
Try fallback tool #1
    ↓ fails
Try fallback tool #2
    ↓ fails
Document as "unavailable" with reason

Common Fallback Chains

Primary ToolFallback Options
PubMed citationsEuropePMC citations → OpenAlex citations
GTEx expressionHuman Protein Atlas expression
PubChem compound lookupChEMBL search → SMILES-based lookup
ChEMBL bioactivityPubChem bioactivity summary
DailyMed drug labelsPubChem drug label info
UniProt protein entryProteins API

Alternative Search Strategies

If keyword search fails:
  • Try synonyms and aliases
  • Use broader/narrower terms
  • Try different databases
If database is empty:
  • Query related databases
  • Use literature to find mentions
  • Check if entity exists under different name
If API rate-limited:
  • Wait and retry
  • Try same query on different database
  • Use cached results if available

Strategy 6: Generate Comprehensive Reports

CRITICAL: With access to many tools, reports should be detailed and thorough.

Report-First Approach

  1. Create report structure FIRST - Define all sections before gathering data
  2. Progressively update - Fill sections as data is gathered
  3. Show findings, not process - Report results, not search methodology

Citation Requirements

Every fact must have a source:
## Protein Function

EGFR is a receptor tyrosine kinase that regulates cell growth.
*Source: UniProt (P00533)*

### Expression Profile
| Tissue | TPM | Source |
|--------|-----|--------|
| Skin | 156.3 | GTEx |
| Lung | 98.4 | GTEx |

Evidence Grading

Grade claims by evidence strength:
TierSymbolDescriptionExample
T1★★★Mechanistic with direct evidenceCRISPR KO study
T2★★☆Functional studysiRNA knockdown
T3★☆☆Association/screen hitGWAS, high-throughput screen
T4☆☆☆Review mention, text-minedReview article
In report:
ATP6V1A drives lysosomal acidification [★★★: PMID:12345678].
It has been implicated in cancer metabolism [★☆☆: TCGA data].

Mandatory Completeness

All sections must exist, even if "data unavailable":
## Pathogen Involvement
No pathogen interactions identified in literature or databases.
*Source: Literature search, UniProt annotations*

Report Quality Metrics

QualityDescriptionTool CallsSections
ExcellentMulti-database, evidence-graded30+All mandatory, detailed
GoodCross-referenced, sourced15-30All mandatory, adequate
AdequateSingle-database focus5-15Core sections only
PoorSingle tool, no sources<5Incomplete

Strategy 7: Use Specialized Skills for Specific Tasks

CRITICAL: For specific research tasks, use specialized skills (not this general skill).

Task-Specific Skill Selection

TaskRecommended Skill
Data Retrieval
Chemical compounds
tooluniverse-chemical-compound-retrieval
Expression data
tooluniverse-expression-data-retrieval
Protein structure
tooluniverse-protein-structure-retrieval
Sequence retrieval
tooluniverse-sequence-retrieval
Research & Profiling
Disease research
tooluniverse-disease-research
Drug profiling
tooluniverse-drug-research
Literature review
tooluniverse-literature-deep-research
Target analysis
tooluniverse-target-research
Clinical Decision Support
Drug safety analysis
tooluniverse-pharmacovigilance
Precision oncology treatment
tooluniverse-precision-oncology
Rare disease diagnosis
tooluniverse-rare-disease-diagnosis
Variant interpretation
tooluniverse-variant-interpretation
Discovery & Design
Small molecule binder discovery
tooluniverse-binder-discovery
Drug repurposing
drug-repurposing
Protein therapeutic design
tooluniverse-protein-therapeutic-design
Outbreak Response
Infectious disease analysis
tooluniverse-infectious-disease
Infrastructure & Development
ToolUniverse installation/setup
setup-tooluniverse
Python SDK for AI scientist systems
tooluniverse-sdk

When to Use This General Skill

Use this skill when:
  • Need general guidance on ToolUniverse usage
  • Task doesn't fit a specialized skill
  • Need to combine multiple specialized workflows
  • Exploring what's possible with ToolUniverse
  • Learning ToolUniverse best practices

Strategy 8: Parallel Execution for Speed

CRITICAL: Run independent queries simultaneously for faster research.

When to Parallelize

ParallelSequential
Different databases for same entityTool B needs output from Tool A
Multiple entities, same data typeBuilding an ID → using the ID
Independent research pathsIterating through a list of results

Parallel Research Paths Example

For target research, run these 8 paths simultaneously:
  1. Identity - Names, IDs, sequence
  2. Structure - 3D structure, domains
  3. Function - GO terms, pathways
  4. Interactions - PPI network
  5. Expression - Tissue expression
  6. Variants - Genetic variation
  7. Drugs - Known drugs, druggability
  8. Literature - Publications, trends

Strategy 9: Iterative Completeness Check

CRITICAL: After gathering data, always ask "What else is missing?" to ensure comprehensive coverage.

The Completeness Loop

Gather initial data
Review what you have
Ask: "What aspects are still missing?"
Identify gaps
Search for tools to fill gaps
Gather additional data
Repeat until comprehensive

Universal Completeness Questions

After each research phase, ask:
  1. Identity: Do I have all relevant identifiers and names?
  2. Core data: Do I have the fundamental information for this entity type?
  3. Context: Do I have surrounding/related information?
  4. Relationships: Do I know what this connects to?
  5. Variations: Do I know about variants, forms, or subtypes?
  6. Evidence: Do I have supporting data from multiple sources?
  7. Literature: Do I have recent publications on this topic?
  8. Gaps: Have I documented what's unavailable?

Gap-Filling Strategies

Gap IdentifiedStrategy
Missing data typeSearch for tools with that data type
Single source onlyQuery additional databases
Outdated informationCheck literature for recent updates
No experimental dataLook for predictions/computational data
Conflicting dataFind authoritative/curated sources
Shallow coverageDive deeper with specialized tools

When to Stop

Stop the completeness loop when:
  • All relevant aspects have been addressed (even if "not found")
  • Multiple sources queried for key data
  • Gaps are documented, not ignored
  • No obvious missing pieces remain

Self-Review Questions

Before finalizing any research:
  • Have I searched for ALL relevant tools?
  • Have I queried multiple databases?
  • Have I followed cross-references?
  • Have I checked recent literature?
  • Have I documented what's unavailable?
  • Is there any obvious gap I haven't addressed?
  • Would someone reading this ask "but what about X?"

Quick Reference: Tool Categories

Protein & Gene Tools

UniProt, Proteins API, MyGene, Ensembl tools

Structure Tools

RCSB PDB, PDBe, AlphaFold, InterPro tools

Drug & Compound Tools

ChEMBL, PubChem, DGIdb, ADMET-AI, DrugBank tools

Disease & Phenotype Tools

OpenTargets, ClinVar, GWAS, HPO tools

Expression Tools

GTEx, Human Protein Atlas, CELLxGENE tools

Variant Tools

gnomAD, ClinVar, dbSNP tools

Pathway Tools

Reactome, KEGG, WikiPathways, GO tools

Literature Tools

PubMed, EuropePMC, OpenAlex, SemanticScholar tools

Clinical Tools

ClinicalTrials.gov, FAERS, PharmGKB, DailyMed tools

Troubleshooting Common Issues

"Tool not found"

  • Search for similar tools using Tool_Finder
  • Check spelling of tool name
  • Try alternative tools for same data type

"Empty results"

  • Check spelling of query terms
  • Try synonyms/aliases
  • Try alternative databases
  • Verify IDs are correct type

"Conflicting data"

  • Note all sources
  • Prefer curated databases
  • Document the conflict in report
  • Use evidence grading

"Incomplete picture"

  • Search for more tools
  • Query additional databases
  • Follow cross-references
  • Expand via literature

Strategy 10: English-First Tool Queries

CRITICAL: Most ToolUniverse tools only accept English terms. Always translate queries to English before calling tools, regardless of the user's language.

Language Handling Rules

  1. Default to English - All tool calls must use English search terms, entity names, and parameters
  2. Translate non-English input - If the user's question is in Chinese, Japanese, Korean, or any other language, translate the relevant scientific terms to English before making tool calls
  3. Respond in the user's language - While tools must be queried in English, deliver the final report/answer in the user's original language
  4. Fallback to original language - Only if an English search returns no results, retry with the original-language terms
  5. Check tool descriptions - A few tools may explicitly document multi-language support; use the original language only when the tool description says so

Examples

User (Chinese): "研究EGFR靶点"
  → Tool calls: use "EGFR", "epidermal growth factor receptor" (English)
  → Report: deliver in Chinese

User (Japanese): "メトホルミンの安全性プロファイル"
  → Tool calls: use "metformin", "safety profile" (English)
  → Report: deliver in Japanese

User (Korean): "알츠하이머병 관련 유전자"
  → Tool calls: use "Alzheimer's disease", "associated genes" (English)
  → Report: deliver in Korean

Why This Matters

ScenarioWrong ApproachCorrect Approach
User asks in Chinese about "二甲双胍"Pass "二甲双胍" to PubChem searchTranslate to "metformin", search in English
User asks in Japanese about a diseasePass Japanese disease name to OpenTargetsTranslate to English disease name first
User asks in Spanish about a genePass Spanish description to toolUse standard gene symbol (e.g., TP53)

Summary: The ToolUniverse Mindset

PrincipleAction
Clarify firstConfirm entity, scope, species, and output before researching
Search widely10000+ tools; always discover more
Multi-hop persistence5-10 tool calls per question is normal
Cross-referenceQuery multiple databases for same data
Disambiguate firstResolve IDs before research
Never give upFallbacks for every failure
Report comprehensivelyDetail with sources and evidence grades
Use specialized skillsApply domain-specific skills for focused tasks
Execute in parallelSpeed through concurrent execution
Check completenessAsk "what's missing?" and fill gaps iteratively
English-first queriesTranslate to English for tool calls; respond in user's language
The goal: Transform 10000+ tools into comprehensive, reliable scientific intelligence.