Loading...
Loading...
Data file fetching and caching for geoscience applications. Download sample datasets with automatic caching, checksum verification, and multiple download sources. Use when Claude needs to: (1) Download datasets from URLs or DOIs, (2) Cache files locally with automatic verification, (3) Verify file integrity with SHA256/MD5 hashes, (4) Extract compressed archives (ZIP, TAR, GZIP), (5) Create data registries for reproducible workflows, (6) Fetch from Zenodo or other repositories.
npx skill4agent add steadfastasart/geoscience-skills poochimport pooch
# Download single file
file_path = pooch.retrieve(
url="https://example.com/data.csv",
known_hash="sha256:abc123...", # None to skip verification
fname="data.csv",
path=pooch.os_cache("myproject")
)
# Create registry for multiple files
REGISTRY = pooch.create(
path=pooch.os_cache("myproject"),
base_url="https://example.com/data/",
registry={"data.csv": "sha256:abc123...", "model.nc": "sha256:def456..."}
)
data_file = REGISTRY.fetch("data.csv")
# Generate hash for local file
file_hash = pooch.file_hash("/path/to/file.csv")| Function | Purpose |
|---|---|
| Download single file with caching |
| Create custom data registry |
| Generate SHA256/MD5 hash of file |
| Get OS-specific cache directory |
# With hash verification
file_path = pooch.retrieve(
url="https://example.com/data.nc",
known_hash="sha256:abc123..."
)
# Without verification (development only)
file_path = pooch.retrieve(url="https://example.com/data.nc", known_hash=None)
# From Zenodo DOI
file_path = pooch.retrieve(
url="doi:10.5281/zenodo.1234567/data.zip",
known_hash="sha256:abc123..."
)# ZIP archive
files = pooch.retrieve(
url="https://example.com/data.zip",
known_hash="sha256:abc123...",
processor=pooch.Unzip()
)
# Decompress single gzip file
file_path = pooch.retrieve(
url="https://example.com/data.csv.gz",
known_hash="sha256:abc123...",
processor=pooch.Decompress(name="data.csv")
)# Progress bar for large downloads
file_path = pooch.retrieve(url=url, known_hash=hash, progressbar=True)
# HTTP authentication
file_path = pooch.retrieve(
url="https://example.com/protected/data.csv",
known_hash=None,
downloader=pooch.HTTPDownloader(auth=("user", "pass"))
)| Processor | Purpose |
|---|---|
| Extract ZIP archives |
| Extract TAR/TAR.GZ archives |
| Decompress gzip, bz2, lzma, xz |
| OS | Default Path |
|---|---|
| Linux | |
| macOS | |
| Windows | |
try:
file_path = pooch.retrieve(url=url, known_hash=hash)
except pooch.exceptions.HTTPDownloadError:
print("Download failed - check URL")
except pooch.exceptions.DownloadError:
print("Network issue")| Tool | Best For | Limitations |
|---|---|---|
| pooch | Reproducible data downloads, hash verification, caching | Not a version control system |
| urllib/requests | Simple one-off downloads, custom HTTP logic | No caching, no hash verification |
| DVC | Data version control alongside git | Heavier setup, requires remote storage |
| wget | Quick command-line downloads | No Python integration, no caching logic |
pooch.file_hash()pooch.create()REGISTRY.fetch()Unzip()Untar()Decompress()