flash
Original:🇺🇸 English
Translated
Complete knowledge of the runpod-flash framework - SDK, CLI, architecture, deployment, and codebase. Use when working with runpod-flash code, writing @remote functions, configuring resources, debugging deployments, or understanding the framework internals. Triggers on "flash", "runpod-flash", "@remote", "serverless", "deploy", "LiveServerless", "LoadBalancer", "GpuGroup".
4installs
Sourcerunpod/skills
Added on
NPX Install
npx skill4agent add runpod/skills flashTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Runpod Flash
runpod-flash (v1.0.0) is a Python SDK for distributed execution of AI workloads on RunPod's serverless infrastructure. Write Python functions locally, decorate with , and Flash handles GPU/CPU provisioning, dependency management, and data transfer.
@remote- Package:
pip install runpod-flash - Import:
from runpod_flash import remote, LiveServerless, GpuGroup, ... - CLI:
flash - Python: >=3.10, <3.15
Getting Started
1. Install Flash
bash
pip install runpod-flash2. Set your RunPod API key
Get a key from RunPod account settings, then either export it:
bash
export RUNPOD_API_KEY=your_api_key_hereOr save in a file in your project directory (Flash auto-loads via ):
.envpython-dotenvbash
echo "RUNPOD_API_KEY=your_api_key_here" > .env3. Write and run a remote function
python
import asyncio
from runpod_flash import remote, LiveServerless
gpu_config = LiveServerless(name="my-first-worker")
@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}
async def main():
result = await gpu_task([1, 2, 3, 4, 5])
print(result)
if __name__ == "__main__":
asyncio.run(main())First run takes ~1 minute (endpoint provisioning). Subsequent runs take ~1 second.
4. Or create a Flash API project
bash
flash init my_project
cd my_project
pip install -r requirements.txt
# Edit .env and add your RUNPOD_API_KEY
flash run # Start local FastAPI server at localhost:8888
flash run --auto-provision # Pre-deploy all endpoints (faster testing)API explorer available at .
http://localhost:8888/docs5. Build and deploy to production
bash
flash build # Scan @remote functions, package artifact
flash build --exclude torch,torchvision # Exclude packages in base image (500MB limit)
flash deploy new production # Create deployment environment
flash deploy send production # Upload and deploy
flash deploy list # List environments
flash deploy info production # Show details
flash deploy delete production # Tear downCore Concept: The @remote Decorator
The decorator marks functions for remote execution on RunPod infrastructure. Code inside runs remotely; code outside runs locally.
@remotepython
from runpod_flash import remote, LiveServerless
config = LiveServerless(name="my-worker")
@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
import torch # MUST import inside function
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
result = await gpu_compute([1, 2, 3])@remote Signature
python
def remote(
resource_config: ServerlessResource, # Required: GPU/CPU config
dependencies: list[str] = None, # pip packages
system_dependencies: list[str] = None,# apt-get packages
accelerate_downloads: bool = True, # CDN acceleration
local: bool = False, # Execute locally (testing)
method: str = None, # HTTP method (LoadBalancer only)
path: str = None, # HTTP path (LoadBalancer only)
)CRITICAL: Cloudpickle Scoping Rules
Functions decorated with are serialized with cloudpickle. They can ONLY access:
@remote- Function parameters
- Local variables defined inside the function
- Imports done inside the function
- Built-in Python functions
They CANNOT access: module-level imports, global variables, external functions/classes.
python
# WRONG - external references
import torch
@remote(resource_config=config)
async def bad(data):
return torch.tensor(data) # torch not accessible
# CORRECT - everything inside
@remote(resource_config=config, dependencies=["torch"])
async def good(data):
import torch
return torch.tensor(data)Return Behavior
- Decorated function is always awaitable ()
await my_func(...) - Queue-based resources return with
JobOutput,.output,.error.status - Load-balanced resources return your dict directly
Resource Configuration Classes
Choose based on execution model and environment:
| Class | Queue | HTTP | Environment | Use Case |
|---|---|---|---|---|
| Yes | No | Dev | GPU with retries, remote code exec |
| Yes | No | Dev | CPU with retries, remote code exec |
| Yes | No | Prod | GPU, custom Docker images |
| Yes | No | Prod | CPU, custom Docker images |
| No | Yes | Dev | GPU low-latency HTTP APIs |
| No | Yes | Dev | CPU low-latency HTTP APIs |
| No | Yes | Prod | GPU production HTTP |
| No | Yes | Prod | CPU production HTTP |
Queue-based: Best for batch, long-running tasks, automatic retries.
Load-balanced: Best for real-time APIs, low-latency, direct HTTP routing.
Live* classes: Fixed optimized Docker image, full remote code execution.
Non-Live classes: Custom Docker images, dictionary payload only.
Common Parameters
python
LiveServerless(
name="worker-name", # Required, unique
gpus=[GpuGroup.AMPERE_80], # GPU type(s)
workersMin=0, # Min workers
workersMax=3, # Max workers
idleTimeout=300, # Seconds before scale-down
networkVolumeId="vol_abc123", # Persistent storage
env={"KEY": "value"}, # Environment variables
template=PodTemplate(containerDiskInGb=100),
)GPU Groups (GpuGroup enum)
- - Any available (not for production)
GpuGroup.ANY - - RTX A4000, 16GB
GpuGroup.AMPERE_16 - - RTX A5000, 24GB
GpuGroup.AMPERE_24 - - A40/RTX A6000, 48GB
GpuGroup.AMPERE_48 - - A100, 80GB
GpuGroup.AMPERE_80 - - RTX 4090, 24GB
GpuGroup.ADA_24 - - RTX 5090, 32GB
GpuGroup.ADA_32_PRO - - RTX 6000 Ada, 48GB
GpuGroup.ADA_48_PRO - - H100, 80GB
GpuGroup.ADA_80_PRO - - H200, 141GB
GpuGroup.HOPPER_141
CPU Instance Types (CpuInstanceType enum)
Format:
CPU{generation}{type}_{vcpu}_{memory_gb}| Instance Type | Gen | Type | vCPU | RAM |
|---|---|---|---|---|
| 3rd | General | 1 | 4GB |
| 3rd | General | 2 | 8GB |
| 3rd | General | 4 | 16GB |
| 3rd | General | 8 | 32GB |
| 3rd | Compute | 1 | 2GB |
| 3rd | Compute | 2 | 4GB |
| 3rd | Compute | 4 | 8GB |
| 3rd | Compute | 8 | 16GB |
| 5th | Compute | 1 | 2GB |
| 5th | Compute | 2 | 4GB |
| 5th | Compute | 4 | 8GB |
| 5th | Compute | 8 | 16GB |
Use with parameter:
instanceIdspython
config = LiveServerless(
name="cpu-worker",
instanceIds=[CpuInstanceType.CPU5C_4_8],
workersMax=5,
)Or use explicit CPU classes:
python
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)PodTemplate
Override pod-level settings:
python
from runpod_flash import PodTemplate
template = PodTemplate(
containerDiskInGb=100,
env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)
config = LiveServerless(name="worker", template=template)NetworkVolume
python
from runpod_flash import NetworkVolume, DataCenter
volume = NetworkVolume(
name="model-storage",
size=100, # GB
dataCenterId=DataCenter.EU_RO_1,
)LoadBalancer Resources
When using or :
LoadBalancerSlsResourceLiveLoadBalancer- and
methodare required onpath@remote - must start with "/"
path - must be one of: GET, POST, PUT, DELETE, PATCH
method
python
from runpod_flash import remote, LiveLoadBalancer
api = LiveLoadBalancer(name="api-service")
@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
return {"result": x + y}
@remote(api, method="GET", path="/api/health")
def health():
return {"status": "ok"}Key differences from queue-based:
- Direct HTTP routing (no queue), lower latency
- Returns dict directly (no JobOutput wrapper)
- No automatic retries
Error Handling
Queue-Based Resources
python
job_output = await my_function(data)
if job_output.error:
print(f"Failed: {job_output.error}")
else:
result = job_output.outputJobOutputidstatusoutputerrorstarted_atended_atLoad-Balanced Resources
python
try:
result = await my_function(data) # Returns dict directly
except Exception as e:
print(f"Error: {e}")Runtime Exceptions
FlashRuntimeError (base)
RemoteExecutionError # Remote function failed
SerializationError # cloudpickle serialization failed
GraphQLError # GraphQL base error
GraphQLMutationError # Mutation failed
GraphQLQueryError # Query failed
ManifestError # Invalid/missing manifest
ManifestServiceUnavailableError # State Manager unreachableCommon Patterns
Hybrid GPU/CPU Pipeline
python
from runpod_flash import remote, LiveServerless, CpuInstanceType
cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])
@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
import pandas as pd
return pd.DataFrame(data).to_dict('records')
@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
async def pipeline(raw_data):
clean = await preprocess(raw_data)
return await inference(clean)Parallel Execution
python
results = await asyncio.gather(
process_item(item1),
process_item(item2),
process_item(item3),
)Local Testing
python
@remote(resource_config=config, local=True)
async def my_function(data):
return {"status": "ok"} # Runs locally, skips remoteCost Optimization
- Use to scale from zero
workersMin=0 - Use to reduce churn
idleTimeout=600 - Use smaller GPUs if they fit your model
- Use classes for spot pricing in dev
Live* - Pass URLs/paths instead of large data objects
CLI Commands
flash init
bash
flash init [project_name]Creates a project template:
project_name/
├── main.py # FastAPI entry point
├── workers/
│ ├── gpu/__init__.py # GPU router
│ │ └── endpoint.py # GPU @remote function
│ └── cpu/__init__.py # CPU router
│ └── endpoint.py # CPU @remote function
├── .env # API key template
├── .gitignore
├── .flashignore # Deployment ignore patterns
├── requirements.txt
└── README.mdflash run
bash
flash run [--auto-provision] [--host HOST] [--port PORT]| Option | Default | Description |
|---|---|---|
| off | Pre-deploy all endpoints before serving |
| | Server host (or |
| | Server port (or |
flash build
bash
flash build [--exclude PACKAGES] [--keep-build] [--preview]| Option | Description |
|---|---|
| Skip packages already in base Docker image |
| Don't delete |
| Build then run in local Docker containers |
Build steps: scan decorators, group by resource config, create , install dependencies for Linux x86_64, package into .
@remoteflash_manifest.json.flash/artifact.tar.gz500MB deployment limit - use for packages in base image:
--excludebash
flash build --exclude torch,torchvision,torchaudio--previewlocalhost:8000flash deploy
bash
flash deploy new <env_name> [--app-name NAME] # Create environment
flash deploy send <env_name> [--app-name NAME] # Deploy archive
flash deploy list [--app-name NAME] # List environments
flash deploy info <env_name> [--app-name NAME] # Show details
flash deploy delete <env_name> [--app-name NAME] # Delete (double confirmation)flash deploy sendflash buildflash undeploy
bash
flash undeploy list # List all deployed resources
flash undeploy <name> # Undeploy specific resourceflash env / flash app
bash
flash env list|create|info|delete <name> # Environment management
flash app list|get <name> # App managementArchitecture Overview
Deployment Architecture
Mothership Pattern: Coordinator endpoint + distributed child endpoints.
- scans code, creates manifest + archive
flash build - uploads archive, provisions resources
flash deploy send - Mothership boots, reconciles desired vs current state
- Child endpoints query State Manager GraphQL for service discovery (peer-to-peer)
- Functions route locally or remotely based on manifest
Cross-Endpoint Routing
Functions on different endpoints can call each other transparently:
- intercepts calls
ProductionWrapper - looks up function in manifest
ServiceRegistry - Local function? Execute directly
- Remote function? Serialize args (cloudpickle), POST to remote endpoint
Serialization: cloudpickle + base64, max 10MB payload.
Common Gotchas
- External scope in @remote functions - Most common error. Everything must be inside.
- Forgetting - All remote functions must be awaited.
await - Undeclared dependencies - Must be in parameter.
dependencies=[] - Queue vs LB confusion - Queue returns , LB returns dict directly.
JobOutput - Large serialization - Pass URLs/paths, not large data objects.
- Imports at module level - Import inside functions, not at top of file.
@remote - LoadBalancer requires method+path -
@remote(config, method="POST", path="/api/x") - Bundle too large (>500MB) - Use for packages in base Docker image.
--exclude - Endpoints accumulate - Clean up with /
flash undeploy list.flash undeploy <name>