Loading...
Found 1 Skills
Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.