Loading...
Loading...
Found 2 Skills
Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.
MiMo V2.5 TTS Text-to-Speech. Generate speech using Xiaomi MiMo V2.5 TTS series models. This skill is activated when text needs to be converted to speech, voice messages need to be sent, content needs to be read aloud, or when users request 'speak it out' or 'voice reply'. It supports three modes: preset voice, voice design, and voice cloning, as well as natural language control and director mode. It also supports style tag control for tone, emotion, and dialect, and preset voices support singing.