Dream-to-Video: Automated Video Generation from Dream Materials
You are responsible for converting the dream text materials provided by users into video prompts, submitting them to the Jiemeng Platform via an automated toolchain for video generation, and downloading the videos locally. Users only need to provide the text, and you will finally return the video files.
0. Initial Environment Configuration
When a user uses this Skill for the first time, automatically configure the environment according to the following process. Check the result after each step, and inform the user that they can start using it only after all steps are completed.
Path Conventions
In this document,
represents the clone path of this repository (i.e., the parent directory of the
directory).
- Project directory:
- Skill resource directory:
{W}/skills/dream-to-video/
Replace
with the actual path when executing commands.
Automatic Execution (No User Action Required)
Step 0-0: Clone the Project Repository
Ask the user where they want to place the project, then execute:
bash
cd "<directory specified by user>" && git clone https://github.com/mediastormDev/dream-to-video-skill.git && cd dream-to-video-skill
The
directory after cloning is
.
If the user says "in the current directory", clone it in the current directory.
Step 0-1: Check Python
Requires Python ≥ 3.10. If it is not installed or the version is too low, stop and prompt the user:
Please install Python 3.10 or higher first:
https://www.python.org/downloads/
Check "Add Python to PATH" during installation.
Step 0-2: Install Python Dependencies
bash
cd "{W}/dream_to_video" && pip install -r requirements.txt
Step 0-3: Install Playwright Browser
bash
playwright install chromium
This will download the Chromium browser engine (about 150MB), which is used to automatically control the Jiemeng Platform.
Step 0-4: Create Necessary Directories
bash
cd "{W}/dream_to_video" && mkdir -p output data auth/browser_profile reference_images/indoor reference_images/outdoor
Step 0-5: Deploy Reference Image Materials
The built-in reference images in the repository are located at
skills/dream-to-video/reference_images/
, automatically copy them to the project runtime directory:
bash
cp -n "{W}/skills/dream-to-video/reference_images/indoor/"*.jpg "{W}/dream_to_video/reference_images/indoor/" 2>/dev/null; cp -n "{W}/skills/dream-to-video/reference_images/outdoor/"*.jpg "{W}/dream_to_video/reference_images/outdoor/" 2>/dev/null; echo "reference images deployed"
does not overwrite existing files. If the user has their own company environment photos, they can be placed in the corresponding directory additionally.
User Action Required
Step 0-6: Login to Jiemeng Platform
bash
cd "{W}/dream_to_video" && python main.py login
After execution, the browser will open the Jiemeng website and display a login QR code. Prompt the user:
Please scan the QR code in the browser with your TikTok/Jiemeng App to complete login. After successful login, the program will automatically save the credentials, and no repeated login is required in the future.
Self-Test Checklist
After completing all steps, run the following checks in sequence. Configuration is successful only if all are ✅:
bash
# Check 1: Python Version
python --version
# Expected: Python 3.10+
# Check 2: Core Dependencies
python -c "import playwright; import cv2; import numpy; print('deps OK')"
# Expected: Output deps OK
# Check 3: Playwright Browser
python -c "from playwright.sync_api import sync_playwright; b=sync_playwright().start(); br=b.chromium.launch(headless=True); br.close(); b.stop(); print('browser OK')"
# Expected: Output browser OK
# Check 4: Directory Structure
python -c "from pathlib import Path; dirs=['output','data','auth/browser_profile']; ok=all((Path('{W}/dream_to_video')/d).is_dir() for d in dirs); print('dirs OK' if ok else 'dirs MISSING')"
# Expected: Output dirs OK
# Check 5: Login Status
cd "{W}/dream_to_video" && python main.py verify
# Expected: Display login is valid
After all checks pass, inform the user:
Environment configuration completed! You can now provide dream materials directly, and I will automatically generate the video.
If any item fails, prompt the user how to fix it specifically, and do not proceed with subsequent steps.
1. Complete Workflow
User provides dream materials → You convert to Prompt according to rules → Submit to queue → Worker automatically generates + downloads → Notify user
After receiving materials each time, you must follow these steps:
Step 1: Convert to Prompt
Convert the user's dream materials into video prompts according to "II. Video Prompt Generation Rules" below. Output the complete Prompt text directly.
- Check Rule 9 (Character Appearance Tagging): Are there visible characters other than the protagonist? If yes, tag them.
- Check Rule 10 (Reference Image Prefix): Is there a description of the physical environment of the company/workplace? If yes, add the prefix
Reference image environment;
.
Step 2: Submit to Queue
bash
cd "{W}/dream_to_video" && python -u main.py add "the complete Prompt you generated"
This command completes instantly and returns a task_id.
Step 3: Ensure Worker is Running
Check if there is a running worker background task. If not, start one:
bash
cd "{W}/dream_to_video" && python -u main.py worker
Run in background mode (
). The Worker will automatically:
- Detect if the Prompt has the
Reference image environment;
prefix
- If yes → Automatically select reference images, switch to "All-round Reference" mode, upload images, and reference the images in the ProseMirror editor via when inputting the prompt
- If no → Directly input the prompt in the textarea
- Configure generation settings (Seedance 2.0 / 16:9 / 15s) → Click generate → Monitor progress → Download → Post-processing effects
Step 4: Inform the User
Tell the user that the task has been submitted, and the video will be automatically downloaded to
{W}/dream_to_video/output/
with a prompt tone notification after generation is completed.
Check Status
If the user asks about the progress, run:
bash
cd "{W}/dream_to_video" && python -u main.py status
Or read the status file:
{W}/dream_to_video/output/batch_state.json
2. Video Prompt Generation Rules
Top-Level Iron Rules
1. Hardcore Realism
Strictly prohibit any anime, two-dimensional vocabulary. Forbid using AI empty words such as "golden, aesthetic, epic, neon, cyberpunk". Use professional photography terms such as "natural light, side backlight, 35mm lens, ISO noise, depth of field control".
2. Uncanny Dream-Logic
Strictly prohibit turning people into monsters. Reflect the sense of dream through environmental atmosphere, logical jumps, and tiny but unreasonable details (such as moving mountains, automatically appearing pastries, repeated mechanical actions).
3. Strict Adherence to Original
Must include the core visual elements in the materials (such as specific items, specific scenes, specific character actions). Do not fabricate non-existent elements.
4. Cinematic Camera
No longer limited to first-person perspective. Can use panoramic, close-up, handheld follow or fixed camera angles. Emphasize the physical dynamics of the lens (such as lens shake, focus switching, slow push-pull) to maintain the sense of presence in the frame. Prioritize using Fisheye Lens (Ultra-Wide 12mm) to strengthen the spatial distortion and oppression of the dream through barrel distortion, especially suitable for close-up characters, corridors, indoor scenes, etc.
5. Silent Visuals
Strictly prohibit line descriptions. All communication is completed through eye contact, gestures, nodding, physical pointing or item display.
6. No Names for Protagonist
Unified use of "Protagonist". Except for global celebrities, replace personal names with "companion", "driver", "believer", etc.
7. Pure Frame
Strictly prohibit any text, subtitles, Logo or watermark in the frame. All information must be conveyed through pure visual elements, and cannot rely on superimposed text.
8. Space & Time (Logic & Timing)
Scene transitions require physical connection (such as walking into shadow, opening a door) or use "Hard Cut". Total duration is within 15s, 1-6 shots.
9. Character Appearance Tagging
Tag the appearance features of other visible characters except the protagonist in the frame. Do not tag the protagonist (dreams are from a first-person perspective, the protagonist usually appears in POV/hands/back view, and the face is not visible).
Region Detection: Scan the region/country keywords in the user's materials:
| User Material Keywords | Added to Other Characters in Prompt |
|---|
| USA, New York, Los Angeles, etc. | American |
| Japan, Tokyo, Osaka, etc. | Japanese |
| South Korea, Seoul, Busan, etc. | Korean |
| India, Mumbai, Delhi, etc. | Indian |
| UK, London, etc. | British |
| Thailand, Bangkok, etc. | Thai |
| Russia, Moscow, etc. | Russian |
| Other recognizable countries/regions | Corresponding nationality |
- Default Value: When there are no region/country words in the materials, add "East Asian appearance" to other visible characters by default
- Writing Method: Naturally integrate into the first appearance description of other characters, such as "a companion with East Asian appearance standing at the end of the corridor", "dozens of colleagues with East Asian appearance scattered in groups of three or five around the venue"
- When There Are No Other Characters: If the whole dream only has the protagonist (such as working overtime alone, staying at home alone), do not add any appearance tags
10. Company Environment Reference Image Prefix
When the user's materials describe the
physical environment of a specific company/workplace, add the prefix
"Reference image environment;" at the very beginning of the Prompt. The Worker will automatically select the corresponding indoor/outdoor reference image from
{W}/dream_to_video/reference_images/
to upload.
Need to Add (semantics refer to place/environment):
| User Material Example | Judgment | Reason |
|---|
| "In the corridor of the company" | ✅ Add | "Company" refers to physical place |
| "A car is parked at the entrance of the company" | ✅ Add | Describes the physical entrance of the company |
| "Arrived at the elevator hall of Building X" | ✅ Add | Describes a specific building |
| "All lights in Studio X are on" | ✅ Add | Describes a specific studio/photography studio |
| "The lights in the corridor of the office building are green" | ✅ Add | Describes the office building environment |
| "The lobby of the office building is very empty" | ✅ Add | Describes the office building environment |
| "The factory workshop is full of dust" | ✅ Add | Describes the factory environment |
| "The company annual meeting is in a certain venue, and the ceiling is leaking" | ✅ Add | Venue + ceiling = physical building environment |
| "Working overtime in the company, the lights in the office suddenly went out" | ✅ Add | Office = physical space |
No Need to Add (semantics refer to people/social relationships/non-company scenes):
| User Material Example | Judgment | Reason |
|---|
| "A colleague from the company asked me to borrow money" | ❌ Do not add | "Company's" modifies people, not place |
| "Eating with people from the company" | ❌ Do not add | "People from the company" refers to social relationships |
| "The company boss suddenly appeared" | ❌ Do not add | Refers to character identity |
| "Met a friend from the company in the mall" | ❌ Do not add | The scene is a mall, not a company |
| "Participated in the company red envelope war" | ❌ Do not add | Social activity, no physical space description |
Core Judgment: See whether "company/building/studio" acts as an adverbial of place (where) or an attribute modifying people (whose). Add in the former case, do not add in the latter case. When the same material has both company environment description and company-related people, judge based on whether there is a specific description of the company's internal scene — as long as there is a description of actual space such as corridor, office, elevator hall, workshop, venue + ceiling/wall, add the prefix.
Writing Format:
Reference image environment; This is a [realistic + emotional word] dream...
Output Format
The Prompt must be a single continuous text (no line breaks, no Markdown format, no分段). It should include the following parts:
Part 1: Style Opening Sentence
This is a [realistic + emotional word] dream, shot with [lens method].
Part 2: Visual Narrative (Shot 1-6)
| Shot | Function | Description Points |
|---|
| Shot 1 | Opening & Tone Setting | Environment reveal + natural light and shadow + physical relationship between protagonist and environment |
| Shot 2 | Main Plot & Details | Core event + surreal details + key action interaction |
| Shot 3 | Turning Point or Jump Cut | Hard cut or physical connection + camera angle change + eerie feedback |
| Shot 4-6 | Climax & Exit | Visual impact + frame edge distortion + physical dissipation or hard stop |
Part 3: Environmental Sound Effects
Ambient background sound + key physical impact sound + distorted mechanical sound / physical echo of ambient sound
Part 4: Technical Style Base (Mandatory)
Shot on Arri Alexa, Fisheye Lens (Fisheye 12mm), obvious barrel distortion in the frame. Letterbox (2.39:1), mandatory wide-screen movie aspect ratio. Heavy Vignette, the four corners of the frame are darkened and converge towards the center. [Fill in light and shadow description according to the scene], low-saturation cool tone. Photo-level realism. Faint digital noise and VHS-like distortion flicker at the frame edges, the image feels fragile as if it will collapse at any moment. Dream core. Liminal space.
Output Example
This is a realistic, absurd, and terrifying dream, shot with alternating fisheye wide-angle and handheld follow. Shot 1: Fisheye lens, open coastal promenade in a resort area, abundant sunlight. On the rocky coast beside the promenade, hundreds of sea lions are densely covered, stacked lazily together, their wet skin reflecting light. Tourists walk leisurely on the promenade, separated from the sea lion group only by a low railing. The frame is unnaturally calm. Shot 2: Hard cut. Fixed camera angle, ultra-wide angle. The sea lion group suddenly becomes restless, the front row of sea lions lift their upper bodies and open their mouths to show their teeth. The next second, the entire group surges over the railing like a tide, hundreds of slippery bodies wriggling forward on the concrete promenade. Tourists start running, and barrel distortion stretches and deforms the fleeing crowd. Shot 3: Handheld follow, severe shaking. The protagonist runs barefoot on the rough cement road, the lens follows the feet from a low angle. A blue hole shoe is kicked off by the crowd, rolling into the area occupied by sea lions. The protagonist looks back, and the dense sea lions have covered the entire promenade, their wet skin reflecting a greasy sheen in the sun. Shot 4: Fisheye lens. The protagonist runs barefoot with the crowd towards the iron gate at the scenic area exit, and in the distance behind, the black silhouettes of the sea lion group advance slowly and neatly, occupying the entire resort area. The exit iron gate makes a metal deformation sound under the crowd's squeeze. The vignetting at the four corners of the frame intensifies, gradually shrinking to full black. Coastal wind mixed with the low roar of the sea lion group into a continuous roar, the friction sound of hundreds of slippery bodies dragging on the concrete, the chaotic footsteps and gasps of the crowd, the metal twisting sound of the iron gate being squeezed. Shot on Arri Alexa, Fisheye Lens (Fisheye 12mm), obvious barrel distortion in the frame. Letterbox (2.39:1), mandatory wide-screen movie aspect ratio. Heavy Vignette, the four corners of the frame are darkened and converge towards the center. High contrast under strong coastal sunlight, low-saturation cool tone. Photo-level realism. Faint digital noise and VHS-like distortion flicker at the frame edges, the image feels fragile as if it will collapse at any moment. Dream core. Liminal space.
3. Login Instructions
When the Worker starts, it will automatically detect the login status of the Jiemeng Platform:
- First use / Login expired: The browser will automatically open the Jiemeng website and display a login QR code, and play a prompt tone (three long beeps + two short beeps) to remind the user. After the user scans the code with their phone to log in, the program will automatically detect and continue working.
- Already logged in: Start working directly without any operation.
- Timeout: Wait for 10 minutes by default, and the program will exit after timeout.
You can also log in manually in advance:
bash
cd "{W}/dream_to_video" && python main.py login
4. Post-Processing Effects (Elliptic Shatter)
After downloading each video, the Worker will automatically execute the Elliptic Shatter Edge Effect post-processing, and finally output two files:
| File | Naming Format | Description |
|---|
| Original | task_XXX_YYYYMMDD_HHMMSS.mp4
| Original video generated by Jiemeng |
| Effect Version | task_XXX_YYYYMMDD_HHMMSS_elliptic-shatter.mp4
| Overlaid with elliptic shatter edge effect |
Effect Description:
- The center of the video remains clear original frame
- The edge presents broken glass texture (debris scattering, slight rotation, chromatic aberration refraction)
- Outer dark border + faint debris texture
- Overall effect: Black texture border + elliptic viewing window + rotating shattered particles
The effect script is located at
effects/elliptic_shatter.py
, processed with OpenCV + NumPy, and runs in a subprocess without blocking the main process.
5. Key Notes
- The Prompt is a single piece of plain text, no line breaks, Markdown format or分段. Pass it directly as a string to .
- The Worker only needs to be started once, it will keep running until all tasks are completed. Multiple materials can be added continuously, and the Worker will process them in queue automatically.
- Each task outputs two videos to
{W}/dream_to_video/output/
: original version + elliptic-shatter effect version.
- There is a prompt tone (3 short beeps) after download + effect processing is completed, and the user can go to get the video when they hear the sound.
- If the user provides multiple materials at once, generate independent Prompts for each segment and add them separately.
- The [Light and Shadow] in the technical base needs to be replaced according to the specific scene, such as "mixed cold light of late-night artificial light and street lamps", "high contrast under strong coastal sunlight", etc.