Frame Minion stands on four external pieces: OpenRouter (the brain and the image/video engine), fal.ai (a specialist video host and the default file uploader), a storage provider (how your reference media reaches the video models), and ffmpeg (the local render engine).
You don't need all four to start — in fact, generating images needs only one. Knowing what each piece does makes the setup choices obvious.
- Planning & language — the music-video planner, per-segment audio analysis (lyrics + mood), and the prompt-enhancement wizards all call a text/multimodal model here. Default:
openai/gpt-5.1. - Image generation — every start frame, end frame, and reference image. Default:
google/gemini-3.1-flash-image-preview("Nano Banana 2"); GPT and FLUX image models are also available. - Video generation — single-shot and per-segment clips run through OpenRouter's
/videospassthrough (Veo, Sora, Wan, etc.). Default:google/veo-3.1-fast.
- Wan-family video, especially lip-sync. Some Wan models — particularly lip-synced singing shots — aren't reliably available through OpenRouter, so Frame Minion calls fal.ai directly for those. (Wan lip-sync accepts clips roughly 2–15 seconds long; the app slices each segment's audio to fit automatically.)
- The default file uploader. Its zero-setup CDN is how reference frames and audio get a URL the video models can fetch — see Storage below.
Whenever a clip uses a start/end frame, a reference image, or a lip-sync audio slice, Frame Minion has to upload that file somewhere reachable first. The storageProvider setting picks where.
| Provider | Setup | Best for |
|---|---|---|
fal.ai CDN fal · default |
Zero config — just needs your fal.ai key. | Almost everyone. Nothing to provision. |
Amazon S3 s3 |
Bring your own bucket: AWS credentials + region + bucket name. | Teams who want media in infrastructure they own and control. |
A few things worth knowing
- The setting is read live on every upload, so you can switch providers without restarting.
- fal-default is not zero-config for OpenRouter-only users. If you generate video, have no fal key, and haven't set up S3, uploads fail with a clear "add a fal key or switch to S3" message. Pick one.
- Existing S3 users are protected. If you previously configured AWS credentials, Frame Minion keeps you on S3 rather than silently switching you to the fal default.
- The S3 box (keys, region, bucket, check/create) is a self-contained block in Settings, dimmed when fal is the active provider.
Frame Minion uses it to:
- Sequence the per-segment clips into one continuous video,
- Generate transitions and effects at clip boundaries (dip-to-black, wipes, flash, shake, impact-zoom, glitch, and the rest),
- Smooth seams between adjacent clips,
- Extract start/end frames from any existing video you attach,
- Encode both the in-editor preview and the universal final MP4.
Why ffmpeg is also needed for the in-editor preview. VS Code's built-in video player doesn't play AAC (mp4a) audio inside an MP4 — so a raw render would preview silently. Frame Minion uses ffmpeg to add an MP3 audio track to the preview specifically so you can hear it inside VS Code. (The universal final MP4 doesn't need this — it's encoded for broad/Apple compatibility and plays with sound anywhere.)
ffmpeg is not downloaded automatically
Frame Minion resolves a binary in this order, and never fetches anything on its own:
frameMinion.ffmpegPathsetting — an explicit path you provide.- System ffmpeg on your PATH. Version 6+ recommended; the filters Frame Minion emits (
xfade,minterpolate,mpdecimate) need at least 4.3. - A copy you previously installed through Frame Minion.
If none resolve, Frame Minion tells you ffmpeg is missing — a banner where rendering is gated, and a status indicator in Settings — and offers a one-click Download & Install. That fetches a pinned, SHA-256-verified static build for your platform and caches it for all future renders. You can Uninstall it later from Settings; that removes only Frame Minion's downloaded copy and never touches a system ffmpeg.
Settings defaults at a glance
| Setting | Default | What it controls |
|---|---|---|
| openRouterModel | openai/gpt-5.1 | Planner / analysis / prompt-enhance text model. |
| imageModel | gemini-3.1-flash-image | Frame & reference image generation. |
| videoModel | google/veo-3.1-fast | Per-segment / single-shot video generation. |
| storageProvider | fal | Where video reference media is uploaded (fal or s3). |
| maxConcurrentVideos | 1 (1–5) | How many segment videos generate in parallel. |
| outputDirectory | frame-minion | Folder (under your workspace) where projects and media are written. |
| awsRegion | us-east-1 | S3 region (only used when storageProvider is s3). |
| s3BucketName | "" | Your S3 bucket (only used when storageProvider is s3). |
| maxConversationTurns | 10 | Conversation history depth for chat-style flows. |
| ffmpegPath | "" | Optional explicit path; empty = auto-resolve (system → installed → prompt). |
What needs a key vs. what's automatic
| Piece | Status | Notes |
|---|---|---|
| OpenRouter | Required | Planning, images, default video. Image generation needs only this. |
| fal.ai | Recommended | Default uploader for video references; required for Wan / lip-sync. Skip it if you only generate images. |
| Amazon S3 | Optional | Opt-out alternative to fal for hosting video reference media. |
| ffmpeg | One-click install | Used locally for rendering. Auto-detected if present; otherwise installed (and removable) from Settings. |
The shortest path: one OpenRouter key gets you planning and images. Add a fal key the moment you want video (or lip-sync). Reach for S3 only if you'd rather host reference files yourself. And ffmpeg is the single thing you install — one click, fully removable, only ever at your request.