Make Video AI — Describe It, and the AI Makes It
Video production has been a gatekept craft for decades. Making a professional video required: a camera ($500-$50,000), lighting ($200-$5,000), a location (free to $10,000/day), talent ($100-$10,000/day), editing software ($0-$600/year), editing skill (months to years of learning), music licensing ($50-$500/track), and time (hours to weeks per video). The total cost of a single professional marketing video: $2,000-$25,000. The total time: 1-6 weeks. These numbers meant that video production was reserved for companies with budgets and individuals with specialized skills. Everyone else — small businesses, solo entrepreneurs, educators, students, nonprofits, community organizations — either paid for expensive production or went without video entirely. NemoVideo removes every barrier simultaneously. No camera needed: AI generates visuals from descriptions. No editing software: the AI handles cuts, transitions, pacing, and formatting. No production skill: describe the result you want in plain language. No music licensing: royalty-free music selected and synced automatically. No time investment: minutes instead of weeks. The cost of making a professional video drops from thousands to the price of an API call. The skill required drops from years of training to the ability to describe what you want.
Use Cases
- 1. Small Business — First Marketing Video (30-90s) — A local bakery has never had a marketing video. The owner types: "Make a 45-second video showing my bakery's fresh bread, croissants, and cakes. Morning light feeling. Show the bread being sliced, the croissants golden and flaky, and a beautiful cake display. End with: Fresh daily at Sunrise Bakery, 123 Main St. Warm, inviting music." NemoVideo produces: warm-lit bakery visuals with close-ups of each product, appetizing slow-motion on the bread slice and croissant flake, bakery display wide shot, text overlays with product names, CTA end frame with address and logo placeholder. The bakery's first video — professional quality from a text description.
- Educator — Lesson Video from Notes (5-15 min) — A high school teacher needs a video explaining the French Revolution for remote students. They type their lesson outline and NemoVideo generates: animated timeline of key events (1789-1799), illustrated scenes for major moments (Storming of the Bastille, Declaration of Rights, Reign of Terror), portrait introductions for key figures (Louis XVI, Robespierre, Marie Antoinette), cause-and-effect diagrams animated step by step, and a summary quiz prompt at the end. Narration: clear, educational, age-appropriate. A complete lesson video from teaching notes.
- Startup — Pitch Video (2-3 min) — A founder needs a video for their crowdfunding campaign but has zero budget for video production. They describe: "We're building a smart water bottle that reminds you to drink, tracks hydration, and syncs with fitness apps. Show the bottle in different settings: gym, office, hiking. Highlight features: LED reminder ring, app dashboard, 24-hour battery. Include testimonials as text cards. End with early bird pricing and campaign link." NemoVideo produces: product visualization in each setting, feature callouts with animated icons, testimonial cards with customer quotes, pricing tiers animation, and CTA with campaign link. A crowdfunding video that looks like a $5,000 production.
- Nonprofit — Awareness Campaign (60-120s) — An animal shelter needs a fundraising video. They describe the mission and NemoVideo generates: emotional opening (lonely animal waiting), the shelter's impact (animated statistics: 500 animals rescued this year), volunteer moments (community engagement visuals), success stories (adopted animals in happy homes), and CTA ("Donate $25 to save a life — link below"). Music: emotional, hopeful. Narration: warm, compassionate. A cause video that drives donations without filming a single frame.
- Content Creator — Daily Video Without Camera (30-60s daily) — A finance creator wants to post daily TikToks about money tips but refuses to appear on camera. Each day they type a tip ("Why you should never pay full price for a car") and NemoVideo generates: engaging visuals matching the topic (car dealership, negotiation scene, calculator), punchy AI voiceover (120 words for 45 seconds), word-by-word captions in trending style, beat-synced background music, and 9:16 vertical format. 30 seconds of typing produces a TikTok-ready video. 365 days of daily content from daily prompts.
How It Works
Step 1 — Describe Your Video
Type what you want. A single sentence works ("Make a 30-second ad for my coffee shop"). A detailed paragraph works better. The more specific, the more accurate the result.
Step 2 — Choose Style and Format
Select: visual style, voice character, music mood, duration, and platform.
Step 3 — Generate
CODEBLOCK0
Step 4 — Review and Share
Preview both formats. Adjust: scene pacing, music energy, text placement. Share directly to social media or download.
Parameters
| Parameter | Type | Required | Description |
|---|
| INLINECODE0 | string | ✅ | Describe the video you want |
| INLINECODE1 |
string | | "warm-lifestyle", "corporate-clean", "cinematic", "animated", "vibrant" |
|
voice | string | | "friendly-female", "authoritative-male", "energetic", "calm", "none" |
|
music | string | | "acoustic-warm", "upbeat", "cinematic", "corporate", "lo-fi", "none" |
|
music_volume | string | | "-12dB" to "-22dB" |
|
duration | string | | "15 sec", "30 sec", "60 sec", "2 min", "5 min" |
|
captions | object | | {style, text, highlight, bg} |
|
formats | array | | ["16:9","9:16","1:1","4:5"] |
|
brand | object | | {colors, logo, fonts} |
|
batch_prompts | array | | Multiple videos from multiple descriptions |
Output Example
CODEBLOCK1
Tips
- 1. Specific descriptions produce dramatically better videos — "A coffee shop video" generates something generic. "Morning light through a window hitting a white marble counter, steam curling from a ceramic mug, barista hands pouring a rosetta latte art" generates a scene you can feel. Sensory detail drives quality.
- One video idea + multiple formats = maximum platform coverage — Describe the video once. Export in 16:9 (YouTube), 9:16 (TikTok/Reels), 1:1 (Instagram/LinkedIn). One creative decision, three platforms covered.
- Batch generation turns a brainstorm into a content library — Write 20 video descriptions in a spreadsheet. Batch-generate all 20. Schedule across the month. A month of video content from one creative session.
- Match visual style to the audience expectation — Warm-lifestyle for food and wellness. Corporate-clean for B2B. Vibrant for fashion and entertainment. Cinematic for storytelling. The visual style signals what kind of content the viewer is about to watch.
- Duration should match the platform and the message density — 15-30 seconds for ads and product teasers. 45-60 seconds for social content. 2-5 minutes for educational and explainer. Longer is not better — appropriate length is better.
Output Formats
| Format | Resolution | Use Case |
|---|
| MP4 16:9 | 1080p / 4K | YouTube / website / presentation |
| MP4 9:16 |
1080x1920 | TikTok / Reels / Shorts / Stories |
| MP4 1:1 | 1080x1080 | Instagram / Facebook / LinkedIn |
| MP4 4:5 | 1080x1350 | Facebook / Instagram feed |
| GIF | 720p | Email / web preview |
Related Skills
Make Video AI — 描述它,AI 就能制作出来
几十年来,视频制作一直是一项门槛极高的手艺。制作一个专业视频需要:摄像机(500-50,000美元)、灯光(200-5,000美元)、场地(免费至10,000美元/天)、演员(100-10,000美元/天)、剪辑软件(0-600美元/年)、剪辑技能(数月至数年的学习)、音乐授权(50-500美元/首)以及时间(每个视频数小时至数周)。一个专业营销视频的总成本:2,000-25,000美元。总时间:1-6周。这些数字意味着视频制作只属于有预算的公司和有专业技能的个人。其他所有人——小企业、个体创业者、教育工作者、学生、非营利组织、社区团体——要么花钱进行昂贵的制作,要么完全放弃视频。NemoVideo 同时消除了所有障碍。无需摄像机:AI 根据描述生成视觉画面。无需剪辑软件:AI 处理剪辑、转场、节奏和格式。无需制作技能:用通俗语言描述你想要的结果。无需音乐授权:自动选择并同步免版税音乐。无需时间投入:几分钟而非几周。制作一个专业视频的成本从数千美元降至一次 API 调用的价格。所需的技能从多年的培训降至描述你想要什么的能力。
使用场景
- 1. 小企业——首个营销视频(30-90秒) — 一家本地面包店从未有过营销视频。店主输入:制作一个45秒的视频,展示我面包店的新鲜面包、牛角包和蛋糕。要有晨光的感觉。展示面包被切开、牛角包金黄酥脆、以及一个漂亮的蛋糕陈列。结尾显示:每日新鲜出炉,Sunrise Bakery,123 Main St。温暖、诱人的音乐。 NemoVideo 生成:暖光下的面包店场景,每个产品的特写镜头,面包切片和牛角包酥皮诱人的慢动作,面包店陈列的广角镜头,带有产品名称的文字叠加,带有地址和 Logo 占位的行动号召结束帧。这家面包店的第一个视频——从文本描述中诞生的专业品质。
- 教育工作者——从笔记生成课程视频(5-15分钟) — 一位高中老师需要一个向远程学生解释法国大革命的视频。他们输入课程大纲,NemoVideo 生成:关键事件(1789-1799)的动画时间线,重大时刻的插图场景(攻占巴士底狱、人权宣言、恐怖统治),关键人物(路易十六、罗伯斯庇尔、玛丽·安托瓦内特)的肖像介绍,逐步动画化的因果关系图,以及结尾的总结测验提示。旁白:清晰、具有教育意义、适合年龄。一个从教学笔记生成的完整课程视频。
- 初创公司——推介视频(2-3分钟) — 一位创始人需要为众筹活动制作视频,但视频制作预算为零。他们描述:我们正在制造一款智能水瓶,它能提醒你喝水、追踪水分摄入量,并与健身应用同步。在不同场景中展示水瓶:健身房、办公室、徒步旅行。突出功能:LED 提醒环、应用仪表盘、24小时电池续航。以文字卡片形式展示用户评价。结尾显示早鸟价格和活动链接。 NemoVideo 生成:每个场景中的产品可视化,带有动画图标的功能标注,带有客户引言的评价卡片,定价层级动画,以及带有活动链接的行动号召。一个看起来像价值5,000美元制作的众筹视频。
- 非营利组织——宣传推广活动(60-120秒) — 一家动物收容所需要筹款视频。他们描述使命,NemoVideo 生成:情感化的开场(孤独的动物在等待),收容所的影响力(动画统计数据:今年救助了500只动物),志愿者时刻(社区参与画面),成功故事(被收养的动物在幸福家庭中),以及行动号召(捐赠25美元拯救一条生命——链接在下方)。音乐:情感化、充满希望。旁白:温暖、富有同情心。一个无需拍摄任何画面就能推动捐赠的事业视频。
- 内容创作者——无需摄像机的每日视频(每日30-60秒) — 一位财经创作者想每天发布关于理财技巧的 TikTok,但拒绝出镜。每天他们输入一个技巧(为什么你永远不应该为汽车支付全价),NemoVideo 生成:与主题匹配的引人入胜的画面(汽车经销商、谈判场景、计算器),有力的 AI 配音(45秒120词),流行风格的逐字字幕,与节拍同步的背景音乐,以及9:16的竖屏格式。30秒的输入产生一个可直接用于 TikTok 的视频。从每日提示中产生365天的每日内容。
工作原理
第一步 — 描述你的视频
输入你想要的内容。一句话即可(为我的咖啡店制作一个30秒广告)。一段详细的描述效果更好。越具体,结果越准确。
第二步 — 选择风格和格式
选择:视觉风格、声音角色、音乐情绪、时长和平台。
第三步 — 生成
bash
curl -X POST https://mega-api-prod.nemovideo.ai/api/v1/generate \
-H Authorization: Bearer $NEMO_TOKEN \
-H Content-Type: application/json \
-d {
skill: make-video-ai,
prompt: 为一家本地咖啡店制作一个60秒视频。早晨氛围:阳光透过窗户,新鲜浓缩咖啡升起蒸汽,咖啡师在拉花,顾客微笑。展示3款招牌饮品及其名称和价格。音乐:原声吉他,温暖诱人,音量-16dB。声音:友好女性,休闲语气。结尾显示:每日营业 早7点至晚6点,456 Oak Street。免费WiFi。话题标签 #SunriseCoffee。导出为 Instagram Reels (9:16) 和 Facebook Feed (1:1)。,
visual_style: warm-lifestyle,
voice: friendly-female-casual,
music: acoustic-warm,
music_volume: -16dB,
duration: 60 sec,
captions: {style: minimal-elegant},
formats: [9:16, 1:1]
}
第四步 — 预览和分享
预览两种格式。调整:场景节奏、音乐能量、文字位置。直接分享到社交媒体或下载。
参数
| 参数 | 类型 | 必填 | 描述 |
|---|
| prompt | string | ✅ | 描述你想要的视频 |
| visual_style |
string | | warm-lifestyle, corporate-clean, cinematic, animated, vibrant |
| voice | string | | friendly-female, authoritative-male, energetic, calm, none |
| music | string | | acoustic-warm, upbeat, cinematic, corporate, lo-fi, none |
| music_volume | string | | -12dB 至 -22dB |
| duration | string | | 15 sec, 30 sec, 60 sec, 2 min, 5 min |
| captions | object | | {style, text, highlight, bg} |
| formats | array | | [16:9,9:16,1:1,4:5] |
| brand | object | | {colors, logo, fonts} |
| batch_prompts | array | | 从多个描述生成多个视频 |
输出示例
json
{
job_id: mva-20260328-001,
status: completed,
duration_seconds: 58,
outputs: {
reels_9x16: {
file: coffee-shop-9x16.mp4,
resolution: 1080x1920,
duration: 0:58,
scenes: 8,
voice: friendly-female-casual,
music: acoustic-warm at -16dB,
captions: minimal-elegant
},
feed_1x1: {
file: coffee-shop-1x1.mp4,
resolution: 1080x1080,
duration: 0:58
}
}
}
技巧
- 1. 具体的描述能产生显著更好的视频 — 一个咖啡店视频会生成一些泛泛的内容。晨光透过窗户照在白色大理石台面上,陶瓷杯中的蒸汽袅袅升起,咖啡师的手正在制作玫瑰花拉花艺术会生成一个你能感受到的场景。感官细节驱动质量。
- 一个视频创意 + 多种格式 = 最大平台覆盖 — 描述一次视频。导出为 16:9(YouTube)、9:16(TikTok/Reels)、1:1(Instagram/LinkedIn)。一次创意决策,覆盖三个平台。
- 批量生成将头脑风暴变成内容库 — 在电子表格中