Back to Guides

Wan 2.7: The Complete Guide (2026)

Alibaba's Wan 2.7 explained: Thinking Mode, native audio, instruction-based editing, and how to call the API. Real examples included.

Wan 2.7: The Complete Guide (2026)

Wan 2.7 is Alibaba Tongyi Lab's latest AI video model, and it's a different lineage entirely from ByteDance's Seedance line: a four-mode suite covering text-to-video, image-to-video, reference-to-video, and instruction-based editing, with a built-in planning step and native audio. This guide covers what it is, what it can do, how pricing and access work, and how it stacks up against Wan's own earlier versions and against Seedance.

What is Wan 2.7?

Wan 2.7 is Alibaba Tongyi Lab's flagship AI video model, released in April 2026. Rather than being a single generation mode, it's a suite: text-to-video, image-to-video, reference-to-video, and instruction-based editing are all available through one endpoint, so you can generate a clip, continue it, anchor it to a reference, or revise it without switching tools.

Key features

A four-mode video suite. One model family handles generation, continuation, reference-driven shots, and prompt-based revision, which cuts down on juggling separate tools for each stage of a video workflow.

Thinking Mode. Before generating, Wan 2.7 interprets and plans your prompt, building a structural understanding of the scene first. That means clear direction on subject, lighting, camera, and mood tends to reward you with more intentional output, rather than the model just pattern-matching keywords.

Native synchronized audio. It generates ambient sound, lip-synced dialogue, and background music in the same pass as the video, and it can clone a voice from a supplied audio reference rather than requiring a separate voice model.

First-and-last-frame control. Set a starting frame and an ending frame, and Wan 2.7 generates coherent motion across everything in between, useful for anything where the exact opening and closing shot matters.

Instruction-based editing. Feed it an existing clip and a text instruction (change the background, swap an object, shift the camera), and it applies the edit locally or globally without re-rendering the whole thing from scratch.

Multi-reference consistency. It accepts up to five image or video references to hold a character's identity and voice across shots, combined with physics-aware motion and 1080p output, and it runs noticeably faster than previous Wan versions.

Specs at a glance

Aspect ratios16:9, 9:16, 1:1, 4:3, 3:4
Resolutions720p, 1080p
Durations2 to 15 seconds
AudioSupported, including voice cloning from a reference
Avg. completion timeAvg. completion time
Image inputSupported (first frame, last frame, references)

Wan 2.7 vs Wan 2.5 and 2.6

If you're still on an earlier Wan version, or you've been searching for "Wan 2.5" or "Wan 2.6" and landed here instead, here's what's actually new in 2.7.

Thinking Mode and instruction-based editing are both new to this release; neither 2.5 nor 2.6 plans the prompt before generating or supports text-driven edits to an existing clip. Multi-reference consistency is also more developed: earlier versions supported fewer simultaneous references, while 2.7 handles up to five image or video references to hold a character's identity and voice across a whole sequence. First-and-last-frame control existed in earlier form but is more reliable here, and inference is meaningfully faster across the board.

If your use case leans on editing existing footage or needs tight character consistency across several shots, 2.7 is worth the move. If you're doing simple one-off text-to-video clips, the practical difference is smaller.

How to use Wan 2.7 (via API)

Using Wan 2.7 through an API comes down to the same pattern as most modern video models: get a key, submit a request, get the result.

bash
curl -X POST https://api.apiframe.ai/v2/videos/generate \
  -H "X-API-Key: afk_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
        "prompt": "a cinematic sunrise over a futuristic cityscape, smooth camera push-in",
        "model": "wan-2.7",
        "wan27Params": {
            "resolution": "720p"
        }
    }'

That returns a jobId immediately, since generation is asynchronous (average completion is around three minutes). Poll the job endpoint until the status is COMPLETED, or register a webhook so the result comes to you instead, which is the better pattern once you're past testing.

For image-to-video, pass an image as the first frame. For first-and-last-frame control, add a last_frame alongside an anchor image or clip. For instruction-based editing, point first_clip at an existing video and describe the change in your prompt. The full Wan 2.7 API docs cover every parameter, including negative prompts, prompt expansion, and seeding for reproducible output.

Wan 2.7 examples and prompts

A few real prompts that show the range of what Wan 2.7 can do:

Text-to-video: "A 5-second clip of a chef slicing vegetables in a modern kitchen, smooth left-to-right pan, cinematic lighting."

Image-to-video: "A 5-second image-to-video clip animating this product photo, slow turntable rotation, soft studio light."

First-and-last-frame: "A 5-second first-and-last-frame clip morphing a closed flower bud into a full bloom."

Reference-driven consistency: "A 5-second reference-based clip keeping this character consistent as she walks through a busy market."

Native audio: "A 5-second clip of a drone ascending over a mountain river at sunset, orchestral background music."

Instruction-based editing: "Edit this 5-second clip, change the background to a rainy night and keep the subject and motion intact."

You can see these generated, and copy the exact prompts, on the Wan 2.7 model page.

How Wan 2.7 compares to Seedance

Wan 2.7 and ByteDance's Seedance line are the two most complete video suites available right now, and they solve overlapping problems differently. Wan 2.7's edge is Thinking Mode and instruction-based editing: if your workflow involves revising existing footage through plain text instructions rather than regenerating from scratch, Wan 2.7 is built for that in a way Seedance isn't. Seedance 2.5, on the other hand, pushes further on raw single-pass duration (30 seconds versus Wan's 15) and reference budget (up to 50 versus Wan's 5).

If you're deciding between them, or want the full picture on ByteDance's side, the Seedance 2.0 guide and Seedance 2.5 guide cover that model family in the same depth as this page.

FAQ

What is Wan 2.7?

Wan 2.7 is Alibaba Tongyi Lab's flagship AI video model, released in April 2026. It's a four-mode suite covering text-to-video, image-to-video, reference-to-video, and instruction-based editing.

What is Thinking Mode?

A planning step where the model interprets and structures your prompt before generating, so clear, detailed direction produces more intentional results.

Does it generate audio?

Yes. Wan 2.7 produces synchronized native audio, including ambient sound, lip-synced dialogue, and background music, and it supports voice cloning from a reference.

Can it edit existing videos?

Yes. Instruction-based editing lets you change the scene, style, objects, or action in an existing clip through a text instruction, applied locally or globally.

How is Wan 2.7 different from Wan 2.5 and 2.6?

Thinking Mode and instruction-based editing are new to 2.7, multi-reference consistency now handles up to five references, and inference is faster overall. See the comparison section above.

Where can you access it?

Through Apiframe, as well as Alibaba Cloud Model Studio, the official Wan website, and (eventually) the Qwen app.

💡
Ready to try it? Get an API key and start with free credits, or head to the Wan 2.7 API page for the full docs and live pricing.

Ready to start building?

Get your API key and start generating AI content in minutes.