Lip Sync on KnightLi Blog

MultiTalk: Audio-Driven Multi-Person Conversational Video Generation

Mon, 15 Jun 2026 08:59:09 +0800

MeiGen-AI/MultiTalk is an audio-driven multi-person conversational video generation project. Its paper is titled Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation. The project aims to generate multi-person conversational videos from multi-stream audio, a reference image, and prompts, while keeping lip motion, interaction, and prompt control aligned.

MultiTalk is not limited to single-person digital humans. It emphasizes multi-person conversations, singing, interactive control, and cartoon character generation. It supports 480P and 720P output. The README notes that the current code mainly supports 480P inference, while 720P requires multiple GPUs.

Main Capabilities

MultiTalk’s capabilities can be summarized as follows:

Single-person and multi-person audio-driven video generation.
Prompt-based control of virtual character interaction.
Generalized generation for cartoon characters and singing.
480P and 720P output.
Video generation up to about 15 seconds.
Multi-GPU, TeaCache, APG, low-VRAM inference, TTS, Gradio, LoRA acceleration, and INT8 quantization.

If you are building multi-person interviews, podcast videos, digital human conversations, character interactions, or TTS-driven character videos, MultiTalk is closer to this use case than single-person lip-sync tools.

Installation

The official README installation process is preserved below. Use a CUDA-capable Linux environment and prepare Conda, Git, enough VRAM, and enough disk space. The environment name in the README is multitalk.

Clone the repository and enter the directory:

1
2

git clone https://github.com/meigen-ai/multitalk.git
cd multitalk

1. Create a conda environment and install PyTorch and xformers

conda create -n multitalk python=3.10
conda activate multitalk
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -U xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu121

This uses the PyTorch wheel for CUDA 12.1. If your driver, CUDA version, or platform differs, adjust the command according to the official PyTorch installation guide.

2. Install Flash-Attn

pip install misaki[en]
pip install ninja
pip install psutil
pip install packaging
pip install flash_attn==2.7.4.post1

flash_attn is sensitive to CUDA, the compiler toolchain, and PyTorch versions. If installation fails, first check whether CUDA, nvcc, GCC, Python, and the PyTorch wheel match.

3. Install other dependencies

1
2

pip install -r requirements.txt
conda install -c conda-forge librosa

4. Install FFmpeg

Inside the Conda environment:

`1`	`conda install -c conda-forge ffmpeg`

Or install it at the system level. The README gives this yum example:

`1`	`sudo yum install ffmpeg ffmpeg-devel`

Model Preparation

MultiTalk needs these models:

Model	Purpose
`Wan2.1-I2V-14B-480P`	Base model
`chinese-wav2vec2-base`	Audio encoder
`Kokoro-82M`	TTS weights
`MeiGen-MultiTalk`	MultiTalk audio condition weights

The README uses huggingface-cli to download models. The official commands are:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./weights/Wan2.1-I2V-14B-480P
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./weights/chinese-wav2vec2-base
huggingface-cli download TencentGameMate/chinese-wav2vec2-base model.safetensors --revision refs/pr/1 --local-dir ./weights/chinese-wav2vec2-base
huggingface-cli download hexgrad/Kokoro-82M --local-dir ./weights/Kokoro-82M
huggingface-cli download MeiGen-AI/MeiGen-MultiTalk --local-dir ./weights/MeiGen-MultiTalk

If huggingface-cli is not available, install Hugging Face Hub first:

`1`	`pip install -U huggingface_hub`

After download, the directory should roughly contain:

weights/
  Wan2.1-I2V-14B-480P/
  chinese-wav2vec2-base/
  Kokoro-82M/
  MeiGen-MultiTalk/

Link Or Copy The MultiTalk Model

The README asks you to link or copy the MultiTalk model into the Wan2.1-I2V-14B-480P directory.

Using symlinks:

1
2
3

mv weights/Wan2.1-I2V-14B-480P/diffusion_pytorch_model.safetensors.index.json weights/Wan2.1-I2V-14B-480P/diffusion_pytorch_model.safetensors.index.json_old
sudo ln -s {Absolute path}/weights/MeiGen-MultiTalk/diffusion_pytorch_model.safetensors.index.json weights/Wan2.1-I2V-14B-480P/
sudo ln -s {Absolute path}/weights/MeiGen-MultiTalk/multitalk.safetensors weights/Wan2.1-I2V-14B-480P/

Or copy the files:

1
2
3

mv weights/Wan2.1-I2V-14B-480P/diffusion_pytorch_model.safetensors.index.json weights/Wan2.1-I2V-14B-480P/diffusion_pytorch_model.safetensors.index.json_old
cp weights/MeiGen-MultiTalk/diffusion_pytorch_model.safetensors.index.json weights/Wan2.1-I2V-14B-480P/
cp weights/MeiGen-MultiTalk/multitalk.safetensors weights/Wan2.1-I2V-14B-480P/

If you use symlinks, replace {Absolute path} with your real absolute path. On Windows or WSL, also verify symlink permissions and path resolution.

Common Parameters

The README lists these parameters:

--mode streaming: long video generation.
--mode clip: generate short video with one chunk.
--use_teacache: run with TeaCache.
--size multitalk-480: generate 480P video.
--size multitalk-720: generate 720P video.
--use_apg: run with APG.
--teacache_thresh: A coefficient used for TeaCache acceleration
—-sample_text_guide_scale： When not using LoRA, the optimal value is 5. After applying LoRA, the recommended value is 1.
—-sample_audio_guide_scale： When not using LoRA, the optimal value is 4. After applying LoRA, the recommended value is 2.

This keeps the README’s original spelling. Before copying commands, note that the dashes before sample_text_guide_scale and sample_audio_guide_scale may not be standard --; check them manually.

Single-Person Inference

Single-person, single-GPU run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/single_example_1.json \
    --sample_steps 40 \
    --mode streaming \
    --use_teacache \
    --save_file single_long_exp

Single-person low-VRAM run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/single_example_1.json \
    --sample_steps 40 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --use_teacache \
    --save_file single_long_lowvram_exp

Single-person multi-GPU inference:

GPU_NUM=8
torchrun --nproc_per_node=$GPU_NUM --standalone generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --dit_fsdp --t5_fsdp \
    --ulysses_size=$GPU_NUM \
    --input_json examples/single_example_1.json \
    --sample_steps 40 \
    --mode streaming \
    --use_teacache \
    --save_file single_long_multigpu_exp

Single-person TTS run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/single_example_tts_1.json \
    --sample_steps 40 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --use_teacache \
    --save_file single_long_lowvram_tts_exp \
    --audio_mode tts

Multi-Person Inference

Multi-person, single-GPU run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_2.json \
    --sample_steps 40 \
    --mode streaming \
    --use_teacache \
    --save_file multi_long_exp

Multi-person low-VRAM run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_2.json \
    --sample_steps 40 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --use_teacache \
    --save_file multi_long_lowvram_exp

Multi-person multi-GPU inference:

GPU_NUM=8
torchrun --nproc_per_node=$GPU_NUM --standalone generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --dit_fsdp --t5_fsdp --ulysses_size=$GPU_NUM \
    --input_json examples/multitalk_example_2.json \
    --sample_steps 40 \
    --mode streaming --use_teacache \
    --save_file multi_long_multigpu_exp

Multi-person TTS run:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_tts_1.json \
    --sample_steps 40 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --use_teacache \
    --save_file multi_long_lowvram_tts_exp \
    --audio_mode tts

FusionX, Quantization, And Gradio

FusioniX or Lightx2v can reduce sampling steps. The README’s single-person example:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/single_example_1.json \
    --lora_dir weights/Wan2.1_I2V_14B_FusionX_LoRA.safetensors \
    --lora_scale 1.0 \
    --sample_text_guide_scale 1.0 \
    --sample_audio_guide_scale 2.0 \
    --sample_steps 8 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --save_file single_long_lowvram_fusionx_exp \
    --sample_shift 2

Multi-person example:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_2.json \
    --lora_dir weights/Wan2.1_I2V_14B_FusionX_LoRA.safetensors \
    --lora_scale 1.0 \
    --sample_text_guide_scale 1.0 \
    --sample_audio_guide_scale 2.0 \
    --sample_steps 8 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --save_file multi_long_lowvram_fusionx_exp

The INT8 quantized model only supports single-GPU runs:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_2.json \
    --sample_steps 40 \
    --mode streaming \
    --use_teacache \
    --quant int8 \
    --quant_dir weights/MeiGen-MultiTalk \
    --num_persistent_param_in_dit 0 \
    --save_file multi_long_lowvram_exp_quant

Quantization with LoRA:

python generate_multitalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --input_json examples/multitalk_example_1.json \
    --quant int8 \
    --quant_dir weights/MeiGen-MultiTalk \
    --lora_dir weights/MeiGen-MultiTalk/quant_models/quant_model_int8_FusionX.safetensors \
    --sample_text_guide_scale 1.0 \
    --sample_audio_guide_scale 2.0 \
    --sample_steps 8 \
    --mode streaming \
    --num_persistent_param_in_dit 0 \
    --save_file multi_long_lowvram_fusionx_exp_quant \
    --sample_shift 2

Gradio example:

python app.py \
    --lora_dir weights/Wan2.1_I2V_14B_FusionX_LoRA.safetensors \
    --lora_scale 1.0 \
    --num_persistent_param_in_dit 0 \
    --sample_shift 2

Or:

`1`	`python app.py --num_persistent_param_in_dit 0`

Quantized Gradio example:

python app.py \
    --quant int8 \
    --quant_dir weights/MeiGen-MultiTalk \
    --lora_dir weights/MeiGen-MultiTalk/quant_models/quant_model_int8_FusionX.safetensors \
    --sample_shift 2 \
    --num_persistent_param_in_dit 0

Practical Notes

MultiTalk has heavy dependencies and large models. Before deployment, check these points:

The current code mainly supports 480P inference; 720P requires multiple GPUs.
The official notes say audio CFG usually works well between 3 and 5, and increasing audio CFG can improve lip synchronization.
The model was trained on 81-frame videos at 25 FPS. 81 frames are better for prompt following; longer clips may reduce prompt-following quality.
During long-video generation, Audio CFG affects color consistency across segments. Try setting it to 3 to reduce tonal variation.
The recommended --teacache_thresh range is 0.2 to 0.5. Higher values may run faster but can reduce video quality.
For low-VRAM environments, try --num_persistent_param_in_dit 0, INT8 quantization, or community low-VRAM solutions first.

The README also notes that TeaCache can provide about a 2x to 3x speedup. Actual results depend on GPU, resolution, sampling steps, number of people, and whether LoRA or quantization is used.

Summary

MultiTalk is an audio-driven generation project for multi-person conversational video. It fits multi-person digital humans, interview videos, TTS-driven videos, cartoon character interaction, and singing scenarios.

For a quick trial, prepare the environment and models using the official installation process, complete the model linking or copying step, then start with the 480P single-person example. After that works, gradually try multi-person generation, TTS, TeaCache, LoRA acceleration, INT8 quantization, and Gradio.

Reference:

GitHub: https://github.com/meigen-ai/multitalk

InfiniteTalk: Audio-Driven Talking Video Generation With Long-Video Support

Mon, 15 Jun 2026 08:50:52 +0800

MeiGen-AI/InfiniteTalk is an audio-driven video generation project. Its goal is to synchronize input audio onto a person video or image, generating a new video where lip shape, head motion, body posture, and facial expressions follow the audio.

It is not positioned as a simple mouth replacement tool. The README describes InfiniteTalk as a sparse-frame video dubbing framework. It can preserve identity in video-to-video scenarios, supports long-video generation, and can also generate talking videos from a single image plus audio in image-to-video scenarios.

Main Capabilities

InfiniteTalk’s key capabilities are straightforward:

Audio-driven video-to-video generation.
Image-to-video generation from an input image and audio.
Synchronization beyond lips, including head, body, and facial expressions.
Long-video generation.
Compared with MultiTalk, the project description emphasizes reduced hand and body distortions and improved lip-sync accuracy.

This kind of project is suitable for interview dubbing, digital human video, localized lip synchronization, long-video redubbing, and virtual character expression. It is closer to a research and engineering tool than a lightweight desktop app.

Installation

The official README installation process is preserved below. Prepare a CUDA-capable Linux environment, Conda, Git, enough VRAM, and enough disk space first. The environment name in the README remains multitalk.

Clone the repository and enter the directory:

1
2

git clone https://github.com/MeiGen-AI/InfiniteTalk.git
cd InfiniteTalk

1. Create a conda environment and install PyTorch and xformers

conda create -n multitalk python=3.10
conda activate multitalk
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -U xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu121

This uses the PyTorch wheel for CUDA 12.1. If your local CUDA, driver, or platform differs, adjust the command according to the official PyTorch installation guide.

2. Install Flash-Attn dependencies

pip install misaki[en]
pip install ninja
pip install psutil
pip install packaging
pip install wheel
pip install flash_attn==2.7.4.post1

flash_attn is sensitive to CUDA, compiler toolchain, and PyTorch versions. If this step fails, first check whether CUDA, nvcc, GCC, Python, and the PyTorch wheel match each other.

3. Install other dependencies

1
2

pip install -r requirements.txt
conda install -c conda-forge librosa

4. Install FFmpeg

Inside the Conda environment, you can use:

`1`	`conda install -c conda-forge ffmpeg`

You can also install it at the system level. The README gives this yum example:

`1`	`sudo yum install ffmpeg ffmpeg-devel`

Model Preparation

InfiniteTalk needs three types of models:

Model	Purpose
`Wan2.1-I2V-14B-480P`	Base model
`chinese-wav2vec2-base`	Audio encoder
`MeiGen-InfiniteTalk`	InfiniteTalk audio condition weights

The README uses huggingface-cli to download models. The official commands are:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./weights/Wan2.1-I2V-14B-480P
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./weights/chinese-wav2vec2-base
huggingface-cli download TencentGameMate/chinese-wav2vec2-base model.safetensors --revision refs/pr/1 --local-dir ./weights/chinese-wav2vec2-base
huggingface-cli download MeiGen-AI/InfiniteTalk --local-dir ./weights/InfiniteTalk

If huggingface-cli is not available, install Hugging Face Hub first:

`1`	`pip install -U huggingface_hub`

Some models may require logging in to Hugging Face or accepting the usage terms on the model page. After download, the directory structure should roughly contain:

weights/
  Wan2.1-I2V-14B-480P/
  chinese-wav2vec2-base/
  InfiniteTalk/

Common Runtime Parameters

The README lists several key parameters:

--mode streaming: long video generation.
--mode clip: generate short video with one chunk.
--use_teacache: run with TeaCache.
--size infinitetalk-480: generate 480P video.
--size infinitetalk-720: generate 720P video.
--use_apg: run with APG.
--teacache_thresh: A coefficient used for TeaCache acceleration
—-sample_text_guide_scale： When not using LoRA, the optimal value is 5. After applying LoRA, the recommended value is 1.
—-sample_audio_guide_scale： When not using LoRA, the optimal value is 4. After applying LoRA, the recommended value is 2.
—-sample_audio_guide_scale： When not using LoRA, the optimal value is 4. After applying LoRA, the recommended value is 2.
--max_frame_num: The max frame length of the generated video, the default is 40 seconds(1000 frames).

The parameter spelling above preserves the README text. In real use, note that a few option prefixes look like full-width or nonstandard dashes. Before copying commands, check whether they should be standard --sample_text_guide_scale and --sample_audio_guide_scale.

Single-GPU Inference Example

Official single-GPU example:

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --input_json examples/single_example_image.json \
    --size infinitetalk-480 \
    --sample_steps 40 \
    --mode streaming \
    --motion_frame 9 \
    --save_file infinitetalk_res

To run at 720P, change --size to infinitetalk-720:

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --input_json examples/single_example_image.json \
    --size infinitetalk-720 \
    --sample_steps 40 \
    --mode streaming \
    --motion_frame 9 \
    --save_file infinitetalk_res_720p

For low-VRAM mode, add --num_persistent_param_in_dit 0:

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --input_json examples/single_example_image.json \
    --size infinitetalk-480 \
    --sample_steps 40 \
    --num_persistent_param_in_dit 0 \
    --mode streaming \
    --motion_frame 9 \
    --save_file infinitetalk_res_lowvram

Multi-GPU, Multi-Person, And Gradio

Multi-GPU inference example:

GPU_NUM=8
torchrun --nproc_per_node=$GPU_NUM --standalone generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --dit_fsdp --t5_fsdp \
    --ulysses_size=$GPU_NUM \
    --input_json examples/single_example_image.json \
    --size infinitetalk-480 \
    --sample_steps 40 \
    --mode streaming \
    --motion_frame 9 \
    --save_file infinitetalk_res_multigpu

Multi-person animation example:

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/multi/infinitetalk.safetensors \
    --input_json examples/multi_example_image.json \
    --size infinitetalk-480 \
    --sample_steps 40 \
    --num_persistent_param_in_dit 0 \
    --mode streaming \
    --motion_frame 9 \
    --save_file infinitetalk_res_multiperson

Gradio example with single-person weights:

python app.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --num_persistent_param_in_dit 0 \
    --motion_frame 9

Gradio example with multi-person weights:

python app.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/multi/infinitetalk.safetensors \
    --num_persistent_param_in_dit 0 \
    --motion_frame 9

Acceleration And Quantized Inference

The README also provides an example for FusioniX or Lightx2v. FusioniX requires 8 steps, while lightx2v requires only 4 steps.

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --lora_dir weights/Wan2.1_I2V_14B_FusionX_LoRA.safetensors \
    --input_json examples/single_example_image.json \
    --lora_scale 1.0 \
    --size infinitetalk-480 \
    --sample_text_guide_scale 1.0 \
    --sample_audio_guide_scale 2.0 \
    --sample_steps 8 \
    --mode streaming \
    --motion_frame 9 \
    --sample_shift 2 \
    --num_persistent_param_in_dit 0 \
    --save_file infinitetalk_res_lora

The quantized model only supports single-GPU inference. Official example:

python generate_infinitetalk.py \
    --ckpt_dir weights/Wan2.1-I2V-14B-480P \
    --wav2vec_dir 'weights/chinese-wav2vec2-base' \
    --infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors \
    --input_json examples/single_example_image.json \
    --size infinitetalk-480 \
    --sample_steps 40 \
    --mode streaming \
    --quant fp8 \
    --quant_dir weights/InfiniteTalk/quant_models/infinitetalk_single_fp8.safetensors \
    --motion_frame 9 \
    --num_persistent_param_in_dit 0 \
    --save_file infinitetalk_res_quant

Practical Notes

InfiniteTalk has heavy dependencies and large models. Before deployment, check a few things:

Whether your GPU has enough VRAM. Low-VRAM machines should first try --num_persistent_param_in_dit 0 or the quantized model.
Whether flash_attn can be installed correctly with the current CUDA and PyTorch combination.
Whether the Hugging Face models are fully downloaded and whether paths match the weights/... paths in the commands.
Whether the input JSON follows the example format.
720P, long videos, multi-person generation, and multi-GPU inference all increase resource requirements significantly.

The README also notes that although FusionX LoRA can speed up inference and improve quality, it may worsen color shift after 1 minute and reduce identity preservation. For I2V, generation from a single image works better within 1 minute; beyond 1 minute, color shift becomes more obvious.

Summary

InfiniteTalk is an audio-driven video generation project aimed at research and engineering use. It fits scenarios that need long-form lip sync, character dubbing, digital human video, and image-to-talking-video generation.

For a quick trial, prepare the environment and models according to the official installation steps, then start with the 480P single-GPU example. After paths, dependencies, and VRAM are confirmed, try 720P, multi-person, multi-GPU, LoRA acceleration, or the quantized model.

Reference:

GitHub: https://github.com/MeiGen-AI/InfiniteTalk