//Sonicbyai-tools

Sonic

Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"

0
0
0

Sonic

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation, CVPR 2025.



Demo



Demo


License

๐Ÿ‘‹ Join our QQ Chat Group

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ NEWS

undefined2025/03/14: Super stoked to share that our Sonic is accpted by the CVPR 2025! See you Nashville!!

undefined2025/02/08: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! undefinedComfyUI version of Sonicundefined

undefined2025/02/06: Commercialization: Note that our license is non-commercial. If commercialization is required, please use Tencent Cloud Video Creation Large Model: undefinedIntroductionundefined / undefinedAPI documentationundefined

undefined2025/01/17: Our undefinedOnline huggingface Demoundefined is released.

undefined2025/01/17: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on undefinedYouTubeundefined.

undefined2024/12/16: Our undefinedOnline Demoundefined is released.

๐ŸŽฅ Demo

Input Output Input Output

For more visual demos, please visit our undefinedPageundefined.

๐Ÿงฉ Community Contributions

If you develop/use Sonic in your projects, welcome to let us know.

๐Ÿ“‘ Updates

undefined2025/01/14: Our inference code and weights are released. Stay tuned, we will continue to polish the model.

๐Ÿ“œ Requirements

  • An NVIDIA GPU with CUDA support is required.
    • The model is tested on a single 32G GPU.
  • Tested operating system: Linux

๐Ÿ”‘ Inference

Installtion

  • install pytorch
  pip3 install -r requirements.txt
  • All models are stored in checkpoints by default, and the file structure is as follows
Sonic
  โ”œโ”€โ”€checkpoints
  โ”‚  โ”œโ”€โ”€Sonic
  โ”‚  โ”‚  โ”œโ”€โ”€audio2bucket.pth
  โ”‚  โ”‚  โ”œโ”€โ”€audio2token.pth
  โ”‚  โ”‚  โ”œโ”€โ”€unet.pth
  โ”‚  โ”œโ”€โ”€stable-video-diffusion-img2vid-xt
  โ”‚  โ”‚  โ”œโ”€โ”€...
  โ”‚  โ”œโ”€โ”€whisper-tiny
  โ”‚  โ”‚  โ”œโ”€โ”€...
  โ”‚  โ”œโ”€โ”€RIFE
  โ”‚  โ”‚  โ”œโ”€โ”€flownet.pkl
  โ”‚  โ”œโ”€โ”€yoloface_v5m.pt
  โ”œโ”€โ”€...

Download by huggingface-cli follow

  python3 -m pip install "huggingface_hub[cli]"
  huggingface-cli download LeonJoe13/Sonic --local-dir  checkpoints
  huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir  checkpoints/stable-video-diffusion-img2vid-xt
  huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny

or manully download pretrain model, svd-xt and whisper-tiny to checkpoints/

Run demo

  python3 demo.py \
  '/path/to/input_image' \
  '/path/to/input_audio' \
  '/path/to/output_video'

๐Ÿ”— Citation

If you find our work helpful for your research, please consider citing our work.

@article{ji2024sonic,
  title={Sonic: Shifting Focus to Global Audio Perception in Portrait Animation},
  author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
  journal={arXiv preprint arXiv:2411.16331},
  year={2024}
}

@article{ji2024realtalk,
  title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
  author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
  journal={arXiv preprint arXiv:2406.18284},
  year={2024}
}

Explore our related researches:

๐Ÿ“ˆ Star History

Star History Chart

[beta]v0.14.0