The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
SDXL implementation of AnimateDiff.
Tooll 3 is an open source software to create realtime motion graphics.
🎨 GPT for video generation ⚡️
Isoflow Diagram as Code and AI Integration to build diagram as code using AI
Model components of the Llama Stack APIs
A Unity MCP server that allows MCP clients like Claude Desktop or Cursor to perform Unity Editor actions.
[T-PAMI 2025] Official implementation for "SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation" https://arxiv.org/abs/2411.17832
Converts raster images into SVG in ComfyUI.
Bridge between ComfyUI and blender ComfyUI-BlenderAI-node addon - Advance Nodes and English Translations.
Used for AI model generation, next-generation Blender rendering engine, texture enhancement&generation (based on ComfyUI)
An open protocol enabling communication and interoperability between opaque agentic applications.
Development repository for the Triton language and compiler
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
gradio WebUI for AdvancedLivePortrait
:fire: 2D and 3D Face alignment library build using pytorch
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Used for AI model generation, next-generation Blender rendering engine, texture enhancement&generation (based on ComfyUI)
Official extension for Blender
Text to 4D Worlds in Blender
Use AI Agents directly in Blender.
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
This extension integrates ByteDance's UNO-FLUX model into ComfyUI, allowing you to use UNO's powerful text-to-image generation with reference capabilities.
hailuo automation
Better than SHAP for Keyword Importance
Creates prompts for Video Models by sequence analysis and prompting using Qwen2.5-VL models from Alibaba.
Official inference repo for FLUX.1 models
Rectified Flow Inversion (RF-Inversion) - ICLR 2025
Nodes for image juxtaposition for Flux in ComfyUI
Official inference repo for FLUX.1 models
Taming FLUX for Image Inversion & Editing; OpenSora for Video Inversion & Editing! (Official implementation for Taming Rectified Flow for Inversion and Editing.)
Flow is a custom node designed to provide a user-friendly interface for ComfyUI.
LLM inference in C/C++
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
A Web UI simplify the AI videos generation using Hunyuan Video Diffusion Model
FastVideo is an open-source framework for accelerating large video diffusion model.
A pipeline parallel training script for diffusion models.
Image composition toolbox: everything you want to know about image composition or object insertion
[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
real time face swap and one-click video deepfake with only a single image
pix2pix3D: Generating 3D Objects from 2D User Inputs
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
Bring portraits to life!
Various AI scripts. Mostly Stable Diffusion stuff.
ComfyUI nodes for LivePortrait
Select a portrait, click to move the head around (please use your own space / GPU!)
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
AI Photo Editing with Inpainting
AI-Powered Photo Editor (Python, PyQt6, PyTorch)
A web app that allows you to select a subject and then change its background, OR keep the background and change the subject.
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
Prompt, run, edit, and deploy full-stack web applications using any LLM you want!
A simple and easy-to-use fx sounds generator
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Demo for NVIDIA's Fewshot Vid2vid
Automatic1111 Stable Diffusion WebUI Video Extension
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Turn your words into music! Describe a sound (e.g., happy, spooky) and this app generates a short piece based on your text.
Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]
Code for Investigating Personalization Methods in Text to Music Generation
some generative audio tools for ComfyUI
Mustango: Toward Controllable Text-to-Music Generation
Text-to-Audio/Music Generation
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Automatically generate and overlay subtitles for any video.
Code for the paper "Jukebox: A Generative Model for Music"
A trainable PyTorch reproduction of AlphaFold 3.
A HTML5 video player with a parser that saves traffic
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
A nearly-live implementation of OpenAI's Whisper.
Robust Speech Recognition via Large-Scale Weak Supervision
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
We use cookies
We use cookies to analyze traffic and improve your experience. You can accept or reject analytics cookies.