Model components of the Llama Stack APIs
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Tooll 3 is an open source software to create realtime motion graphics.
Isoflow Diagram as Code and AI Integration to build diagram as code using AI
No description
A Unity MCP server that allows MCP clients like Claude Desktop or Cursor to perform Unity Editor actions.
[T-PAMI 2025] Official implementation for "SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation" https://arxiv.org/abs/2411.17832
Converts raster images into SVG in ComfyUI.
No description
Bridge between ComfyUI and blender ComfyUI-BlenderAI-node addon - Advance Nodes and English Translations.
Used for AI model generation, next-generation Blender rendering engine, texture enhancement&generation (based on ComfyUI)
No description
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
No description
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
No description
gradio WebUI for AdvancedLivePortrait
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Used for AI model generation, next-generation Blender rendering engine, texture enhancement&generation (based on ComfyUI)
Official extension for Blender
Text to 4D Worlds in Blender
No description
Use AI Agents directly in Blender.
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
No description
This extension integrates ByteDance's UNO-FLUX model into ComfyUI, allowing you to use UNO's powerful text-to-image generation with reference capabilities.
hailuo automation
Better than SHAP for Keyword Importance
Creates prompts for Video Models by sequence analysis and prompting using Qwen2.5-VL models from Alibaba.
Official inference repo for FLUX.1 models
Nodes for image juxtaposition for Flux in ComfyUI
No description
Official inference repo for FLUX.1 models
Taming FLUX for Image Inversion & Editing; OpenSora for Video Inversion & Editing! (Official implementation for Taming Rectified Flow for Inversion and Editing.)
Flow is a custom node designed to provide a user-friendly interface for ComfyUI.
LLM inference in C/C++
A Web UI simplify the AI videos generation using Hunyuan Video Diffusion Model
FastVideo is an open-source framework for accelerating large video diffusion model.
No description
A pipeline parallel training script for diffusion models.
No description
No description
Image composition toolbox: everything you want to know about image composition or object insertion
[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
real time face swap and one-click video deepfake with only a single image
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
Various AI scripts. Mostly Stable Diffusion stuff.
ComfyUI nodes for LivePortrait
AI Photo Editing with Inpainting
AI-Powered Photo Editor (Python, PyQt6, PyTorch)
A web app that allows you to select a subject and then change its background, OR keep the background and change the subject.
No description
No description
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
No description
Prompt, run, edit, and deploy full-stack web applications using any LLM you want!
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Demo for NVIDIA's Fewshot Vid2vid
Automatic1111 Stable Diffusion WebUI Video Extension
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Turn your words into music! Describe a sound (e.g., happy, spooky) and this app generates a short piece based on your text.
Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]
some generative audio tools for ComfyUI
Mustango: Toward Controllable Text-to-Music Generation
Text-to-Audio/Music Generation
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Automatically generate and overlay subtitles for any video.
A trainable PyTorch reproduction of AlphaFold 3.
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
A nearly-live implementation of OpenAI's Whisper.
Robust Speech Recognition via Large-Scale Weak Supervision
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.