CyberVerse is an open-source digital human agent platform with real-time video calling. Create an AI agent you can see and talk to, face to face, just like a video call.
CyberVerse is an open-source digital human agent platform with real-time video calling. Create an AI agent you can see and talk to, face to face, just like a video call.
Ever dreamed of having your own J.A.R.V.I.S. — an AI that truly sees you, hears you, and talks back in real time?
Want to see someone you’ve lost again, hear their voice, watch them smile at you?
Or maybe there’s a character you’ve always wished you could bring to life?
Just one photo. CyberVerse makes them alive.
Not pre-recorded. Not turn-based. Unlimited-duration, live, low-latency video calls with a digital human — first frame in ~1.5s. Built on WebRTC with P2P streaming and embedded TURN/NAT traversal.
Every digital human is more than an avatar you can talk to. It is the AI that actually does things.
Upload a single photo to create your digital human. State-of-the-art avatar models deliver real-time facial animation, natural lip-sync, and subtle idle breathing — no 3D modeling or motion capture.
Brain, face, voice, ears — every component is a swappable plugin. Mix and match LLMs, TTS engines, ASR models, and avatar backends via YAML config.
Characters shown here are demo examples only. They are not bundled with CyberVerse and are not provided for commercial use.
Real-time video conversation requires GPU acceleration. Below are benchmarks for FlashHead and LiveAct avatar models:
| Model | Quality | GPU | Count | Resolution | FPS | Real-time? |
|---|---|---|---|---|---|---|
| FlashHead 1.3B | Pro | RTX 5090 | 2 | 512×512 | 25+ | ✅ Yes |
| FlashHead 1.3B | Pro | RTX 5090 | 1 | 464x464 | 20 | ✅ Yes |
| FlashHead 1.3B | Pro | RTX PRO 6000 | 1 | 512×512 | 20 | ✅ Yes |
| FlashHead 1.3B | Pro | RTX 4090 | 1 | 512×512 | ~10.8 | ❌ No |
| FlashHead 1.3B | Lite | RTX 4090 | 1 | 512×512 | 25+ | ✅ Yes |
| LiveAct 18B | — | RTX PRO 6000 | 2 | 320×480 | 20 | ✅ Yes |
| LiveAct 18B | — | RTX PRO 6000 | 1 | 256×417 | 20 | ✅ Yes |
Pro favors visual quality; Lite favors speed. The table reflects typical quality–compute balances — more GPU headroom lets you push higher quality; tighter hardware calls for lower settings (resolution, Pro vs Lite, etc.) to stay real-time.
protoc-gen-go, protoc-gen-go-grpc)libvpx for video encoding)To verify, use:
node --version
go version
protoc --version
ffmpeg -version|grep libvpx
conda --version
git clone https://github.com/dsd2077/CyberVerse.git
cd CyberVerse
conda create -n cyberverse python=3.10
conda activate cyberverse
Install PyTorch (CUDA 12.8) in this environment:
pip3 install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
cp infra/.env.example .env
Edit .env, fill in your API keys:
DOUBAO_ACCESS_TOKEN=your_doubao_access_token # ByteDance Doubao omni model
DOUBAO_APP_ID=your_doubao_app_id
Doubao Voice: get App ID / API Key per Volcengine quick start → DOUBAO_APP_ID / DOUBAO_ACCESS_TOKEN.
After the stack is running, you can change these values (and other API keys / service endpoints) from the web UI at /settings instead of editing .env only.
CyberVerse currently supports FlashHead and LiveAct; download only what you need. More backends are planned.
pip install "huggingface_hub[cli]"
| Model Component | Description | Link |
|---|---|---|
SoulX-FlashHead-1_3B |
1.3B FlashHead weights | Hugging Face, ModelScope |
wav2vec2-base-960h |
Audio feature extractor | Hugging Face, ModelScope |
# If you are in mainland China, you can use a mirror first:
# export HF_ENDPOINT=https://hf-mirror.com
hf download Soul-AILab/SoulX-FlashHead-1_3B \
--local-dir ./checkpoints/SoulX-FlashHead-1_3B
hf download facebook/wav2vec2-base-960h \
--local-dir ./checkpoints/wav2vec2-base-960h
| ModelName | Download |
|---|---|
| SoulX-LiveAct | Hugging Face, ModelScope |
| chinese-wav2vec2-base | Hugging Face, ModelScope |
hf download Soul-AILab/LiveAct \
--local-dir ./checkpoints/LiveAct
hf download TencentGameMate/chinese-wav2vec2-base \
--local-dir ./checkpoints/chinese-wav2vec2-base
cp infra/cyberverse_config.example.yaml cyberverse_config.yaml
Edit the local cyberverse_config.yaml, update the model paths to match your local checkpoints. This file is ignored by git so local paths and deployment settings do not conflict with upstream changes.
inference:
avatar:
default: "flash_head" # selects which avatar model to start; if set to live_act, fill the live_act section below
runtime:
cuda_visible_devices: 0 # shared GPU ID(s), e.g. 0,1 for multi-GPU
world_size: 1 # shared GPU count, set to 2 for dual-GPU
flash_head:
checkpoint_dir: "./checkpoints/SoulX-FlashHead-1_3B" # ← your path
wav2vec_dir: "./checkpoints/wav2vec2-base-960h" # ← your path
model_type: "lite" # "pro" for higher quality (needs more GPU)
compile_model: true
compile_vae: true
dist_worker_main_thread: true
infer_params:
frame_num: 33
motion_frames_latent_num: 2
tgt_fps: 20
sample_rate: 16000
sample_shift: 5
color_correction_strength: 1.0
cached_audio_duration: 8
num_heads: 12
height: 512
width: 512
live_act:
ckpt_dir: "./checkpoints/LiveAct" # ← your path
wav2vec_dir: "./checkpoints/chinese-wav2vec2-base" # ← your path
seed: 42
compile_wan_model: false
compile_vae_decode: false
dist_worker_main_thread: true
default_prompt: "一个人在说话"
infer_params:
size: "320*480"
fps: 20
audio_cfg: 1.0
You can skip editing paths here for now and adjust these options later in the web UI.
# SageAttention (source build)
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # Optional
python setup.py install
# FlashAttention (optional)
pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation
If compilation is slow, download a prebuilt wheel from flash-attention releases and
pip install <wheel>.whl.
make setup
This installs the base editable package ([dev,inference]), generates gRPC stubs, and installs frontend dependencies. For extra Python packages, either install everything (large) or cherry-pick extras listed under [project.optional-dependencies] in pyproject.toml:
# all optional groups at once
pip install -e ".[all]"
# or pick what you need, e.g.:
pip install -e ".[omni,flash_head]"
pip install -e ".[live_act]"
Terminal 1 — Python inference server:
conda activate cyberverse
make inference
make inference will read inference.avatar.default from cyberverse_config.yaml, then initialize exactly that one avatar model in the current inference process. Startup logs will print the active avatar model.
Wait until you see:
Active avatar model initialized: <model_name>CyberVerse Inference Server started on port 50051Terminal 2 — Go API server:
make server
Terminal 3 — Frontend:
make frontend
# Check API health
curl -s http://localhost:8080/api/v1/health
When streaming_mode: direct uses the embedded TURN server, the browser must be able to reach the server’s 8443/TCP. If the page loads but audio/video never connects, or the server logs show ICE connection state: failed or publish timeout waiting for connection, first check whether your machine can reach port 8443 on the server:
nc -vz <server-ip> 8443
If 8443 is not reachable, the usual cause is a cloud security group, firewall, or NAT restriction. In that case, you can forward your local 8443 to the server through an SSH tunnel:
ssh -L 8443:127.0.0.1:8443 user@host -p port
After the tunnel is established, the browser will access the remote TURN service through local 127.0.0.1:8443.
If you want the browser to connect to the remote server directly instead of through an SSH tunnel, set pipeline.ice_public_ip in cyberverse_config.yaml to the server’s public IP or domain. If you are using an SSH tunnel, you can keep the default value (127.0.0.1).
Open http://localhost:5173 in your browser — you’re ready to go.
Configure characters, inference, and launch real-time digital-human sessions.
Turn digital humans into agents with memory, tools, and task execution.
Connect multiple agents so they can communicate, collaborate, and form networks.
GNU General Public License v3.0 — see LICENSE
SoulX-FlashHead — Avatar model by Soul AI Lab
SoulX-LiveAct - Avatar model by Soul AI Lab
Pion — Go WebRTC implementation