Use Unity 3D character and Python deep learning algorithms to stream as a VTuber!
Use Unity 3D character and Python deep learning algorithms to stream as a VTuber!
This is part of the OpenVTuberProject, which provides many toolkits for becoming a VTuber.
Youtube Playlist (Chinese) (Covers videos 1-4):
First of all, I’d like to give credits to the following projects that I borrow code from:
Project | LICENSE |
---|---|
head-pose-estimation | LICENSE |
face-alignment | LICENSE |
GazeTracking | LICENSE |
And the virtual character unity-chan © UTJ/UCL.
conda create -n vtuber python=3.6
. Activate it by conda activate vtuber
.Python libraries
pip install -r requirements_(cpu or gpu).txt
pip install onnxruntime-gpu
to get faster inference speed using onnx model.conda
. Example: conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
pip install -r requirements_gpu.txt
.pip install onnxruntime-gpu
to get faster inference speed using onnx model.Optional
Here we assume that you have installed the requirements and activated the virtual environment you are using.
You need to download the models here, extract and put into face_alignment/ckpts
.
If you don’t use onnxruntime
, you can omit this step as the script will automatically download them for you.
Run python demo.py --debug
. (add --cpu
if you have CPU only)
You should see the following:
Left: CPU model. Right: GPU model run on a GTX1080Ti.
python demo.py --connect
to synchronize your face features with the virtual character. (add --debug
to see your face and --cpu
if you have CPU only as step 1.)You should see the following:
Left: CPU model. Right: GPU model run on a GTX1080Ti.
Enjoy your VTuber life!
In this section, I will describe the functionalities implemented and a little about the technology behind.
Using head-pose-estimation and face-alignment, deep learning methods are applied to do the following: face detection and facial landmark detection. A face bounding box and the 68-point facial landmark is detected, then a PnP algorithm is used to obtain the head pose (the rotation of the face). Finally, kalman filters are applied to the pose to make it smoother.
The character’s head pose is synchronized.
As for the visualization, the white bounding box is the detected face, on top of which 68 green face landmarks are plotted. The head pose is represented by the green frustum and the axes in front of the nose.
Using GazeTracking, The eyes are first extracted using the landmarks enclosing the eyes. Then the eye images are converted to grayscale, and a pixel intensity threshold is applied to detect the iris (the black part of the eye). Finally, the center of the iris is computed as the center of the black area.
The character’s gaze is not synchronized. (Since I didn’t find a way to move unity-chan’s eyes)
As for the visualization, the red crosses indicate the iris.
Estimate eye aspect ratio: The eye aspect ratio can be used to detect blinking, but currently I just use auto blinking since this estimation is not so accurate.
Estimate mouth aspect ratio: I use this number to synchronize with the character’s mouth.
The mouth distance is used to detect smile and synchronize with the character.
If you want to customize the virtual character, you can find the unity project in release.