React × TypeScript × SVG × MediaPipe — build a lightweight, resolution-independent VTuber entirely in code, no Live2D required.
This guide walks through the full pipeline: landmarks → features → smoothing → SVG controls → OBS capture, with commented code you can adapt.
Demo Video
What This Article Covers
- How to design and implement a VTuber using only SVG, without Live2D
- Why SVG? / Under what conditions does it produce certain visual effects? (with detailed explanations)
- Excerpts of real code (with inline comments)
- Full list of major variables and state
Why SVG (SVGTuber)?

- Resolution-independent: Vectors never blur, even when zoomed in during a stream.
- Consistent stroke width: Use
vector-effect: non-scaling-stroke. - Free & lightweight: No heavy 3D physics or mesh editing.
- Full control with code: Every behavior can be designed manually—great for research and customization.
- Easy streaming setup: Runs in a browser, capture via OBS is enough.
Suitable for:
- People who like bold lines / cartoon-like style, want constant line width
- Those who want to swap colors later (separating lineart and fill)
- Those who find Live2D’s mesh editing heavy or annoying
- Those who want to freely design expressions, conditions, and pseudo-physics directly in code
Recommended Tech Stack
- React 18+ / TypeScript / Vite
- MediaPipe Face Mesh (face landmark detection + iris)
- @mediapipe/camera_utils / drawing_utils (webcam input & debug rendering)
- SVG (all character art is vector)
- Browser APIs:
<video>,<canvas>,requestAnimationFrame, CSS transform
👉 Tip: Set refineLandmarks: true for stable iris detection, improving gaze and blinking quality.
Example: FaceMesh basic options
// If you want proper “eye openness / gaze” detection, refineLandmarks is essential
faceMesh.setOptions({
maxNumFaces: 1,
refineLandmarks: true, // Improves iris precision
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5,
});
Overall Data Flow
Webcam → FaceMesh (landmarks) → Features (blink/gaze/mouth/yaw-pitch-roll) → Smoothing/Clamping → React State → <Character /> props → SVG transform (translate/rotate/mask/shape) → OBS capture
Step 1. Parts Separation & SVG Drawing
- Hair (front / side / back / accessories): complex shapes → draw in vector editor (e.g., Inkscape), import to code
- Face, eyebrows, sclera, pupils, mouth, body, limbs: written directly in code (
ellipse,line,polygon,path) - Masks (
clipPath):- Face interior (lashes, eyelids excluded) clipped by face ellipse
- Pupils clipped by sclera → prevents overflow
Consistent stroke width
/* Keep stroke width constant even when scaling */
.hair-wrap :is(path, ellipse, polygon, rect, circle) {
vector-effect: non-scaling-stroke;
}
⚠️ Note: Scaling path shapes preserves stroke width but distorts shape. For strict consistency, switch shape variants or generate SVG programmatically.
Step 2. Mapping Face Movements (Condition → Effect)
Step 2-1. Blinking (left/right independent, wink support)

Condition: distance between eyelids ÷ eye width (normalized)
Effect: Move upper eyelid down when t=0(open) → 1(closed). Hold gaze during blink to prevent pupil jitter.
Ex.: Blink normalization with hysteresis
// Eye vertical/horizontal ratio → normalized 0..1 (larger = more open)
const vL = Math.abs(L_bottom.y - L_top.y) / (Math.abs(L_outer.x - L_inner.x) + 1e-6);
// Use hysteresis (different thresholds for open/close) → prevents rapid flickering
const applyCloseSnap = (t: number, snapRef: React.MutableRefObject<boolean>) => {
const CLOSE_SNAP_ON = 0.90, CLOSE_SNAP_OFF = 0.85; // key difference
/* ... */
};
Step 2-2. Gaze

Condition: Relative iris center vs. eye center, clamped to [-1..1]
Effect: Move pupil with eyeOffsetX/Y. Hold the previous Y value during deep blinks to reduce jitter.
Ex.: Hold gaze during blink
// During blink, interpolate towards previous Y → prevents jittery pupils
const wHold = smoothstep(0.25, 0.70, tBlink);
eyeLocalY = (1 - wHold) * eyeLocalY + wHold * prevY;
Step 2-3. Mouth Shapes (neutral / V-shape / ▽ / “O”)

Planned: future support for full vowels (a/i/u/e/o) and lip sync.
Conditions:
mRatio(mouth vertical/horizontal) → openness tM- Combine with mouth corner height & narrowness → decide shape
Effects:
- Closed + downturned corners → straight line (neutral) → curve into V with more smile
- Medium open + narrow width + not smiling → round “O”
- Else → closed = V, open = ▽
// Self-calibrate ranges per person using EMA
const mRange = Math.max(1e-5, mouthOpenBase.current! - mouthClosedBase.current!);
let tM = (mRatio - mouthClosedBase.current!) / mRange; // 0=closed, 1=open
Step 2-4. Head Orientation (yaw / pitch / roll)



Condition:
- Yaw: nose tip vs. temple midpoint (left-right turn)
- Pitch: nose tip vs. temple midpoint (up-down nod)
- Roll: temple-to-temple angle (head tilt)
Small noise ignored via dead zone, max angles clamped.
2D emphasis for effect:
- Face parts (eyes, nose, mouth, glasses): mostly translations
- Nose 1.15×, mouth 0.9× movement → fake depth
- Side hair: slightly squeezed toward facing side (better via shape switching)
- Back hair: delayed opposite to head direction
- Roll: rotate ellipse around center
const kx = 0.20, ky = 0.25; // gain → px; ±25° yaw / ±20° pitch = ±5px
setFaceX((p) => smooth(p, yaw * kx));
setFaceY((p) => smooth(p, pitch * ky));
Step 3. Hair Pseudo-Physics (spring & damping)

Design: Use head angular/linear velocity → lag hair movement.
- Dead zone for micro jitter
- Clamp amplitude for stability
// Simple smoothing
const smooth = (prev: number, next: number, a = 0.25) => prev + (next - prev) * a;
// Clamp range
const clamp = (v: number, lo = -1, hi = 1) => Math.max(lo, Math.min(hi, v));
Step 4. Breathing (torso up/down + shoulder rotation)

Condition: Triangular wave cycle (inhale/exhale), independent of FaceMesh.
Effect:
- Torso, arms move up/down
- On exhale (torso down), arms slightly open at shoulders
const shoulderMin = 2; // min 2°
const shoulderRange = shoulderMaxDeg - shoulderMin;
setShoulderDeg(-(shoulderMin + open01 * shoulderRange));
Step 5. Upper Body Tilt (based on on-screen face position)

Condition: Screen X offset + yaw angle
Effect: Tilt torso ±8° for natural weight shift
// ~dx=±0.10 → ±8°
const POS_GAIN_DEG = 80;
const tiltFromPos = clampDeg(dxScreen * POS_GAIN_DEG, -8, 8);
Step 6. Stabilization Methods (clamp / EMA / dead zone / hysteresis)
- Clamp: limit extremes
- EMA/smooth/lerp: smooth following
- Dead zone: ignore tiny movements
- Hysteresis: different thresholds → prevents flicker
const DEAD_ZONE = 0.3;
const wFollow = clamp((mag - DEAD_ZONE) / (5 - DEAD_ZONE), 0, 1); // ignore small motion
Step 7. UI and OBS



- Show webcam feed while developing (see landmarks)
- Grid overlay helps align SVG positions
- Green background for easy OBS chroma key
Example: Green screen toggle
<input type="checkbox" checked={showGreenBg}
onChange={e => setShowGreenBg(e.target.checked)} />
Integration into Streaming
- Run avatar in browser, capture with OBS
- Turn on green screen → apply chroma key in OBS → overlay on background
Related Work: Pose Animator vs. SVGTuber
- Pose Animator: Open-source, TensorFlow.js + MediaPipe FaceMesh + PoseNet. Bone-based, accurate motion reproduction.
- SVGTuber: Focused on simple 2D style, SVG code-based drawing. No bone setup, just Live2D-like essentials (blinking, mouth, hair delay).
This concludes the introduction to creating a VTuber using only SVG!

Appendix (Code Digest)
App.tsx
Central component that analyzes facial motion from the webcam, converts it into features, and maps those to React state passed down to Character.tsx.
// App.tsx (excerpt with inline comments)
export default function App() {
// ▼ Refs for webcam and debug canvas (mirrored preview)
const videoRef = useRef<HTMLVideoElement>(null);
const canvasRef = useRef<HTMLCanvasElement>(null);
// ▼ Hysteresis flags for blinking (separate thresholds for opening/closing)
const snapOpenLRef = useRef(false); // Left eye: opening side
const snapOpenRRef = useRef(false); // Right eye: opening side
const snapCloseLRef = useRef(false); // Left eye: closing side
const snapCloseRRef = useRef(false); // Right eye: closing side
// ▼ Display toggles (debug and OBS green screen)
const [showCamera, setShowCamera] = useState(false);
const [showGrid, setShowGrid] = useState(false);
const [showGreenBg, setShowGreenBg] = useState(true);
// ▼ Character scale (helps composition in OBS)
const [charScale, setCharScale] = useState(1.25);
// ▼ Translations in px: pupils (eyeX/Y) and common face parts (faceX/Y)
const [eyeX, setEyeX] = useState(0);
const [eyeY, setEyeY] = useState(0);
const [faceX, setFaceX] = useState(0);
const [faceY, setFaceY] = useState(0);
// ▼ Blink amount per eye (0=open, 1=closed)
const [blinkLeft, setBlinkLeft] = useState(0);
const [blinkRight, setBlinkRight] = useState(0);
// ▼ Smile intensity (0..1) — used e.g. to lift lower eyelids
const [smile01, setSmile01] = useState(0);
// ▼ Previous-frame yaw/pitch (to estimate "head speed" via deltas)
const prevYawRawRef = useRef(0);
const prevPitchRawRef = useRef(0);
// ▼ Mouth features (open / narrowness / frown)
const [mouthOpen, setMouthOpen] = useState(0); // 0..1
const [mouthNarrow01, setMouthNarrow01] = useState(0); // narrowness
const [mouthFrown01, setMouthFrown01] = useState(0); // downward corners
// ▼ Face roll (negative = left tilt / positive = right tilt)
const [faceRotDeg, setFaceRotDeg] = useState(0);
const prevFaceRotRef = useRef(0); // “hold” during blink
// ▼ Upper-body tilt (±8°) — computed from on-screen face X + yaw
const [torsoTiltDeg, setTorsoTiltDeg] = useState(0);
const headZeroXRef = useRef<number | null>(null); // zero point for screen X (self-calibration)
// ▼ Debug angles: yaw/pitch
const [faceYawDeg, setFaceYawDeg] = useState(0);
const [facePitchDeg, setFacePitchDeg] = useState(0);
// ===== Utilities (smoothing & limiting) =====
const smooth = (p: number, n: number, a = 0.25) => p + (n - p) * a; // smooth follow
const clamp = (v: number, lo = -1, hi = 1) => Math.max(lo, Math.min(hi, v)); // range limit
const clampDeg = (v: number, lo = -15, hi = 15) => Math.max(lo, Math.min(hi, v));
const avg = (pts: {x:number;y:number;z?:number}[]) => ({
x: pts.reduce((s,p)=>s+p.x,0)/pts.length,
y: pts.reduce((s,p)=>s+p.y,0)/pts.length,
z: pts.reduce((s,p)=>s+(p.z??0),0)/pts.length,
});
const smoothstep = (e0:number, e1:number, x:number) => {
const t = Math.max(0, Math.min(1, (x - e0)/(e1 - e0))); return t*t*(3-2*t);
};
// ===== Self-calibration (update per-person baselines for open/closed) =====
const openBaseL = useRef<number | null>(null);
const closedBaseL = useRef<number | null>(null);
const openBaseR = useRef<number | null>(null);
const closedBaseR = useRef<number | null>(null);
const mouthOpenBase = useRef<number | null>(null);
const mouthClosedBase = useRef<number | null>(null);
// ===== Keyboard shortcuts (Z/X/C/V to reset zeros/baselines) =====
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
const k = e.key.toLowerCase();
if (k === "z") /* set current roll as new zero */;
if (k === "x") /* re-capture yaw zero */;
if (k === "c") /* re-capture pitch zero */;
if (k === "v") headZeroXRef.current = null; // re-set on-screen X zero
};
window.addEventListener("keydown", onKey);
return () => window.removeEventListener("keydown", onKey);
}, []);
// ===== Receive FaceMesh results → extract features → update React state =====
useEffect(() => {
const faceMesh = new FaceMesh({ /* load assets from CDN */ });
faceMesh.setOptions({
maxNumFaces: 1,
refineLandmarks: true,
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5
});
faceMesh.onResults((results) => {
// 1) Draw to debug canvas (mirrored; matches intuitive left/right)
// 2) From landmarks: compute blink / gaze / mouth / yaw / pitch / roll
// 3) Stabilize via dead zones, clamp, smoothing, and hysteresis
// 4) setState(): update props for <Character />
/* ...omitted here; see article for details... */
});
// Start camera
let camera: Camera | null = null;
if (videoRef.current) {
camera = new Camera(videoRef.current, {
onFrame: async () => { await faceMesh.send({ image: videoRef.current! }); },
width: 640, height: 480,
});
camera.start();
}
return () => {
try { (camera as any)?.stop?.() } catch {}
try { (faceMesh as any)?.close?.() } catch {}
};
}, []);
}
components/Character.tsx
A presentational component that draws the character in SVG based on state values passed from App.tsx. The key idea is “which prop moves which visual element.”
// Character.tsx (excerpt with inline comments)
type CharacterProps = {
// Local pupil movement (gaze), expect ±4px
eyeOffsetX?: number;
eyeOffsetY?: number;
// Common translation for face parts (derived from yaw/pitch), expect ±5px
faceOffsetX?: number;
faceOffsetY?: number;
// Blink amount per eye (0=open, 1=closed)
blinkLeft?: number;
blinkRight?: number;
// Mouth openness (0..1)
mouthOpen?: number;
// “Eye smile” (lift lower eyelids), 0..1
eyeSmile?: number;
// Glasses appearance (keep stroke width via non-scaling-stroke)
showGlasses?: boolean;
glassesStrokeWidth?: number;
glassesLensOpacity?: number;
// Hair pseudo-physics parameters (spring k, damping c, follow amounts)
hairPhysics?: boolean;
hairStiffness?: number;
hairDamping?: number;
// Face rotation (roll)
faceRotateDeg?: number;
// Upper-body tilt (±8°)
torsoTiltDeg?: number;
};
// Core for constant stroke width: vector effect (stroke doesn't scale)
<style>{`
.hair-wrap :is(path,ellipse,polygon,rect,circle){
fill: currentColor !important;
}
.hair-wrap [fill="none"]{
fill: none !important; stroke: currentColor; stroke-width: 1.2px;
vector-effect: non-scaling-stroke;
}
`}</style>
// Clip the pupils so they only appear within the sclera
<clipPath id={clipIdLeft}>
<path d={`M ... Z`} />
</clipPath>
<ellipse
cx={X_EYE_LEFT + dx + px} cy={Y_EYE_CENTER + dy + py}
rx={EYE_WIDTH/2} ry={EYE_HEIGHT/2}
fill={COLOR_EYE} stroke="black" strokeWidth={1}
clipPath={`url(#${clipIdLeft})`}
/>
// Glasses: keep stroke width fixed via vectorEffect
<path
d={lensUPath(Lx, Ly)}
fill="none"
stroke={glassesStroke}
strokeWidth={glassesStrokeWidth}
vectorEffect="non-scaling-stroke"
/>
// Upper-body tilt (rotation pivot Y is near lower torso for natural motion)
const torsoDeg = clamp(torsoTiltDeg, -8, 8);
const torsoPivot = (torsoPivotY ?? Y_BODY_BOTTOM);
const upperMotion = `rotate(${torsoDeg}, ${X_FACE_CENTER}, ${torsoPivot})`;
<g transform={upperMotion}>
{/* torso, neck, arms, hair, etc. */}
</g>
Variables & State Overview
App.tsx (Inputs → Features → State)
- Refs:
videoRef,canvasRef— webcam & debug canvas - UI:
showCamera,showGrid,showGreenBg,charScale - Gaze & face:
eyeX,eyeY,faceX,faceY,faceRotDeg - Blink:
blinkLeft,blinkRight(0..1, per eye) - Mouth:
mouthOpen(0..1),mouthNarrow01,mouthFrown01 - Smile:
smile01(0..1; raises lower eyelids) - Debug angles:
faceYawDeg,facePitchDeg - Upper body:
torsoTiltDeg(±8°) - Self-calibration:
openBase*/closedBase*(eyes),mouth*Base(mouth)
→ update per person and distance
Character.tsx (Props that drive the look)
- Gaze:
eyeOffsetX,eyeOffsetY - Face parts:
faceOffsetX,faceOffsetY,faceRotateDeg - Blink:
blinkLeft,blinkRight - Mouth:
mouthOpen+mouthNarrow01/mouthFrown01(used in shape switching) - Eye smile:
eyeSmile(lift lower eyelids) - Hair physics:
hairPhysics,hairStiffness,hairDamping, plus follow amounts - Upper body:
torsoTiltDeg - Accessories:
showGlasses,glasses*(keep stroke width withvector-effect)
Also read: My 2025 Dev Journey (React Native → Flutter → AI Q-Learning → Supabase)