World’s First?! How to Build a VTuber Without Live2D: “SVGTuber”

React × TypeScript × SVG × MediaPipe — build a lightweight, resolution-independent VTuber entirely in code, no Live2D required.

This guide walks through the full pipeline: landmarks → features → smoothing → SVG controls → OBS capture, with commented code you can adapt.


Demo Video

Demo Video

What This Article Covers

  • How to design and implement a VTuber using only SVG, without Live2D
  • Why SVG? / Under what conditions does it produce certain visual effects? (with detailed explanations)
  • Excerpts of real code (with inline comments)
  • Full list of major variables and state

Why SVG (SVGTuber)?

Vectors never blur!
Vectors never blur!
  • Resolution-independent: Vectors never blur, even when zoomed in during a stream.
  • Consistent stroke width: Use vector-effect: non-scaling-stroke.
  • Free & lightweight: No heavy 3D physics or mesh editing.
  • Full control with code: Every behavior can be designed manually—great for research and customization.
  • Easy streaming setup: Runs in a browser, capture via OBS is enough.

Suitable for:

  • People who like bold lines / cartoon-like style, want constant line width
  • Those who want to swap colors later (separating lineart and fill)
  • Those who find Live2D’s mesh editing heavy or annoying
  • Those who want to freely design expressions, conditions, and pseudo-physics directly in code

Recommended Tech Stack

  • React 18+ / TypeScript / Vite
  • MediaPipe Face Mesh (face landmark detection + iris)
  • @mediapipe/camera_utils / drawing_utils (webcam input & debug rendering)
  • SVG (all character art is vector)
  • Browser APIs: <video>, <canvas>, requestAnimationFrame, CSS transform

👉 Tip: Set refineLandmarks: true for stable iris detection, improving gaze and blinking quality.

Example: FaceMesh basic options

// If you want proper “eye openness / gaze” detection, refineLandmarks is essential
faceMesh.setOptions({
  maxNumFaces: 1,
  refineLandmarks: true,       // Improves iris precision
  minDetectionConfidence: 0.5,
  minTrackingConfidence: 0.5,
});

Overall Data Flow

WebcamFaceMesh (landmarks) → Features (blink/gaze/mouth/yaw-pitch-roll) → Smoothing/ClampingReact State<Character /> propsSVG transform (translate/rotate/mask/shape) → OBS capture


Step 1. Parts Separation & SVG Drawing

  • Hair (front / side / back / accessories): complex shapes → draw in vector editor (e.g., Inkscape), import to code
  • Face, eyebrows, sclera, pupils, mouth, body, limbs: written directly in code (ellipse, line, polygon, path)
  • Masks (clipPath):
    • Face interior (lashes, eyelids excluded) clipped by face ellipse
    • Pupils clipped by sclera → prevents overflow

Consistent stroke width

/* Keep stroke width constant even when scaling */
.hair-wrap :is(path, ellipse, polygon, rect, circle) {
  vector-effect: non-scaling-stroke;
}

⚠️ Note: Scaling path shapes preserves stroke width but distorts shape. For strict consistency, switch shape variants or generate SVG programmatically.


Step 2. Mapping Face Movements (Condition → Effect)

Step 2-1. Blinking (left/right independent, wink support)

Blinking
Blinking: the basic of VTubers

Condition: distance between eyelids ÷ eye width (normalized)
Effect: Move upper eyelid down when t=0(open) → 1(closed). Hold gaze during blink to prevent pupil jitter.

Ex.: Blink normalization with hysteresis

// Eye vertical/horizontal ratio → normalized 0..1 (larger = more open)
const vL = Math.abs(L_bottom.y - L_top.y) / (Math.abs(L_outer.x - L_inner.x) + 1e-6);

// Use hysteresis (different thresholds for open/close) → prevents rapid flickering
const applyCloseSnap = (t: number, snapRef: React.MutableRefObject<boolean>) => {
  const CLOSE_SNAP_ON = 0.90, CLOSE_SNAP_OFF = 0.85; // key difference
  /* ... */
};

Step 2-2. Gaze

The pupil is clipped to the white of the eye
The pupil is clipped to the white of the eye

Condition: Relative iris center vs. eye center, clamped to [-1..1]
Effect: Move pupil with eyeOffsetX/Y. Hold the previous Y value during deep blinks to reduce jitter.

Ex.: Hold gaze during blink

// During blink, interpolate towards previous Y → prevents jittery pupils
const wHold = smoothstep(0.25, 0.70, tBlink);
eyeLocalY = (1 - wHold) * eyeLocalY + wHold * prevY;

Step 2-3. Mouth Shapes (neutral / V-shape / ▽ / “O”)

In the future, I would like to try out "aiueo" (Japanese alphabet) and lip-syncing.
In the future, I would like to try out “aiueo” and lip-syncing.

Planned: future support for full vowels (a/i/u/e/o) and lip sync.

Conditions:

  • mRatio (mouth vertical/horizontal) → openness tM
  • Combine with mouth corner height & narrowness → decide shape

Effects:

  • Closed + downturned corners → straight line (neutral) → curve into V with more smile
  • Medium open + narrow width + not smiling → round “O”
  • Else → closed = V, open = ▽
// Self-calibrate ranges per person using EMA
const mRange = Math.max(1e-5, mouthOpenBase.current! - mouthClosedBase.current!);
let tM = (mRatio - mouthClosedBase.current!) / mRange; // 0=closed, 1=open

Step 2-4. Head Orientation (yaw / pitch / roll)

Yaw (left and right swing angle). Also pay attention to the movement of the hair.
Pitch (up and down nod angle)
Pitch (up and down nod angle)
Roll (tilting of the head)
Roll (tilting of the head)

Condition:

  • Yaw: nose tip vs. temple midpoint (left-right turn)
  • Pitch: nose tip vs. temple midpoint (up-down nod)
  • Roll: temple-to-temple angle (head tilt)

Small noise ignored via dead zone, max angles clamped.

2D emphasis for effect:

  • Face parts (eyes, nose, mouth, glasses): mostly translations
  • Nose 1.15×, mouth 0.9× movement → fake depth
  • Side hair: slightly squeezed toward facing side (better via shape switching)
  • Back hair: delayed opposite to head direction
  • Roll: rotate ellipse around center
const kx = 0.20, ky = 0.25; // gain → px; ±25° yaw / ±20° pitch = ±5px
setFaceX((p) => smooth(p, yaw * kx));
setFaceY((p) => smooth(p, pitch * ky));

Step 3. Hair Pseudo-Physics (spring & damping)

When you move your face, your hair will follow after a short delay.
When you move your face, your hair will follow after a short delay.

Design: Use head angular/linear velocity → lag hair movement.

  • Dead zone for micro jitter
  • Clamp amplitude for stability
// Simple smoothing
const smooth = (prev: number, next: number, a = 0.25) => prev + (next - prev) * a;
// Clamp range
const clamp = (v: number, lo = -1, hi = 1) => Math.max(lo, Math.min(hi, v));

Step 4. Breathing (torso up/down + shoulder rotation)

Not only can it move up and down, but the arms can also be opened and closed.
Not only can it move up and down, but the arms can also be opened and closed.

Condition: Triangular wave cycle (inhale/exhale), independent of FaceMesh.
Effect:

  • Torso, arms move up/down
  • On exhale (torso down), arms slightly open at shoulders
const shoulderMin = 2; // min 2°
const shoulderRange = shoulderMaxDeg - shoulderMin;
setShoulderDeg(-(shoulderMin + open01 * shoulderRange));

Step 5. Upper Body Tilt (based on on-screen face position)

Happy♪
Happy♪

Condition: Screen X offset + yaw angle
Effect: Tilt torso ±8° for natural weight shift

// ~dx=±0.10 → ±8°
const POS_GAIN_DEG = 80;
const tiltFromPos = clampDeg(dxScreen * POS_GAIN_DEG, -8, 8);

Step 6. Stabilization Methods (clamp / EMA / dead zone / hysteresis)

  • Clamp: limit extremes
  • EMA/smooth/lerp: smooth following
  • Dead zone: ignore tiny movements
  • Hysteresis: different thresholds → prevents flicker
const DEAD_ZONE = 0.3;
const wFollow = clamp((mag - DEAD_ZONE) / (5 - DEAD_ZONE), 0, 1); // ignore small motion

Step 7. UI and OBS

During development, I worked while displaying the camera image (my face covered in mesh is reflected in the image lol).
During development, I worked while displaying the camera image (my face covered in mesh is reflected in the image lol).
Using a grid is useful for specifying positioning when drawing SVG.
Using a grid is useful for specifying positioning when drawing SVG.
Completed an app that can be used directly for streaming with a green screen!
Completed an app that can be used directly for streaming with a green screen!
  • Show webcam feed while developing (see landmarks)
  • Grid overlay helps align SVG positions
  • Green background for easy OBS chroma key

Example: Green screen toggle

<input type="checkbox" checked={showGreenBg}
       onChange={e => setShowGreenBg(e.target.checked)} />

Integration into Streaming

  • Run avatar in browser, capture with OBS
  • Turn on green screen → apply chroma key in OBS → overlay on background

Related Work: Pose Animator vs. SVGTuber

  • Pose Animator: Open-source, TensorFlow.js + MediaPipe FaceMesh + PoseNet. Bone-based, accurate motion reproduction.
  • SVGTuber: Focused on simple 2D style, SVG code-based drawing. No bone setup, just Live2D-like essentials (blinking, mouth, hair delay).

This concludes the introduction to creating a VTuber using only SVG!

When your corners of the mouth rise, the lower eyelids rise and you smile slightly.
When your corners of the mouth rise, the lower eyelids rise and your model smile slightly.

Appendix (Code Digest)

App.tsx

Central component that analyzes facial motion from the webcam, converts it into features, and maps those to React state passed down to Character.tsx.

// App.tsx (excerpt with inline comments)
export default function App() {
  // ▼ Refs for webcam and debug canvas (mirrored preview)
  const videoRef = useRef<HTMLVideoElement>(null);
  const canvasRef = useRef<HTMLCanvasElement>(null);

  // ▼ Hysteresis flags for blinking (separate thresholds for opening/closing)
  const snapOpenLRef  = useRef(false); // Left eye: opening side
  const snapOpenRRef  = useRef(false); // Right eye: opening side
  const snapCloseLRef = useRef(false); // Left eye: closing side
  const snapCloseRRef = useRef(false); // Right eye: closing side

  // ▼ Display toggles (debug and OBS green screen)
  const [showCamera,  setShowCamera]  = useState(false);
  const [showGrid,    setShowGrid]    = useState(false);
  const [showGreenBg, setShowGreenBg] = useState(true);

  // ▼ Character scale (helps composition in OBS)
  const [charScale, setCharScale] = useState(1.25);

  // ▼ Translations in px: pupils (eyeX/Y) and common face parts (faceX/Y)
  const [eyeX,  setEyeX]  = useState(0);
  const [eyeY,  setEyeY]  = useState(0);
  const [faceX, setFaceX] = useState(0);
  const [faceY, setFaceY] = useState(0);

  // ▼ Blink amount per eye (0=open, 1=closed)
  const [blinkLeft,  setBlinkLeft]  = useState(0);
  const [blinkRight, setBlinkRight] = useState(0);

  // ▼ Smile intensity (0..1) — used e.g. to lift lower eyelids
  const [smile01, setSmile01] = useState(0);

  // ▼ Previous-frame yaw/pitch (to estimate "head speed" via deltas)
  const prevYawRawRef   = useRef(0);
  const prevPitchRawRef = useRef(0);

  // ▼ Mouth features (open / narrowness / frown)
  const [mouthOpen,     setMouthOpen]     = useState(0); // 0..1
  const [mouthNarrow01, setMouthNarrow01] = useState(0); // narrowness
  const [mouthFrown01,  setMouthFrown01]  = useState(0); // downward corners

  // ▼ Face roll (negative = left tilt / positive = right tilt)
  const [faceRotDeg, setFaceRotDeg] = useState(0);
  const prevFaceRotRef = useRef(0); // “hold” during blink

  // ▼ Upper-body tilt (±8°) — computed from on-screen face X + yaw
  const [torsoTiltDeg, setTorsoTiltDeg] = useState(0);
  const headZeroXRef = useRef<number | null>(null); // zero point for screen X (self-calibration)

  // ▼ Debug angles: yaw/pitch
  const [faceYawDeg,   setFaceYawDeg]   = useState(0);
  const [facePitchDeg, setFacePitchDeg] = useState(0);

  // ===== Utilities (smoothing & limiting) =====
  const smooth   = (p: number, n: number, a = 0.25) => p + (n - p) * a; // smooth follow
  const clamp    = (v: number, lo = -1, hi = 1) => Math.max(lo, Math.min(hi, v)); // range limit
  const clampDeg = (v: number, lo = -15, hi = 15) => Math.max(lo, Math.min(hi, v));
  const avg      = (pts: {x:number;y:number;z?:number}[]) => ({
    x: pts.reduce((s,p)=>s+p.x,0)/pts.length,
    y: pts.reduce((s,p)=>s+p.y,0)/pts.length,
    z: pts.reduce((s,p)=>s+(p.z??0),0)/pts.length,
  });
  const smoothstep = (e0:number, e1:number, x:number) => {
    const t = Math.max(0, Math.min(1, (x - e0)/(e1 - e0))); return t*t*(3-2*t);
  };

  // ===== Self-calibration (update per-person baselines for open/closed) =====
  const openBaseL   = useRef<number | null>(null);
  const closedBaseL = useRef<number | null>(null);
  const openBaseR   = useRef<number | null>(null);
  const closedBaseR = useRef<number | null>(null);
  const mouthOpenBase   = useRef<number | null>(null);
  const mouthClosedBase = useRef<number | null>(null);

  // ===== Keyboard shortcuts (Z/X/C/V to reset zeros/baselines) =====
  useEffect(() => {
    const onKey = (e: KeyboardEvent) => {
      const k = e.key.toLowerCase();
      if (k === "z") /* set current roll as new zero */;
      if (k === "x") /* re-capture yaw zero */;
      if (k === "c") /* re-capture pitch zero */;
      if (k === "v") headZeroXRef.current = null; // re-set on-screen X zero
    };
    window.addEventListener("keydown", onKey);
    return () => window.removeEventListener("keydown", onKey);
  }, []);

  // ===== Receive FaceMesh results → extract features → update React state =====
  useEffect(() => {
    const faceMesh = new FaceMesh({ /* load assets from CDN */ });
    faceMesh.setOptions({
      maxNumFaces: 1,
      refineLandmarks: true,
      minDetectionConfidence: 0.5,
      minTrackingConfidence: 0.5
    });

    faceMesh.onResults((results) => {
      // 1) Draw to debug canvas (mirrored; matches intuitive left/right)
      // 2) From landmarks: compute blink / gaze / mouth / yaw / pitch / roll
      // 3) Stabilize via dead zones, clamp, smoothing, and hysteresis
      // 4) setState(): update props for <Character />
      /* ...omitted here; see article for details... */
    });

    // Start camera
    let camera: Camera | null = null;
    if (videoRef.current) {
      camera = new Camera(videoRef.current, {
        onFrame: async () => { await faceMesh.send({ image: videoRef.current! }); },
        width: 640, height: 480,
      });
      camera.start();
    }
    return () => {
      try { (camera as any)?.stop?.() } catch {}
      try { (faceMesh as any)?.close?.() } catch {}
    };
  }, []);
}

components/Character.tsx

A presentational component that draws the character in SVG based on state values passed from App.tsx. The key idea is “which prop moves which visual element.”

// Character.tsx (excerpt with inline comments)
type CharacterProps = {
  // Local pupil movement (gaze), expect ±4px
  eyeOffsetX?: number;
  eyeOffsetY?: number;

  // Common translation for face parts (derived from yaw/pitch), expect ±5px
  faceOffsetX?: number;
  faceOffsetY?: number;

  // Blink amount per eye (0=open, 1=closed)
  blinkLeft?: number;
  blinkRight?: number;

  // Mouth openness (0..1)
  mouthOpen?: number;

  // “Eye smile” (lift lower eyelids), 0..1
  eyeSmile?: number;

  // Glasses appearance (keep stroke width via non-scaling-stroke)
  showGlasses?: boolean;
  glassesStrokeWidth?: number;
  glassesLensOpacity?: number;

  // Hair pseudo-physics parameters (spring k, damping c, follow amounts)
  hairPhysics?: boolean;
  hairStiffness?: number;
  hairDamping?: number;

  // Face rotation (roll)
  faceRotateDeg?: number;

  // Upper-body tilt (±8°)
  torsoTiltDeg?: number;
};

// Core for constant stroke width: vector effect (stroke doesn't scale)
<style>{`
  .hair-wrap :is(path,ellipse,polygon,rect,circle){
    fill: currentColor !important;
  }
  .hair-wrap [fill="none"]{
    fill: none !important; stroke: currentColor; stroke-width: 1.2px;
    vector-effect: non-scaling-stroke;
  }
`}</style>

// Clip the pupils so they only appear within the sclera
<clipPath id={clipIdLeft}>
  <path d={`M ... Z`} />
</clipPath>

<ellipse
  cx={X_EYE_LEFT + dx + px} cy={Y_EYE_CENTER + dy + py}
  rx={EYE_WIDTH/2} ry={EYE_HEIGHT/2}
  fill={COLOR_EYE} stroke="black" strokeWidth={1}
  clipPath={`url(#${clipIdLeft})`}
/>

// Glasses: keep stroke width fixed via vectorEffect
<path
  d={lensUPath(Lx, Ly)}
  fill="none"
  stroke={glassesStroke}
  strokeWidth={glassesStrokeWidth}
  vectorEffect="non-scaling-stroke"
/>

// Upper-body tilt (rotation pivot Y is near lower torso for natural motion)
const torsoDeg = clamp(torsoTiltDeg, -8, 8);
const torsoPivot = (torsoPivotY ?? Y_BODY_BOTTOM);
const upperMotion = `rotate(${torsoDeg}, ${X_FACE_CENTER}, ${torsoPivot})`;

<g transform={upperMotion}>
  {/* torso, neck, arms, hair, etc. */}
</g>

Variables & State Overview

App.tsx (Inputs → Features → State)

  • Refs: videoRef, canvasRef — webcam & debug canvas
  • UI: showCamera, showGrid, showGreenBg, charScale
  • Gaze & face: eyeX, eyeY, faceX, faceY, faceRotDeg
  • Blink: blinkLeft, blinkRight (0..1, per eye)
  • Mouth: mouthOpen (0..1), mouthNarrow01, mouthFrown01
  • Smile: smile01 (0..1; raises lower eyelids)
  • Debug angles: faceYawDeg, facePitchDeg
  • Upper body: torsoTiltDeg (±8°)
  • Self-calibration: openBase* / closedBase* (eyes), mouth*Base (mouth)
    → update per person and distance

Character.tsx (Props that drive the look)

  • Gaze: eyeOffsetX, eyeOffsetY
  • Face parts: faceOffsetX, faceOffsetY, faceRotateDeg
  • Blink: blinkLeft, blinkRight
  • Mouth: mouthOpen + mouthNarrow01 / mouthFrown01 (used in shape switching)
  • Eye smile: eyeSmile (lift lower eyelids)
  • Hair physics: hairPhysics, hairStiffness, hairDamping, plus follow amounts
  • Upper body: torsoTiltDeg
  • Accessories: showGlasses, glasses* (keep stroke width with vector-effect)

Also read: My 2025 Dev Journey (React Native → Flutter → AI Q-Learning → Supabase)