Sat Jul 27 2024

# WebCodecs Performance and Optimization Insights

It's been a year and a half since I open-sourced WebAV (opens new window) and wrote a series of articles to help beginners get started with Web audio and video.

I've always had concerns about potential performance gaps between Web platform and Native APP audio/video processing, given that WebCodecs API is proxied through the browser and requires JavaScript for some data processing. The extent of performance overhead wasn't clear.

I believe readers new to WebCodecs share this concern about its performance capabilities.

Now that WebAV (opens new window) has stabilized and v1.0 is approaching release, I conducted some simple performance tests, and the results are encouraging 😃

Note: WebAV is a WebCodecs-based SDK for creating/editing video files on the Web platform

# Test Environment

Test Resource: bunny_avc_frag.mp4 (fMP4), 1080P, 10min, AVC encoded
Output: 10min, 3Mbps bitrate, 30 FPS
Method: Draw simple text in the video center, composite and export

# Hardware

Device 1: MacBook Pro, M1 (2020), 16 GB
Device 2: AMD Ryzen 7 5800 8-core, RTX 3080, 32 GB
Device 3: Intel i9-9900k, RTX 2070, 32 GB
Device 4: Intel i5-8265U, UHD Graphics 620, 8 GB

# WebCodecs Performance Results

Numbers represent video composition time (in seconds), WebAV performance data is based on latest version below
benchmark

Note 1: Local composition (WebAV, ffmpeg, CapCut APP) heavily depends on hardware configuration; cloud composition (CapCut Web) depends on allocated server resources
Note 2: CapCut APP version 4.1.0, installed via official downloader, shows unusual behavior for unknown reasons

2024.08.12 Update: WebAV v0.14.6 Optimized Data

benckmark-240812

2024.08.16 Update: WebAV v0.14.9 Optimized Data benckmark-240816

Interested readers can compare results or discuss in comments. WebAV performance test code is here (opens new window); other tools require separate installation.

# Summary

WebCodecs leverages hardware acceleration for encoding/decoding. After three optimization rounds, WebAV shows significant performance improvements.
While some performance gaps remain compared to Native solutions on certain devices, I believe performance is no longer a primary consideration in solution selection.
Moving forward, we'll focus on SDK stability and preparing for v1.0 release (opens new window).

Future performance optimizations are possible. To stay updated on WebAV (WebCodecs) performance optimization data, subscribe to this article's comment issue (opens new window).

# Performance Optimization Strategies

Below are my experiences; reader discussion and additions welcome

# Encoding is the Performance Bottleneck

Encoding tasks consume the most computational resources. Don't let other tasks (demuxing, decoding, compositing, etc.) block the encoder.
Prepare frame data in advance to keep the encoder continuously working with an uninterrupted supply of video frames once video composition begins.

However, carefully manage memory/VRAM usage - don't decode too much data in advance.
Keep the encoding queue manageable to avoid VRAM overflow; see VRAM Usage Control below for details.

# Parallel Encoders

Since video encoding is the performance bottleneck, analysis shows significant time spent waiting for encoder output.
Creating multiple encoders can better utilize GPU hardware acceleration. Distribute encoding tasks by GOP across encoders, then assemble outputs in chronological order. This shows notable improvements on some devices.

# Memory Usage Optimization

Audio/video files are typically large. Loading entire files into memory consumes excessive space, causing frequent GC and performance degradation. Very large files may overflow, causing errors or crashes.

Store video files in OPFS (Origin Private File System) and load into memory as needed.
WebAV (opens new window) uses opfs-tools (opens new window) for private origin file operations, significantly reducing memory usage.

# Memory Allocation Optimization

In JavaScript, ArrayBuffer abstracts operable memory data.

Web developers rarely need to consider memory allocation and collection, but audio/video processing frequently operates on large memory blocks. Frequent memory allocation and collection can impact performance.

Here are some APIs relevant to Web audio/video performance optimization:

subarray (opens new window): Reuse allocated memory, reducing new memory allocation. Modifying shared portions affects all sharing objects
resize (opens new window): Dynamically adjust ArrayBuffer size, reducing new memory allocation
transfer (opens new window): Create new ArrayBuffer (larger/smaller) from existing one quickly, but source becomes unusable after transfer
Transferable (opens new window): Objects implementing this interface transfer between threads efficiently, optimizing WebWorker performance
SharedArrayBuffer (opens new window): Direct memory sharing between threads, avoiding data transfer overhead but requiring lock consideration
WebWorker concurrent processing; see my article on JS Concurrency (opens new window)

Note compatibility of resize, transfer, and SharedArrayBuffer activation requirements

# VRAM Usage Control

Video frames (VideoFrame (opens new window)) contain raw image data, consuming significant VRAM. Never hold too many video frame objects simultaneously. Create as needed and close immediately after use to avoid VRAM overflow and severe performance impact.

During decoding, if VideoFrames from VideoDecoder aren't closed promptly, accumulation will stop decoder output.

During encoding, JavaScript-created VideoFrames for VideoEncoder require active developer management of object count.
Monitor VideoEncoder encodeQueueSize (opens new window); pause new frame creation when queue size grows too large, allowing encoder to process queued frames.