Wed Jul 19 2023

# Web Audio & Video (1) Fundamentals

Web Audio & Video Series Table of Contents

Before diving into the subsequent articles or starting to use WebAV for audio and video processing, it's essential to understand some background knowledge.

This article provides a brief introduction to the fundamental concepts of audio and video, as well as the core APIs of WebCodecs.

# Video Structure

A video file can be thought of as a container that holds metadata and encoded data (compressed audio or video);
Different container formats have various distinctions in how they organize and manage metadata and encoded data.
Video Structure

# Codec Formats

The primary purpose of encoding is compression. Different codec formats represent different compression algorithms;
This is necessary because raw sampled data (images, audio) is too large to store or transmit without compression.

Different codec formats vary in their compression ratio, compatibility, and complexity;
Generally, newer formats offer higher compression rates but come with lower compatibility and higher complexity;
Different business scenarios (VOD, live streaming, video conferencing) require different trade-offs between these three factors.

Common Video Codecs

H264 (AVC), 2003
H265 (HEVC), 2013
AV1, 2015

Common Audio Codecs

MP3, 1991
AAC, 2000
Opus, 2012

# Container (Muxing) Formats

Encoded data is compressed raw data that requires metadata for proper parsing and playback;
Common metadata includes: timing information, codec format, resolution, bitrate, and more.

MP4 is the most common and best-supported video format on the Web platform, which is why our subsequent example programs will work with MP4 files.

MP4 containing AVC (video codec) and AAC (audio codec) offers the best compatibility

Other Common Formats

FLV: flv.js primarily works by remuxing FLV to fMP4 (opens new window), enabling browsers to play FLV format videos
WebM: free format, output by MediaRecorder (opens new window)

# WebCodecs Core APIs

Audio & Video Workflow
As shown above, WebCodecs operates at the encoding/decoding stage, not involving muxing or demuxing

The relationship between diagram nodes and APIs:
Video

Raw image data: VideoFrame (opens new window)
Image encoder: VideoEncoder (opens new window)
Compressed image data: EncodedVideoChunk (opens new window)
Image decoder: VideoDecoder (opens new window)

Data transformation flow:
VideoFrame -> VideoEncoder => EncodedVideoChunk -> VideoDecoder => VideoFrame

Audio

Raw audio data: AudioData (opens new window)
Audio encoder: AudioEncoder (opens new window)
Compressed audio data: EncodedAudioChunk (opens new window)
Audio decoder: AudioDecoder (opens new window)

*Audio data transformation is symmetrical to video*

This symmetry between encoding/decoding and audio/video makes WebCodecs easier to understand and master, which is one of its design goals.

Symmetry: have similar patterns for encoding and decoding

# WebCodecs API Considerations

Here are some common pitfalls that newcomers should be aware of:

VideoFrames can consume significant GPU memory; close them promptly to avoid performance impact
VideoDecoder maintains a queue; its output VideoFrames must be closed timely, or it will pause outputting new VideoFrames
Regularly check encodeQueueSize (opens new window); if the encoder can't keep up, pause producing new VideoFrames
Encoders and decoders need to be explicitly closed (e.g., VideoEncoder.close (opens new window)) after use, or they might block other codec operations

# Appendix

WebAV (opens new window) Audio & video processing SDK built on WebCodecs
VideoFrame (opens new window), AudioData (opens new window)
VideoEncoder (opens new window), VideoDecoder (opens new window)
AudioEncoder (opens new window), AudioDecoder (opens new window)
EncodedVideoChunk (opens new window), EncodedAudioChunk (opens new window)