Back to projects
Apr 21, 2026
4 min read

ESP32 RTP Audio Streamer

ESP32-based RTP audio streamer that captures line-in audio via ES8388 codec and streams it over Wi-Fi UDP to any RTP-capable player.

ESP32 RTP Audio Streamer

Streams line-in audio from an ESP32 AudioKit (ES8388 codec) over Wi-Fi using RTP/UDP. Any RTP-capable player (VLC, ffplay) can receive the stream with no format parameters — the codec is self-describing via RTP payload type 11 (L16 mono 44100 Hz, RFC 3551).

Features

  • Low-latency UDP streaming (no TCP buffering)
  • RTP encapsulation — VLC auto-detects format, no parameters needed
  • Configurable via menuconfig (SSID, port, sample rate, etc.)
  • Targets the DHCP gateway automatically (ideal for mobile hotspot use)
  • Optional fixed destination IP via menuconfig

Hardware

Tested on AI Thinker ESP32 AudioKit v2.2 with ES8388 codec.
Board selected in ADF: LyraT v4.3 (closest compatible profile).

AudioKit v2.2 pin workaround

The AudioKit v2.2 uses different I2C and I2S pins than the stock LyraT v4.3 profile. Edit the following file in ESP-ADF before building:

$ADF_PATH/components/audio_board/lyrat_v4_3/board_pins_config.c

Apply these changes:

FunctionLyraT defaultAudioKit v2.2
I2C SDAGPIO 18GPIO 33
I2C SCLGPIO 23GPIO 32
I2S BCKGPIO 5GPIO 27
/* get_i2c_pins() */
i2c_config->sda_io_num = GPIO_NUM_33;   // was 18
i2c_config->scl_io_num = GPIO_NUM_32;   // was 23

/* get_i2s_pins() */
i2s_config->bck_io_num = GPIO_NUM_27;   // was 5

Keep a backup of the original file before editing — this change affects all projects that use the LyraT v4.3 board profile.

Line-in wiring

The AudioKit v2.2 exposes two line-in jacks wired to LINPUT2 / RINPUT2 on the ES8388. The onboard MEMS microphone shares the same analog path; setting PGA gain to 0 dB makes it nearly inaudible in practice.

Requirements

  • ESP-IDF (tested with v4.4+)
  • ESP-ADF
  • Environment variables IDF_PATH and ADF_PATH set correctly

Build & Flash

idf.py menuconfig   # set Wi-Fi credentials and optional parameters
idf.py build
idf.py flash monitor

Configuration (menuconfig)

MenuKeyDefaultDescription
Network ConfigurationWIFI_SSIDmyssidWi-Fi network name
Network ConfigurationWIFI_PASSWORDmyssidWi-Fi password
Network ConfigurationCLIENT_IP(empty)Fixed destination IP. If empty, the DHCP gateway is used (= phone on hotspot)
Audio SettingsAUDIO_SAMPLE_RATE44100Sample rate in Hz. Must be 44100 for RTP PT11 auto-detection
Audio SettingsAUDIO_CHANNELS11 = mono, 2 = stereo
RTP / Network SettingsUDP_AUDIO_PORT8889Destination UDP port
RTP / Network SettingsUDP_PACKET_SIZE1024PCM bytes per UDP packet (~11.6 ms at 44100 Hz mono)

Receiving the stream

Open a network stream and enter:

rtp://@:8889

VLC auto-detects the format from the RTP header. No additional parameters needed.

ffplay

ffplay -f s16be -ar 44100 -ac 1 udp://0.0.0.0:8889

Note: RTP L16 is big-endian (s16be), not little-endian.

How it works

ES8388 line-in


I2S stream (ESP-ADF)


raw_stream (ring buffer, 2 KB)


RTP sender task
  ├─ byteswap PCM (little-endian → big-endian, required by L16/RFC 3551)
  ├─ prepend 12-byte RTP header (PT=11, seq++, timestamp += samples)
  └─ sendto() → UDP → receiver

The 2 KB ring buffer and the blocking raw_stream_read() act as a natural rate limiter: the sender can never outpace the I2S capture clock.

Project structure

main/
├── main.c            — Wi-Fi init, codec init, pipeline init, app_main
├── rtp_sender.c      — RTP header, byteswap, UDP sender task
├── rtp_sender.h      — public API (rtp_sender_start)
├── CMakeLists.txt
└── Kconfig.projbuild — all configurable parameters

Known limitations

  • Only one receiver at a time (unicast UDP).
  • Changing AUDIO_SAMPLE_RATE away from 44100 Hz breaks RTP PT11 auto-detection; the receiver will need manual format parameters.
  • The onboard microphone cannot be fully disabled in software; PGA gain 0 dB reduces its contribution to an acceptable level.