Free Software Video Streaming with Voctomix

Early in November, I was fortunate to have been able to contribute and attend Linux App Summit in Barcelona. I was able to work with some excellent people and develop our Voctomix based streaming solution.

This is a great opportunity to write this small article, outlining our Voctomix configuration, some of the learnings that we have made during Linux App Summit and detail some of the plans that we have going forward.

Tuxedo Computers

Tuxedo computers have been very supportive of our efforts towards free software live streaming and as such, have loaned us 2 of their laptops to use during events and drive our solution forward.

Traditionally, one of the primary constraints that we’ve worked around is CPU power, leading us to look into solutions that allow us to delegate CPU and GPU intensive workloads to post-processing. Having access to these laptops allowed us to live-stream the event.

Thanks to Tuxedo Computers

Linux App Summit

Linux App Summit was a great opportunity to test and develop our current configuration. We’ve found that free software events are great milestones to work towards.

Voctomix

https://github.com/voc/voctomix

Voctomix was a free software tool developed by the team behind the Chaos Communication Congress event. Voctomix is a collection of components, written in Python that that co-ordinate GStreamer pipelines across multiple devices. From a Live Streaming point of view, Voctomix provides a video mixer capable of mixing 2 live cameras and 1 screen grabbing input.

Some of the key capabilities of Voctomix that matter to us:

  • Voctomix can receive video streams over a TCP Socket
  • The Voctomix control panel can run on a separate device than the video mixer itself, allowing us to limit the blast radius of failure
  • The interface for Sources (Audio / Video input) and Sinks are TCP Sockets with a common GStreamer based protocol.

Source Feeds

For Linux App Summit, we decided to configure 1 camera to track the presenter with another to show a wide view of the conference hall. Alongside the two camera inputs, we also configured an HDMI Grabber to capture the output of the presenter’s laptop.

  • We used a Canon HDV30 Video Camera as a Zoom Camera, capturing output with Firewire. We are able to use
  • We used a Logitech C920 Webcam as a Wide Camera. The C920 is capable of providing a 720p H264 feed
  • We use the sender component of a Lenkeng HDMI Extender to capture HDMI output from the presenter laptop, I’ll detail more on this in a future entry.

Output Feeds

As for output feeds, we capture the Voctomix output and split it into 2 feeds

  • Youtube Feed
  • h264 timestamped files split hourly.

RTMP Bus

A common feature of most of our video systems is that we normally capture video or audio on different nodes on the same LAN. Where it’s possible to send raw video directly to the Voctomix server in raw format, the bandwidth required to send 1080p video at 30 frames per second is fairly prohibitive on most commodity hardware.

As we normally don’t work with broadcast-quality cameras, it’s possible to take a small hit on perceived video quality to solve this problem. We can encode video to H264 on a low-cpu preset and send video to a shared RTMP server on the same LAN.

In this configuration, we run a pre-configured Docker container to manage live transport of video from the source hosts.

https://hub.docker.com/r/tiangolo/nginx-rtmp/

This container is run as a service on the same machine running Voctomix to limit the impact to latency.

1
docker run -d -p 1935:1935 --name nginx-rtmp tiangolo/nginx-rtmp

Firewire Input

Firewire as a technology is still impressive by today’s standards. We can rely on Firewire as a transport mechanism for 1080i video, it is, however, unlikely that we will find modern hardware with a firewire port.

For this event, we were using the firewire output of a Canon HDV30 camera. This was connected to a Lenovo X200 Laptop with a Firewire X200 ExpressCard. We take the video from the firewire input, then using FFmpeg, convert this to H264 and send it to the RTMP Media Bus that we are running on the video mixer.

On the mixer laptop, we can consume the RTMP feed, convert it back to raw video and pass this directly on to the Voctomix Server.

Sender Laptop
1
2
3
4
ffmpeg \
-f iec61883 -i auto \
-codec x264 \
-f flv rtmp://voctomix.local:1935/live/cam1
Receiver Laptop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
confdir="`dirname "$0"`/../"
. $confdir/default-config.sh
if [ -f $confdir/config.sh ]; then
. $confdir/config.sh
fi

while [ 0 ]
do
ffmpeg -y -nostdin \
-i rtmp://voctomix.local:1935/live/cam1 \
-fflags nobuffer -fflags discardcorrupt -flags low_delay \
-ac 2 \
-filter_complex "
[0:v] scale=$WIDTH:$HEIGHT,fps=$FRAMERATE [v] ;
[0:a] aresample=$AUDIORATE [a]
" \
-map "[v]" -map "[a]" \
-pix_fmt yuv420p \
-c:v rawvideo \
-c:a pcm_s16le \
-f matroska \
tcp://localhost:10000
done

C920 Input

As a second camera input, we used a C920 webcam. The C920 is commonly used by game streamers on twitch and has been designed to make several assumptions regarding lighting and focus and maximise the quality in the majority of cases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
confdir="`dirname "$0"`/../"
. $confdir/default-config.sh
if [ -f $confdir/config.sh ]; then
. $confdir/config.sh
fi

DEVICE=$(v4l2-ctl --list-devices | grep -C1 C920 | tail -n1 | awk '{print $1}')

while [ 0 ]
do
gst-launch-1.0 \
v4l2src device=$DEVICE ! video/x-h264,width=1280,height=720,framerate=30/1 ! h264parse ! \
avdec_h264 ! videoconvert ! videorate ! videoscale !\
video/x-raw,format=I420,width=$WIDTH,height=$HEIGHT,framerate=$FRAMERATE/1\
,pixel-aspect-ratio=1/1 ! \
queue min-threshold-time=3000000000 max-size-time=30000000000 \
max-size-bytes=0 max-size-buffers=0 ! \
mux. \
\
audiotestsrc freq=550 !\
audio/x-raw,format=S16LE,channels=2,layout=interleaved,rate=$AUDIORATE !\
mux. \
\
matroskamux name=mux !\
tcpclientsink host=localhost port=10001;
done

Output Feeds

Similar to the way that we can ingest video feeds into Voctomix, we can consume the output stream in the same way. We can use either FFMPEG or GStreamer to consume these feeds, process, then either stream to another location or save to a local file. In this case, we worked with 2 consumers.

Streaming to Youtube
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
while [ 0 ]
do
ffmpeg -y -nostdin \
-i tcp://localhost:15000 \
-threads:0 0 \
-aspect 16:9 \
-c:v libx264 \
-filter_complex '
[0:v] yadif=mode=2, hqdn3d [deinter];
movie=/etc/voctomix/overlay.png [logo];
[deinter] [logo] overlay=0:0 [out]
' \
-map '[out]' \
-maxrate:v:0 10M -bufsize:v:0 8192k -crf:0 21 \
-pix_fmt:0 yuv420p -profile:v:0 main -g:v:0 25 \
-preset:v:0 veryfast \
\
-ac 1 -c:a aac -b:a 96k -ar 44100 \
-map 0:a -filter:a:0 pan='mono|c0=FL' \
-ac:a:2 2 \
\
-y -f flv rtmp://a.rtmp.youtube.com/live2/$(cat /etc/voctomix/youtube.txt)
done
Saving to Local File
1
2
3
4
5
6
7
8
9
while [ 0 ]
do
ffmpeg \
-y -nostdin \
-i tcp://localhost:11000 \
-flags +global_header -aspect 16:9 \
-f segment -segment_time 180 -strftime 1 \
/store/linux-app-summit/segment-%Y-%m-%d_%H-%M-%S.mp4
done

Next Steps

Following the event, we’re able to focus on 3 primary challenges:

Improving Audio Capture

We had some challenges around audio input, we’re going to be trying a few strategies to improve audio capture across multiple devices. We would traditionally deploy several boundary microphones as a backup, I’m interested in finding a more practical way to utilise these on a live stream.

Improving how we manage and deploy camera modules

Voctomix has been built following a microservice style approach, this gives us some flexibility around the deployment of components. I would like to apply some DevOps principals to how we develop our video pipeline, using tools such as Ansible to deploy changes across each device. It would also be sensible to build system packages for some of our components.

Automation in Post Processing

Post-processing is currently a very manual process. This year, we looked into storing all mixer events in a time-series database while retaining all raw footage for later processing. I’m planning to investigate how we can automatically generate a Kdenlive file from this data, allowing us to “Re-Master” our live events.