Video Mixing on the Cloud with OpenVPN and OBS

In the search for more flexibility when trying to record talks during conferences we found that there was one constant constraint, CPU (and GPU) power. Where it’s possibly to simply buy more hardware, we would that this is often very expensive and impractical, and often, depending on your use case, hard to justify.

We realised this was not a new problem. Clouds vendors have been selling this potential flexibility for years so we decided to experiment in using the public cloud to do the heavy lifting in terms of live video production.

Video Capture

We care about four types of primary media:

  • (Video) Camera Input
  • Audio Capture
  • Presenter Screen Capture
  • Supporting Media (background images, etc)
Camera Input

For camera and cook with mainly use raspberry pies with camera modules. These raspberry pies have been configured as RTMP servers that can be connected to from OBS.

Audio Capture

For audio capture we generate another video feet with a static background. This allows us to create the same RTMP server that we use for the camera but instead use it primarily to open the audio from a USB sound device.

Presenter Screen Capture

For presenter screens, we use a Lenkeng HDMI extender, again these are connected to a Raspberry Pi over ethernet.

Bitrates

In terms of bitrates, we tend to send about 2MBps for each video feed. Audio feeds can be around 192kbps given that the video element in these feeds is a static image.

We do however store raw media on a storage device connected locally to the Raspberry Pi. This allows us to use higher quality assets for the final edit.

Supporting Media

Interstitials, lower thirds and backgroud media are stored on the machine running OBS.

Cloud Hosted OBS

OBS is Flexible, what it can do and what it allows us to do …

Remote Desktop in Linux isn’t quite as good as you would assume. It took some testing, but eventually we found a good pairing of Xrdp and Remina.

Xrdp is an RDP layer on top of VNC. Allowing us to utilise better compression and display state management over a network.
Remina is a fairly modern Remote Desktop client with support for a few Remote Desktop protocols.

Networking

With all capture devices configured to connect to a central OpenVPN Server. We can configure the AWS VPC routing tables to route all traffic to the private OpenVPN subnet, this allows us to connect our Cloud OBS instance to the camera devices transparantly.

Control Hardware

Where RDP is reasonably for overall control of OBS, it can often provide a frustrating experience with a low bitrate. Instead, we configured some local hardware to provide this level of control.

A lightweight Midi to Websocket bridge allows us to control OBS using a midi controller. We can make use of push-buttons for scene control, while also using dials to manage volume, and pan/tilt/zoom.

For video feed monitoring, we can use relatively inexpensive 10” 1080p monitors connected to an HDMI switch. This provides real-time feedback as to whether our cameras configured appropriately.

Monitoring

For stream monitoring, we are able to take advantage of the netowrk connectivity on AWS. Using the Telegraf/ InfluxDB / Grafana (TIG) Stack, we are able to build a lightwieght dashboard for system health.

As I’ve alluded to in previous talks, we can also use InfluxDB to storage business events. Every time an operator presses a button to change scene, each time a new raw file is created, these events are all stored in a custom database in InfluxDB. Exporting these to CSV super helpful the post-processing phase.

Free Software Video Streaming with Voctomix

Early in November, I was fortunate to have been able to contribute and attend Linux App Summit in Barcelona. I was able to work with some excellent people and develop our Voctomix based streaming solution.

This is a great opportunity to write this small article, outlining our Voctomix configuration, some of the learnings that we have made during Linux App Summit and detail some of the plans that we have going forward.

Tuxedo Computers

Tuxedo computers have been very supportive of our efforts towards free software live streaming and as such, have loaned us 2 of their laptops to use during events and drive our solution forward.

Traditionally, one of the primary constraints that we’ve worked around is CPU power, leading us to look into solutions that allow us to delegate CPU and GPU intensive workloads to post-processing. Having access to these laptops allowed us to live-stream the event.

Thanks to Tuxedo Computers

Linux App Summit

Linux App Summit was a great opportunity to test and develop our current configuration. We’ve found that free software events are great milestones to work towards.

Voctomix

https://github.com/voc/voctomix

Voctomix was a free software tool developed by the team behind the Chaos Communication Congress event. Voctomix is a collection of components, written in Python that that co-ordinate GStreamer pipelines across multiple devices. From a Live Streaming point of view, Voctomix provides a video mixer capable of mixing 2 live cameras and 1 screen grabbing input.

Some of the key capabilities of Voctomix that matter to us:

  • Voctomix can receive video streams over a TCP Socket
  • The Voctomix control panel can run on a separate device than the video mixer itself, allowing us to limit the blast radius of failure
  • The interface for Sources (Audio / Video input) and Sinks are TCP Sockets with a common GStreamer based protocol.

Source Feeds

For Linux App Summit, we decided to configure 1 camera to track the presenter with another to show a wide view of the conference hall. Alongside the two camera inputs, we also configured an HDMI Grabber to capture the output of the presenter’s laptop.

  • We used a Canon HDV30 Video Camera as a Zoom Camera, capturing output with Firewire. We are able to use
  • We used a Logitech C920 Webcam as a Wide Camera. The C920 is capable of providing a 720p H264 feed
  • We use the sender component of a Lenkeng HDMI Extender to capture HDMI output from the presenter laptop, I’ll detail more on this in a future entry.

Output Feeds

As for output feeds, we capture the Voctomix output and split it into 2 feeds

  • Youtube Feed
  • h264 timestamped files split hourly.

RTMP Bus

A common feature of most of our video systems is that we normally capture video or audio on different nodes on the same LAN. Where it’s possible to send raw video directly to the Voctomix server in raw format, the bandwidth required to send 1080p video at 30 frames per second is fairly prohibitive on most commodity hardware.

As we normally don’t work with broadcast-quality cameras, it’s possible to take a small hit on perceived video quality to solve this problem. We can encode video to H264 on a low-cpu preset and send video to a shared RTMP server on the same LAN.

In this configuration, we run a pre-configured Docker container to manage live transport of video from the source hosts.

https://hub.docker.com/r/tiangolo/nginx-rtmp/

This container is run as a service on the same machine running Voctomix to limit the impact to latency.

1
docker run -d -p 1935:1935 --name nginx-rtmp tiangolo/nginx-rtmp

Firewire Input

Firewire as a technology is still impressive by today’s standards. We can rely on Firewire as a transport mechanism for 1080i video, it is, however, unlikely that we will find modern hardware with a firewire port.

For this event, we were using the firewire output of a Canon HDV30 camera. This was connected to a Lenovo X200 Laptop with a Firewire X200 ExpressCard. We take the video from the firewire input, then using FFmpeg, convert this to H264 and send it to the RTMP Media Bus that we are running on the video mixer.

On the mixer laptop, we can consume the RTMP feed, convert it back to raw video and pass this directly on to the Voctomix Server.

Sender Laptop
1
2
3
4
ffmpeg \
-f iec61883 -i auto \
-codec x264 \
-f flv rtmp://voctomix.local:1935/live/cam1
Receiver Laptop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
confdir="`dirname "$0"`/../"
. $confdir/default-config.sh
if [ -f $confdir/config.sh ]; then
. $confdir/config.sh
fi

while [ 0 ]
do
ffmpeg -y -nostdin \
-i rtmp://voctomix.local:1935/live/cam1 \
-fflags nobuffer -fflags discardcorrupt -flags low_delay \
-ac 2 \
-filter_complex "
[0:v] scale=$WIDTH:$HEIGHT,fps=$FRAMERATE [v] ;
[0:a] aresample=$AUDIORATE [a]
" \
-map "[v]" -map "[a]" \
-pix_fmt yuv420p \
-c:v rawvideo \
-c:a pcm_s16le \
-f matroska \
tcp://localhost:10000
done

C920 Input

As a second camera input, we used a C920 webcam. The C920 is commonly used by game streamers on twitch and has been designed to make several assumptions regarding lighting and focus and maximise the quality in the majority of cases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
confdir="`dirname "$0"`/../"
. $confdir/default-config.sh
if [ -f $confdir/config.sh ]; then
. $confdir/config.sh
fi

DEVICE=$(v4l2-ctl --list-devices | grep -C1 C920 | tail -n1 | awk '{print $1}')

while [ 0 ]
do
gst-launch-1.0 \
v4l2src device=$DEVICE ! video/x-h264,width=1280,height=720,framerate=30/1 ! h264parse ! \
avdec_h264 ! videoconvert ! videorate ! videoscale !\
video/x-raw,format=I420,width=$WIDTH,height=$HEIGHT,framerate=$FRAMERATE/1\
,pixel-aspect-ratio=1/1 ! \
queue min-threshold-time=3000000000 max-size-time=30000000000 \
max-size-bytes=0 max-size-buffers=0 ! \
mux. \
\
audiotestsrc freq=550 !\
audio/x-raw,format=S16LE,channels=2,layout=interleaved,rate=$AUDIORATE !\
mux. \
\
matroskamux name=mux !\
tcpclientsink host=localhost port=10001;
done

Output Feeds

Similar to the way that we can ingest video feeds into Voctomix, we can consume the output stream in the same way. We can use either FFMPEG or GStreamer to consume these feeds, process, then either stream to another location or save to a local file. In this case, we worked with 2 consumers.

Streaming to Youtube
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
while [ 0 ]
do
ffmpeg -y -nostdin \
-i tcp://localhost:15000 \
-threads:0 0 \
-aspect 16:9 \
-c:v libx264 \
-filter_complex '
[0:v] yadif=mode=2, hqdn3d [deinter];
movie=/etc/voctomix/overlay.png [logo];
[deinter] [logo] overlay=0:0 [out]
' \
-map '[out]' \
-maxrate:v:0 10M -bufsize:v:0 8192k -crf:0 21 \
-pix_fmt:0 yuv420p -profile:v:0 main -g:v:0 25 \
-preset:v:0 veryfast \
\
-ac 1 -c:a aac -b:a 96k -ar 44100 \
-map 0:a -filter:a:0 pan='mono|c0=FL' \
-ac:a:2 2 \
\
-y -f flv rtmp://a.rtmp.youtube.com/live2/$(cat /etc/voctomix/youtube.txt)
done
Saving to Local File
1
2
3
4
5
6
7
8
9
while [ 0 ]
do
ffmpeg \
-y -nostdin \
-i tcp://localhost:11000 \
-flags +global_header -aspect 16:9 \
-f segment -segment_time 180 -strftime 1 \
/store/linux-app-summit/segment-%Y-%m-%d_%H-%M-%S.mp4
done

Next Steps

Following the event, we’re able to focus on 3 primary challenges:

Improving Audio Capture

We had some challenges around audio input, we’re going to be trying a few strategies to improve audio capture across multiple devices. We would traditionally deploy several boundary microphones as a backup, I’m interested in finding a more practical way to utilise these on a live stream.

Improving how we manage and deploy camera modules

Voctomix has been built following a microservice style approach, this gives us some flexibility around the deployment of components. I would like to apply some DevOps principals to how we develop our video pipeline, using tools such as Ansible to deploy changes across each device. It would also be sensible to build system packages for some of our components.

Automation in Post Processing

Post-processing is currently a very manual process. This year, we looked into storing all mixer events in a time-series database while retaining all raw footage for later processing. I’m planning to investigate how we can automatically generate a Kdenlive file from this data, allowing us to “Re-Master” our live events.

Linux App Summit

Now that I’m on the back of some fairly intense travelling during the last 6 weeks, I finally get a chance to write something up.

Back in November, I was fortunate to be involved in the Linux App Summit conference in Barcelona. Linux App Summit was a joint venture between multiple communities, including KDE and Gnome.

LAS was a great opportunity to use the event infrastructure components that we’ve been building for KDE over the past few years. Now that we’ve been able to test both Frab and our Event Registration tooling with more communities, we’ve been able to mature our requirements some more for further development in the coming months.

  • In both cases, we wish that it was possible to use a third party authentication provider.
  • In both cases, we found limitations in our ability to apply custom styling and layout components to the user front-end. We’re now looking for a way to do this in a sustainable way.
  • We’re looking at the User Experience of submitting talks and events to Frab. Frab works great for large scale events, but we might not need this much flexibility for events such as LAS.

LAS was run as a single-track conference event with breakout sessions during each day. On doing this, I was really impressed with the content structure and themes during each day.

During the event, I spent some time working on the video recording of the talks. Although we had some issues with the audio feed, we had a great opportunity to test out a Voctomix based streaming configuration. We currently have a couple of Laptops on loan from Tuxido computers, I’ll be publishing an entry on this with more details next week.

Thanks as always for KDE e.V. for giving me the opportunity to work on really fun projects like LAS, always giving me more scope to improve our event infrastructure and work with amazing people.