Making a Matrix server STUN-enabled.

Jan 18 2020

Previously in this series I showed you how to migrate a Matrix server to use Postgres, a database server designed for busy workloads, such as those of a busy chat server.  This time around I'll demonstrate how to integrate Synapse with a STUN/TURN server to make the voice and video conferencing features of the Matrix network more reliable.  It's remarkably easy to do but it does take a little planning.  Here's why I recommend doing this:

If you are reading this, chances are you're behind a NATting firewall, which means that your device doesn't have a publically routable IP adresss.  In addition to rewriting all of your network traffic so that it doesn't look like it's coming from a private network, the firewall is also doing port forwarding to pass inbound traffic to your device (least of all replies from web servers), again so it doesn't look like you're behind a firewall.  This works just ducky with TCP traffic because TCP sets up bidirectional connections; TCP packets are acknowledged every time which has the additional effect of letting the firewall keep the connection together.  VoIP traffic, on the other hand, tends to use UDP, which is not connection-oriented.  One way to look at UDP is as a fire-and-forget protocol: The packet gets launched toward its destination, and it may or may not arrive depending upon network core weather patterns, luck, the phase of the moon... packets may also not necessarily arrive in the correct order.  It's an inherently unreliable protocol.  This is what makes it useful for streaming data traffic like audio or video, because it's inherently low latency.  If you've ever been on a call and heard it break up or go into robot mode (or for that matter, seen a television program glitch out) this is probably what happened.  The occasional glitchout is the price you pay for a relatively snappy data stream.

The other problem is that firewalls tend to not treat UDP traffic very nicely.  UDP isn't connection oriented (it's a paper airplane instead of two tin cans and a string) so the firewall can't always figure out how to pass return traffic back to you.  Sometimes it can, sometimes it can't and you'll only get half a call.  Sometimes it works but a little extra network engineering is involved, which most people really can't be bothered to do at home.  I know I don't anymore because it's a huge pain in the ass.  But there are a couple of things that can eliminate the extra fiddly crap that you don't necessarily have to mess with.  One of these is the service you're using incorporating a STUN server (Session Traversal of UDP through NAT) which helps your client software figure out its external IP address, the kind of NAT they're stuck behind, and the UDP port their outbound traffic is coming from.  A related technique is called TURN (Traversal Using Relays around NAT), which basically does the heavy lifting of relaying that multimedia traffic back to you.  It's weird, it's fiddly, and you probably don't know you're using it because this functionality can be built into whatever chat software you're using.  There are even public STUN/TURN servers out there that you can use if you really need to.

The optimal technique, of course, is to bake it into your service so that nobody has to worry about it.  That's what we're about to do.

There is an open source STUN/TURN server called coturn which is part of the default Ubuntu package repository.  Installing it is about as easy as it gets:

drwho@jackpoint:~(3) $ sudo apt-get update
...
drwho@jackpoint:~(3) $ sudo apt-get install -y coturn

You don't need to enable a whole lot in the /etc/turnserver.conf file to get it running, so here's what I had to change:

# TCP and UDP ports to listen on.
listening-port=3478

# TCP and UDP ports to listen for TLS traffic on.
tls-listening-port=5349

# IP addresses to listen on.  You need at least one, but multiple are supported.
listening-ip=your.servers.ip.address
listening-ip=your.servers.other.ip

# Verbose logs, please.
verbose

# Use fingerprints in TURN messages.
fingerprint

# Use long-term credentials (needed for server integration).
lt-cred-mech

# Turn on secret-based authentication.
use-auth-secret

# Static secret to use for authentication.
static-auth-secret=LoveSexSecretGod

# The default realm to handle traffic for.  This is needed both for the API
# and because we're not configuring a database for this service.
realm=name.of.your.server.example.com

# Log to syslog, because systemd sucks.
syslog

(Incidentally, here's how I extracted only the config options: grep -v '^#' /etc/turnserver.conf | grep -v '^\W*$' | grep -v '^\w$')

Edit the /etc/default/coturn file so that it looks like this:

#
# Uncomment it if you want to have the turnserver running as 
# an automatic system service daemon
#
TURNSERVER_ENABLED=1

I have no idea why the packagers thought this was a good idea.

Now let's fire it up:

drwho@jackpoint:~(3) $ sudo systemctl enable coturn.service
drwho@jackpoint:~(3) $ sudo systemctl start coturn.service

One more thing: If you have a firewall running on your box, you need to make coturn accessible from outside.  I forgot to do that and spent an hour troubleshooting it.

drwho@jackpoint:~(3) $ sudo ufw allow Turnserver

(No, I don't know why the package is called coturn, but everything refers to turnserver.)

Now, we reconfigure Synapse a bit by editing the /home/matrix/homeserver.yaml config file.  I'll only list the stuff I had to add or edit to that file below:

...
# URLs of the STUN/TURN server to use.  These should be all on one line.
turn_uris: [ "turn:name.of.your.server.example.com:3478?transport=tcp",
    "turn:name.of.your.server.example.com:3478?transport=udp" ]

# Shared authentication secret. Matches /etc/turnserver.conf.
turn_shared_secret: "LoveSexSecretGod"

# How long a TURN session can last, in ms.
turn_user_lifetime: 86400000

# Are guests allowed to use the TURN server?
# If this is 'false', reliability of VoIP tanks.
turn_allow_guests: true
...

And restart Synapse:

drwho@jackpoint:/home/matrix(3) $ sudo systemctl restart matrix-synapse.service

Any logged in users will probably only notice a one second blip when you do this.  The upshot, however, will be Matrix's VoIP functionality working much more reliably.