mattbutterfield.com

home | blog

Building video chat into my personal website using WebRTC, Websockets, and Golang on GCP.

2021-05-02

Recently I've become more and more interested in how WebRTC works, so a few weeks ago I decided to add a page to my website where I could set up peer-to-peer video chats. There are plenty of libraries and services out there to make this extremely easy, but knowing that WebRTC is now well supported natively across browsers and devices, I wanted try doing everything myself with minimal dependencies. It turned out to be a fun project to set up. It was also hard to find simple examples and explanations of this, so I hope to provide that here.

Frontend

I was able to do everything I needed on the frontend with vanilla Javascript and HTML. The only HTML that is necessary are some <video> elements to display local and remote video streams:

    
<video id="local_video" autoplay controls muted playsinline></video>
<video id="remote_video" autoplay controls playsinline></video>
    
  

I needed to identify who the current user is and who they want to to talk to. I don't have users or cookies or anything like that on my website, so I'm just using url query parameters and generating a unique link for each 'user'. Below, peer1 will be able to visit the first link to talk to peer2 and vice versa:

     
https://mattbutterfield.com/video?userID=peer1&peerID=peer2
https://mattbutterfield.com/video?userID=peer2&peerID=peer1
     
  

On to the Javascript!

A peer-to-peer connection is the goal here, but in order to do that, I need some way for the two users to communicate initially to let each other know where they are on the web and what kind of data they are sending to each other. I'll explain the backend implementation later, but this websocket will allow that initial communication to happen via my backend server:

    
let ws = new WebSocket("wss://" + window.location.host + '/video/connections' + window.location.search);
    
  

This initial communication consists of messages that are created and handled by a RTCPeerConnection object:

    
let peerConnection = new RTCPeerConnection({"iceServers": [{"urls": "stun:stun.l.google.com:19302"}]});
    
  

Take a look at this if you want to get a better idea of what is actually going on with the connection that will be set up and the format of the messages being exchanged. There are three types of messages we care about: offers, answers, and ICE (Interactive Connectivity Establishment) candidates. Offer and answer messages mostly contain information about the media stream, and ICE candidate messages are about how to establish the actual peer connection across the web.

navigator.mediaDevices.getUserMedia() will request access to the current user's camera and microphone, creating a media stream which is displayed locally, then used with peerConnection to create the 'offer' message and send it to the peer with the ws connection:

    
navigator.mediaDevices.getUserMedia({video: true, audio: true}).then(stream => {
  let element = document.getElementById('local_video');
  element.srcObject = stream;
  element.play().then(() => {
    stream.getTracks().forEach(track => peerConnection.addTrack(track, stream));
    peerConnection.onnegotiationneeded = () => {
      peerConnection.createOffer().then(offer => {
        return peerConnection.setLocalDescription(offer);
      }).then(() => {
        ws.send(JSON.stringify(peerConnection.localDescription));
      });
    }
  });
});
    
  

In the meantime, a bit more configuration on peerConnection is necessary to set up how to display the stream received from the peer, and how send the ICE candidate messages to the peer through the websocket:

    
peerConnection.ontrack = evt => {
  let element = document.getElementById('remote_video');
  element.srcObject = evt.streams[0];
  element.play();
};

peerConnection.onicecandidate = evt => {
  if (evt.candidate) {
    ws.send(JSON.stringify({type: 'candidate', ice: evt.candidate}));
  }
}
    
  

That takes care of sending the offers, answers and ICE candidates, now the websocket and peerConnection need to be able to receive them and take the appropriate actions:

    
ws.onmessage = (evt) => {
  const message = JSON.parse(evt.data);
  switch (message.type) {
    case 'offer': {
      peerConnection.setRemoteDescription(message).then(() => {
        return peerConnection.createAnswer()
      }).then(answer => {
        return peerConnection.setLocalDescription(answer)
      }).then(() => {
        ws.send(JSON.stringify(peerConnection.localDescription));
      });
      break;
    }
    case 'answer': {
      peerConnection.setRemoteDescription(message);
      break;
    }
    case 'candidate': {
      peerConnection.addIceCandidate(new RTCIceCandidate(message.ice));
      break;
    }
  }
};
    
  

That's the entire frontend. It weighs in at around ~50 lines of vanilla JS.

Backend

On the backend, I needed some way to handle multiple active websocket connections and pass messages between them. My website is written in Go, and my first implementation held all the active websocket connections in memory in a map. When a message came in from one peer, I could look up the other peer's websocket connection in the map and pass on the message.

This worked when running the app locally, but because my website is deployed on GCP Cloud Run with multiple running instances, I couldn't rely on both peers' websockets being connected to the same instance and in the same memory. A simple shared map was not viable, so I looked for something else.

For passing messages around on the backend, the first thing that came to mind on GCP was Pub/Sub, and it turned out to be a nice solution. First, I set up a client:

    
import "cloud.google.com/go/pubsub"

var pubSub *pubsub.Client

func Initialize() error {
    pubSub, err = pubsub.NewClient(context.Background(), "mattbutterfield")
    if err != nil {
        return err
    }
    return nil
}
    
  

VideoConnections handles websocket connections and does the following:

    
func VideoConnections(w http.ResponseWriter, r *http.Request) {
    ws, err := websocket.Accept(w, r, nil)
    if err != nil {
        log.Fatal(err)
    }
    defer closeWS(ws)
    userID := strings.ToLower(r.URL.Query().Get("userID"))
    peerID := strings.ToLower(r.URL.Query().Get("peerID"))

    peers := []string{userID, peerID}
    sort.Strings(peers)
    topicName := fmt.Sprintf("video-%s-%s", peers[0], peers[1])
    topic := pubSub.Topic(topicName)
    topic.EnableMessageOrdering = true

    ctx := context.Background()
    exists, err := topic.Exists(ctx)
    if err != nil {
        log.Fatal(err)
    }
    if !exists {
        log.Printf("Topic %s doesn't exist - creating it", topicName)
        _, err = pubSub.CreateTopic(ctx, topicName)
        if err != nil {
            log.Fatal(err)
        }
    }

    cctx, cancelFunc := context.WithCancel(ctx)
    go wsLoop(ctx, cancelFunc, ws, topic, userID)
    pubSubLoop(cctx, ctx, ws, topic, userID)
}
    
  

wsLoop listens for new messages coming to the websocket and publishes them to the Pub/Sub topic with an ordering key, to ensure everything arrives in the order it was sent:

    
func wsLoop(ctx context.Context, cancelFunc context.CancelFunc, ws *websocket.Conn, topic *pubsub.Topic, userID string) {
    log.Printf("Starting wsLoop for %s...", userID)
    orderingKey := fmt.Sprintf("%s-%s", userID, topic.ID())
    for {
        if _, message, err := ws.Read(ctx); err != nil {
            log.Printf("Error reading message %s", err)
            break
        } else {
            log.Printf("Received message to websocket: ")
            msg := &pubsub.Message{
                Data:        message,
                Attributes:  map[string]string{"sender": userID},
                OrderingKey: orderingKey,
            }
            if _, err = topic.Publish(ctx, msg).Get(ctx); err != nil {
                log.Printf("Could not publish message: %s", err)
                return
            }
        }
    }
    cancelFunc()
    log.Printf("Shutting down wsLoop for %s...", userID)
}
    
  

Finally, pubSubLoop listens for new messages published to the Pub/Sub topic and writes them to the websocket:

    
func pubSubLoop(cctx, ctx context.Context, ws *websocket.Conn, topic *pubsub.Topic, userID string) {
    log.Printf("Starting pubSubLoop for %s...", userID)
    subscriptionName := fmt.Sprintf("%s-%s", userID, topic.ID())
    sub := pubSub.Subscription(subscriptionName)
    if exists, err := sub.Exists(ctx); err != nil {
        log.Printf("Error checking if sub exists: %s", err)
        return
    } else if !exists {
        log.Printf("Creating subscription: %s", subscriptionName)
        if _, err = pubSub.CreateSubscription(
            context.Background(),
            subscriptionName,
            pubsub.SubscriptionConfig{
                Topic:                 topic,
                EnableMessageOrdering: true,
            },
        ); err != nil {
            log.Printf("Error creating subscription: %s", err)
            return
        }
    }
    if err := sub.Receive(cctx, func(c context.Context, m *pubsub.Message) {
        m.Ack()
        if m.Attributes["sender"] == userID {
            log.Println("skipping message from self")
            return
        }
        log.Printf("Received message to pubSub: ")
        if err := ws.Write(ctx, websocket.MessageText, m.Data); err != nil {
            log.Printf("Error writing message to %s: %s", userID, err)
            return
        }
    }); err != nil {
        log.Printf("Error setting up subscription Receive: %s", err)
    }
    log.Printf("Shutting down pubSubLoop for %s...", userID)
}
    
  

And with that, I have a working solution. I've tested it on various computers and mobile browsers, on different networks across some distances. The peer-to-peer connection is usually crystal clear. It feels better than the mainstream video conferencing tools, which is quite satisfying. It was good to dive in to some technical areas that I hadn't explored before and come out with something that I understand and works well.

The final working Javascript and Go files can be viewed here and here.