37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
37° 48' 15.7068'' N, 122° 16' 15.9996'' W
cloud-native gis has arrived
Maps
Engineering
Routing patterns for manageable Phoenix Channels
There are unique challenges that arise when you’re building a huge Phoenix Channel, and in this post we’re going to explore some solutions.
There are unique challenges that arise when you’re building a huge Phoenix Channel, and in this post we’re going to explore some solutions.

Phoenix Channels are great. They are at the very core of Felt’s collaborative features, happily handling all kinds of interactions in your map. Being a WebSocket, event-based interface, they differ quite a bit from traditional, HTTP-based APIs not only in the interface semantics, but also in the way they are architected in the code itself. There are unique challenges that arise when you’re building a huge Phoenix Channel, and in this post we’re going to explore some solutions to them.

Building an HTTP-based API often goes like this:

  • You go to your <p-line>Router<p-line> module
  • Add a new route for your endpoint
  • Specify which Controller module and function will handle a request to that route

The nice thing about this is that it’s both easy to see everything that your API is capable of handling, and you can group related functionality in the same Controller module. Taking Felt as an example, we would probably have a <p-inline>MapsController<p-inline>, a <p-inline>LayersController<p-inline>, and so on.

Another nice thing is that you can specify which <p-inline>Plug<p-inline>s to run before (or maybe after) each group of routes; you can define an <p-inline>:authenticated<p-inline> pipeline at the router level to gate specific features to logged in users, or you can hide features behind a feature flag without having to update every single controller action:

When building a Phoenix Channel, however, none of these goodies are available. Instead, we often define a single channel, which is backed by a single OTP process, that will handle everything related to it via pattern matching in clauses for the <p-inline>handle_event/3<p-inline> callback. This is fine when starting, but once your channel starts to grow and is responsible for handling lots of features, you quickly end up with a giant, many thousand lines module.

There are many ways in which we can improve this situation, and we’ll explore some of them below. By the end of the article, we’ll have our channel defined in an elegant and declarative way, making it easier to see what the channel is capable of, and to manage the implementation.

Namespace related events

Imagine you’re building Felt, and you’re implementing three features: elements, layers and cooperative updates. Just like how we would group HTTP endpoints under a common path and make the same module handle them, we can group related functionality by prefixing them with a namespace:

This small improvement can go a long way. Even if it’s super simple, just colocating related features together can go a long way in improving the readability of your channel module, and removes the mental burden of deciding where’s the correct place to put your new event handler.

What isn’t colocated, however, is the implementation code that was extracted to private functions in the same module, so what you end up with is a very organized event handling section, and a very messy private section.

To fix this we need to get a bit more sophisticated, and for that we can borrow a pattern that’s common when building complex GenServers:

Using the channel as a router

When building complex GenServers, a common practice is to use the GenServer callback module(i.e. the module with use <p-inline>GenServer<p-inline>) to route the messages to a specified module that does the actual work:

We can do something similar with our Phoenix Channel.

At this point we have all our handlers neatly organized by feature group with a prefix, and we can leverage it to route messages to different implementation modules:

With this change, now our Phoenix Channel is pretty much the same as the Router was in a traditional HTTP API: we just define the root name for the event, and tell the Channel to which module it needs to delegate.

The only thing we’re missing are plugs and pipelines.

Running common functions on each event

One of the most common cases for Plugs in an HTTP API is to perform authentication and maybe authorization checks before the request reaches the Controller. To do this, all it takes is to add the following to your router or controller:

And boom, all the requests affected by that plug will return an error if the user can’t be authenticated.

With Phoenix Channels, however, it would seem like we don’t need that: the user often connects to the WebSocket by providing a signed token, which the server verifies and then uses to retrieve the current user. So in principle authentication needs to happen just once: when the user connects to the socket, right?

In reality, while that first authentication is required, we also need to authenticate the user on each event. Users are always in control of the data they send to the server, so you need to always check that they are who they claim to be, and that they can do only what they’re allowed to do. Moreover, because channels are stateful, it may be the case that a user tries to perform an action on a resource they no longer have access too.

A very simple example of this in Felt is the following:

  1. User A creates a map, and invites User B to edit the map together
  2. User B makes some edits, and all is fine
  3. User A now revokes edit permissions from User B

Now if User B tries to edit the map, the server should reject that action, but if we just assume that the channel state is always up to date and don’t do authorization checks on every event, we would allow User B to edit a map they don’t have any permission to edit.

With this in mind, let’s add some checks to our code:

Great, now we have some basic security in place, but it’s quite annoying to have to manually add all these checks manually to every single event handler. We can do a bit better by “grouping” the handlers by event:

Sweet, now we have the checks for all the events in that module handled in a single place, though we needed to add yet another layer of indirection to achieve it: we now have a mini-router inside the handler module, and we have to repeat the name of the events twice: one in the <p-inline>handle_event<p-inline> function, and one in the head of the do <p-inline>_handle_event<p-inline> functions. This is fine in a lot of cases, though it may become a burden in others, and Elixir won’t tell you if you made a typo in one of the events names, so you’ll have to catch that with yout test suite, or in the worst case, in production.

Step by step we arrived at a pattern that allows us to group related functionality together, and that lets us define common checks for groups of events. Can we go any further?

Plugs. But for Channels.

One thing we’re lacking is Plugs. Our <p-inline>ensure_map_editable<p-inline> function is just a regular helper, but it has no way to add new assigns to the socket, or to explicitly define if the event handling should be halted; that is decided by the <p-inline>handle_event<p-inline> function in an ad-hoc way.

As a first attempt we may want to define a channel Plug the same way we define an HTTP Plug: it’s just a function that takes a Socket and returns a new Socket or halts the event handling:

But there are some considerations that make this definition suboptimal. One important aspect of HTTP APIs is that once the request is fully handled, it is discarded from memory. Every single time a request arrives, the full request context like the authenticated user, and the resource we want to act upon is loaded from scratch, processed and then freed. With Channels, however, while we do have Socket <p-inline>assigns<p-inline> in the same spirit we have Plug.Conn <p-inline>assigns<p-inline>, the difference is that these assigns live in the socket for the entire lifetime of the channel process. The implication is that the bigger the assigns, the more memory that channel will use, and for applications like Felt, where a session can stay open for literal weeks, we need to be conservative on what we store there; for example, you want to store just the map id instead of all its contents.

However, it’s still useful for plug-like functions to load data and hand it over to the event handler in some way. The problem we need to address is how to pass that data outside of the socket assigns. To add more issues to the mix, the event payload is not stored in the Socket itself, so if we want to work with it we need to include it in the arguments as well.

To solve this, we’ll need to make a few changes. Instead of just taking the Socket and returning a Socket, we’ll receive:

  1. The Socket
  2. The event payload
  3. A “bindings” value, which contains the additional data from previous plugs, or default
  4. The plug options

And we’ll return a tuple including:

For the error case:

  1. The <p-inline>:halt<p-inline> atom to indicate no further processing should happen
  2. The socket, which may be modified
  3. The status for the reply that will be sent to the client
  4. The reason for the halting, returned to the client

For the success case:

  1. A <p-inline>:cont<p-inline> atom indicating if we should continue processing the event
  2. The socket, which may be modified
  3. The event payload, which may be modified
  4. The new “bindings”

The purpose of returning the “bindings” as a separate value is to be able to keep the original payload separate from any other additional information our plug may load.

With these changes, our spec looks like the following:

We will need to also make changes to our event handlers, as they now need to be aware of this new spec and new way of handling events.

First, we need a function that is able to run a series of plug functions, and maybe halt the event handling:

Whew, that’s a mouthful, but all we’re doing is running the plugs one by one. If any of them returns a halt value, the “pipeline” stops and the halting reason is sent to the client. Otherwise, processing continues until a <p-inline>{:cont, socket, payload, bindings}<p-inline> value is returned.

Finally, we can use it in our handler as such:

With this we are pretty much done, we have a little pattern and almost a little framework to organize channel handlers, and use an approach similar to Phoenix’s Plug pipelines to run common functionality before each handler.

Using the ChannelHandler library

All the ideas presented in this article so far can be great to make turn a giant Phoenix Channel into a more maintainable set of modules with an approach similar to traditional HTTP Phoenix APIs.

You can try them out by running the examples presented here, or you can use ChannelHandler, a library I wrote applying all the ideas here, with the aim of making this as ergonomic as using the Phoenix router macros. It’s built on top of Spark, the DSL engine powering the Ash Framework, so you don’t need to know all the nitty gritty details about macros and the AST to look under the hood and contribute. Moreover, if you use a compatible editor extension, you get autocompletion for the DSL for free!

With ChannelHandler, the above example looks as follows:

The library applies everything we discussed in the article, and packs it in a declarative API.

You may have noticed that the handler functions take a <p-inline>context<p-inline> argument instead of <p-inline>bindings<p-inline>. In ChannelHandler, the context is a struct containing both metadata about the event itself, and of course the bindings, so to access the bindings you need to do <p-inline>context.bindings.<p-inline> The same applies to plugs.

With this declarative approach, we get a number of benefits:

  • A unified interface to deal with routing and plugs, so you don’t have to write the same boilerplate code in every channel or handler module
  • Compile time validation of the arguments you provide to your functions. Thanks to Spark, you’ll get a warning if you try to pass a function with the wrong parity to <p-inline>handle<p-inline>
  • A familiar API: ChannelHandler tries to stay close to the ideas of traditional HTTP routers in Phoenix, so it should be evident to anyone in your team what is going on
  • A clean bird’s eye view of everything your channel is capable of, without requiring you to put on your spelunking hat and digging through all the handlers to figure out which messages are allowed.

There are also more features in the roadmap:

  • A task similar to <p-inline>mix phx.routes<p-inline> to list every single event that your channel supports. Spark works by generating a DSL data structure, which you can then use for code generation like ChannelHandler does. That same data structure can be use for introspection of your modules, which can make writing such task a painless endeavor.
  • Handler level plugs. I’d like to support handler level plugs similar to controller plugs, such that you can do this instead of defining groups at the router level:

Feel free to use it and let me know how it works for you!

Final Thoughts

We covered a lot of ground in this article. We went from a regular, vanilla Phoenix Channel where every single detail about it was built in an ad-hoc way, to “discovering” a pattern, or an abstraction, that enables us to not only organize our features in a more manageable way, but also gives us more superpowers in the form of “Channel Plugs”.

We also took a look at the ChannelHandler library, which incorporates all the ideas in this article in a simple, declarative API that helps you forget about the technicalities of managing your channel, and just focus on describing the events, and which functions or modules handle them.

There are still more challenges to face when working with a live process like a Channel, like handling messages received in the <p-inline>handle_info<p-inline> callback for the Channel. However, these ideas should still help you deal with the bulk of the complexity and boilerplate.

Hiring

If you loved this article and want to work with a team of incredibly talented engineers, please reach out.

Bio
Lucas is a Senior Software Engineer at Felt specializing in Elixir.
LinkedIn
More articles