Taking Hashrocket's "Ultimate Elixir CI" to the next level

Hashrocket’s post “Build the Ultimate Elixir CI with GitHub Actions” has been super influential in the Elixir community. The vast majority of Elixir projects I’ve seen use some variation of the setup they recommend:

Setup Elixir
Restore the cache
Check formatting
Check Credo
Run Dialyzer via Dialyxir
Test via ExUnit

From what I’ve seen of the Elixir community, this setup is pretty well accepted as best practice (though people’s feelings on Dialyzer are more mixed than the other tools). That’s where we started as well—it looked something like this:

But then we went further—I’d argue a lot further. Our philosophy is that to the extent that we can have the CI system give us feedback, catch our oversights, improve our code, or lighten the load on human PR reviewers, we should take advantage of that.

Here are some ways we’ve gone beyond the basics in our GitHub Actions CI system:

1. Deploy a staging environment for every PR

Every time a PR is created, we spin up a new instance of the entire Felt architecture on Render, complete with its own database. We then have a GitHub bot that comments on the PR with a “one-click sign in” link—a link with embedded credentials that get you automatically logged in and looking at a map. Then, every time a new commit is pushed to that branch, we do a zero-downtime deployment of the new code to that staging environment.

It’s hard to overstate what an improvement this is to the PR review workflow. Until you’ve seen it in action, it’s easy to accept that having a massive amount of friction is just the way things are–having to check out the branch, install dependencies that may have changed, run migrations, build, run, go to the right localhost URL, log in, etc.

Especially when you’re reviewing three or four PRs a day, that friction adds up—both in terms of time lost and cognitive overhead.

2. Parallelize everything for the fastest feedback

Nothing aggravates me more than pushing a commit and seeing the CI fail within a few seconds due to, say, the formatting being off. Then you push a commit: “Fix formatting.” You wait, and the CI fails again: Credo says imports being in the wrong order. Push another commit: “Fix Credo.” Now, and only now, do you find out if the tests pass in CI.

All the waiting and the back-and-forth in that process is an awful way to work. Each context switch there is an opportunity to get distracted, forget what you were working on, or (God help you) fall into a Twitter doomscrolling trap. (Not to mention you end up with just a whole slew of meaningless commits.)

Instead, we moved to always running all the CI checks. That way developers get feedback about everything wrong with their code on the first run. There’s just one problem with this: it makes it quite a bit slower to get feedback. Rather than the formatter failing within seconds of the CI job starting, you now have to wait for all the tests to run, Dialyzer to finish, etc. before the job fails. So, when we moved to always running every check, we also split the CI tasks up into multiple jobs.

Now we have separate jobs for:

Compiling and running tests
Linting (formatter checks, Credo, etc.)
Dialyzer
Long-running integration tests (run nightly, rather than on every PR)

Running them in parallel provides the fastest feedback possible to devs, at the expense of a small amount of extra CI billing time (roughly a minute to set up the container for each task). The CI bill is so minimal compared to the cost of a dev context switching, though, that this is an easy tradeoff to make.

Now devs can get complete feedback on every commit in under two minutes. Like having a staging environment per PR, it may be easy to undervalue this prior to making the switch, but it’s impossible to go back once you’ve experienced it.

3. Refactor out the boilerplate

Now that we split our Elixir CI workflow into multiple jobs run in parallel, there’s a ton of overlap between each the jobs. To get set up, they require:

Running the erlef/setup-beam action
Setting up the dependencies cache
Setting up the build cache
Maybe cleaning the build (see #4 below)
Maybe installing build dependencies like Rebar and Hex
Ensuring we have the latest dependencies
Compiling dependencies (but only for the CI tasks that need it!)
Compiling the project (also only for the CI tasks that need it!)

In fact, more or less every Elixir CI running on GitHub Actions will need those same steps.

We could (and did initially!) copy and paste that setup boilerplate between each of the tasks, but the duplication eventually drove us to create a separate GitHub Action to wrap all this up. Now we can accomplish all 8 of those things (with customization as needed, like skipping compiling when it’s not necessary) in a single step.

This also helps ensure bug fixes get applied consistently, like when I realized I had configured the caching wrong. I could fix it in one place and be confident all our builds would be sped up the same way.

4. Blow away the cache on retries

Every team I’ve worked with has seen it—not just in Elixir, but in any compiled language. Incremental builds are reliable 99.5% of the time, but every once in awhile, you’ll hit a completely baffling error while running your tests—the kind that makes you question your sanity—which mysteriously disappears when you do a clean build.

How do you handle this in CI?

One (bad) option is to do full, clean builds every time. This avoids those rare instances of flakiness, at the cost of making every CI run massively slower—minutes or even tens of minutes for big enough projects. Another equally bad option is to just teach your devs how to fiddle with the CI script to change the caching keys when they want to try a clean build.

What you’d really like to do is to automatically clear the cache if and only if an incremental build may be to blame for a test failure. We accomplished this by having our setup action check before the compilation step to see if this build is a retry; if it is, we do a clean build.

This is a nice compromise between day-to-day speed and 100% confidence in every build; any time a dev has a question about whether a failure was caused by a flaky build, a retry is sufficient to rule that out. This can save a lot of time and frustration on those rare flaky days, and it increases the team’s overall confidence in CI results.

5. Report code coverage

We run our test suite under ExCoveralls and post nicely formatted, per-file coverage deltas as a comment on each PR. We also have a corresponding “Coverage” check that fails when a PR would decrease overall test coverage.

While there may be rare cases where you’re okay with coverage going down slightly, having the numbers in front of you (on a per-file basis) for every commit makes it easy to find places you might have overlooked testing.

6. Static analysis. Then more static analysis.

Credo is great. It’s a great start. But there are a handful of other static analysis tools that can help drive code quality up over time, rather than succumbing to entropy and deteriorating. (See our “quality checks” workflow for usage examples.)

First, we start with the built-in stuff:

When you compile, do so with the <p-inline>--all-warnings<p-inline> and <p-inline>--warnings-as-errors<p-inline> flags (as recommended by Wojtek Mach of Dashbit). Things like unused variables, implicit dependencies, use of deprecated functions, and more will be caught.
Check for unused dependencies (and fail the build if there are any):
<p-inline>$ mix deps.unlock --check-unused<p-inline>
Check for compile-time dependencies between modules. This could be an article unto itself (there was a great episode of the Thinking Elixir podcast on it!). The long and short of it, though, is that you’d like to detect when changing one module will result in having to recompile another. If you’re diligent (and aware of the problem) you can avoid this entirely.
Elixir creator José Valim once recommended enforcing it in CI by running:
<p-inline>$ mix xref graph --label compile-connected --fail-above 0<p-inline>
While zero isn’t always the right threshold—there are a few cases where you might have a legitimate reason for one module to have a compile-time dependency on another—it’s a good goal. (See more on diagnosing and removing transitive compile-time dependencies on the Dashbit blog.)

The final check I’d consider essential here is sobelow, a tool that scans Phoenix projects for security vulnerabilities. It’s incredibly easy to get started with, and has identified concerning issues in every project I’ve worked on so far.

From there, we get into the realm of optional or “extra credit” checks. Things you can consider adding:

Additional Credo checks. The credo_contrib package has a number of them (my favorite of which is <p-inline>CredoContrib.Check.PublicPrivateFunctionName<p-inline>, which disallows having public and private functions with the same name). You can also write your own Credo checks for things specific to your own code base—it’s easier than you might expect.
mix_unused to check for unused functions. In my experience, it generates a lot of false positives in Phoenix apps, so it takes some configuration (in the form of an ignore list) to get working in CI.
doctor, a tool for scanning your project’s documentation and ensuring it meets certain completeness thresholds.

And of course, if your app includes other toolchains, they deserve CI love too. Our frontend is TypeScript + React, so we run linting via ESLint, formatting checks via Prettier, and unit tests via Jest.

7. Dependabot is your friend

We’ve configured GitHub’s Dependabot to monitor the third-party dependencies in our <p-inline>mix.exs<p-inline> and issue pull requests once a week with updates. Not only does this help us stay on top of security issues, it lets us pay down tech debt incrementally. Rather than taking updates rarely (whenever devs think of it) and potentially dealing with lots of migration steps, we can take them in small pieces. This is less overwhelming for devs, and it spreads the risk of something going wrong over time.

(If you’ve never been bitten by an “update all the dependencies at once” commit going wrong, consider yourself lucky.)

The Future

There are still lots of ways we could take our CI even further. Right now we’re focused on bringing our Typescript tooling up to a comparable level as our Elixir setup, but there are still a few ways we could improve in the future:

End-to-end or smoke tests. Right now we have unit and integration tests for our Elixir backend, plus a separate suite of tests for our Typescript frontend. High on our priority list is the ability to run a few browser-based, end-to-end tests to make sure the integration of the two is in good shape. We believe automating these tests have the highest ROI of all testing strategies for our product.
More sample data in our PR environments, so that humans reviewing the changes will have instant access to a variety of representative test cases for our maps.
Set up a beefier, self-hosted GitHub runner. The 2-core Linux VM that GitHub gives us is serviceable, but we could cut down the time each build takes significantly if we set up, say, a 16-core VPS to run our builds. (Supporting more powerful runners is actually on GitHub’s public road map for Q1 2022.) We are obsessed with giving developers feedback as fast as possible, so investing in faster build machines makes sense for us.

Working at Felt

If you like Elixir and you like maps, join us!

1. Deploy a staging environment for every PR

2. Parallelize everything for the fastest feedback

3. Refactor out the boilerplate

4. Blow away the cache on retries

5. Report code coverage

6. Static analysis. Then more static analysis.

7. Dependabot is your friend

The Future

Working at Felt

Felt now in AWS Marketplace delivering cloud-native GIS in 1-click

Stream, Index, and Visualize Raster Data from Amazon S3, Google Cloud, or Microsoft Azure with Felt

Build Apps 6x Faster with Felt’s New Developer APIs

Map symbols unleashed: import, visualize, and manage icons in Felt

Supercharged Security at Scale: SSO, Authenticated Embeds & Self-Hosted VPC

Introducing Felt SQL Queries for Spatial Analysis at Scale