How Vivun Uses Ephemeral Environments to Facilitate Rapid and Collaborative Development

Jonathan Call Avatar photo

When organizations grow, dependencies that developers rely on grow exponentially–as do the number of environments code is developed, deployed, and tested on. What defines a development environment? Is it its location–whether in the cloud, on-prem, or on your local laptop? Maybe a better question is: What even is the development environment? 

In this post, we’ll explore how Vivun is using ephemeral environments to:

  • Expedite environment provisioning for development and testing
  • Significantly increase developer productivity
  • Deliver a self-service provisioning system that supports our teams’ hyper growth over the next few years
  • Allow us to double our engineering team size in the next 6 months without any additional SRE team cost or workload

What’s the problem?

In the past, I’ve worked with development teams of all sizes ranging from just a handful of “do-everything” coworkers to hundreds of engineers and programmers spread across dozens of teams, all deploying to the same shared location. From a Site Reliability Engineering (SRE) perspective, the larger the organization, the more difficult it is to make fundamental infrastructure changes–such as adjusting security models–simply because of the large number of developers. In turn, development work can be negatively impacted, leading to an interesting observation: If development has people who depend on it, and if development needs to maintain consistency and quality, has development just become a different type of Production environment? 

Let’s explore that through a simple mental exercise. When an idea is first iterated through a proof of concept there’s typically a single person working on it—let’s call them Jim. Writing code and developing features is quick and easy because Jim can change what they want whenever they want, with everything typically hosted right there on Jim’s local development machine. 

Then another team member, Sally, joins in and suddenly there becomes a need for a shared environment where feature collaboration can happen. Your infrastructure team stands up a machine or two where the latest code can be tested to accommodate for both teammates. 

For Jim and Sally, using this shared environment together is not a problem as there’s not a lot of parallel work that happens. When changes need to be made, Jim and Sally simply ping each other through Slack. As the product starts to gain traction, the team size grows exponentially. 

There’s now a healthy backlog of feature work; project managers start helping prioritize tasks, and an entire business line is being driven by this effort. Customers want their bugs fixed and your Solutions Consultants want to know why a feature that was supposed to be released a month ago is still pending. 

Unfortunately, Jim has to explain to everybody that because Brett deployed bad code and broke development before going on vacation, no one has been able to test deploying their changes in weeks! Does that scenario seem familiar? Okay, maybe it’s a bit dramatic, but we’ve all witnessed similar situations to some degree as an organization grows. 

Is Development really just another Production?

Even though it’s not advertised, this “Development” environment has gained some inherent and expected SLAs. The customers are your internal coworkers who look to infrastructure teams to maintain a smooth and healthy development lifecycle. Even the data, while not belonging to a “paying customer,” or considered “sensitive,” probably has some importance. It’s tied to the testing or validation of a given feature that’s in progress. In tech circles we like to say that development environments are places that are continually changing and stability isn’t a concern. However, as we’ve seen, this is no longer the case. 

So what happens, and what do we do? Maybe the developers all start developing locally again? Then it becomes a race to get your pull request approved and merged before everyone else, and let each individual sort out any integration issues. Hmm…well that’s really just kicking the can around, right? What about the Infrastructure and QA teams forcing every change through a tighter, more stringent gate? While not a terrible idea, testing and quality gates themselves need to be developed and tested somewhere—we’ve all had a feature fail testing because the test condition wasn’t updated yet.

Vivun’s solution: ephemeral environments

Luckily for us, modern infrastructure gives us the tools to quickly provision, deploy, and make available new digital resources such as Cloud Platform Providers, Containerization, and Container Orchestration. The real value is in using these tools in new and novel ways.

While developing our second product, Eval, Vivun took the time to abstract out the dependencies from our containerized applications and then placed environment-specific values in configurable places, such as ConfigMaps and Secrets. Variables used in database connections, API keys, and logging configurations are all injected at runtime. This forges a pathway for us to dynamically create new environments on-demand for each feature that is being developed. Here’s a simplified overview of how this works:

First, a developer pushes a new branch to Github with a specific prefix–in our case “feature/”. Using prefix filters like this allows the developer to control which branches do and do not get an ephemeral environment. 

Then, a CI/CD job picks up on this and runs three parallel pipelines: 

  • Build the frontend service
  • Build the backend service
  • Provision the infrastructure

The frontend and backend service pipelines check each respective repository to see if that branch exists. If the branch exists, then the pipelines run through the build process to generate a new container image from that code. If the branch doesn’t exist for that service, we skip building it and tell the pipeline to use whatever the latest “stable” released image is. For Eval, we only have two services, but this same strategy could be followed for a project with any amount of services. 

Another key step is generating a unique identifier for each ephemeral branch. This can be a git sha, a generated GUID, or in our case a string based on the feature branch name. This identifier is used by the infrastructure pipeline to create new resources with properties such as names, tags, and password. The important part here is that this resulting identifier is tied to a branch and is always the same when that branch triggers the pipeline. This allows us to add idempotency to our infrastructure build steps. If the resource already exists, we can then reuse the same pipeline and drastically reduce execution time on future runs. 

At this point, we have the following dedicated resources:

  • RDS database
  • S3 bucket
  • IAM roles

Additionally, we have a unique service container build for each service that has the branch located in the code repository. But how do we configure the service to use these randomly generated resources? This is where the early abstraction component comes into play. Through the power of Helm, we can dynamically generate a “values.yaml” file based on the identifier and the responses from our infrastructure deployment steps. The values file contains all of the necessary information the deployments need. Everything from database URLs to hostnames are generated and inserted at this point. Running a “helm upgrade–install” on each service with the values file allows us to have a consistent, yet unique, version of our entire application stack. Each service is on its own pods in their own namespaces–even the data belongs solely to this environment:

What does this mean for Development?

We now have independent isolated environments where developers can test integration with other services they aren’t touching, and QA can create or validate testing scenarios for their quality gates. Project Managers can check out new features before they’re even merged using dedicated URLs. Best of all, we have a space where true experimentation can occur in a setup that more closely matches the actual running environments than what a developer would typically have if they were running things locally. 

So, is there even a point to having a development environment then? Yes! Even though we have all of these dedicated environments that test changes against the rest of the codebase, there is no guarantee that the stable branch contains this code. Sure, when the branch is first cut it is there, but over time other features may get merged and if changes aren’t rebased they won’t be picked up on the feature branch (y’all always rebase right…right?).

This is where development becomes the repository for changes–a place where things come together and only changes to the stable branch are played out on top of an environment. This allows you to be more stringent in testing (remember you should have already tweaked or written your tests in the ephemeral environment) and to really raise the bar on what gets into the stable branch. Development is the place where packages are built, whether they are containers or entire helm charts. Packages can then be validated and promoted as a whole up through various higher environments until making their way to production. At Vivun, we’ve additionally added Slack integrations to our new pipelines so that developers can see what’s running; what’s been deployed; and are provided links to each pipeline’s executions. These integrations help developers reduce context switching by pushing this info into the places they are already familiar with. 

What’s the catch?

While some of these concepts may seem too good to be true you’ll be excited to know they’re not! By properly planning our development approach to containerized services and using good Infrastructure as Code (IaC), even a smaller, fast-moving company like Vivun is able to accomplish this setup. 

One vital point to keep in mind is that more infrastructure means more cost. You’re paying for the increased development time, which is often necessary and invaluable. However, it is not worth it if there are resources lingering out there doing nothing. Vivun solves this problem by running daily automated reports on the number and age of all ephemeral environments. We also automatically delete the ephemeral environment and all associated AWS resources when the feature branch is merged into the master. By enforcing standard cloud best practices around tagging, monitoring, and alerting, you can minimize cost and maximize gains.

Where is Development?

By utilizing modern infrastructure and automation tooling at our disposal, Vivun is able to decouple development work from specific environments; enable testing to be shifted earlier in the development cycle; and, most importantly, reduce the delivery time for fixes, features, and all the goodies our customers enjoy.

Jonathan Call Avatar photo November 1, 2022