Using Terraform to Manage DigitalOcean Resources

I am a fan of DigitalOcean. What they lack in breath of services they more than make up for with the ease of use, documentation, and tutorials. Last year, I overhauled this website to be driven by Ansible. This year, I want to take this automation to the next level. There are capability gaps using Ansible to create infrastructure that I’ve had to work around by doing some tasks manually or by writing custom scripts.

An example of this comes when trying to create a managed database cluster. Ansible cannot do this so I wrote a Python script to handle database management.

I do not feel DigitalOcean should fill the gaps either. Why? Because Ansible is a configuration management tool that ensures resources are configured in a desired state. Infrastructure creation is not Ansible’s job. There are specific tools for infrastructure creation… Enter Terraform.

Terraform is a tool for defining providers (like DigitalOcean or AWS) and the resources (like droplets, load balancers, etc.) that your environment requires. Terraforms intent is to compare your infrastructure to your desired state and make corrections to bring your resources into compliance. It is a different concern from HOW the infrastructure is configured.

Over the next few months, I’m going to migrate infrastructure concerns out of Ansible and into Terraform. In fact, I’ve already got a POC to share.

This repository defines the new standard for infrastructure that I am aiming for.

Here’s a simple mockup of the goal:

I did try to use the built-in graph functionality of Terraform to show this but it came out looking like this:

I’ve got boxes full of Pepe!

Anyways, it’s a work in progress. I’ve run into what I believe is a bug with the DigitalOcean Terraform provider and I’ve already raised a ticket with them to get resolved.

Next time, let’s actually learn something and dig into a resource and the provider configuration.


WordPress on DigitalOcean Updates

This project hasn’t been touched in a few months so last night I embarked to give it a spin. It failed. Miserably. So I went bug-hunting and got it working again.

Resolved Issues

  • centos-base
    • Removed packages no longer available that were causing role to fail.
    • Added package Glances to replace htop.
    • Fixed an issue with fail2ban configuration tasks.
    • Enhanced fail2ban configuration.
  • create-swap
    • Resolved a typo in a task that prevented swap file from being created.
  • install-apache
    • Enabled gzip compression to reduce page load times.
    • Enabled caching to reduce page load times.
    • Removed hardcoded values in vhost.conf.j2 that would have resulted in a misconfigured HTTP to HTTPS redirect.
  • install-certbot
    • Fixed issues that would have prevented automatic renewal cron job from being created.
  • create-droplet
    • Changed default droplet size from 1gb to 2gb.
  • destroy-droplet
    • removed hardcoded region that would have prevented deleting droplets not deployed in region NYC1.
  • install-wordpress
    • Fixed an issue where MySQL port was not being added to wp-config.php, preventing WordPress from starting.
    • Fixed an issue where Apache could not access document root.
    • Fixed an issue where wp-config.php was getting incorrect database connection details.
  • database-server
    • Simplified data returned from script used to create database servers to resolve an issue in install-wordpress.

See commit:

Next steps will be focusing on adding an automatic build job to help ensure that this code is always in good, working order.

Have a good weekend and take care of yourselves.


Speed Up Terraform Init

We have a lot of build processes that utilize Terraform to perform destroy and apply command. This results in a ton of terraform inits that download the same provider plugins over and over again. You can recover this download time and reduce the risk of Hashicorp giving you the banhammer (which I haven’t heard of them doing but you never know) by configuring the provider plugin cache.

Give it a shot and speed up your terraform jobs!


Re-Architecting This Website VIII

“Done. For now…”

All of my goals for the first re-architecture of this website are now complete. This evening I fixed the backup script, added a role that installed EFF’s certbot, and updated the README to reflect the current status.

You can see what’s new here:

Have a good week. In the coming weeks, there will not be much activity as I’ll be changing gears to Kubernetes.


Re-Architecting This Website VII

“Baby steps.”

I’ve added a single change today. There’s new role that will install and configure the DigitalOcean monitoring agent.

In the coming days, I’ve got a few more items to wrap up. Once those (minor) missing pieces are in place, I will call this project done and move on to something else, like re-re-architecting this website using Kubernetes.

You can see what’s new here:

Have a good weekend.


Re-Architecting This Website III

“Automate all the things!”

This weekend I’ve been busy laying much of the groundwork requires to be able to re-deploy this website. In part one, I provided a diagram that showed how the entire website, including the database, is deployed on a single droplet (VM). While this model has worked well for about three years, it’s time to move on. To that end, I’ve begun building automation to create and manage WordPress sites.

I’ve created a repo on Github to house the infrastructure code. It’s available here:

As you can see, I’ve decided to continue using DigitalOcean. They’ve proven reliable. Unfortunately, DigitalOcean lacks some things, like NFS, that would make the website even more scalable. Still, I don’t feel they’re necessary at this point.

Currently, I have the ability to create/destroy VMs, create/destroy/attach storage volumes, and a decent portion of the application installation is complete. It’s capable of going from nothing to the WordPress setup page over HTTP.

Finally, I know I could just move all of this to AWS LightSail but what’s the fun in that?? I learn best by doing and I want to have a deep understanding with all the moving parts. Once I’m done building this out with DO, I might do the same on the other major cloud vendors.


Re-Architecting This Website I

Let’s embark on a journey to re-architect this website to something that is more resilient. Currently, the entire website runs on a single DigitalOcean droplet.

Backups are handled through a Bash script that zips up the application directory, dumps the DB, and tars up the output.

This could be better. Let’s make this better. As the website gets re-architected, I’ll provide diagrams and links to the code & documentation leveraged.


Upgrading Atlassian Applications via Ansible

Over the next few weeks, I’ll be doing write-ups on how I upgrade some of our developer tools. Specifically, I’ll focus on the Atlassian applications I manage, but the process for our other tools (like Artifactory, SonarQube, etc.) is largely the same and most of the Ansible Roles are common across all of the applications.

For now, the playbooks and roles will not be available but I’ll post relevant snippets where I feel it’s needed.

Finally, we use RHEL/CentOS internally (for hosting our internal tools, we have other OSes for doing builds) so our playbooks are oriented towards a single OS.


Building Something from Nothing

It’s pretty cool to build something that people grow to rely on. It’s also a little terrifying for someone like me who has a habit of second guessing decisions and who tends to be risk adverse.

A few years back, the developer who writes our installers made changes to enable them to run in headless mode. To facilitate testing, I wrote a small PowerShell script that ran through the basic installation steps that our engineers and QA followed manually. My goal was to give this script to our QA department so they could download the build from our build system and with a few parameters, run this script against their testing environments to perform an upgrade.

We demoed it to our R&D organization. We showed how it could be tied into the build system so that when the build was completed, it could kick off a “deployment” job that would upgrade environments. It was pretty straightforward: Download the build binaries into a folder called “artifacts” in a sibling directory to the said script and then execute the script with a few parameters.

For over a year, no one really bit. My colleague and I just used it for installer testing, mostly. That’s not completely true. There were a few QA teams that decided to start using it, but if I had to guess, the automated upgrade script was being used for approximately one in twenty test environments.

At some point, the organization began to build out a performance testing team. They had requirements around automating the deployment of environments with more than one server. They were trying to replicate how our application was run in production environments, which meant that there might be dozens of VMs running different portions of our product.

Well, perhaps unfortunately for them, I was the only person really fooling with ways to automate the deployment of the product. I took their requirements and changed my script so that you could give it all the same parameters as before but also give it a “run mode” flag that’d tell it what kind of work it’d be doing. For example, perhaps you wanted to stop or start all of the application services, or just update the database, or maybe just install a subset of the products functionality againt the target VM. I (quite kludgily, if that’s a word) added that functionality to the existing script all while sticking to my original goal: Have a single PowerShell script that a user could run from anywhere with just a few parameters.

No one bit and the requester exited the company shortly thereafter.

Around the time I was asked to extend the functionality of my script, other teams began researching how to build out full automation for our SaaS environments. I sat with them and went over what I had devised. Obviously, what I had would not scale but I at least had sorted out the process to upgrade an environment and scripted it out. They began building a large, scalable automation system. I later found out that for several months my script was tucked inside their automation system doing some of the heavy lifting until they had re-written everything.

One day out of the blue, my manager sent me a link to a repository the performance team owned that had taken the script I had written and began developing additional tooling to coordinate their use case of large scale deployments. At the core, it was calling my script to do all of the installation work. I was both elated and annoyed. It felt like my work, which had largely been ignored or dismissed as being insufficient was being used and built on for bigger and better things. Additionally, the only direct proof that this was my code was a single comment left in the script that had my name and email and some commit messages vaguely referring to my repository.

In short, it felt like a group of engineers grabbed my work, tweaked a couple things, and ran with it without giving credit where credit was due. Still, swallowed my pride quick. The other side of the coin was this: I had built something that was seen as worthy of being reused. I had never done that before. Never mind that I was never hired as a developer. I came onto the build team through a string of transfers / promotions and built the script in my spare time. It wasn’t robust. It wasn’t easily extendable. It was not built with any solid patterns. My goal was to write a single, portable tool that people could use to upgrade their single environments with and it had grown into something much, MUCH larger.

Now days, it’s used to upgrade several dozen testing environments including several large multi VM product environments each day, some several times a day. Though I’ve never measured it, I’m positive It has by now given hundreds and hundreds of hours back to the engineering team and rescued them from a boring, thankless, and sometimes tedious task.

I learned a lot from building my automation tool. There’s a lot I wish I had done differently, but that’s okay. It’s in maintenance mode these days, it mostly does everything that it’s ever been asked to do, and it continues to be relied on two years after I demoed it initially. People keep waxing poetic about a replacement but it hasn’t happened yet.

EDIT: I recently came across another instance of this code being used without attribution (which is fine, but goes against the spirit of code sharing) for some tools being shipped to a customer. 😐


This Old Blog

I originally created this blog as a way to document my career-work. I have a real passion for what I do and I truly enjoy sitting down and looking for refinements in an existing system or process. Still, my personal passion is not my career. I draw a clear line between my personal interests and the work I do to pay the bills. I know that doesn’t work for all people. We’re told to follow our passions and I do but just not through my 9-5 type work.

Still, I haven’t written about my work in a long time, so here’s my attempt to lay out some of the projects I’ll be working on and documenting next year.

  1. Automate the deployment of all our internal applications. Currently our internal tools are upgraded manually and approximately every six months. I’d like to be able to do that on a regular basis and have it be touchless. I’ve already laid the groundwork, demoed it to my team, and I’m over 50% complete.
  2. Create a SSP for common requests we currently handle. There’s a lot of work that’s simple and repetitive for us (like creating repos, build configurations, user provisioning) that we’d be better off not dealing with. I propose a self service portal for teams to use to get these things done faster and without us being directly involved with each request.
  3. Automate all of our release procedures. We do some work, like release branching andcertain packaging tasks, manually today. I’ve already automated a lot of this but it needs to be rolled out fully so some of our tasks become virtually touchless.
  4. Completely automate provisioning of our build infrastructure and expose the provisioning scripts to engineering so they can review, propose changes, or even make those changes themselves.

So yeah, I think the theme of 2018 is automation and standardization. It’s gonna be a good year, just gonna send it.