Creating a Dynamic Inventory Script for Ansible

It seems no one has written a blog post on creating dynamic inventory scripts for Ansible in a while. I feel this topic could use an update as some of the information I found was incomplete or out of date.

My goal is was convert Terraforms’s tfstate data from DigitalOcean to a usable inventory script. Keep that in mind as it drove many specifics on how the script works. I want to also note that the script I reference is a first pass at getting a working inventory script.

So first, the script (in its current state):

#!/usr/bin/python3

import subprocess
import argparse
import json

relevant_tf_state_values = {
    'digitalocean_droplet': ['name', 'ipv4_address', 'ipv4_address_private', 'tags'],
    'digitalocean_database_cluster': ['name', 'host', 'private_host', 'port'],
    'digitalocean_database_user': ['name', 'password'],
    'digitalocean_database': ['name'],
    'digitalocean_domain': ['id'],
    'digitalocean_volume': ['name', 'size', 'initial_filesystem_type'],
    'digitalocean_ssh_key': ['name', 'fingerprint']
}

extra_vars = {
    'ansible_ssh_user': 'root',
    'web_mount_point': '/mnt/nfs/data',
    'web_mount_point_type': 'nfs',
    'ansible_ssh_common_args': '-o StrictHostKeyChecking=no -o userknownhostsfile=/dev/null'
}

class DigitalOceanInventory(object):

    def __init__(self):
        self.tags = []
        self.droplets = []
        self.vars = {}
        self.inventory_json = json.loads(self._get_terraform_output())
        self._generate_groups()
        self._generate_vars()
        self.ansible_inventory = self._generate_ansible_inventory()
    
    def _get_terraform_output(self):
        process = subprocess.Popen(['terraform', 'show', '-json'],
                                   stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE,
                                   universal_newlines=True)
        stdout, stderr = process.communicate()
        return stdout

    def _parse_resource(self, resource, resource_type, relevant_objects):
        data = {}
        for key, value in resource['values'].items():
            if key in relevant_objects:
                data[f'{resource_type}_{key}'] = value
        return data

    def _generate_groups(self):
        tags = 'digitalocean_tag'
        droplets = 'digitalocean_droplet'
        for resource in self.inventory_json['values']['root_module']['resources']:
            if resource['type'] == tags:
                self.tags.append(resource['values']['name'])
            elif resource['type'] == droplets:
                self.droplets.append(self._parse_resource(resource, droplets, relevant_tf_state_values[droplets]))

    def _generate_vars(self):
        for resource in self.inventory_json['values']['root_module']['resources']:
            if resource['type'] in relevant_tf_state_values.keys() and resource['type'] not in \
                    ['digitalocean_tags', 'digitalocean_droplets']:
                for key, value in resource['values'].items():
                    if key in relevant_tf_state_values[resource['type']] and key not in ['ip', 'tags']:
                        resource_id = resource['type']
                        self.vars[f'{resource_id}_{key}'] = value
                for key, value in extra_vars.items():
                    self.vars[key] = value

    def _generate_ansible_inventory(self):
        inventory = {}
        for tag in self.tags:
            hosts = []
            public_ips = []
            private_ips = []
            inventory[tag] = {}
            for droplet in self.droplets:
                if tag in droplet['digitalocean_droplet_tags']:
                    hosts.append(droplet['digitalocean_droplet_ipv4_address'])
                    public_ips.append(droplet['digitalocean_droplet_ipv4_address'])
                    private_ips.append(droplet['digitalocean_droplet_ipv4_address_private'])
                inventory[tag]['hosts'] = hosts
                inventory[tag]['vars'] = self.vars
            ansible_tag = tag.replace('-', '_')
            inventory[tag]['vars'][f'{ansible_tag}_public_ips'] = public_ips
            inventory[tag]['vars'][f'{ansible_tag}_private_ips'] = private_ips
            if 'digitalocean_volume_name' in inventory[tag]['vars']:
                nfs_mount_point = str('/mnt/' + inventory[tag]['vars']['digitalocean_volume_name'].replace('-', '_'))
                inventory[tag]['vars']['nfs_mount_point'] = nfs_mount_point
        inventory['_meta'] = {}
        inventory['_meta']['hostvars'] = {}
        return inventory

    def get_inventory(self):
        return json.dumps(self.ansible_inventory, indent=2)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--save', '-s', help='Generates Ansible inventory and stores to disk as inventory.json.',
                        action='store_true')
    parser.add_argument('--list', action='store_true')
    args = parser.parse_args()
    do = DigitalOceanInventory()
    if args.list:
        print(do.get_inventory())
    elif args.save:
        with open('inventory.json', 'w') as inventory:
            inventory.write(do.get_inventory())


if __name__ == '__main__':
    main()

At a high level, we’re getting the tfstate from Terraform by running the following command: terraform show -json. Next, we generate hostgroups by piggybacking on the tags added to host resources during creation. Next, we parse through the other resources to get the subset of information that we’re interested in. Finally, we generate an Python object with all the data in the desired format. Finally, we dump it as a JSON object and either return it to stdout or to inventory.json.

The inventory output looks something like this:

{
  "tag-name-node": {
    "hosts": [
      "10.0.0.1"
    ],
    "vars": {
      "digitalocean_ssh_key_fingerprint": "00:11:22:33:44:55:66:77:88:99:AA:BB:CC:DD:EE:FF",
      "digitalocean_ssh_key_name": "sshkeyname",
      "ansible_ssh_user": "root",
      "web_mount_point": "/mnt/nfs/data",
      "web_mount_point_type": "nfs",
      "ansible_ssh_common_args": "-o StrictHostKeyChecking=no -o userknownhostsfile=/dev/null",
      "digitalocean_database_cluster_host": "something.ondigitalocean.com",
      "digitalocean_database_cluster_name": "db-name",
      "digitalocean_database_cluster_port": 25060,
      "digitalocean_database_cluster_private_host": "private.something.ondigitalocean.com",
      "digitalocean_database_user_name": "wordpress",
      "digitalocean_database_user_password": "password",
      "digitalocean_domain_id": "something.com",
      "digitalocean_volume_initial_filesystem_type": "ext4",
      "digitalocean_volume_name": "volume-name",
      "digitalocean_volume_size": 5,
      "nfs_node_public_ips": [
        "10.0.0.1"
      ],
      "nfs_node_private_ips": [
        "10.0.0.1"
      ],
      "nfs_mount_point": "/mnt/barista_cloud_volume"
    }
  },
  "_meta": {
    "hostvars": {}
  }
}

Now, if you try to feed this to Ansible as an inventory file, it will not be parsed correctly. The dynamic inventory JSON format is not the same as the JSON inventory format. This took me awhile to figure out and is honestly kind of frustrating as it makes creating a working JSON template so you can iterate and test quickly much more difficult than it needs to be. On the topic of gotcha’s, here a a few more to be aware of.

  1. Your inventory script does not have to be written in Python, but it must include a shebang at the top of the script so it can be executed (also it must be executable so chmod +x your script).
  2. The inventory script must accept the flag --list. It’s supposed to also accept --host and return details on a single host but I have not needed it nor implemented it.
  3. Even if you are not adding vars for specific hosts, you MUST include the _meta section in your inventory.

That’s about it. I will probably come back around and clean this script up and make it more reusable. Heck, I might put together a boilerplate script that can make creating custom dynamic inventory scripts quicker. As mentioned before, this is a first pass attempt to get something that works for my use case.

Finally, I feel I would be remiss if I did not include the tidbits of info I found scattered around the web that helped me figure this out.

https://www.jeffgeerling.com/blog/creating-custom-dynamic-inventories-ansible (Jeff, as always, is an invaluable resource on all things Ansible.)

https://docs.ansible.com/ansible/2.9/dev_guide/developing_inventory.html

https://adamj.eu/tech/2016/12/04/writing-a-custom-ansible-dynamic-inventory-script/

Thanks all folks. Have a good weekend!

 

Using Terraform to Manage DigitalOcean Resources

I am a fan of DigitalOcean. What they lack in breath of services they more than make up for with the ease of use, documentation, and tutorials. Last year, I overhauled this website to be driven by Ansible. This year, I want to take this automation to the next level. There are capability gaps using Ansible to create infrastructure that I’ve had to work around by doing some tasks manually or by writing custom scripts.

An example of this comes when trying to create a managed database cluster. Ansible cannot do this so I wrote a Python script to handle database management.

https://github.com/seaburr/WordPressOnDigitalOcean/tree/master/roles/database-server

I do not feel DigitalOcean should fill the gaps either. Why? Because Ansible is a configuration management tool that ensures resources are configured in a desired state. Infrastructure creation is not Ansible’s job. There are specific tools for infrastructure creation… Enter Terraform.

Terraform is a tool for defining providers (like DigitalOcean or AWS) and the resources (like droplets, load balancers, etc.) that your environment requires. Terraforms intent is to compare your infrastructure to your desired state and make corrections to bring your resources into compliance. It is a different concern from HOW the infrastructure is configured.

Over the next few months, I’m going to migrate infrastructure concerns out of Ansible and into Terraform. In fact, I’ve already got a POC to share.

https://github.com/seaburr/Terraform-On-DO

This repository defines the new standard for infrastructure that I am aiming for.

Here’s a simple mockup of the goal:

I did try to use the built-in graph functionality of Terraform to show this but it came out looking like this:

I’ve got boxes full of Pepe!

Anyways, it’s a work in progress. I’ve run into what I believe is a bug with the DigitalOcean Terraform provider and I’ve already raised a ticket with them to get resolved.

Next time, let’s actually learn something and dig into a resource and the provider configuration.

 

Speed Up Terraform Init

We have a lot of build processes that utilize Terraform to perform destroy and apply command. This results in a ton of terraform inits that download the same provider plugins over and over again. You can recover this download time and reduce the risk of Hashicorp giving you the banhammer (which I haven’t heard of them doing but you never know) by configuring the provider plugin cache.

https://www.terraform.io/docs/configuration/providers.html#provider-plugin-cache

Give it a shot and speed up your terraform jobs!