No, Watson, this was not done by accident, but by design.
— Sherlock Holmes
It’s no small task to create a plan for something entirely consistent with itself. This really is the definition of good design: consistency. So, with that in mind, my research parter and I began planning how we were going to take a problem – “We need an API for interacting with Libvirt on multiple hosts!” – and come up with a solution.
First: Break it down
We needed a way to:
- Easily detect libvirt-hosts on a network, and store their IP addresses
- Directly communicate with those libvirt instances to check on the health and status of their guest VMs
- Separate this API from other applications and platforms running concurrently to ours
Identifying the LIbvirt Hosts
Our first thought was to create a daemon that would be installed on each libvirt host in the cluster. This way, RESTful calls could be made to the daemon, which would then run the relevant virsh command. This presented the problem of access though, meaning one would have to have direct access to the local network of the libvirt-hosts in order to use the API.
To solve this problem we decided we would have write a web-server program to act as the host of whatever interface the client wanted for the API. So long as the interface host was on the same network as the libvirt hosts, it would have a path to them. Also, if the user chose to connect the interface host to a public network as well, an admin could use the API even if they were woken up in the middle of the night by an emergency VM failure.
On a practical level, this led to another problem: How would the web-server know which IPs on the network were libvirt hosts? We had three options:
- Manually enter each libvirt host IP into the interface’s configuration
- Manually enter the interface-server’s IP address in the configuration file of the daemon on each libvirt host
- Automate the process in some way
We chose number 3, and here’s how we did it:
The Daemon (Or, a Story of Nmap-ing)
First, we had to make it so that the daemon would be listening for API calls through a TCP port that wasn’t used by any other application on any of the machines in the cluster. This could be set to an obscure default and/or defined at install-time.
Second, we theorized that all the interface-server would have to do is test each IP on the cluster network to see if that particular port was open on whichever IP it was testing – if it was, the daemon was installed and it was hosting virtual machines. If not, it was an active host on the cluster that had another function. Either way, this would automate the discovery of the nodes our API was to manage.
With this approach, no special configuration was needed for the daemons other than to specify which port to keep open, and even then only if the default was unavailable.
We decided to write a program, the Crawler, that would use the extremely powerful NMAP linux utility to quickly test a client-specified CIDR range in the manner described above. But in a huge cluster of possibly hundreds (or more) machines, could this cause network congestion? We weren’t sure, but just in case we split the Crawler’s function into four:
- Scan the entire network, recording IPs of hosts running libvirt, and hosts that were not.
- Scan the remainder of the CIDR range that was not originally found to contain any active IPs for new hosts of either kind
- Scan the hosts that were not running libvirt to see if there was a change
- Periodically probe the libvirt-hosts the Crawler had discovered to ensure there was an active connection with the daemon
Though we may have been overthinking it, we figured that this would cover every possible problem arising from this automated system.
The interface-server would need a way to transmit calls to the daemons of specific hosts, and receive data back from those calls. For this, we conceived of the Agent, which acts as a proxy for calls from the user interface to the daemons, and results from the daemons to the user interface.
The final piece of the conceptual puzzle was a way for the client to actually use this RESTful API in a meaningful and streamlined fashion. Seeing as we already had a interface-server to allow for external access to the interface, a web-based application seemed to make the most sense.
I began the design process yesterday, and began developing the interface this afternoon. You can read about it here.
We now needed:
- A way for the web-server to access and store vital information without unreasonably increasing the resource footprint of the application
- A way to encapsulate a web-server and the scripts that make it do what we need it to do
For the first point, Diogo discovered Redis – a noSQL database that uses RAM to store its data and calculations while the computer is on. He’s written a post about it here.
For the second, we decided to use Node.JS to write a robust, efficient and portable web server. This will likely be detailed in a later post.
Now the work begins! Diogo has been furiously writing the logic for our server (using Node.JS) while I’ve been busy developing the user interface.
Expect more to come!