When you’re managing multiple server in multiple region for multiple application with multiple cloud vendors, it’s difficult to keep track on what’s happening where. In case of managing hundreds of servers, it may get complicated faster than you think. So, being curious, I searched how rest of the world is doing it and got some amazing ideas from the folks who manage hundreds of instances in AWS/Azure.
A good started point would be this article from wikimedia corporation. Also, this comment brings a concise summary of effective design from reddit (bonus point: search for similar thread in reddit to get some amazing insights).
Convention followed by me
Managing servers have 2 aspect: You know the nitty-gritty details of the server (e.g. manually created AWS EC2 instance) so you treat them as pet. Or, you just know that you have bunch of servers that do the same thing (e.g. servers created during auto scaling) or different things for microservices (e.g. containers in kubernetes), so you’re managing them as cattle. You can name your pets, but you can’t name every cow in cattle. So, you need to have separate way of dealing with names.
Consequently, I settled to the following convention/ucode (Unique CODE) by following wikimedia foundation:
<location-code>-<mount-as>-<application-code><NN>-[:name-prefix]-<deployment-type>, details of the code below:
Location Code (Geographic location+Provider):
- [Cloud based: 3 character] → Nearest airport of the area where the datacenter/server is located. It must be a valid 3 digit IATA code (heavily inspired by CloudFlare also) . Later we will use this code to denote a particular hosting/cloud vendor per area. Obviously it has the problem of >=2 vendor being in same city/close vicinity. In that case, assign IATA code on first-come-first-assigned basis and subsequent times try to use an IATA code from nearby state/country.
AWS/Google cloud both are present in Frankfurt, DE area. So, they are coded with future possibilities:
FRAis used to refer Google cloud platform
CGNfor future usage – 1 (using nearby state)
Singapore, SG hosts multiple datacenter for multiple cloud vendor (e.g. AWS, Digital ocean, Google Cloud, Azure). So, coding example here:
SINstands for AWS
XSPfor future usage – 1 (using nearby state)
PKUfor future usage – 2 (using nearby country)
JHBfor future usage – 3 (using nearby country)
- [On-premise: 4 character] → ICAO code of the nearest airport in which the on-premise server is located.
EGLLfor London, UK site’s on-premise infra by Equinix.
In case of no mention of regions and everything being generic terms (e.g. Global/America/Europe), use close enough approximation. For instance, Heroku mentions Europe and America as region, so
BSL is used for Europe region and
SFO is used for America.
location, you’ve got idea about vendor and data center location,
mount-as should tell how its getting mounted into your overall application hierarchy or organization network topology (i.e. private IPv4 address space). To elaborate, consider that your application starts with a single Unix machine and many other conceptual machines/topologies (e.g. devices/servers/clusters/vpc) will be mounted gradually so that your application can scale and support multiple functionalities. So, you need an uniform way to mount a particular device (i.e. technology stack) into your machine (i.e. application) to manage them in the long run.
Usually, I use 1 digit code for personal devices, 2~3 digit code for server-cluster, 4 digit for server/container-groups. This gives the opportunity to align user devices and servers in same convention.
The codes for mounting parameter might be:
- M; L; V → Mobile device; Laptop; Virtual box
- H → shared Hosting
- IAAS Systems:
- AMI → Amazon ec2 Machine Instance
- GCI → Google Cloud Instance
- VPS Systems:
- DL → DropLet, a rented virtual private server from DigitalOcean
- LS → LightSail, an offering from AWS to provision private server
- PAAS Systems:
- EB → amazon Elastic Benstalk platform from AWS
- GAE → Google App Engine platform of google cloud)
- DY → DYno based computing resource from heroku
- FAAS Systems:
- LM → LaMbda, aws managed infra to run code without provisioning servers
- PWD → platform to work with docker concepts in an interactive way (e.g. Play-With-Docker.com)
- ENNN → External services which are considered part of the system (e.g. SAAS systems). Here, any of the valid E codes should be used for identifying a specific service. In order to identify corresponding digits of the E code, an applicable vanity number of the service might be helpful. For instance, E365 for Office365 services or E442 for Github or E107 for Slack)
It should represent the business-connection of the instance. For instances reachable through HTTP, they should get a pre-defined fqdn as well as a valid IPv4 private address (not repetable in the organizational topology). The fqdn code might be defined in these approaches:
- Identified as a subdomain of the application domain (e.g. for an application called mirror, subdomain may become
- Selected in a way so that regex based automation codes can group all related items together. For starters, it might be Index(Branding)/Company-name/Application-name.
- For advanced usage, sometime it’s auto generated from CI/CD pipeline as per head commit’s tag (e.g.
subdomain.example.comCNAME maybe be resolved to
a23cde4is dynamically generated by CI/CD pipeline).
Just to add in my case: I use this code as per corresponding GIT repository name. So, for DEV/UAT/PROD environments, CI/CD systms use separate IAM accounts clone same repository and proceed accordingly.
Numeric entry, should be unique and auto-incremented in
(mount-as)+(application-code) area. For immutable instances, the
NN value should never be re-used while deploying a new instance of code by CI/CD server (if possible) because auto-incremented value can be managed in codes. Another aspect of this code is to supply an unique prefix to underlying sub-systems (e.g. docker containers) to uniquely identify them. For mutable instances, this value can be re-used.
Example: let’s say we have to define 2 EC2 AMI instance for
mirror application. Then, we can define the fqdn of the instances as:
ami.mirror02d.example.com (where 01 and 02 are auto-incremented in
Prefixes should be selected in a way so that it rings a bell as soon as you hear about it and immediately its architecture come to your mind. This is optional but highly recommended along with
Application-code to group resources on multiple level and pinpoint a particular instance as per its role/service. We can easily create these prefixes as done in Wikimedia and append some source-control centric values to mark each deployment. Usually a good convention is to use URL friendly scheme
<known-prefix>[-<technology-stack>] (e.g. web-nginx, db-mysql, fs).
Possible values of prefixes might be any of the following (Known values are picked for prefixes, try to re-use existing conventions as much as possible):
- WS → Web Server
- VM → general purpose Virtual Machine
- VPC → Virtual Private Cloud
- DB → DataBase server
- LB → Load Balance server
- FS → File Server
- LDAP → LDAP server
- MX → Mail eXchange server
- CICD → Code Integration and Deployment Server
- IG → Internet Gateway
- SG → Security Gateway
- ANY → ANY purpose server
- CHM → docker Container Host Machine
It’s a regex to formally say the convention, please don’t judge me. It is telling that: There might be a number (
n) first, which can be used in conjunciton to
name-prefix to maintain local number of same prefixes. Then, we may repeat another
name-prefix, so positions for child nodes are reserved. Then, the whole thing can be repeated, essentially making ways to infinitely repeat this zero or many times.
- Let’s say you have 2 web server. Then one can be named as
- Let’s say you are creating multiple web servers as docker container inside an EC2 AMI. Then docker host machine may have name AMI1 and containers should get names
Not only it will represents how the instance is deployed, but also this value will be used to apply rules to auto destroy applicable instances. I suggest to apply convention where deployment type is written at the end and a simple rule like
<value> ends with identify any specific group of deployed resources. Possible values for deployment type:
- D = Development time machine, low cost instances preferred and usually destroyed every night
- U = UAT purpose machines, high availability preferred in low cost. Mostly used by testers/approval oriented stakeholders and destroyed weekly
- P = Production with high availability, general end-client availability required; never destroyed and always available
- L = Production server without high availability; usually fitted in free tier
- S = Sandbox (1/application-code); usually used in concept learning phase
- T = One-of instances for proof-of-concept implementation
- X = Defunct. Usually when a server is taken out of system, it’s given a new name so that we can answer how many D/U/P servers are active in the system
Lastly, to explain the system: For an application “Wallpaperify”, I need a landing page which will redirect to a facebook page. I am using a heroku dyno for it and using their php-buildpack to setup required softwares. Therefore, I named this slug as: SFO-DY-INDEX-01P. Explanation is as follows:
- SFO → Deploying this to San Francisco region so using the nearest airport code. Also,
SFOis assigned to Heroku, so it’s an instance of Heroku.
- DY → It’s a dyno prepared with standard parameters
- INDEX → Default landing page
- 01 → Numeric entry for similar application
- P → Production server
Side note: if the convention is too much, or you find lots of cons with it, then another easy but scalable way is to this format: