When you’re managing multiple server in multiple region for multiple application with multiple cloud vendors, it’s difficult to keep track on what’s happening where. In case of managing hundreds of servers, it may get complicated faster than you think. So, being curious, I searched how rest of the world is doing it and got some amazing ideas from the folks who manage hundreds of instances in AWS/Azure.
A good started point would be this article from wikimedia corporation. Also, this comment brings a concise summary of effective design from reddit (bonus point: search for similar thread in reddit to get some amazing insights).
Convention followed by me
Managing servers have 2 aspect: You know the nitty-gritty details of the server (e.g. manually created AWS EC2 instance) so you treat them as pet. Or, you just know that you have bunch of servers that do the same thing (e.g. servers created during auto scaling) or different things for microservices (e.g. containers in kubernetes), so you’re managing them as cattle. You can name your pets, but you can’t name every cow in cattle. So, you need to have separate way of dealing with names.
Consequently, I settled to the following convention/ucode (Unique CODE) by following wikimedia foundation:
<location-code>-<mount-as>-<application-code><NN>-[:name-prefix]-<deployment-type>, details of the code below:
Location Code (Geographic location+Provider):
- [Cloud based: 3 character] → Nearest airport of the area where the datacenter/server is located. It must be a valid 3 digit IATA code (heavily inspired by CloudFlare also) . Later we will use this code to denote a particular hosting/cloud vendor per area. Obviously it has the problem of >=2 vendor being in same city/close vicinity. In that case, assign IATA code on first-come-first-assigned basis and subsequent times try to use an IATA code from nearby state/country.
AWS/Google cloud both are present in Frankfurt, DE area. So, they are coded with future possibilities:
FRAis used to refer AWS
HHNfor Google cloud platform
CGNfor future usage – 1
Singapore, SG hosts multiple datacenter for multiple cloud vendor (e.g. AWS, Digital ocean, Google Cloud, Azure). So, coding example here:
SINstands for AWS
JHBfor future usage – 1
PKUfor future usage – 2
XSPfor future usage – 3
- [On-premise: 4 character] → ICAO code of the nearest airport in which the on-premise server is located.
EGLLfor London, UK site’s on-premise infra by Equinix.
In case of no mention of regions and everything being generic terms (e.g. Global/America/Europe), use close enough approximation. For instance, Heroku mention Europe and America as region, so
BSL is used for Europe region and
SFOis used for America.
location, you’ve got idea about vendor and data center location,
mount-as should tell how its getting mounted into your overall application hierarchy. To elaborate, consider that your application is a single Unix machine. Multiple devices (i.e. devices/servers/clusters/vpc) will be mounted gradually so that your application can scale and support multiple functionalities. So, you need an uniform way to mount a particular device (i.e. technology stack) into your machine (i.e. application).
Usually, I use 1 digit code for personal devices, 2~3 digit code for server-cluster, 4 digit for server/container-groups. This gives me the opportunity to align user devices and servers in same convention.
The codes for mounting parameter might be:
- M; L; V → Mobile device; Laptop; Virtual box
- H → shared Hosting
- IAAS Systems:
- AMI → Amazon ec2 Machine Instance
- GCI → Google Cloud Instance
- VPS Systems:
- DL → DropLet, a rented virtual private server from DigitalOcean
- LS → LightSail, an offering from AWS to provision private server
- PAAS Systems:
- EB → amazon Elastic Benstalk platform from AWS
- GAE → Google App Engine platform of google cloud)
- DY → DYno based computing resource from heroku
- FAAS Systems:
- LM → LaMbda, aws managed infra to run code without provisioning servers
- PWD → platform to work with docker concepts in an interactive way (e.g. Play-With-Docker.com)
- ENNN → External services which are considered part of the system (e.g. SAAS systems). Here, any of the valid E codes should be used for identifying a specific service. In order to identify corresponding digits of the E code, an applicable vanity number of the service might be helpful. For instance, E365 for Office365 services or E442 for Github or E107 for Slack)
It should represent the business-connection of the instance. However, more prominent usage would be: they are selected in a way so that regex based automation codes can group all related items together. For starters, it might be Index(Branding)/Company-name/Application-name.
For advanced usage, sometimes it’s as simple as application name (e.g. wordpressmirror) or sometime it’s auto prepared from CI/CD pipeline as per head commit’s tag (e.g.
subdomain.example.com CNAME maybe be resolved to
a23cde4 is dynamically generated by CI/CD pipeline).
Just to add in my case: I use this code as per corresponding GIT repository name. So, for DEV/UAT/PROD environments, CI/CD systms use separate IAM accounts clone same repository and proceed accordingly.
Numeric entry, should be unique and auto-incremented in
application-code area. For immutable instances, the
NN value should never be re-used while deploying a new instance of code by CI/CD server (if possible) because auto-incremented value can be managed in codes. Another aspect of this code is to supply an unique prefix to underlying sub-systems (e.g. docker containers) to uniquely identify them. For mutable instances, this value can be re-used.
Prefixes should be selected in a way so that it rings a bell as soon as you hear about it and immediately its architecture come to your mind. This is optional but highly recommended along with
Application-code to group resources on multiple level and pinpoint a particular instance as per its role/service. We can easily create these prefixes as done in Wikimedia and append some source-control centric values to mark each deployment. Usually a good convention is to use URL friendly scheme
<known-prefix>[-<technology-stack>] (e.g. web-nginx, db-mysql, fs).
Possible values of prefixes might be any of the following (Known values are picked for prefixes, try to re-use existing conventions as much as possible):
- WS → Web Server
- VM → general purpose Virtual Machine
- VPC → Virtual Private Cloud
- DB → DataBase server
- LB → Load Balance server
- FS → File Server
- LDAP → LDAP server
- MX → Mail eXchange server
- CICD → Code Integration and Deployment Server
- IG → Internet Gateway
- SG → Security Gateway
- ANY → ANY purpose server
- CHM → docker Container Host Machine
It’s a regex to formally say the convention, please don’t judge me. It is telling that: There might be a number (
n) first, which can be used in conjunciton to
name-prefix to maintain local number of same prefixes. Then, we may repeat another
name-prefix, so positions for child nodes are reserved. Then, the whole thing can be repeated, essentially making ways to infinitely repeat this zero or many times.
- Let’s say you have 2 web server. Then one can be named as
- Let’s say you are creating multiple web servers as docker container inside a single docker host machine. Then docker host machine should have name CHM1 and containers should get names
Not only it will represents how the instance is deployed, but also this value will be used to apply rules to auto destroy applicable instances. I suggest to apply convention where deployment type is written at the end and a simple rule like
<value> ends with identify any specific group of deployed resources. Possible values for deployment type:
- D = Development time machine, low cost instances preferred and usually destroyed every night
- U = UAT purpose machines, high availability preferred in low cost. Mostly used by testers/approval oriented stakeholders and destroyed weekly
- P = Production with high availability, general end-client availability required; never destroyed and always available
- L = Production server without high availability; usually fitted in free tier
- S = Sandbox (1/application-code); usually destroyed after concept testing is done
- T = One-of instances. Usually used for training purpose to demonstrate something
- X = Defunct. Usually when a server is taken out of system, it’s given a new name so that we can answer how many D/U/P servers are active in the system
Lastly, to explain the system: For an application “Wallpaperify”, I need a landing page which will redirect to a facebook page. I am using a heroku dyno for it and using their php-buildpack to setup required softwares. Therefore, I named this slug as: SFO-DY-INDEX-01P. Explanation is as follows:
- SFO → Deploying this to San Francisco region so using the nearest airport code. Also,
SFOis assigned to Heroku, so it’s an instance of Heroku.
- DY → It’s a dyno prepared with standard parameters
- INDEX → Default landing page
- 01 → Numeric entry for similar application
- P → Production server
Side note: if the convention is too much, or you find lots of cons with it, then another easy but scalable way is to this format: