When you’re managing multiple server in multiple region for multiple application with multiple cloud vendors, it’s difficult to keep track on what’s happening where. In case of managing hundreds of servers, it may get complicated faster than you think. So, being curious, I searched how rest of the world is doing it and got some amazing ideas from the folks who manage hundreds of instances in AWS/Azure.
A good started point would be this article from wikimedia corporation. Also, this comment brings a concise summary of effective design from reddit (bonus point: search for similar thread in reddit to get some amazing insights).
Convention followed by me
Managing servers have 2 aspect: You know the nitty-gritty details of the server (e.g. manually created AWS EC2 instance) so you treat them as pet. Or, you just know that you have bunch of servers that do the same thing (e.g. servers created during auto scaling) or different things for microservices (e.g. containers in kubernetes), so you’re managing them as cattle. You can name your pets, but you can’t name every cow in cattle. So, you need to have separate way of dealing with names.
Consequently, I settled to the following convention which may be used as fqdn also:
<location-code>-<mount-as>-<application-code><NN>-[:contextual-suffix]-<deployment-type>, details of the code below:
Location Code (Geographic location+Provider):
- [Cloud based: 3 character] → Nearest airport of the area where the datacenter/server is located. It must be a valid 3 digit IATA code (heavily inspired by CloudFlare also) . Later we will use this code to denote a particular hosting/cloud vendor per area. Obviously it has the problem of >=2 vendor being in same city/close vicinity. In that case, assign IATA code on first-come-first-assigned basis and subsequent times try to use an IATA code from nearby state/country.
AWS/Google cloud both are present in Frankfurt, DE area. So, they are coded with future possibilities:
FRAis used to refer Google cloud platform
CGNfor future usage – 1 (using nearby state)
Singapore, SG hosts multiple datacenter for multiple cloud vendor (e.g. AWS, Digital ocean, Google Cloud, Azure). So, coding example here:
SINstands for AWS
XSPfor future usage – 1 (using nearby state)
PKUfor future usage – 2 (using nearby country)
JHBfor future usage – 3 (using nearby country)
- [On-premise: 4 character] → ICAO code of the nearest airport in which the on-premise server is located.
EGLLfor London, UK site’s on-premise infra by Equinix.
In case of no mention of regions and everything being generic terms (e.g. Global/America/Europe), use close enough approximation. For instance, Heroku mentions Europe and America as region, so
BSL is used for Europe region and
SFO is used for America.
location, you’ve got idea about vendor and data center location,
mount-as should tell how its getting mounted into your overall application hierarchy or organization network topology (i.e. private IPv4 address space). To elaborate, consider that your application starts with a single Unix machine and many other conceptual machines/topologies (e.g. devices/servers/clusters/vpc) will be mounted gradually so that your application can scale and support multiple functionalities. So, you need an uniform way to mount a particular device (i.e. technology stack) into your machine (i.e. application) to manage them in the long run.
Usually, I use 1 digit code for personal devices, 2~3 digit code for server-cluster, 4 digit for server/container-groups. This gives the opportunity to align user devices and servers in same convention.
The codes for mounting parameter might be:
- M; L; V → Mobile device; Laptop; Virtual box VM
- H → shared Hosting
- IAAS Systems:
- AMI → Amazon EC2 Machine Instance
- GCI → Google Cloud Instance
- VPS Systems:
- DL → DropLet, a rented virtual private server from DigitalOcean
- LS → LightSail, an offering from AWS to provision private server
- PAAS Systems:
- EB → amazon Elastic Benstalk platform from AWS
- GAE → Google App Engine platform of google cloud)
- DY → DYno based computing resource from heroku
- FAAS Systems:
- LM → LaMbda, aws managed infra to run code without provisioning servers
- PWD → platform to work with docker concepts in an interactive way (e.g. Play-With-Docker.com)
- ENNN → External services which are considered part of the system (e.g. SAAS systems). Here, any of the valid E codes should be used for identifying a specific service. In order to identify corresponding digits of the E code, an applicable vanity number of the service might be helpful. For instance, E365 for Office365 services or E442 for Github or E107 for Slack)
It should represent the an unique business friendly name of the application. For instances reachable through HTTP, they should get a pre-defined fqdn as well as a valid IPv4 private address (not repeatable in the organizational topology). If you need additional subsystems in application code, just append them with “
00“ after application code and while reading ignore that. Choosing
00 as separator works because
00 will not appear in numeric counters in this scheme.
Example: Le’ts assume that we have an application urlShorten with couple of sub-systems. Then we can name them as
urlShorten00k8s where all of them will refer to same application code:
urlShorten. For another application called
GermanProbashe-Mirror without any sub-system, we can simply name it as
Numeric entry, should be unique and auto-incremented in
(mount-as)+(application-code) area. For immutable instances, the
NN value should never be re-used while deploying a new instance of code by CI/CD server (if possible) because auto-incremented value can be managed in codes. Another aspect of this code is to supply an unique prefix to underlying sub-systems (e.g. docker containers) to uniquely identify them. For mutable instances, this value can be re-used.
Example: let’s say we have to define 2 EC2 AMI instance for
mirror application. Then, we can define the fqdn of the instances as:
ami-wordpress-02d.example.com (where 01 and 02 are auto-incremented in
Context suffix (optional):
Contextual suffixes should be selected in a way so that it rings a bell as soon as you hear about it and immediately its architecture/background come to your mind. Usually a good convention is to use URL friendly scheme
[<known-prefix>-N]*. Using regular expressions we can give name to nested technology stacks. Please ensure that they are placed after
Possible values of prefixes might be any of the following (Known values are picked for prefixes, try to re-use existing conventions as much as possible):
- WS → Web Server
- VM → general purpose Virtual Machine
- VPC → Virtual Private Cloud
- DB → DataBase server
- LB → Load Balance server
- FS → File Server
- LDAP → LDAP server
- MX → Mail eXchange server
- CICD → Code Integration and Deployment Server
- IG → Internet Gateway
- SG → Security Gateway
- ANY → ANY purpose server
- CHM → docker Container Host Machine
Example: Let’s say you are creating multiple web servers as docker container inside an EC2 AMI. Then docker host machine may have name AMI1 and containers should get names as
Not only it will represents how the instance is deployed, but also this value will be used to apply rules to auto destroy applicable instances. I suggest to apply convention where deployment type is written at the end and a simple rule like
<value> ends with identify any specific group of deployed resources. Possible values for deployment type:
- D = Development time machine, low cost instances preferred and usually destroyed every night
- U = UAT purpose machines, high availability preferred in low cost. Mostly used by testers/approval oriented stakeholders and destroyed weekly
- P = Production with high availability, general end-client availability required; never destroyed and always available
- L = Production server without high availability; usually fitted in free tier
- S = Sandbox (1/application-code); usually used in concept learning phase
- T = One-of instances for proof-of-concept implementation
- X = Defunct. Usually when a server is taken out of system, it’s given a new name so that we can answer how many D/U/P servers are active in the system
Lastly, to explain the system: For an application “Wallpaperify”, I need a landing page which will redirect to a facebook page. I am using a heroku dyno for it and using their php-buildpack to setup required softwares. Therefore, I named this slug as: SFO-DY-INDEX-01P. Explanation is as follows:
- SFO → Deploying this to San Francisco region so using the nearest airport code. Also,
SFOis assigned to Heroku, so it’s an instance of Heroku.
- DY → It’s a dyno prepared with standard parameters
- INDEX → Default landing page
- 01 → Numeric entry for similar application
- P → Production server
Side note: if the convention is too much, or you find lots of cons with it, then another easy but scalable way is to this format: