Snowflake servers are the server whose actual configuration has drifted far more than what was actually required. These servers can because of big issues as there can be scenarios that you may want to replace the server and you are not aware of what all makes this server still serving and you properly launched new server is not running the workload you want to run
What can cause the servers to become snowflake servers?
Mostly changes that are done by humans or initiated by humans. Take the below example.
You are facing a issue in one of you servers and you made a quick fix for that issue. Now you forgot or did not thought is necessary to add it to the system which launch your servers or provision your server or does server management. Now these process is repeated several times thus leading to a server whose actual configuration is not known to anyone now. And this situation is dangerous.
Which kind servers are Mostly Prone to these issues?
What I can think of servers which are very critical part of your system and single point of failure are more exposed to this problem rather than the servers that are less important can you can replace them with some time in hand.
The problem with these servers is they bring down a critical service thus business force you to fix them at the earliest. You make a lot of changes to fix that server and didn’t keep a track of this and added it to the initial launch script.
Recommended books for Cloud Architectures.
How you can keep this in check?
Always make the changes back to the start script if you have make any changes in the server.
Errors will happen that does not mean you can to go and start making changes everywhere. Take you time to point out the error. Fix your scripts of automation to do that and run it through automation.
Once you start making changes manually. Others will also take the liberty to do it and things will go out of hands. That is why proper DevOps is more of a culture or tradition than technology.