Orchestration and automation tools for telecom networks
Telecom infrastructure management even for small companies is very comlex and resource intensive
As the systems become more and more complex, larger and mutually dependent each on other, the question of keeping them up to date becomes more important. That is especially true for telecommunications where we are used to deal with high amounts of fast moving data, where the number transactions easily exceeds millions per day. To cope with this load, the systems being scaled out by adding more identically configured nodes, like load balancers, proxies, backend servers, message brokers etc. The other important factor distinguishing telecoms infrastructure from other webservices is need for near 100% availability. It is universally expected nowadays that your telecom provider is always up, either by being able to connect the call or deliver your text message within seconds. This factor puts even higher load on the systems administrators who has very little rooms to manouevres when it comes to take systems offline for maintenance or upgrades.
Next level of even more complexity is cloud based services offering telecom infrastructure. It has to deal with all above written multiplied by number of instances (number of clients, tenants etc). But it is not just blind multiplication of course. Most probably there are differences in settings, like IP addresses, port numbers, number of nodes etc. When working with such systems, administrators need to have very well documented procedures and up to date inventory.
Software tools to automate this work has existed for long time. However, for quite long time they were rather complex and/or expensive that the system had to be extremely large to afford it and feel real benefits of such automation. For smaller operations it was out of the reach, and in many cases manual work of system administration was still a preferred choice to go.
Luckily, lately there are plenty of general purpose software automation and orchestration tools which can be very well used also in smaller companies, telecoms included. Applications like Cheff, Puppet, Saltstack, Ansible, Vagrant are few of the examples of great tools which makes life easier for system administrators. It also reduces costs for companies in the age of ever growing cost and scarcity of technical staff.
I will not go into details of each of the above tools and will not try to recommend one or another due to fact that I personally am not familiar with all of them, and based on your needs, your requirements may vary. Some might prefer speed of deployment vs less preparation task, or simplicity over features etc. But there are at least few use cases where I see orchestration and automation tools are invaluable:
- Deployment of new hardware units. It is usually a tedious work to get all required software up and running when you add a new server to the network. Things like setting correct timezone, firewall rules, start scripts are just a few examples which easily get overlooked in these steps;
- Upgrading the software. As mentioned before, downtime is not an option nowadays anymore. Automation tools can be a great help to ensure systems upgrade do not cause any downtime, by rolling updates for example.
- Applying global changes. Sooner or later your vendor or provider or a major partner will decide to change a global setting which is configured in countless configuration files across multiple machines. With properly configured automation system, the change will be quick and less prone to errors.
- Security. Managing list of firewall rules on many machines are real nightmare. Very common problem is removing firewalls during the system troubleshooting (i.e. maybe the firewall is blocking this or that service and that is why it doe snot work) and forgetting to put it back after work is done. Fraud is very common in telecoms. I have personally seen cases where forgetting to put firewall back for half a day caused loss of several tens of thousands dollars due to fraud.
- Adding new tenants or users in cloud service. This is routing task which might need to be run many times, just with different settings. Ideal work for robots, so lets make robots to do it.
Finally, note on importance to understand danger or great powers of these tools. Improperly used they can cause a lot of damage or irrecoverable loss of data. Also, a troubleshooting sometimes can be more difficult that in traditional cases. This article itself was written after failing to spot an error caused by misconfigured automation tool and it took more than 6 hours to find it. Happy automationing!