The missed opportunity: Data Analytics in the IT Infrastructure

In 2008, I was leading the growth for our IT managed services and infrastructure practice. To stay ahead of issues, we depended heavily on monitoring tools which mostly told us when a device was down. My team would then address the problem and life would return to normal for a while. One unwanted side effect of having the monitoring tools in place was that the number of alerts grew exponentially as our business grew. Despite our efforts to stabilize our customers systems, we were inundated with email alerts from these monitoring tools. I reached the point of either hiring a dedicated person to monitor alerts or find a way to implement automation solutions to find a balance. I felt that automation was the best long-term choice for our team.  

Still, our customer base continued to grow and after several successful automation initiatives to help stay ahead of issues and reduce monitoring alerts, I felt something was still missing. We needed to get ahead of the issues, we needed to predict events.


One night while working late I had a thought; what if we store and analyze all of data metrics about what is happening in the IT infrastructure? Could that data help us predict a system failure?

We immediately went to work and created a solution that stored and analyzed as much IT data as possible. We were then able to understand and discover more about the behaviors that drove IT failures. In 2015, we added machine learning (ML intelligence) and our solution began making predictions about system’s failures.  At first, the ML predictions were completely wrong. We learned to properly train the ML with good quality data and then everything changed. We slowly achieved a modest level of automation and system health prediction with accurate data.



While our journey of IT infrastructure discovery and automation goes on, the following are some of the lessons I learned along the way:

1. There is so much data in an IT infrastructure and most of it is probably not being collected.

2. There is also a lot of data which is collected, however, this data is probably in a silo, and it is not leveraged, correlated, or used.

3. When it comes to understanding your infrastructure’s current capabilities and limitations, it is also important to understand its past. Having historical data available allows for this understanding to happen.

4. Regardless of your type of business, having users “down” due to equipment failures will cost you money. Using data to predict and avoid failures is the way to go.

5. You can choose to have an IT environment which is either reactive, informative, predictive or transformative.The difference between each of these categories is how much data you are collecting, analyzing, correlated and reported. 

6. Store your data in the cloud in a safe and secure manner. Let the cloud handle the capacity.

7. Having a historical record of each device performance, behavior configuration, interactions with other devices is gold when managing upgrades and budgets.

8. Once you have data collected, you can remove the guesswork and apply financial models to estimate replacement costs and upgrades.

9. You don’t have to trust anyone’s interpretation of what is happening on the IT infrastructure. Having accurate and clean data will help you decide its true health and operational risk levels.

10. You can predict and avoid many Operational Risks if the data you already have (collected or uncollected) were to be analyzed, correlated, and reported.

In conclusion, reducing operational risks will help your IT infrastructure reach higher levels of stabilization and build solid trust with your user base. This all can be achieved by collecting, correlating, analyzing and reporting meaningful and accurate data. This data in turn can be used to make data-driven decisions that have a higher level of return than making decisions with little or no data behind them.

Remember, your primary business asset is your data. Use it wisely!



Emilio Chemali, Director of Business Intelligence & Analytics, MRE Consulting, Ltd.

Emilio is a technology subject matter expert, respected thought leader and CIO100 Award Winner.  With over 18 years of experience, Emilio has helped clients in multiple industries create business value through Business Intelligence, Data Analytics, DataOps, DevOps, IoT, Application Integration, Enterprise Mobility, Enterprise Architecture, Software Development, Infrastructure Management, Cloud Strategies, Server Virtualization, and Application Performance Tuning initiatives.



Click the link below to download the PDF version.