Open-source Computing is Exploding: Interview With Costa Tsaousis, CEO and Founder, Netdata
Hi Costa. Can you tell us about your role and the team you handle at Netdata?
I am the original developer of the Netdata. After developing this tool for a couple of years and using it to troubleshoot the issues we were facing on the cloud and on-prem infrastructure I was responsible for, in 2016 I released it on GitHub as open-source, free software. People loved it; it went viral. Within a couple of weeks, the project reached 10,000 stars on GitHub and Netdata was born. The company has grown to nearly 50 people worldwide, with the majority in engineering roles.
Open-source computing is exploding at an unprecedented rate. What factors would you attribute this level of growth to?
Open source is probably one of the major breakthroughs of our time. It greatly influences the world in a myriad of ways. And it offers many advantages over the closed source, especially for businesses.
Many open-source projects have completely eclipsed their closed source counterparts, starting with Linux and continuing through the cloud and microservices era. It’s my fundamental belief that open source software is a better model for rapid, scalable development. Our collaboration through tools like open repositories on GitHub engages a global community of contributors, widening the talent pool and expertise.
What is Netdata and which platforms does it connect with?
Netdata is the easiest, most effective way to troubleshoot and monitor IT infrastructure. Netdata’s approach is simple: easy installation, access to unlimited metrics, real-time monitoring, and data visualization optimized for troubleshooting. Netdata’s more than a monitoring tool; it’s a troubleshooting platform that brings people and data together in one place in real-time to drive down time to resolution.
The four principles that set Netdata apart are the following:
- Easy installation and configuration: Unlike other tools, all aspects of the product are built for users to easily get up and running quickly. Users don’t need to be monitoring experts to use Netdata, nor do they need to spend weeks or months in planning and resourcing. The installation of data visualization takes minutes.
- Unlimited, high granularity metrics: All metrics are available for users to monitor. Unlike other tools that limit metrics collection, Netdata’s architecture is limitless and monitoring is infinitely scalable.
- Real-time metrics: Metrics are collected per-second or per-event with data-collection-to-visualization latency of less than a millisecond.
- Interactive dashboards: Netdata auto-discovers metrics and instantly builds beautiful visualizations. Charts are ready out-of-the-box and can be drilled down into easily. Prebuilt dashboards display meaningful metrics optimized for visual anomaly detection, enabling users to explore and learn how systems are behaving and why.
The open-source Agent runs on Linux, FreeBSD, macOS, Kubernetes, Docker, and all their derivatives and also integrates with over 200+ applications and services including Docker, MongoDB, Nginx and many more.
Tell us more about Netdata Cloud and the user groups who have adopted it?
Netdata is built to simplify the lives of sysadmins, DevOps engineers, developers, and IT managers. Every decision we make while building the platform is an attempt to make monitoring simpler, faster, smoother, more predictable while providing real-time insights in high resolution and in a meaningful way.
Netdata has more than forty-seven thousand stars on GitHub, more than 180M Docker pulls, and nearly 3M people who have installed the product. Netdata is also one of the top starred projects in Cloud Native Computing Foundation Landscape.
What do you believe is the future of “monitoring and reporting” in business intelligence processes?
Business intelligence necessarily includes IT operations. ITIM insights are important for everything from business continuity and disaster recovery to finding economies of scale and controlling variable costs. Things like Cloud hosting costs can quickly lead to overruns if they are not monitored closely.
IT infrastructure monitoring plays a key role not only in managing costs but, in capacity planning overall to align infrastructure to business objectives.
We are witnessing a fast-paced adoption of AutoML and AI Engineering in Monitoring/ visualization techniques. Could you tell us something about your recent AI ML projects/ data science operations?
We have recently been working on leveraging some statistical and machine learning techniques to help surface the most interesting metrics and charts to our users given an area of focus or interest they would like to drill into.
The idea here is to leverage these tools to help assist our users when troubleshooting and suggest potentially interesting places to look first and cut down the search space a little.
So, we formulate the problem to the ML as “show me charts and metrics that seem to have changed most significantly in our area of interest?“.
This is a nice way to frame a problem for ML as we do not try and pretend it actually understands your system better than you do.
Unfortunately, there is a lot of hype around “AIOps” that is not really honest about what currently is and is not possible, and lots of over-promising that could backfire. Rather, we formulate it as a more agnostic framing whereby if you find some useful insights among the top recommendations, then the ML has done its job well. It is still always going to require a human in the loop to make sense and take action on these insights, at least for the near term.
As we get more and more detailed metrics and observability, the bottleneck becomes the set of eyeballs trying to consume all this information. We are hoping to leverage ML to try and ease the cognitive load this places on the user, and try to provide some helpful insights and potential jumping-off points for their exploration and investigation.
COVID-19 has forced businesses to realign their data management and AI projects. What kind of unique challenges did you face during the pandemic?
Luckily, we have been cloud-native from the start, so all our ML and research infrastructure are perfectly set up for remote access and collaboration wherever our people are working from.
How did you overcome these? What does your product roadmap look like for 2021?
The product roadmap will heavily include making enhancements to Netdata Cloud, although we’ll continue to make improvements and fine-tune the Agent.
Enhancements will focus on continuing to improve Kubernetes and microservices monitoring, eBPF monitoring as a method to monitor applications, without their cooperation, and continuing our ML efforts for even more predictability on infrastructure performance.
Hear it from the Pro: Tell us about the role of Kubernetes engineering in modern IT networking.
The one reason k8s is so popular, of course, is the ability to apply software development practices to infrastructure, by deploying an immutable infrastructure setup (a “release”) from “code” (i.e. helm charts). This has the potential to greatly simplify all the network engineering that SREs have to do by making commonly used network services (cluster DNS, load balancing, firewalls, monitoring, security) easy to develop, debug, and even share with others.
There is also a large number of networking plugins available to make things easier and more customizable when deploying k8s itself to various infrastructures.
Thank you, Costa! That was fun and we hope to see you back on AiThority.com soon.
Costa Tsaousis is the founder and CEO of Netdata, as well as being the original developer of the Netdata Agent. Previously, he worked for 25 years in the online IT services industry, assisting disruptors like Viva.gr, Viva Wallet, and Hellas Online become challengers using technology. Costa is also the primary developer behind FireHOL, a “firewall for humans” that builds secure, stateful firewalls from easy to understand, human-readable configurations.