Humio at Netlify: Real-time Observability at Scale — in All Departments

This blog was originally published on humio.com. Humio is a CrowdStrike Company.

"Being able to share contextual knowledge through saved searches, dashboards, common queries, things like that, enabled my operations team to run faster, and helped our engineering team build more tools for support and sales." - Ryan Neal, Head of Infrastructure at Netlify

The Company

Netlify is a San Francisco-based Cloud computing company. They provide serverless hosting and run a global app delivery network for over 500,000 developers and businesses including Google, Facebook, Kubernetes, Samsung, and Cisco. They offer a toolbox for front-end developers who want to move a website from a JAMstack into a production CDN.

The Challenge

Too much popularity was the problem! Netlify had outgrown their old system of logging. It was getting bogged down by the increased traffic and it was unable to meet their growing needs. Head of Infrastructure at Netlify Ryan Neal tells us about when he realized he needed a new solution. “We were hitting an inflection point of scale where we are getting a lot more popular and a lot more log volume coming through the system. And in that case, our current solution started to have problems around returning inquiries in timely manners and being able to complete in any way.” Ryan had first tried making a custom-brewed solution using a bunch of different logging aggregation frameworks. As it began to consume more and more of his time to tune it so it would work, his manager approached him and asked him to redirect his efforts. “My CTO came to me and said, ‘Look, I need you to build this network, this product – our stuff, not a logging aggregation framework.’“ Netlify had the additional requirement of a logging solution that sales and customer support could access as well. Within their company, they had a system in place in which developers built custom queries so sales and support teams could access business insights from their logs. If Netlify could improve its logging solution it would impact teams throughout their business. “Support and sales will come to us with different requests, and we will often need to build tooling or build dashboards that they're going to use throughout the rest of their day.”

The Search

Netlify began an expansive search that included a wide variety of options. “We reached out to a bunch of the kind of normal players like Splunk and Elasticsearch and we started looking at the startups in the space: LogDNA and pretty much anything that started with “log.” The tech team decided to go ahead and start a couple of different POCs with a couple of companies at the same time, and then started seeing that over time Humio actually ended up winning out from a feature-set capability for us.” For Netlify, the key winning feature of Humio was how it allowed them to customize their logs in order to accommodate the requests of other departments. “The key factor that it came down to was the ability to customize the tool for us internally. There’s a lot of capabilities in our logs. There’s a lot of information and it’s a lot of contextual knowledge. Being able to share that contextual knowledge through saved searches, dashboards, common queries, things like that, enabled my operations team to run faster and helped our engineering team build more tools for support and sales.” The trial and testing phase wasn’t over yet. Netlify needed to be sure that they could not only make gains in customization and increased logging capacity, but that they could also answer all the same questions as their previous solution. “We installed Filebeat and started piping data in parallel to Humio. We did this at first just to verify correctness — are we able to at least answer the same questions with Humio? And slowly we figured, yes we can. Actually, we were able to gain more information out of it. That really solidified our decision.”

The Solution

Positive Effects for Developers

Humio’s unlimited ingest empowers Ryan and his team to make decisions about changes to their system with full confidence knowing that they are doing the right thing. “When I'm able to log everything, it means that I can look at my developers and say, ‘go ahead, absolutely.’ That's really powerful, especially because you can't ever really go back in time and add more information. You can't say, ‘I wish I knew that field.’ So it's more like ‘let’s remove it from the developer’s mind, log it, and then potentially we'll be able to use it later in an effective way.’”

Positive Effects for Sales and Engineering

The benefits of Humio extended to the rest of the team. Netifly’s developers created tools for use internally that made logs accessible to sales and engineering teams. “Our support team and our sales team are also consumers of Humio and they work with us and engineering to help enable them to answer questions. If we have a customer reporting an incident, we need to figure out what went wrong there. We can help support and give them the tools to find the answer for themselves, or have searchable frameworks. If it's sales, they come to us and say, ‘I'm looking at a customer's usage. Could you tell me about it, or other customers that are in this space?’”

Positive Effects for Customers

The benefits of Humio integration also impacts the end-user experience for Netlify’s customers. Logging everything and then being able to search in real time unlocks powerful means of responding to outages. “We're able to serve our end-user better because it really impacts our uptime and our mean time to discovery. In the event of an incident going off, we're able to look in our metrics charts and see issues and problems. We see averages, percentiles, throughput rates dropping, latency spiking — things like that. And then we're able to jump through the Humio data and figure out, is that one client, is it a region? — any kind of problem.” Netlify’s use of Humio shows how a real-time scalable solution can have positive ripple effects that spread within an organization and even extend out to customers. "When I'm able to log everything, it means that I can look at my developers and say, “go ahead, absolutely.” That's really powerful, especially because you can't ever really go back in time and add more information." - Ryan Neal, Head of Infrastructure at Netlify

Breaches Stop Here