This blog was originally published Jan. 6, 2020 on humio.com. Humio is a CrowdStrike Company.
Humio is purpose-built to aggregate and retain billions of streaming logs, then analyze and visualize them to determine the health of the environment — something we describe as “feeling the hum of the system.” Humio developers tenaciously optimize data ingest, retention, compression, and storage to take advantage of today’s modern hardware. In a talk at QCon 2019 called Real-time Log Analytics at Scale with Kafka, our CTO Kresten Krab Thorup describes some of the technology powering Humio. He explains how Humio is able to ingest hundreds of terabytes of streaming data per day and still deliver instant ingest, real-time alerts and reports, and lightning-fast search results.Optimized search is at the heart of Humio
The primary way users interact with Humio is to perform searches to explore the data to find answers to questions and get to the root-cause of issues. At the heart of the Humio platform are advanced proprietary algorithms that provide search results with sub-second latency. One way we optimize performance is to bypass indexing the data. As a result, the moment the data is ingested, alerts and reports are updated, and the data is available to search. But without an index, Humio needs to be clever about how search is implemented.How Humio optimizes brute-force search
There are lots of clever ways to perform a search. The simplest to conceptualize is a brute-force linear search that sequentially checks every element. But when it’s used with a large number of candidates, it can be really slow. So it may surprise you that we make Humio so incredibly fast by using brute-force methods to search logs. You may cringe thinking about using brute-force to search so much log data. It sounds like a huge waste of resources. Isn’t it a tool from the stone ages?! Our engineers have discovered that when brute-force search is altered to increase efficiency, it becomes as fast as index-based searching, enabling you to search streaming data as well as stored data!Mechanical sympathy
Humio engineers understand the intricacies of the hardware, using principles of mechanical sympathy - looking at the performance from the machine’s perspective and attempting to make it easier to do its job successfully. The term was coined by race car driver Jackie Stewart, who said, “You don’t have to be an engineer to be a racing driver, but you do have to have mechanical sympathy.” So when Humio uses brute-force searching, they implement it in a way that makes the work easiest for your processors. Instead of searching your entire database, Humio uses time- and metadata-selection to reduce the problem space. We further enhance our brute-force searching by using principles of mechanical sympathy to decrease processing time, speed up the search by compressing data, and pull data from cached memory whenever possible. An optimized brute-force search with Humio can run at 30-40x the speed of a regular search!How brute-force search becomes fast
- Normal time to search 1 GB of data from a drive takes
- If you compress the data by 10x, it takes less time to search, so it now takes
- Add more cores - split data into four cores of 256 MB and it takes less time:
- Divide your data bunches into single MB chunks of decompressed data rather than 256 MB. Move compressed data from the main memory to the CPU, and use CPU cache so you don’t have to scan it in again from main memory. Now it only takes
- In practice, you’re often searching for current and recent data that can be kept in the page cache, so you can get search times down to
Why Humio uses brute-force search
The need to make streaming data instantly searchable creates a need for brute-force searching. Streaming data querying happens near-instantly, giving you no time to apply complex indexes that could make searching faster. To search streaming and historical data at fast speeds, we bring in optimized brute-force searching. Many conventional log management systems that heavily index will search faster thantypical
brute-force searching, but not faster than Humio’s optimized brute-force search. And by heavily indexing, many of these conventional logging systems sacrifice streaming access to their data.