Introduction to data flow mapping
How can you know that your company data is truly safe?
Security teams used to answer this question by scanning data at rest. They periodically scanned the limited company data stores to piece together a picture of where data was at all times.
In just a few years, however, things changed in a fundamental way. Architectures have become so complex that it is no longer possible to keep track of data with just an at-rest approach. To maintain control of data, it is essential to also track data in motion and work with a data flow map. A smart strategy involves using data flow maps first and data at rest second. That is, organizations should use data in motion to assess which data stores should be prioritized and scanned at rest.
What is data flow mapping?
Data flow mapping is the process of visualizing and tracking the flow of data from acquisition to disposal. It is the missing piece of the puzzle that helps keep data safe even as it flows through highly fragmented, complex, and dynamic environments. Beyond providing a bird’s-eye view of what’s going on, it can help uncover where data may be vulnerable and provide clear steps to mitigate risk and prevent breaches.
2024 CrowdStrike Global Threat Report
The 2024 Global Threat Report unveils an alarming rise in covert activity and a cyber threat landscape dominated by stealth. Data theft, cloud breaches, and malware-free attacks are on the rise. Read about how adversaries continue to adapt despite advancements in detection technology.
Download NowWhy data flow mapping?
Up until recently, data was stored centrally in a limited number of databases that were periodically scanned at rest. This allowed security teams to keep track of data and make sure it was protected.
In modern architectures, however, data passes through hundreds or even thousands of applications and third-party vendors, moving across cloud providers and in and out of shadow databases. Trying to capture this dynamic and fast-paced flow of data with static snapshots is virtually impossible.
Scanning every data source is impractical and prohibitively expensive. Tracking a single data transfer might require the copying and processing of petabytes of data.
Even more importantly, if you only scan data at rest, you miss out on the entire data journey — where it’s been, where it’s going, who owns it, etc. This information couldn’t be more critical when you need to get to the root cause of a problem quickly.
This is where data flow mapping comes in.
Data flow mapping benefits
Having the ability to map the flow of data automatically and drill down to the data layer has many benefits.
With data flow mapping, you can:
- Increase coverage: Data flow mapping allows organizations to automatically discover all external services — including generative AI (GenAI) — and analyze and classify the data that flows to them.
- Comply with regulations: Knowing where sensitive data is at all times and securing it properly is crucial in meeting privacy and security regulations, such as the GDPR and CCPA. PCI DSS, for example, requires that credit card data be fenced off in a specific environment, applying this to processed data as well as data at rest. Upholding this regulation for processed data is not possible by just scanning data at rest.
- Reduce data scanning costs: Data flow mapping can radically reduce public cloud costs by keeping the number of scans to a minimum. One way it does this is by providing security teams with the ability to identify which data stores contain sensitive or high-value data and prioritize these for scanning and analysis, eliminating the need to scan and analyze low-value data stores. Data flow mapping also does this by capturing changes as they occur rather than taking snapshots of everything at every step of the way.
- Supercharge remediation: Data flow mapping plays an important role in improving security posture by visualizing and tracking the flow of data within an organization in real time. This helps organizations identify potential vulnerabilities or risks as they occur, uncovering unauthorized services, stopping data leaks in their tracks, and lowering the impact of such events.
- Make better decisions: Data flow mapping enables organizations to make more informed decisions about data management by providing business context and understanding of how data is used and shared within the organization. This includes determining what data to collect, how to store and secure it, and how long it should be retained.
Challenges with data flow mapping
The first thing to know about data flow mapping is that it can be extremely tricky to implement, especially if done manually. There are several big challenges to look out for:
- Architectural complexity. One of the biggest challenges of mapping the flow of data is that modern architectures have become incredibly complex and fragmented. It is almost impossible to keep track of data that travels through hundreds or even thousands of applications each day.
- Blind spots. Data often flows unexpectedly, going to unmanaged databases, shadow data stores, and third-party services. It can be difficult to map and protect data that flows to locations you know nothing about. The result is a flow map that may seem whole but is rife with blind spots. The worst part is that these blind spots are probably where sensitive data needs the most protection.
- Tedious and time-consuming tasks. Organizations must continually monitor and update data flow maps as systems change and new data routes form.
Tackling these challenges on your own is not only difficult and time-consuming but prone to errors and incredibly frustrating. In the following section, we introduce two automated methods that can help overcome these issues.
2023 Threat Hunting Report
In the 2023 Threat Hunting Report, CrowdStrike’s Counter Adversary Operations team exposes the latest adversary tradecraft and provides knowledge and insights to help stop breaches.
Download NowAutomated data flow mapping methods
There are a few different ways to automatically map the flow of data, and it’s important to understand the differences between them.
Log analysis
One common method is to create a data flow map based on logs and metadata. This involves collecting log data from various sources — such as servers, applications, and network devices — and then using it to create a map of how data flows through an organization.
Though this approach provides useful information, it has some significant drawbacks. Log data is typically limited in scope and may not capture all data movements. In addition, logs are data-blind — they can identify that two assets have communicated, but they cannot say anything about the nature of the data that was transferred between them. This leaves security teams performing educated guesswork about the data type, which can lead to a wide range of security gaps.
In the case of a database that contains personally identifiable information (PII) only, for example, log analysis can mistakenly flag every single communication with that database as a PII data transfer. Beyond causing alert fatigue, log analysis can also miss out on vulnerable PII that flies beneath the radar, hiding in unstructured data, in unexpected fields, etc.
Let’s now turn to an approach that eliminates these issues by looking directly at the data itself.
Payload analysis in runtime
A more comprehensive method is to create a data flow map based on payload analysis in a runtime module. This involves analyzing the actual data payloads as they flow through an organization in real time.
This approach provides a more complete and accurate picture of data movements because it captures all data flows and includes information about the content and context of the data. It’s the only way to truly understand where sensitive data is flowing instead of relying on incomplete or potentially misleading log data.
To reap the full benefits of data flow mapping, it is important to implement it in a way that does not impact performance. One of the best ways to do so is by using a runtime module that is powered by eBPF, as this keeps resources and friction to a minimum.