Beefing up the Sandbox (and More): Signature Chaining to Pinpoint More Malware Behaviors

computer images

This blog is intended for malware researchers working to develop signatures detecting malware, and engineers developing infrastructure supporting these signatures.

At CrowdStrike, we often leverage machine learning (ML) to detect malware, both in the cloud and on end hosts. In some circumstances, it can also be helpful to leverage signatures. Signatures in this usage are sets of heuristic logic that make determinations such as about maliciousness or importance of data for higher classifications. For example, signatures make for good ground truth for ML. In addition, in the cloud where memory and CPU cycles are virtually infinite, signatures can be a handy tool to identify specific types of software that exhibit specific behaviors, for example.

Distinguishing between operations and characteristics of legitimate software and malware is complex.  Even legitimate software employs a byzantine collection of activities. The differences between normal activities and those indicating maliciousness are often subtle. Further complicating this distinction, malware authors often seek to hide their telling operations among activity observed in normal, legitimate applications.

Because of these challenges, signature analysis using singular, isolated activities or characteristics examined in a stateless manner is insufficient.graphics of chart

Figure 1. Operating in isolation

graphics in chart

Figure 2. Operating together

What’s needed is the ability to incorporate data from a variety of sources into making heuristic decisions.

The solution presented here introduces signature chaining. Signature chaining allows discrete signatures to make determinations using data shared among signatures, versus operating solely on the data provided to the signature invocation’s instance.

This article will outline the starting point for signature chaining by describing common characteristics of signature analysis systems, the challenges these bring, and methods addressing these that facilitate the introduction of signature chaining. 

Common Characteristics of Signature Analysis Systems

As stated, signatures are sets of heuristic, or rule-based, logic that make determinations based on input data. This data may be static (such as from a file on disk) or behavioral (such as a running process launching another process). For further purposes of discussion, this data will be referred to as sample characteristics data. The origin and nature of the data (such as whether static or behavioral) does not need to be distinguished. The signature chaining solution operates on all sample characteristics data in the same manner.

Signatures are invoked by an analysis system, provided the relevant sample characteristics data, and then individually perform their heuristic analysis to make determinations. Determinations may include concluding the sample is malicious (or clean, a useful classification ability), or deciding to save additional data for analysis by later processing such as machine learning or examination by malware researchers.

Sample characteristics data may be provided as a summation.  For example, a single sample characteristic might be provided for each unique Windows Registry value-creation operation, regardless of the number of repeats of the operation. Uniqueness may be determined by the characteristic’s type (Registry value-create in this example), and parameters (process performing the operation, Registry path, etc.). Alternately, a run-length encoding might be included, where each unique sample characteristic includes the number of repeated instances.

Signature invocation may be optionally controlled through filtering. In this usage, signatures indicate what types of sample characteristics to invoke them with.  For example, a signature may be invoked only for Registry value-creation events.

Commonly, further signature invocation will be truncated if an earlier signature indicates a match, as a performance optimization.

In many analysis systems, signatures operate in isolation. This may be driven by simplification, or because signatures are invoked inline while behavioral events are received. The signatures only operate on the immediate parameters provided, the parameters for the unique Registry value-creation event in the earlier example.

Order of invocation may not be deterministic. Signatures may be invoked in any order, in parallel or in serial. This is often due to:

  • Signatures being invoked only after all relevant data collected
  • Signature processing performed by parallel threads
  • Signatures invoked at the time a sample characteristic is collected (such as an event interception)

The preceding are common characteristics of signature analysis systems, which need to be recognized and addressed by a solution introducing signature chaining. 

Solution Goals

A successful solution to signature chaining must accommodate several goals. 

Minimal Goals

  • Facilitate introduction to existing signature-invocation systems
    • This is a primary goal for the solution
  • Provide a data store shareable across signature invocations
    • Signatures may add, retrieve and modify values
    • Data store “handed” to signatures upon invocation
    • Key-value system for accessing data store
      • Key and value names, and data format, specified by individual signatures
    • Synchronization across threads
  • Opt-in system: Existing signatures that do not opt-in to use signature chaining operate as before
  • All signatures opting in for chaining invoked regardless of earlier signature matches
  • Operates without dependency on signature invocation order
  • Operates without dependency on ordered delivery of sample characteristics (including behavioral events)
  • Support shared evaluation functions
    • Since invocation ordering is not determined, it is useful to use shared functions that perform evaluation of collated data
    • Each signature invokes the shared evaluation functions
    • The aggregate evaluation logic is hosted by these shared functions, versus duplicated in the chained signatures

Implementation Highlights

The key-value shared data store is expressed in a language-appropriate manner for the signatures. If the signatures are written in Python, for example, the key-value store may be expressed as a Python dictionary.

The store is passed to the signatures upon invocation. This may be as a parameter added to an existing export (a rules match function, for example) or via a new export, to avoid changes to existing functions.  Note that the latter assumes that the signatures are loaded and persisted while the rules logic is invoked. If the signatures are invoked in a purely stateless manner, then the shared store must be passed per invocation of the signature.

Locking must be used to ensure integrity of the store across thread access. This locking may be coarse-grained where the store is locked for each signature invocation. During invocation, the signature may then retrieve/add/change data in the store as wished. After invocation, the store is unlocked and then relocked for the next signature invocation. This is the simplest means of preserving store integrity. It is also less performant and scalable than other options. As the use of the store is opt-in, signatures not requiring it could run without this locking, mitigating impact somewhat.

An optimization might include locking performed by the signature itself via a callback prior to retrieving and operating on data in the store. Further optimizations are possible, but they increase complexity of the solution.

Application Example

An example of malicious activity readily detected with signature chaining is the behavior of creating a new file (perhaps in a temporary folder location) and then launching the file as a process. The intent of the file creation may be hidden somewhat by naming the file as a non-executable (not an .EXE in Windows).

With stateless signature invocation only operating on parameters for individual behavioral events, detecting this as malicious/suspicious activity is difficult. The data included with the file creation event doesn’t indicate suspiciousness by itself. The launch of the file as a process is perhaps more suspicious if efforts were made to hide the file’s nature (saved as a temp file, marked as not an executable file, etc.).  But even with these, the process launch event by itself is insufficient to detect maliciousness.

The best detection opportunity pairs the file creation and the process launch events. The file creation can be filtered for creating the file, opening it for writing, etc., and this noted in the key-value store. The process launch event analysis can check the key-value store and note the file creation/open for writing activity noted by the file creation event, pair this with the process launch event and make a detection.

For this example, two signatures (and optionally a third) will be developed. The first responds to notification of file creation events. The second responds to notification of process launch events. It is assumed that ordering of signature invocations is not predetermined.

An optional third signature performs static analysis on the file created/opened for write. The goal of this signature is to determine if the file is in an executable format, regardless of how the file is named (for example, the .EXE extension removed in Windows). This scan will need to be performed against the closed/committed file. The scan invocation can be done by the analysis system prior to the invocation of the behavioral systems, or the scan signature can be invoked by the signature handling the process launch event (as the file will have been closed/committed by then).

For this example, the shared store will be passed to an exported function of the signature. Export of this function also serves as the opt-in for the signature to participate in chaining.

The file creation event signature performs these operations:

  • Receives the shared store via the opt-in export function
  • Receives the invocation for the file creation event
    • Checks if the file is being newly created, opened for write
    • Optionally checks other indicators such as creation in a temp folder and creation of an ostensibly non-executable file
    •  If sufficiently interesting, notes the file creation in the shared store, including the file path and other relevant data
  • Invokes shared analysis function
    • Return results from the shared analysis function as the signature’s match results

 The optional file scan signature performs these operations:

  • Receives the shared store via the opt-in export function
  • Receives the invocation for the file scan operation
    • Checks if the file’s contents indicate it is in an executable format
    • Notes scan findings in the shared store, including file path and other relevant data

The process launch event performs these operations:

  • Receives the shared store via the opt-in export function
  • Receives the invocation for the process launch event
    • Notes the process launch in the shared store, including the process file path and other relevant data
  • When an optional scan signature is included, checks in the store if the file was scanned by the scan signature
    • If not, invoke the file scan signature
  • Invokes shared analysis function
    • Return results from the shared analysis function as the signature’s match results

The shared analysis function, called by each signature, performs these operations:

  • Receives the invocation from the chained file creation and process launch signatures
    • Shared store and file path passed as parameters
  • Retrieves information noted in the store by the optional scan signature, the file create signature and the process launch signature for the file path
    • If the file was newly created or opened for write and the file was launched as a process, determine that this is malicious/suspicious activity. Analysis function will return a match result.
    • If the file was created/named as a non-executable, but the optional scan signature noted it is in an executable format, determine that this indicates a malicious/suspicious attempt to hide the nature of the created/written file prior to launching. Note this in extended analysis results.
    • If the file was created in a temp folder, determine that this indicates a malicious/suspicious attempt to hide the nature of the created/written file prior to launching. Note this in extended analysis results.
flow chart diagram

Figure 3. Diagram illustrating the operations described

Wrap-up

We hope this blog is informative, illustrating how signature chaining provides powerful capabilities for detection of maliciousness and how introducing this capability can be facilitated to enable easier addition to existing signature analysis systems. 

 Additional Resources

Related Content