Transform SOC with Next-Gen SIEM

Uncover the future of SIEM technology. Enhance your security operations center with cutting-edge SIEM strategies and automation.

Download Your Guide Now

Transform SOC with Next-Gen SIEM

Uncover the future of SIEM technology. Enhance your security operations center with cutting-edge SIEM strategies and automation.

Download Your Guide Now

Logs are an essential component of any IT system, helping you with any and all of the following:

  • Monitor infrastructure performance
  • Detect application bugs
  • Conduct root cause analysis
  • Investigate security incidents
  • Track user behavior
  • … and more.

To fully utilize your logs, you need a robust log management system that can cope with the various structured and unstructured formats they come in.

A well-designed log management solution will ingest, parse, and store logs—regardless of their formats. This means you can search, analyze, and correlate data from different systems to find trends, create dashboards, and even trigger alerts to improve your business processes.

In this article, we’ll discuss general log formats and then cover some of the commonly used log formats across IT systems.

A Brief Introduction to Log Formats

A log format defines how the contents of a log file should be interpreted. Typically, a format specifies:

  • Whether the log contents are structured or unstructured
  • Whether the data is in plain text or binary
  • What kind of encoding the log file will use
  • How records are delimited

Log formats can also define the fields contained within the log file and the data types for those fields. For example, name=text or age=number.  Special fields, like timestamps, are usually in predefined formats (such as ISO 8601, which would be displayed as 2022-07-10 15:21:00.000).

Applications usually define their available log format(s). Sometimes, the application gives the user a choice of format (for example, JSON or CSV). For hardware devices, manufacturers usually define the log types to be used.

Structured, semi-structured, and unstructured logs

Log files come in structured, semi-structured, or unstructured formats.

Structured log formats have a clear, consistent pattern and can be read by humans and machines. Fields are sometimes separated by a character such as a comma (as in CSV files), space, or hyphen. They may also be joined with an equal (=) sign (for example, name=Jane or city=Paris).

Most log management systems have pre-configured parsers built in and can easily ingest structured log formats. Below is an example of a structured log file:

[{

"Env" : "Prod",

"ServerName" : "LAPTOP123",

"AppName" : "Console1.vmhost.exe",

"AppLoc" : "C:Teststackify-api-dotnetdstConsoleApplication1binDebugConsole1.vmhost.exe",

"Logger" : "StackifyLib.net",

"Msgs" : [{

"Msg" : "Incoming metrics data",

"data" : "{"clientid":12345}",

"EpochMs" : 1445345672470,

"Level" : "INFO",

"id" : "0c12301b-e4ge-11e6-8933-897567896a4"

}]

}]

Unstructured log formats don’t use a particular pattern, but they are still easy for humans to read. This makes it difficult to split the events and extract key-value pairs during parsing. If there is no built-in parser in the log management system, an unstructured log will require custom parsing, often creating extra work for the engineer.

2018-10-25 11:56:35,008 INFO  [LAPTOP321-1-3]  c.a.c.d.RFC4519DirectoryMembershipsIterable Found 7 children for 7 groups in 2 ms

Starting process to remove.

Process started.

Process completed.

Semi-structured logs are easy for humans to read, but also have a schema or pattern, making it possible for machines to read too. They have more complex field and event separators than a comma or an equal sign, but they do have a pattern. Log management systems can ingest semi-structured logs but usually require a parser to split events and extract key-value pairs. This is usually done using regular expressions or other code.

Commonly Used Log Formats

While log formats vary widely across systems, applications, and tools, certain log formats are commonly used. Let’s cover the notable ones in more detail.

JSON

JavaScript Object Notation (JSON) is one of the most commonly used log formats. JSON logs are semi-structured, containing multiple key-value pairs. With JSON, logs can nest data into different layers while keeping the format easy to read by humans. JSON also provides a way of maintaining data types, such as string, number, boolean, null/empty, object, or array.

As a relatively newer format, JSON usually uses UTF-8 encoding at rest and in transit, which makes it accessible by both *nix and Windows operating systems. There are no restrictions on the quantity or type of fields you can include. This works well with NoSQL (or schema-less) databases but can require extra work from the log author to ensure consistency of field types between apps and log sources.

Here is an example JSON log file:

{

"timestamp": "2022-07-29T02:03:45.293Z",

"message": "User Jane.Doe has logged in",

"log": {

"level": "info",

"file": "auth.c",

"line": 66,

},

"user": {

"name": "jane.doe",

"id": 235

},

"event": {

"success": true

}

}

Windows Event logs

Windows Event logs contain data relating to events that occur on the Windows operating system. Security, application, system, and DNS events are some examples of Windows Event logs, and they all use the same log format.

Windows Event logs are often used by system administrators for troubleshooting system or application errors, investigating security incidents, or tracking user logins. They are usually very detailed, including information such as timestamp, event ID, username, hostname, message, and task category.

Here is an example Windows Event log:

An account was successfully logged on.

Subject:

Security ID: SYSTEM

Account Name: DESKTOP-LLHJ389$

Account Domain: WORKGROUP

Logon ID: 0x3E7

Logon Information:

Logon Type: 7

Restricted Admin Mode: -

Virtual Account: No

Elevated Token: No

Impersonation Level: Impersonation

New Logon:

Security ID: AzureADRandyFranklinSmith

Account Name: rsmith@montereytechgroup.com

Account Domain: AzureAD

Logon ID: 0xFD5113F

Linked Logon ID: 0xFD5112A

Network Account Name: -

Network Account Domain: -

Logon GUID: {00000000-0000-0000-0000-000000000000}

Process Information:

Process ID: 0x30c

Process Name: C:WindowsSystem32lsass.exe

Network Information:

Workstation Name: DESKTOP-LLHJ389

Source Network Address: -

Source Port: -

Detailed Authentication Information:

Logon Process: Negotiate

Authentication Package: Negotiate

Transited Services: -

Package Name (NTLM only): -

Key Length: 0

CEF

Common Event Format (CEF) is an open, text-based log format used by security-related devices and applications. Developed by ArcSight Enterprise Security Manager, CEF is used when collecting and aggregating data by SIEM and log management systems.

CEF logs use UTF-8 encoding and include a common prefix, a CEF header, and a variable extension that contains a list of key-value pairs.

The prefix contains the timestamp of the event and the hostname. The header includes the CEF software version, device vendor, device product, device version, device event class ID, name, and severity. The rest of the log message comprises additional custom fields to enrich it.

Here is an example entry that uses CEF:

CEF:0|Trend Micro|Deep Security Manager||600|User Signed In|3|src=10.52.116.160 suser=admin target=admin msg=User signed in from 2001:db8::5

CLF

The NCSA Common Log Format (CLF) is one of the oldest log formats used by web servers. It’s a standardized, text-based log file with a fixed format, which means you can’t customize the fields. Each line in the log file includes:

  • Remote host address
  • Remote log name
  • Username
  • Timestamp
  • Request and Protocol Version
  • HTTP Status Code
  • Bytes Sent

A hyphen is used to represent a field that doesn’t contain data for that event, and a plus (+) sign represents unsupported characters.

Here is an example CLF Log:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

ELF

The Extended Log Format (ELF) is used by web applications. It is similar to CLF but contains more information and flexibility over which fields are used. ELF logs contain data relating to a single HTTP transaction. Fields are separated by white space, and a hyphen represents a missing field.

The beginning of the log contains information regarding the version, date, time, software, and any relevant comments. This is preceded by a hash (#) symbol. The log also contains the field names, making it much easier for log handlers to parse all the fields properly.

W3C

The W3C Extended Log File Format is a highly customizable log format used by Windows IIS servers. You can configure which fields to include, helping to reduce the size of the log files and keep only relevant information. Available fields include:

  • Timestamp
  • Client IP
  • Server IP
  • URI-Stem
  • HTTP Status Code
  • Bytes Sent
  • Bytes Received
  • Time Taken
  • Version

Some fields are prefixed with s (server), c (client), sc (server to client action) or cs (client to server action) to show if it’s related to the server or client side.

Here is an example of a W3C log:

#Software: Internet Information Services 6.0

#Version: 1.0

#Date: 2001-05-02 17:42:15

#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version

17:42:15 172.16.255.255 GET /default.htm 200 HTTP/1.0

Discover the world’s leading AI-native platform for next-gen SIEM and log management

Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.