Transform SOC with Next-Gen SIEM
Transform SOC with Next-Gen SIEM
Logs are an essential component of any IT system, helping you with any and all of the following:
- Monitor infrastructure performance
- Detect application bugs
- Conduct root cause analysis
- Investigate security incidents
- Track user behavior
- … and more.
To fully utilize your logs, you need a robust log management system that can cope with the various structured and unstructured formats they come in.
A well-designed log management solution will ingest, parse, and store logs—regardless of their formats. This means you can search, analyze, and correlate data from different systems to find trends, create dashboards, and even trigger alerts to improve your business processes.
In this article, we’ll discuss general log formats and then cover some of the commonly used log formats across IT systems.
A Brief Introduction to Log Formats
A log format defines how the contents of a log file should be interpreted. Typically, a format specifies:
- Whether the log contents are structured or unstructured
- Whether the data is in plain text or binary
- What kind of encoding the log file will use
- How records are delimited
Log formats can also define the fields contained within the log file and the data types for those fields. For example, name=text
or age=number
. Special fields, like timestamps, are usually in predefined formats (such as ISO 8601, which would be displayed as 2022-07-10 15:21:00.000
).
Applications usually define their available log format(s). Sometimes, the application gives the user a choice of format (for example, JSON or CSV). For hardware devices, manufacturers usually define the log types to be used.
Structured, semi-structured, and unstructured logs
Log files come in structured, semi-structured, or unstructured formats.
Structured log formats have a clear, consistent pattern and can be read by humans and machines. Fields are sometimes separated by a character such as a comma (as in CSV files), space, or hyphen. They may also be joined with an equal (=
) sign (for example, name=Jane
or city=Paris
).
Most log management systems have pre-configured parsers built in and can easily ingest structured log formats. Below is an example of a structured log file:
[{"Env" : "Prod",
"ServerName" : "LAPTOP123",
"AppName" : "Console1.vmhost.exe",
"AppLoc" : "C:Teststackify-api-dotnetdstConsoleApplication1binDebugConsole1.vmhost.exe",
"Logger" : "StackifyLib.net",
"Msgs" : [{
"Msg" : "Incoming metrics data",
"data" : "{"clientid":12345}",
"EpochMs" : 1445345672470,
"Level" : "INFO",
"id" : "0c12301b-e4ge-11e6-8933-897567896a4"
}]
}]
Unstructured log formats don’t use a particular pattern, but they are still easy for humans to read. This makes it difficult to split the events and extract key-value pairs during parsing. If there is no built-in parser in the log management system, an unstructured log will require custom parsing, often creating extra work for the engineer.
2018-10-25 11:56:35,008 INFO [LAPTOP321-1-3] c.a.c.d.RFC4519DirectoryMembershipsIterable Found 7 children for 7 groups in 2 msStarting process to remove.
Process started.
Process completed.
Semi-structured logs are easy for humans to read, but also have a schema or pattern, making it possible for machines to read too. They have more complex field and event separators than a comma or an equal sign, but they do have a pattern. Log management systems can ingest semi-structured logs but usually require a parser to split events and extract key-value pairs. This is usually done using regular expressions or other code.
Commonly Used Log Formats
While log formats vary widely across systems, applications, and tools, certain log formats are commonly used. Let’s cover the notable ones in more detail.
JSON
JavaScript Object Notation (JSON) is one of the most commonly used log formats. JSON logs are semi-structured, containing multiple key-value pairs. With JSON, logs can nest data into different layers while keeping the format easy to read by humans. JSON also provides a way of maintaining data types, such as string, number, boolean, null/empty, object, or array.
As a relatively newer format, JSON usually uses UTF-8 encoding at rest and in transit, which makes it accessible by both *nix and Windows operating systems. There are no restrictions on the quantity or type of fields you can include. This works well with NoSQL (or schema-less) databases but can require extra work from the log author to ensure consistency of field types between apps and log sources.
Here is an example JSON log file:
{"timestamp": "2022-07-29T02:03:45.293Z",
"message": "User Jane.Doe has logged in",
"log": {
"level": "info",
"file": "auth.c",
"line": 66,
},
"user": {
"name": "jane.doe",
"id": 235
},
"event": {
"success": true
}
}
Windows Event logs
Windows Event logs contain data relating to events that occur on the Windows operating system. Security, application, system, and DNS events are some examples of Windows Event logs, and they all use the same log format.
Windows Event logs are often used by system administrators for troubleshooting system or application errors, investigating security incidents, or tracking user logins. They are usually very detailed, including information such as timestamp, event ID, username, hostname, message, and task category.
Here is an example Windows Event log:
An account was successfully logged on.Subject:
Security ID: SYSTEM
Account Name: DESKTOP-LLHJ389$
Account Domain: WORKGROUP
Logon ID: 0x3E7
Logon Information:
Logon Type: 7
Restricted Admin Mode: -
Virtual Account: No
Elevated Token: No
Impersonation Level: Impersonation
New Logon:
Security ID: AzureADRandyFranklinSmith
Account Name: rsmith@montereytechgroup.com
Account Domain: AzureAD
Logon ID: 0xFD5113F
Linked Logon ID: 0xFD5112A
Network Account Name: -
Network Account Domain: -
Logon GUID: {00000000-0000-0000-0000-000000000000}
Process Information:
Process ID: 0x30c
Process Name: C:WindowsSystem32lsass.exe
Network Information:
Workstation Name: DESKTOP-LLHJ389
Source Network Address: -
Source Port: -
Detailed Authentication Information:
Logon Process: Negotiate
Authentication Package: Negotiate
Transited Services: -
Package Name (NTLM only): -
Key Length: 0
CEF
Common Event Format (CEF) is an open, text-based log format used by security-related devices and applications. Developed by ArcSight Enterprise Security Manager, CEF is used when collecting and aggregating data by SIEM and log management systems.
CEF logs use UTF-8 encoding and include a common prefix, a CEF header, and a variable extension that contains a list of key-value pairs.
The prefix contains the timestamp of the event and the hostname. The header includes the CEF software version, device vendor, device product, device version, device event class ID, name, and severity. The rest of the log message comprises additional custom fields to enrich it.
Here is an example entry that uses CEF:
CEF:0|Trend Micro|Deep Security Manager||600|User Signed In|3|src=10.52.116.160 suser=admin target=admin msg=User signed in from 2001:db8::5
CLF
The NCSA Common Log Format (CLF) is one of the oldest log formats used by web servers. It’s a standardized, text-based log file with a fixed format, which means you can’t customize the fields. Each line in the log file includes:
- Remote host address
- Remote log name
- Username
- Timestamp
- Request and Protocol Version
- HTTP Status Code
- Bytes Sent
A hyphen is used to represent a field that doesn’t contain data for that event, and a plus (+) sign represents unsupported characters.
Here is an example CLF Log:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
ELF
The Extended Log Format (ELF) is used by web applications. It is similar to CLF but contains more information and flexibility over which fields are used. ELF logs contain data relating to a single HTTP transaction. Fields are separated by white space, and a hyphen represents a missing field.
The beginning of the log contains information regarding the version, date, time, software, and any relevant comments. This is preceded by a hash (#) symbol. The log also contains the field names, making it much easier for log handlers to parse all the fields properly.
W3C
The W3C Extended Log File Format is a highly customizable log format used by Windows IIS servers. You can configure which fields to include, helping to reduce the size of the log files and keep only relevant information. Available fields include:
- Timestamp
- Client IP
- Server IP
- URI-Stem
- HTTP Status Code
- Bytes Sent
- Bytes Received
- Time Taken
- Version
Some fields are prefixed with s (server), c (client), sc (server to client action) or cs (client to server action) to show if it’s related to the server or client side.
Here is an example of a W3C log:
#Software: Internet Information Services 6.0#Version: 1.0
#Date: 2001-05-02 17:42:15
#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version
17:42:15 172.16.255.255 GET /default.htm 200 HTTP/1.0
Discover the world’s leading AI-native platform for next-gen SIEM and log management
Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.