An access log is a log file that records all events related to client applications and user access to a resource on a computer. Examples can be web server access logs, FTP command logs, or database query logs.
Managing access logs is an important task for system administrators. Software developers, operations engineers, and security analysts use access logs to monitor how their application is performing, who is accessing it, and what’s happening behind the scenes. Access logs can help IT teams discover problems, detect threats, and identify capacity issues.
Typically, access logs contain some common information. Example information includes:
- The date and time of client access
- The client IP address or hostname
- Username
- The status or criticality of the event
- Success or failure of the operation
- Any relevant messages
In this article, we’ll consider why access logs are important, different types of access logs and their locations, their contents, and the various configuration parameters involved.
Access Log Types
We can broadly classify access logs into three main categories:
- Activity logs
- Server access logs
- Error logs
Activity logs
An activity log records all the actions performed by a user during a session. Such activities include executing commands, visiting URLs, and accessing files. Some examples of activity logs include:
Server access logs
Server access logs contain information about user connections and their resource requests. Unlike activity logs, these logs don’t contain detailed information about what the user actually did. Examples of server access logs include:
- Linux lastlog
- Windows security log
- AWS S3 bucket server access log
- Oracle Directory Server access log
- SQL Server login audits
Error logs
Error logs contain diagnostic information about errors encountered during client sessions. These logs are useful for troubleshooting application and system errors. Some examples include:
To keep things simple, we’ll focus on web server access logs in this article. Typically, web server access logs contain all three types of information (user access, user activity, and request errors).
Why Do You Need to Capture Access Logs?
Capturing and analyzing web server access logs is beneficial for system administrators.
First, it shows the web application’s availability and health for faster error troubleshooting. For example, if the access log shows a high number of HTTP error 404, it means users are trying to access one or more non-existent pages, or the site is using the wrong URLs.
An access log also helps troubleshoot critical errors. For example, a high number of 5xx errors indicates the web server is encountering internal errors—part of the site is probably crashing. Looking further into the web server error log can reveal more information.
Digital marketing is another area of value for web server access logs. Using the access log entries, digital marketers can identify areas on the site where users visit, request data, complete forms, download files, or click links. All these can power fine-tuned user profiling and search engine optimization.
SecOps engineers use web server access logs to find unusual behaviors or anomalies. An unexpected surge of HTTP GET requests from a specific range of IP addresses is one example. This may signal a possible DDoS attack from a set of compromised computers. If a web server is only supposed to accept HTTP/HTTPS traffic from a web application firewall, then direct HTTP requests from other IP addresses can indicate possible unauthorized access.
What Does an Access Log Contain?
Typically, a web server access log will contain the following types of information:
Date and time | The date and time the site/page was accessed, which can be in UTC or in the web server’s local time. |
---|---|
Source IP | The client machine’s IP address. |
Destination IP | IP address of the web server. |
Destination FQDN | The web server’s fully qualified domain name. |
Destination port | The requested port on the web server. This is typically 80 (default for HTTP) or 443 (default for HTTPS) but can be anything depending on which port the website is running. |
Protocol | The client access network protocol. A typical example is HTTP 1.1. |
Username | User accessing the website (if anonymous, this is denoted by a hyphen). |
Resource | The page or element requested. |
HTTP method | This HTTP request method (such as, GET , POST , and so on). |
HTTP status code | Status code returned by the web server (such as, 200 OK, 404 Page Not Found, and so on). |
URI Query | The application query sent to the website as part of the HTTP request. |
HTTP referrer | The IP address or URL that directed the client to this website. |
HTTP user agent | The type and version of the client browser. |
Bytes received | The number of bytes received by the web server from the client. |
Bytes sent | The number of bytes sent by the web server to the client. |
To see what these fields look like, let’s consider the following snippet from an Apache web server access log:
116.35.41.41 - - [21/May/2022:11:22:41 +0000] "GET /aboutus.html HTTP/1.1" 200 6430 "http://34.227.9.153/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.4 Safari/605.1.15"
Here, the access log shows a client request coming from the IP address 116.35.41.41
on May 21st, 2022 at 11:22 a.m. local server time. The client accessed the aboutus.html
page under the website’s root directory. The HTTP status code was 200
(such as, the client request was successful), and the referring website address is http://34.227.9.153/
. The user’s browser was Apple’s Safari, and the web server sent 6430 bytes to the client when it served the page.
By aggregating such information from the access log, you can find the:
- Number of unique visitors per page or unique pages per visitor
- Geolocations of site visitors
- Most commonly accessed parts of a site
- Most commonly used client queries
- Total number of different HTTP status codes
How to Find Access Logs
A web server’s access log location depends on the operating system and the web server itself.
For example, the default location of the Apache web server’s access log in RHEL-based systems is /var/log/httpd
. In Debian-based systems like Ubuntu, the location is /var/log/apache2
.
For Nginx, by default, the access log is in the /var/log/nginx
directory in both RHEL and Debian-based systems.
The default access log location for Internet Information Service (IIS) running on a Windows server is %SystemDrive%inetpublogsLogFilesW3SVC
. The %SystemDrive%
is typically C:
, and the site_id is the IIS-hosted website’s ID.
There are different ways administrators can read a web server’s access logs. A site administrator can SSH into the actual web server’s console for Linux-based systems and use commands like cat
, tail
, and grep
to read the file. Sometimes, webmasters may have to use the hosting provider’s control panel (such as, cpanel) to open and read the access log.
How to Configure Access Logs
Like most other settings, you can set the properties of a web server access log in its configuration file. Locating a web server’s main configuration file depends on the web server itself and the OS. Here is a list:
Webserver | OS | Main Configuration File |
---|---|---|
Apache | RHEL-based | /etc/httpd/conf/httpd.conf |
Apache | Debian-based | /etc/apache2/apache2.conf |
Nginx | RHEL-based | /etc/nginx/nginx.conf |
Nginx | Debian-based | /etc/nginx/nginx.conf |
IIS | Windows Server | %WinDir%System32InetsrvConfigApplicationHost.config |
Some of the common access log settings in any web server are:
- Log location
- Log format
- Log level
- Log rotation
The access log location can be different for each website hosted on the web server. For example, in Apache, the following command sets the server-wide access log location:
CustomLog "/var/log/httpd2/access_log" common
But this can be overridden for a VirtualHost:
ServerName www.mysite.com
ServerAlias test.com
DocumentRoot /var/www/html/test.com
ErrorLog /var/log/httpd/mysite.com/error_log
CustomLog /var/log/httpd/mysite.com/access_log combined
The access log format configuration specifies the fields to include in log entries. The access log format can be common
or combined
. The snippet below shows a sample configuration:
LogFormat "%h %l %u %t "%r" %>s %b" common
Here:
%h
is the remote hostname%l
is the remote logname from identd (if supplied)%u
is the client's user ID (if available)%t
is the timestamp the request was received%r
is the first line of the HTTP request%>s
is the HTTP status code returned by the webserver%b
is the size of the resource returned in bytes
You can refer to the Apache documentation to see how to use the custom log module to configure your own access log format.
Other Apache access log configuration settings can include log level and log rotation. Log level allows you to include only specific events that meet a certain criticality level and above. These criticality levels can be debug
, info
, notice
, warn
, error
, crit
, alert
, emerg
, and anything between trace1
to trace8
. The lower the log level, the more verbose log entries will be. In the snippet below, we are configuring the access log to record only warn level messages and above:
LogLevel warn
This can be overridden for Apache VirtualHosts.
You can set Apache log rotation using the Linux logrotate utility or Apache’s rotatelog program.
Discover the world’s leading AI-native platform for next-gen SIEM and log management
Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.