Modern distributed applications are based on microservices architecture, typically running on Kubernetes-managed containers. The architectural shift has also changed how these applications generate logs. Due to the ephemeral nature of Kubernetes Pods, operations teams may not have consistent access to the containers in those Pods to collect the application logs. Application logs are lost whenever a Pod goes down or the orchestrator evicts it. The cluster nodes where the Pods run can also be transient due to the elastic nature of cloud-hosted infrastructure.
In this Kubernetes logging guide, we cover the fundamentals of the Kubernetes logging architecture and some of the use cases. In part one, we covered basic node-level logging and cluster-level logging using a node-level logging agent.
In part two, we will cover cluster-level logging using sidecar patterns and the benefits of centralized logging. We will also introduce Falcon LogScale, a modern log management solution.
What Is Cluster-level Logging?
Node-level logging, with the default logging drivers, redirects the output from an application’s stdout
and stderr
streams to appropriate log files. These files exist only during the lifetime of the Pod. The data is lost when Kubernetes evicts a Pod or performs garbage collection.
Cluster-level logging focuses on aggregating logs using a backend service so that logs exist beyond the lifetime of the Pods and their containers. Kubernetes recommends two methods for implementing cluster-level logging.
The first method uses the DaemonSet pattern with a node-level agent. A single Pod runs in all the nodes and is responsible for capturing logs from all the Pods in that node, shipping them to a backend service. This is the method we covered in Part One of this guide.
The second recommended method involves the sidecar pattern.
What Is the Sidecar Pattern?
The sidecar pattern attaches a container to each application Pod. This container is responsible for capturing all the logs from the Pod. Therefore, the sidecar pattern uses more resources than the DaemonSet pattern. Despite this, a sidecar pattern is popular because it offers a great solution when the following are true of your environment:
- Applications don’t log their output to
stderr
andstdout
. - Logs from different application Pods need to be stored in separate locations.
- Application Pods output their logs in different formats.
The sidecar pattern has several advantages over the DaemonSet pattern:
- There’s no need to enforce a single logging format for application containers when a sidecar container collects the logs.
- The architecture provides good isolation between different application containers.
- Because the nodes don’t store any logs, bundling a log agent with the sidecar removes the need for log rotation.
Implementing Cluster-level Logging with the Sidecar Pattern
The sidecar pattern uses a companion container with the primary container to collect its log files. There are two ways you can implement this pattern:
- Streaming sidecar container
- Logging agent
Let’s cover each of these in detail.
Streaming sidecar container
A streaming sidecar has one job: fetch the logs from the application container and write them to node-level directories. A separate node-level agent then fetches the logs from the directories in the node and sends them to a logging backend. This method is suitable for applications that use a non-standard logging method or don’t send their logs to stderr
or stdout
.
If there’s no need for application-level isolation when capturing logs, the logging agent can run at the node level. This arrangement removes the need for extra resources which would have been present if the logging agent had been bundled with the sidecar.
The snippet below implements a streaming sidecar container.
apiVersion: v1kind: Pod
metadata:
name: primary
spec:
containers:
- name: primary
image: busybox:1.28
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/server.log;
i=$((i+1));
sleep 5;
done
volumeMounts:
- name: logdir
mountPath: /var/log
- name: secondary
image: busybox:1.28
args: [/bin/sh, -c, 'tail -n+1 -F /var/log/server.log']
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: logdir
emptyDir: {}
In the above configuration, we start with a busybox (a lightweight Linux utility image) image and use a simple script to increment a counter every second. The script writes the counter value to a log file called server.log
.
Next, we create a sidecar container (named secondary
) that tails this log file. The output of the tail
command is automatically redirected to stdout
, where the default logging driver picks it up and saves it to node-level directories. The node-level logging agent can then access it from those directories and ship it to the logging backend.
The problem with this approach is that it still doesn’t provide enough application-level isolation. It also lacks the flexibility to handle logs from various Pods differently.
Sidecar pattern with logging agent
The alternative option is to use a logging agent, like Fluentd, embedded in the sidecar container. This arrangement ensures that the logging agent runs at the application level and not at the node level. Naturally, the resource usage is higher than the other implementations discussed here.
The snippet below shows a sidecar configuration with the Fluentd collection agent bundled.
apiVersion: v1kind: Pod
metadata:
name: fluentd-sidecar
spec:
containers:
- name: primary
image: busybox:1.28
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/server.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: logdir
mountPath: /var/log
- name: secondary
image: fluent/fluentd-kubernetes-daemonset:elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "<elasticsearch-host-url>"
- name: FLUENT_ELASTICSEARCH_PORT
value: "<elasticsearch-host-port>"
volumeMounts:
- name: logdir
mountPath: /var/log
volumes:
- name: logdir
hostPath:
path: /var/log
After saving the configuration file as fluentd-sidecar.yaml
, you can create the Pod with the following command:
kubectl apply -f fluentd-sidecar.yaml
A sidecar with a bundled logging agent can isolate applications better than the one that uses streaming. It’s also flexible enough to handle logs from each application Pod differently.
Benefits of Centralized Logging for Kubernetes Logging Tools
A cluster-level logging setup uses a centralized logging backend to remove the limitations of short-lived container logs. This has clear advantages over basic node-level logging provided by the default configuration. One of these advantages is not storing logs in the node, which means there’s no need for log rotation.
With centralized logging, container logs live beyond the lifespan of the container, the Pod, and the cluster node. Centralized logging platforms also offer the following benefits:
- Visualization of logs through charts and dashboards
- Powerful query engines and custom parsing capabilities
- Automatic correlation of log events to identify anomalies, patterns, and trends
- Ability to create alerts based on specific log event criteria
You can use many commercial and open-source frameworks and tools for implementing a centralized logging backend. Such logging platforms can work with both DaemonSet and sidecar patterns.
Conclusion
As we have covered in this series, Kubernetes-based applications require a different log management approach. The short-lived nature of Kubernetes containers means logs get lost when a container crashes, a Pod is evicted, or a node goes down. For this reason, node-level logging (which comes by default with Kubernetes) is not an ideal solution, particularly if you want to save the logs for later analysis.
Cluster-level logging using either the node-level agent DaemonSet pattern or sidecar pattern solves this problem. Although the sidecar pattern provides better flexibility than the node-level agent pattern, it uses more resources. Depending on the flexibility you need, you can use either pattern. Both patterns work well with centralized logging platforms.
Log your data with CrowdStrike Falcon Next-Gen SIEM
Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.