An overview of the Kubernetes api logs. What fields are useful, and some places where log visibility might be missing in most environments.


K8s Api Audit logs

There are several log sources in a Kubernetes environment. One of the best sources to use during an incident to determine what has occurred on a cluster are the api audit logs. These logs contain all of the requests made to query or modify objects in the cluster. On the surface the api audit logs can be a bit overwhelming. Much like any log source, the more you look at it and the more you work with it the easier it becomes to quickly digest the event data. As a responder there are a few fields we will want to pay attention to: requestURI, sourceIPs, verb, user.username, userAgent.

fieldpurpose
requestURIThis is the resource requested answering the ‘what’ was requested
sourceipWhere the request came from
verbAnswers the question was this creating/modifying a resource or querying information about a resource
user.username‘who’ made the request
userAgentThis is the user agent of the application that made the request
responseStatus.code
annotations.authorization.k8s.io/decisionrbac decision, allow/forbid
annotations.authorization.k8s.io/reasonDescription on why a request was allowed
user.extra.authentication.kubernetes.io/pod-namepod where request originated
user.extra.authentication.kubernetes.io/pod-uidpod uid where request originated

Below is a sample api audit log event.

{
   annotations: {
     authorization.k8s.io/decision: allow
     authorization.k8s.io/reason: RBAC: allowed by ClusterRoleBinding "cluster-system-anonymous" of ClusterRole "cluster-admin" to User "system:anonymous"
   }
   apiVersion: audit.k8s.io/v1
   auditID: 5dbc599e-0726-45ae-a2e7-b1fb06c6e6f9
   kind: Event
   level: Metadata
   objectRef: {
     apiVersion: v1
     resource: secrets
   }
   requestReceivedTimestamp: 2022-06-17T12:01:14.222991Z
   requestURI: /api/v1/secrets
   responseStatus: {
     code: 200
     metadata: {
     }
   }
   sourceIPs: [
     172.20.39.107
   ]
   stage: ResponseComplete
   stageTimestamp: 2022-06-17T12:01:14.246392Z
   user: {
     groups: [
       system:unauthenticated
     ]
     username: system:anonymous
   }
   userAgent: curl/7.68.0
   verb: list
}

Digging into the event a bit we know the verb is list which tells us that this request is attempting to query data from the server. The username is system:anonymous which means no authentication was performed. The requestURI is api/v1/secrets, this api endpoint is used when attempting to retrieve all of the secrets from every namespace. If the requestURI was api/v1/namespaces/test/secrets then it would have been an attempt to just query secrets from the test namespace.

The responseStatus.code value is a 200, just like any other web response code a 200 means success. The two fields under annotations authorization.k8s.io/decision, authorization.k8s.io/reason tell us that the request was authorized from a RBAC pov. The reason goes into further detail of why it was allowed informing us that the ClusterRole cluster-admin is associated with the system:anonymous user via the ClusterRoleBinding cluster-system-anonymous

We can see the sourceIP is 172.20.39.107 which happens to be our load balancer in this scenario. If the request was internal to the cluster, it would have had an internal ip of the node the request originated from. This is important to note because in the event of malicious activity originating within a cluster which is also using the anonymous account, sourcing the k8s api requests back to the node is typically as close as you can get to answering “where” a request came from using the api audit logs. However, if a service account is being used the event will show the exact pod the request originated from as long as are current version of Kubernetes is being utilized. Older versions may not show this level of detail in the log output.

The userAgent value is curl/7.68.0. This field can easily be spoofed which makes it hard to trust that curl was used. However, it can often be used to link events together to understand activity originating from the same actor.

By using this information, we can infer the request more than likely originated from outside the cluster and the command used to make the request was something similar to this:

curl -k https://<KUBERNETES-API-LOAD-BALANCER-SERVER>/api/v1/secrets

Let’s take a look at one more log output:

{
   annotations: {
     authorization.k8s.io/decision: forbid
     authorization.k8s.io/reason:
   }
   apiVersion: audit.k8s.io/v1
   auditID: 4190c06f-5bb4-4654-a610-02af06ac0593
   kind: Event
   level: RequestResponse
   objectRef: {
     apiVersion: v1
     namespace: dev
     resource: pods
   }
   requestReceivedTimestamp: 2022-06-18T18:50:08.978700Z
   requestURI: /api/v1/namespaces/dev/pods?limit=500
   responseObject: {
     apiVersion: v1
     code: 403
     details: {
       kind: pods
     }
     kind: Status
     message: pods is forbidden: User "system:serviceaccount:dev:default" cannot list resource "pods" in API group "" in the namespace "dev"
     metadata: {
     }
     reason: Forbidden
     status: Failure
   }
   responseStatus: {
     code: 403
     metadata: {
     }
     reason: Forbidden
     status: Failure
   }
   sourceIPs: [
     172.20.59.66
   ]
   stage: ResponseComplete
   stageTimestamp: 2022-06-18T18:50:08.979502Z
   user: {
     extra: {
       authentication.kubernetes.io/pod-name: [
         nettools
       ]
       authentication.kubernetes.io/pod-uid: [
         340ea116-a70a-463c-90e8-f7819acf034f
       ]
     }
     groups: [
       system:serviceaccounts
       system:serviceaccounts:dev
       system:authenticated
     ]
     uid: eb04ad0d-29d6-4ea0-8df5-d666b26eebf9
     username: system:serviceaccount:dev:default
   }
   userAgent: kubectl/v1.24.2 (linux/amd64) kubernetes/f66044f
   verb: list
}

As we look through the event details, we can see that kubectl was used to query a list of pods within the dev namespace requestURI=/api/v1/namespaces/dev/pods?limit=500. The response was 403 because the service account (found in the user.username field) system:serviceaccount:dev:default does not have permissions to query for a list of pods within the dev namespace. There are some new fields in this event which help us to quickly identify which pod this took place in. Under the user section we see the following snip:

extras.png

The information in the extras section tells us the exact pod that was used. Since we know the pod name as well as the pod-uid it provides us the opportunity to check the Kubernetes server to determine if the pod is still running. If it is, could snapshot the underlying containers in the pod for further analysis.

Based on the information in the audit log it would appear the command used to create the event was similar to:

./kubectl get pods -n dev

And if we are lucky enough to have a 3rd part tool such as Falco or some other tool providing runtime detection capabilities we might have an alert such as this one to help confirm and validate our assumptions:

   output_fields: {
     container.id: eddba1ff1c56
     container.image.repository: docker.io/raesene/alpine-nettools
     container.image.tag: latest
     evt.time: 1655578208694767400
     fd.name: 100.96.1.23:37794->100.64.0.1:443
     k8s.ns.name: dev
     k8s.pod.name: nettools
     proc.cmdline: kubectl get pods -n dev
   }
   priority: Notice
   rule: Contact K8S API Server From Container

Kubelet logs

The kubelet is the control point for all the comings and goings on the node. Each node in the cluster will have a unique set of kubelet logs. Kubelet logs may be helpful when attempting to confirm when a particular container was started or removed. Depending on how it is configured the logs may have a lot of additional information available as well.

This page from RedHat has a handy reference chart for the different log levels available for the kubelet

Log verbosityDescription
–v=0Always visible to an Operator.
–v=1A reasonable default log level if you do not want verbosity.
–v=2Useful steady state information about the service and important log messages that might correlate to significant changes in the system. This is the recommended default log level.
–v=3Extended information about changes.
–v=4Debug level verbosity.
–v=6Display requested resources.
–v=7Display HTTP request headers.
–v=8Display HTTP request contents.

At the time of writing kops deploys the kubelet with a default log level of 2 /usr/local/bin/kubelet ... --v=2 unless otherwise specified. Below are some sample events that show a container being removed, the nettools pod being removed, and then a few seconds later the nettools pod being created.

Jun 18 19:42:16 ip-172-20-59-66 kubelet[5656]: {"ts":1655581336241.7104,"caller":"topologymanager/scope.go:110","msg":"RemoveContainer","v":0,"containerID":"9beafc9454c36cf07dec1793128629f2a80bb43ac65aac400458f44d398032a8"}

Jun 18 19:42:16 ip-172-20-59-66 kubelet[5656]: {"ts":1655581336244.2942,"caller":"kubelet/kubelet.go:2102","msg":"SyncLoop REMOVE","v":2,"source":"api","pods":[{"name":"nettools","namespace":"dev"}]}

Jun 18 19:42:34 ip-172-20-59-66 kubelet[5656]: {"ts":1655581354264.5642,"caller":"kubelet/kubelet.go:2092","msg":"SyncLoop ADD","v":2,"source":"api","pods":[{"name":"nettools","namespace":"dev"}]}

From a responder’s point of view, these logs are more than likely not the most helpful but may be a good reference point.


host logs / security tools

Most hosts within a Kubernetes cluster are going to be Linux. This means that most if not all of the log sources traditionally used to analyze a Linux host can be used to help paint a picture of what has occurred on a compromised cluster. The same applies to 3rd party security tools. We briefly talked about Falco earlier, but other tools such has ossec, sysmon, osquery, and <$$ VENDOR> can all be used to gain more insights into the malicious activity. Please make sure to understand the limits of the tools though, especially when it comes to paid vendor solutions. A lot of the big players are much stronger in Windows and have gaps in visibility when it comes to *nix systems.


Other places for logs

Depending on how the cluster is configured, and what support applications are being utilized there could be several additional places with valuable log information. Does the cluster use a service mesh, are there proxy containers deployed into every pod for the most part? If so, the proxy containers are great source of logs potentially. If it appears to be a pod or application compromise, do not forget to review the application logs and the logs for the application pod if it is still around.