Hands on with Calico’s eBPF data plane native service handling

The Calico 3.13 release introduced an exciting new eBPF based data plane option targeted at those ready to adopt newer kernel versions and wanting to push the Linux kernel’s latest networking capabilities to the limit. In addition to improved throughput and latency performance compared to the standard Linux networking data plane, Calico’s eBPF data plane also includes native support for Kubernetes services without the need to run kube-proxy. In this blog post I’ll explore what benefits this new service handling provides compared to kube-proxy, and get hands on to see the differences first hand. Two features I’m particularly excited to explore are source IP preservation and Direct Server Return (DSR).

Kube-proxy and source IP

One frequently encountered friction point with Kubernetes networking is the application of Network Address Translation (NAT) by kube-proxy to incoming network connections to Kubernetes services (e.g. via a service node port), which in most cases has the side effect of removing the original client source IP address from incoming traffic. This means that Kubernetes network policies cannot restrict incoming traffic from specific external clients, since by the time the traffic reaches the pod it no longer has the original client IP address. In addition, for some applications, knowing the source IP address is desirable or required. For example, an application may need to perform geolocation based on source address.

To understand this limitation further, let’s start by taking a detailed look at kube-proxy’s default NAT behavior. A request is sent from an external client to an endpoint for a service within a Kubernetes cluster. This inbound traffic to the service will reach kube-proxy, where two kinds of NAT will be applied. Destination Network Address Translation (DNAT) is used to map the destination from the service or node port to whichever specific pod kube-proxy has chosen to load balance the connection to. In addition, application of Source Network Address Translation (SNAT) replaces the client source IP address with the local node IP address.

The SNAT is done so that the service pod will send its response back to the original node, where the DNAT (and SNAT) can be reversed before the response is then forwarded back to the client. If the DNAT were not reversed the client would not recognize the response traffic because it would have a different IP address than the client thought it was connecting to.

Calico native service handling

Calico’s eBPF dataplane makes several changes to this model. One of the most significant differences is in swapping out the Kubernetes kube-proxy with native service handling. When an incoming connection is received for a service, Calico’s eBPF data plane makes its load balancing decision, then forwards the packet to the node hosting the chosen service pod. The forwarded packets are received by the Calico eBPF data plane on the node that hosts the service pod, which applies the DNAT (changing the destination IP from the service to the pod’s IP) before forwarding to the service pod. Because only DNAT is applied, the source IP address is preserved on the request. The response from the pod has a reverse DNAT applied by the eBPF program, at which point the response can be returned directly to the client (if your configuration allows it, otherwise the response is returned via the original ingress node). Routing the response directly back to the client is known as Direct Server Return (DSR).

Hands on

It’s relatively easy to try both of these configurations out and directly compare the differences between the two modes of operation. The rest of this post will explore some of these differences with a simple installation you can try out in your own cloud or virtual environment. Following the Calico docs instructions for trying out the new eBPF dataplane, set up a cluster with a minimum of a master node and one worker node. In addition, create two VM instances outside the cluster to be used as clients for making service requests.

For this example, the nodes I created had the following IP addresses:

  • k8s Master: 172.31.1.8
  • k8s Node: 172.31.1.207
  • client1: 172.31.1.55
  • client2: 172.31.1.162

To demonstrate the differences between the eBPF and standard kube-proxy models, we’ll start up a basic single instance Nginx deployment and use a service with a node port to make it accessible from outside the cluster:

master:~$ kubectl apply -f - <<EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      nodePort: 30604
EOF 

Verify that the service is running:   

master:~$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        2m
nginx        NodePort    10.102.27.139   <none>        80:30604/TCP   22h

master:~$ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           2m

master:~$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-d46f5678b-c2zqg   1/1     Running   0          2m

From each of the clients connect to the Nginx service via the node port on the master with:

client:~$ curl http://172.31.1.8:30604
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
...
</body>

You should see responses to both of the clients. Now check the Nginx logs on the Kubernetes cluster, and note that the source IP of every request has been forwarded to the individual service.

master:~$ kubectl logs -l app=nginx
172.31.1.55 - - [24/Apr/2020:03:08:16 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.65.3" "-"
172.31.1.162 - - [24/Apr/2020:03:10:56 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.65.3" "-"

As expected, with the eBPF dataplane in place the source IPs are passed through all the way to the pod.  Next we’ll create a network policy to restrict connections to just the Nginx port, and from only one of the clients.

master:~$ kubectl apply -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: nginx
spec:
  podSelector:
    matchLabels:
      app: nginx
  Ingress:
  - from:
    - ipBlock:
        cidr: 172.31.1.55/32
    ports:
      - protocol: TCP
        port: 80

Now if you try to connect from the two clients only the one specified in the network policy will be able to connect. The network policy will deny the response from the second client.

You may recall from the eBPF diagram that it’s possible to perform Direct Server Return (DSR) from the application node, rather than returning through the original ingress node. This reduces one hop from the path the response takes, making for a faster response. In the current configuration the dataplane still returns data through the node port on the master node that we connected to. Let’s measure the performance, then enable DSR and measure the improvement.

client:~$ for i in {1..10000} \
do 
    curl -w "%{time_total}\n" -o /dev/null -s http://172.31.1.8:30604 \
done | jq -s add/length
0.0018143528999999977

This script makes 10000 connections to the service, collecting the total time as a list, which is then averaged using a simple jq script. On average, each connection takes about 1.8 ms.

Enable DSR in the Calico manifest.

master:~$ kubectl set env -n kube-system ds/calico-node \ 
              FELIX_BPFExternalServiceMode="dsr"

Run the performance test again from the client:

client:~$ for i in {1..10000} \
do 
    curl -w "%{time_total}\n" -o /dev/null -s http://172.31.1.8:30604 \
done | jq -s add/length
0.0015947682999999972

This results in a response time of ~1.6 ms, an 11% decrease using DSR. Next, let’s compare kube-proxy’s service handling with Calico’s eBPF implementation.  To do this we’ll need to disable Calico’s eBPF magic and re-enable the original kube-proxy.  Use these two commands to switch to this classic model:

master:~$ kubectl set env -n kube-system ds/calico-node \
              FELIX_BPFENABLED="false"
master:~$ kubectl patch ds -n kube-system kube-proxy \
              --type merge \
              -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": null}}}}}'

Now try to curl from both of the clients again. Notice that both are blocked by the network policy. Kube-proxy is applying both SNAT and DNAT to the request, removing the source IP. Since the network policy is applied to the source IP of the traffic, and the original source has been replaced with the master’s IP, the traffic into the pod is blocked by the network policy.

We can re-enable traffic by modifying the network policy to remove the client IP restriction, leaving us just with the application port for the Nginx service:

master:~$ kubectl apply -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: nginx
spec:
  podSelector:
    matchLabels:
      app: nginx
  ingress:
  - ports:
      - protocol: TCP
        port: 80
EOF

Curl from all of the clients again, you will be able to connect from both again. Grab the output from the nginx pod logs, and notice that both of the requests are coming from the private IP of the K8s master node:

master:~$ kubectl logs -l app=nginx
...
172.31.1.8 - - [24/Apr/2020:03:41:10 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.65.3" "-"
172.31.1.8 - - [24/Apr/2020:03:41:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.65.3" "-"

To achieve the same functionality of allowing traffic from only one client we could use Calico network policy, which is richer than Kubernetes network policy, and can be applied to the host itself to restrict access to the node port. But applying policy to hosts rather than pods introduces a little more operational complexity when deploying a new service. Whereas Calico’s eBPF data plane native service handling source IP preservation makes writing and managing access much easier, using standard Calico or Kubernetes network policies applied to pods.

Let’s measure the latency of accessing the service using kube-proxy:

client:~$ for i in {1..10000} \
do 
    curl -w "%{time_total}\n" -o /dev/null -s http://172.31.1.8:30604 \
done | jq -s add/length
0.001875712099999999

Kube-proxy takes about 1.9 ms on average. So Calico’s eBPF data plane native service handling with direct return reduces the latency by about 16% in this simple example.

In a more complex deployment, depending on your network, size of cluster, and volume of network traffic, you might see greater gains.  In addition, though I didn’t try to measure it in this blog, there are CPU gains associated with Calico’s new eBPF data plane. You can read more about them in this introduction to the eBPF dataplane by one of the Calico engineers.

Summary

For those ready to adopt newer kernel versions, Calico’s new eBPF data plane offers improved performance of network routing and filtering operations while also improving on existing security models. One of the ways Calico’s eBPF dataplane realizes these improvements are through source IP preservation and Direct Server Return. We’re excited to be bringing these features to Kubernetes clusters as a production ready feature in the near future. In the meantime, you can try the eBPF dataplane out as part of the Project Calico 3.13 tech preview. Check it out, and let us know what you think. 

You can connect with other Calico community members over in the Project Calico Slack channel, or get involved in larger discussions at our community discussion board. To stay up to date with blog posts, community meetings, and free training, be sure to connect with the @ProjectCalico on Twitter.


If you enjoyed this blog post then you may also like:

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!

X