nftlb benchmarks and performance keys

Benchmarks

Latests benchmarks, dated June 2018, show an important improvement of performance in using nftables as data path instead of iptables.

Given a testbed environment of 2 clients executing a HTTP load stress tool, 1 load balancer and 3 backends with an HTTP terminator delivering a response of about 210 bytes, we obtain the following benchmarks in HTTP flows per second:

iptables DNAT		256.864,07 RPS / cpu
iptables SNAT		262.088,94 RPS / cpu

nftables DNAT		560.976,44 RPS / cpu
nftables SNAT		608.941,57 RPS / cpu
nftables DSR		7.302.517,31 RPS / cpu

The figures above are shown in terms of per physical CPU as the scalability adding cores are almost linear. Although these benchmarks have been performed with only 3 backends, iptables performance will drop substantially while adding more backends, as they implies more sequential rules.

Those benchmarks were performed with retpoline disabled (no Spectre/Meltdown mitigations), but once they’re enabled the performance penalties detected in NAT cases with conntrack enabled for both iptables and nftables cases are much worse for the first one:

iptables: 40.77% CPU penalty
nftables: 17.27% CPU penalty

Performance keys

The retpoline penalties are explained due to the use of much more indirection calls in iptables than in nftables. But also, there are some more performance keys that will be explained below.

Rules optimization

The main performance key is the rules optimization. It was already known in iptables that the use of ipset boosts performance as it reduces the sequential rules processing.

In nftlb, although it could be extended to be used for other purposes, we set a basic rules per virtual service using the expressive language that natively supports the use of sets and maps. Please see below the generated rules for a virtual tcp service named vs01 with 2 backends:

table ip nftlb {
    map tcp-services {
        type ipv4_addr . inet_service : verdict
        elements = { 192.168.0.100 . http : goto vs01 }
    }

    chain prerouting {
        type nat hook prerouting priority 0; policy accept;
        ip daddr . tcp dport vmap @tcp-services
    }

    chain postrouting {
        type nat hook postrouting priority 100; policy accept;
    }

    chain vs01 {
        dnat to jhash ip saddr mod 2 map { 0 : 192.168.1.10, 1 : 192.168.1.11 }
    }
}

Once we need to add a new backend, just regenerate the associated chain to the virtual service without the inclusion of new rules and without affecting the rest of the other virtual services.

    chain vs01 {
        dnat to jhash ip saddr mod 3 map { 0 : 192.168.1.10, 1 : 192.168.1.11, 2 : 192.168.1.12 }
    }

Then, if a new virtual service vs02 needs to be created, then the ruleset becomes like it’s shown below, without the addition of new rules or affecting other virtual services:

table ip nftlb {
    map tcp-services {
        type ipv4_addr . inet_service : verdict
        elements = { 192.168.0.100 . http : goto vs01,
                     192.168.0.102 . https : goto vs02 }
    }

    chain prerouting {
        type nat hook prerouting priority 0; policy accept;
        ip daddr . tcp dport vmap @tcp-services
    }

    chain postrouting {
        type nat hook postrouting priority 100; policy accept;
    }

    chain vs01 {
        dnat to jhash ip saddr mod 3 map { 0 : 192.168.1.10, 1 : 192.168.1.11, 2 : 192.168.1.12 }
    }

    chain vs02 {
        dnat to jhash ip saddr mod 2 map { 0 : 192.168.2.10, 1 : 192.168.2.11 }
    }
}

Early hooks

nftables permits the use of early ingress hook that is used in nftlb during DSR scenarios.

Also, this early hook can be used for filtering purposes that boosts performance in cases of dropping packets. This is shown below with the most early stage of iptables and nftables cases in packets per second:

iptables prerouting raw drop: 38.949.054,35 PPS
nftables ingress drop: 45.743.628,64 PPS

Acceleration techniques

There is still more room for optimization, indeed, as nftables already supports fast paths and lightweight techniques that can be used for packet mangling. As examples of that are:

Flowtables. Conntrack fast path to delegate already established connections to the ingress stage without passing through the whole slow path. More info here.

Stateless NAT. For some load balancing cases, stateless NAT can be performed without connection tracking and from the ingress stage to obtain all the performance applied to NAT scenarios.

Share on:

Documentation under the terms of the GNU Free Documentation License.

Was this article helpful?

Related Articles