Category Archives: PowerDNS

Protecting an Open DNS Resolver

As another piece of work I’ve been doing for the excellent Strongarm anti-malware team we recently converted the service so that it can be used to get instant protection wherever you are. Part of this involved my work in converting the core (customized) DNS server into an open resolver. This is usually strongly advised against as you can unwittingly become part of some very serious Denial of Service attacks, however in this blog post I show you how to implement some pretty simple restrictions and limitations to prevent this from happening so you can run a DNS open resolver without running this risk.

Linux Stateless Firewalling for High Performance

I’m currently doing a fun bit of consulting on high performance Linux with a great company called Strongarm. I’ve written a post on their blog about we went about adapting a standard linux firewall to make it much more efficient and less resilient to DDoS attack. In short, remove the connection tracking modules and easily do it yourself – but watch out for hidden traps especially on the AWS EC2 platform because it uses jumbo frames!

I’ve archived the content below:

When we were building Strongarm, we came across an interesting challenge that we hadn’t seen addressed before: how to make a Linux stateless firewall that guarantees performance and resilience. Below, I’ll explain how we went about exploring and eventually solving this problem and offer some specific tips you can apply if you are trying to achieve something similar.

Stateful Connection Tracking and Its Issues

I love Linux, and in particular its firewalling capability. The excellent iptables utility has so many extensions and features it’s hard to keep track of everything that it can do. However, one of the issues I’ve seen many times on high performance systems is that, while the Linux firewall behaves excellently in many common situations, there are some loads where it can severely degrade performance and even cripple your server.

By default, as soon as you load iptables, stateful connection tracking is enabled. This allows you to build a firewall that will totally protect your computer as simply as:

The first command allows any connections that are already established to be able to continue when the responses come in, and the second one says to drop anything else by default. This will let you talk to anyone else on the network, but not let anyone else talk to you. There are obviously much more complex firewalling setups, but unless you explicitly disable connection tracking, it will always be there keeping a list of which connections have been established to or from your computer and what their current states are.

Usually this is not a problem, but under high performance situations or in the event of  an attack, you can sometimes see the dreaded “ip_conntrack: table full, dropping packet” error in the kernel logs. This probably means that someone tried to connect to your server but couldn’t. In other words, it seems to at least some part of the internet that your server has gone offline! The purpose of this is to prevent DoS attacks. Linux limits the number of connections that it tracks so that it doesn’t use all of the system’s memory. You can see what your connection limit is on newer kernels by running:

(If you’re having this issue right now, as a temporary workaround, you can write a larger value, perhaps echoing 5000000 into that file to raise the limit. For a longer-term solution, please read on.)

Being able to track 250,000 connections simultaneously might seem like quite a lot, but what exactly is a connection? The answer is relatively easy to define for a stateful protocol like TCP, however in a stateless protocol such as UDP or ICMP, all you can do is say, “We didn’t see a packet in the past X seconds – I guess we don’t have a connection anymore.” How many seconds by default?

Now, protocols such as UDP or ICMP are easily spoofed (i.e. the source address can be easily faked). Since UDP has 65,000 ports, this basically means that if you can send roughly 50,000 UDP packets (one per port) from 5 servers or spoofed addresses in a period of less than 30 seconds, you will cause the Linux connection tracking table to overflow and effectively block any other connections for that time. This is pretty easy to do. A UDP packet only needs to be 28 bytes, and 28 * 250,000 = 7Mb, or 60Mbit of data. In other words, on a 100Mbps connection, or from 10 people’s 10Mbps connections, you can send enough data in about 0.5 seconds to take a server offline for 29 seconds. Oops!

It’s not just deliberate attacks that can cause this, either. Many services that simply get lots of connections‚especially DNS servers (because they work over UDP)—can easily hit this limit because they’ve been left with stateful connection tracking enabled by default. Fortunately, this is quite straightforward to disable, and as long as you do it the correct way, there should be no downside to it. Moreover, as you might imagine, tracking a number of connections can take quite a lot of processing power, which is another reason to disable stateful connection tracking. In a simple test of UDP traffic we were able to achieve a 20% performance increase by disabling stateful connection tracking on our firewall.

There are, of course, some situations where you can’t use a stateless firewall. The main scenario in which a stateless firewall won’t work arises when you need to NAT traffic. This can be the case, for example, when you are configuring the server as a router. In this case, the kernel needs to keep track of all the connections flowing through the router. However, for many situations, you can convert your firewall to be stateless with very little hassle. Below, we’ll explain how to do this.

How to Convert a Stateful Firewall to Stateless

Let’s take a slightly more complex example than above for a web server:

Simple enough to understand (hopefully).

The first thing to do is to allow TCP (stateful) connections to keep working as they always have, but without tracking their state. We can do this by changing our “-m state” lines to look something like:

This means, “If the TCP connection is already established, let it through,” (i.e. it doesn’t have the SYN flag set). This will mean that all TCP connections work exactly as before.

However, because we don’t have a state for UDP connections, we have to flip the rules around. For example, in the above code, we are saying, “Allow outbound connections to port 53,” so we now need to add a rule that also states, “Allow inbound connections from port 53.” This means you will need to add the following rule:

One other thing that recently caught us by surprise is that you need to allow certain types of ICMP signalling traffic:

(Your stateless firewall will work fine without this 99% of the time. However, at Strongarm, we hit an issue on EC2 with our servers when using jumbo frames (packet size 9000) that were trying to communicate with the internet (packet size 1500) over the https protocol. EC2 tried to tell us using ICMP to make our packets smaller (the pmtu protocol), but because we were dropping it all automatically, we didn’t receive those packets, and so we couldn’t speak https with the internet. D’oh!)

Finally, we add in the magic that tells conntrack to not run, using the special “raw” table:

And you’re done. For an extra few lines of firewall code, you can achieve a 20% performance improvement in packet processing speed, lower memory usage, greater resistance to withstand DoS attacks, and much better scalability.

Now let’s put this all together to give you a final script that you can use to build an awesome firewall:

Blacklisting domains using PowerDNS Recursor

I recently had a client that was wanting to provide a recursive DNS service within his company, however wanted to blacklist a lot of domains to redirect internally. And I mean a lot – over 1 million porn/spam/… domains. It’s one thing to use the excellent unbound recursive DNS software and set up the blocks using the local-data argument, but it was requiring over 6gb of memory to load the list and crashing the process because of that.

As it turns out, yet again PowerDNS to the rescue. I love PowerDNS for its flexibility (so much so that I created a very high performance DNS backend for it and also run a company consulting on DNS deployments).

As the full list of domains would not fit in memory we had to use a database, I took inspiration from a previously posted Lua script which used a tinycdb. Unfortunately tinycdb requires manual compilation and so wasn’t an option, and as the client already had the list of domains in a MySQL deployment we ended up using that. Both PowerDNS authoritative server and recursor can support Lua scripting to do pretty much anything you need, and there are a number of database options that Lua can use. I started off using luadbi as it seemed to have a nicer interface, however unfortunately luadbi only supports lua 5.1 whereas the debian build of PowerDNS uses lua 5.2. This meant switching to use luasql which is a bit lower-level.

So, the following Lua script will redirect any blacklisted subdomains (in the mysql table domains with field name) to 127.0.0.1:

As establishing a MySQL connection each request is quite a high overhead it might well be worth switching to use SQLite in the future, this should be very simple to do by just changing the driver name and connection string.

Ultra-high performance with PowerDNS

I love PowerDNS, it’s so flexible through the multiple backends that you can do pretty much whatever you want with it. When we had a nightmare weekend at 123-reg several years ago, I designed and built the next generation of the DNS infrastructure on PowerDNS because it would connect straight into our main DNS databases and we could easily change schema etc without having issues.

However, because of this flexibility sometimes PowerDNS has some performance issues. When designing the second generation of high-performance DNS servers for the HostEurope group I tested a number of other opensource servers. The highest performing by far was nsd however it used a lot of memory and wasn’t able to give us the flexibility that we required.

Looking at PowerDNS it had a number of scaling issues – up to 4 or 6 cores it was fine but after that it just couldn’t scale any further. So, working with a number of tools such as valgrind and perf I identified a number of scaling issues with the distributor code and created a series of patches which were rolled in to PowerDNS Authoritative 3.3 and 3.4 and enable PowerDNS to scale easily to 32 or 64 cores. I also came across a Linux kernel issue with locking (actually an improvement in the packet handling code) which meant that the boxes couldn’t handle more than about 250k PPS before the thread contention on the socket lock hit a limit. Fortunately with Linux 3.9 the patches that the Google guys had produced a few years ago finally got incorporated and so the SO_REUSEPORT patch was also added to PowerDNS.

Finally, I found a very high performance key/value database (lmdb) which is very reliable and fast (sqlite was limited to about 3 or 4 cores, BDB always corrupts itself, KoyotoDB is nice but updates cause a lot of contention and I’m not sure if it’s being actively developed any more). And the result is a high performance backend for PowerDNS (lmdbbackend) which can scale to over 1m QPS/server with instant updates and low response times. The best way to avoid an embarrassing DNS DDOS is to have enough capacity to respond to all queries that are thrown at the server, after that you can try to filter the big hitters. Unfortunately many anti-DDOS appliances that we tested turned out to fail quite badly on UDP-based attacks, or attacks at > 1m PPS so it’s often best to not put them in front of infrastructure that may sustain such an attack.

If you want professional consulting and assistance with setting up a high-performance resilient DNS infrastructure please contact me through my consultancy company dns-consultants.com