Slow DNS lookups can have many reasons. Mostly they are easy to fix because it simply is a wrong IP address of the DNS f.ex. But today i had a harder one, but was easy to fix if you know how…
Looking up an address with dig works all the time, but when puppet gets its plugins or other files from the puppet master, it takes five second for each file, and puppet fails with an “execution expired” error. So I started to debug it.
It’s most likely not an error within puppet, because i know that our CentOS 5 clients are working just fine.
Next, tcpdump. Took five seconds until i saw a connection on port 8140. As expected, not an error within puppet. Next, DNS. Now it’s getting interesting.
It sends out two DNS request, one for the A record and one for the AAAA record. This is the default since glbc 2.9 to do them in parallel. But i only get back one answer waiting for the second. Looks like a problem on your DNS proxy which sits on our firewall or the DNS behind. Not sure yet.
The man page of resolv.conf has two options which could fix the problem:
single-request (since glibc 2.10) sets RES_SNGLKUP in _res.options. By default, glibc performs IPv4 and IPv6 lookups in parallel since version 2.9. Some appliance DNS servers cannot handle these queries properly and make the requests time out. This option disables the behavior and makes glibc perform the IPv6 and IPv4 requests sequentially (at the cost of some slowdown of the resolving process). single-request-reopen (since glibc 2.9) The resolver uses the same socket for the A and AAAA requests. Some hardware mistakenly sends back only one reply.When that happens the client system will sit and wait for the second reply. Turning this option on changes this behavior so that if two requests from the same port are not handled correctly it will close the socket and open a new one before sending the second request.
The second option sounds exactly like my problem. Added “options single-request-reopen” to resolv.conf, and bang… Works!