Introduction
Over the past few years I’ve seen a number of cases where Unix systems have suffered serious outages caused by the loss of a primary name server. Such systems appear really slow, and often when used in conjunction with Samba or a remote name service such as Centrify servers may appear to hang.
The main reason for this is the manner in which Unix performs DNS lookups, by first looking at the primary name server, then trying the secondary etc. Since it is stateless, a successive lookup will hit the primary server before trying the second, even if it is not responding. Loss of the primary name server will cause all programs making a remote connection via hostname (not IP) will experience a DNS timeout delay before connecting.
On machines with a reasonable degree of DNS lookups, this eventually consumes a large amount of system resources as requests block and accumulate, and in some cases has resulted in servers running out of physical memory.
Name Service Caching Daemon
One solution is to use a name service caching daemon, there are a number available, and many Linux distributions include prepackaged solutions. Care must be taken to fully understand how these programs operate as they are often implemented lower in the service stack.
Using a bind cache to reduce the problem…
The simple and reliable solution is to install a local caching name server, a simple lightweight bind install configured to forward requests to the primary and secondary (and other) name servers, but only listening on localhost, and with zone transfers etc disabled for security reasons. Then the nameserver 127.0.0.1 is added to the servers /etc/resolv.conf to ensure it’s used. As bind obeys “time to live” cache times, there is no impact on name resolution accuracy.
On failure of a primary name sever, the local caching name will cache the secondary name server response, so successive lookups of the same address will return instantly. In addition most lookups will already be cached, hence temporary loss of the primary name server often goes unnoticed.
Caching Bind Config
The named.conf file for bind is shown below, the forwarders section should contain the list of name servers from the /etc/resolv.conf, the resolv.conf file should have name server 127.0.0.1 added before the other name servers.
options {
listen-on { 127.0.0.1; }; directory "/var/named"; dump-file "logs/named_dump.db"; forwarders { // LOCAL-FORWARDERS }; forward only; }; logging { channel "mainlog" { file "logs/named.log" versions 3 size 1m; print-category yes; print-severity yes; print-time yes; }; channel "querylog" { file "logs/query.log" versions 2 size 1m; print-category yes; print-severity yes; print-time yes; }; category queries { // Uncomment next line to log query messages. #querylog; null; }; category default { mainlog; }; };