Redundant Servers And Domain Name Service

IsThisPageOk?? No one seems to care since Feb2004 when I put the info in the rest of this section

Following information last modified May 1997. Are there still people finding good advice from the contents? Are there friends of PaulChisholm who wish to alert him too? I put it on a DeletedUnlessDefended notice in Feb2004 but would rather prefer it to be refreshed and linked to other current info. -- dl

The page is exactly what I was looking for. But I would like to know if this technique of having more ip-adresses in DNS can be used with e.g. web-browsers to provide fault-tolerance, e.g. if most clients will continue with the next ip in the list if the first times out. What clients supports this? -- PeterFavrholdt

Anyone who's planning a system of redundant servers, and who knows a little about the Domain Name Service (DNS), will be tempted to use the second to implement the first. A logical name can be given to all the redundant servers (that is, the IP addresses of the servers can all be listed under the logical name). As long as at least one of the systems is up, it can be reached via the logical name. If multiple systems are up, DNS can distribute clients among them, providing simple load balancing. It's a very good start; but it's no more than that. Making such an approach work requires considerably more effort.

The key problem is this: While DNS will return one or more IP addresses corresponding to a logical name, it will not necessarily return IP addresses of hosts providing service. DNS makes no attempt to see if a system is reachable. In particular, gethostbyname(), the usual programmatic interface to DNS, returns (among other things) two values: an IP address for the name in question (call it the "suggested address"), and a list of IP addresses for that name. If one system (associated with the logical name) is up, its IP address will be in the list. However, the "suggested" IP address may not be a system that's up.

Therefore: If a group of servers are all listed in DNS under one logical name, clients must try all the name's addresses to guarantee finding an available host. ("Try all addresses.")

(This presumes some host is available. If not, nothing useful can be guaranteed. "Available" means the client can reach, not just a server, but the service in question on that server. Different IP addresses may not represent different hosts; they may represent different IP interfaces on the same host; for example, connected to the Internet via different service providers. There's still value in trying each address; each could correspond to different routes to the same system.)

So, non-obviously, server configuration impacts client design. The impact goes further than that, though. The client must be willing to try multiple IP addresses. This can lead to long connect attempts times (or requires the client to give up quickly when a given IP address is not available, often a poor strategy when network latency or server load are high). The client will often run into trouble (for example, connect() failures) that results from unavailable hosts, as opposed to other problems (for example, being unable to create a socket or look up host addresses) that make it impossible to reach any server.

Therefore: A client trying to contact one of several servers must be able to differentiate fatal errors, which likely will prevent it from reaching any server, from host-specific errors, which prevent it from reaching one host but may not prevent it from reaching another. ("Try other addresses when one fails.")

This difference is often subtle. It must be made by looking beyond "what failed," and concentrating on "why did it fail?" (I've seen a program that propagates errors too far, causing the whole program to exit before a second host is tried.)

DNS can be configured in many ways. It can be configured specifically to support load balancing; however, that's not the default. No matter how DNS and the servers are configured, client code can be made to work somewhat fairly, especially if DNS provides no such guarantees.

Therefore: To guarantee relatively fair use of different hosts, simulate round robin in the client. Get the list of IP addresses, start in the middle, and try them all. For example, try addresses in order, trying the first after the last. ("Simulate server load balancing in client.")

This approach has some interesting results. If DNS is configured to always return the same list in the same order, simulating load balancing will randomly pick a host. If DNS is configured to return addresses in random order, picking one of them at random will have the same effect: one IP address will picked at random. (The sum of two random numbers in the same range, modulo the greatest value, is random.) If DNS implements strict round robin, clients that pick the suggested address will conform with the server strategy, but clients that simulate load balancing will work against the server. Finally, if the name service implements a true load balancing strategy, offering addresses in order of decreasing load, client simulation of load balancing will pick a server at random (regardless of load) and tend to defeat the server strategy. Thus, this approach to client design is very sensitive to server design. This client strategy probably isn't suitable to a Web browser. It might be suited to small scale client/server systems.

Finally, distributed systems architects should be leery of trying to solve too many problems at the same level. The designs described above can be abstracted as, "Try to reach a logical server in as many ways as possible." These particular designs all tried to solve the problem with application level software or with the DNS system. Those aren't necessarily the right levels to approach what's effectively a switching problem.

Therefore: Consider non-software solutions to let clients reach a logical server.

As an example, say we're willing to give up load balancing and automatic fail-over. An IP address can be assigned to the logical server, and to the primary server system. When the primary system fails, a secondary system can be reconfigured to use the logical IP address.

There are probably many better ways to map a logical IP address to multiple physical systems, probably using routers, but they're beyond my competence to describe.

--PaulChisholm

Please add your comments on RedundantServersAndDomainNameService here!

It is a great idea. I am looking for the solution. -- Ian

CategoryAddress