Like Tree1Likes

DNS Racing - providing services using Multi ISP, Load balancing, Fail over.

Closed Thread
Page 3 of 3 FirstFirst 1 2 3
  1. #21

    Join Date
    Dec 2002

    >> Still when the query hits a DNS resolver it will send out queries to all NS servers.

    I do not think bind and powerdns do this. They send the queries out algorithmically.

    BIND-8 maintains the list of name servers for a zone.
    When BIND-8 finds multiple name servers to resolve a
    request, it sorts the servers by the smoothed response
    time, and tries the servers in this order1. The smoothed
    response time is the average response time of this server
    in the recent past.
    The smoothed response time is maintained as follows.
    When a response comes back, the smoothed average response
    time, srtt, is computed using the exponentially weighted
    moving average:
    The algorithm of BIND-9 is a variant of the best-server
    algorithm. BIND-9 implements Equation 1 but not Equation
    2 3. The �� parameter for a name server is initialized
    to a small random value so that all name servers are
    accessed at least once. However, a recursive server will
    refer to only the best performing server, once �� of the
    other servers are recorded.
    This could be a problem in some situations. The algorithm
    is adaptive only when the best performing server
    slows down. It is unable to detect a situation where the
    performance of another server improves. It is possible
    that a recursive server switches to a non-optimal server
    under a short outage but never goes back to the best
    server thereafter.
    More good reading on:

    The response time of your server hosted on a 10mbps VDSL type connection, on the ISP's WAN might be a lot slower than say the DNS server hosted at another datacenter which peers directly with the ISP.

    Bind-9 makes it even worse if say your DNS server or link to it was down the first time your ISPs DNS server tried to contact it. From what I can understand, it will not be considered again.
    Last edited by shri; 01-06-2011 at 02:27 PM.

  2. #22

    Join Date
    Oct 2009
    Back in the US of A, home of the free...

    This still does not answer the question in relation to different ISP's in the same building all coming off the same NAP/Backbone. Say for example PCCW is the main telco or the tier 1 provider in that area, everyone else will be reselling PCCW's bandwidth.

    Go peek into the Demarc of your building's telco closet, you will see lots of internet lines from different providers, if you trace those back to the NAP, they will all be off one.

    Have you ever heard of DNS Spoofing MIM, or DNS poisoning? What are you going to do when DNS is fucked up and your clients are redirected to a site that looks like yours, enters in their personal information or worse banking logins. How are you going to explain that to the clients?

    You solution works on a really small scale (1 to 2 Servers), it is not scalable to large enterprises (100+ Servers). When you are talking about such a large infrastructure, we load balance the traffic inside and out, we also have redundant systems externally facing and internally, multiple firewalls/switches/routers both primary and hot standby where if one part fails, it routes traffic to the standby unit.

    We could lose 90% our servers, our primary core switches, and the client wouldnt see any connectivity problems. There might be some latency processing transactions when all the traffic which is equally distributed to 200 servers goes to 20 servers, but they would still have connectivity.

    I cant speak for others, but when I build networks, data centers, and NOCS, I plan/design/build cooling/power/connectivity/processing speed for 150% of anticipated needs.

    I'm not going to risk having clients go down to shave a few $$ off the top. As I said before, your plans might work on a small scale with little or no traffic, minimal hits, no backend processing. It will not work for large scale enterprises where there are tons of hits on the website, backend processing.

    Your thinking is still very localized in regards to saving money on infrastructure, do you turn off the HVAC to the servers at night cause its cooler?

    There is a right way and a wrong way to do things, and your DNS route is not the right way.

  3. #23

    Join Date
    Dec 2002

    >> You solution works on a really small scale (1 to 2 Servers), it is not scalable to large enterprises (100+ Servers).

    To his credit he does say it is a poor man's solution. Which is why I am struggling to figure out why it works, as it ticks the poor man check box for me and we do have multiple lines coming into the office.

  4. #24

    Join Date
    Mar 2010

    In bind 9.5 bind (and prior) once it had a historical value of how long it took a NS to reply it picked the fastest one... (whooo hooo!), The problem with that is that NS which had not been tried or had data for would have a "latency value of 0" which means that the DNS resolver would tend to quiery the same ones... and that has some security implications (which can't find any documentation for.)

    So i bind 9.6 what they did was ti put a random latency value on the DNS servers which had not been tried so they would get a chance... This will trigger the DNS resolver to try them, still ultimately it will return to ask the DNS server which is the fastest

    However what I see right now, given the mix of DNS server versions and types, is that what I say works, and works well. I sometimes see big shifts of traffic from one pipe to another... One of our connections is bigger lower latency so we do get a larger percentage of traffic on it.

    HKfoot, to answer your question.... you could use DNS racing to serve upto 13 different IP's in different data centers in different countries and then put load balancers behind IP with a few hundred servers... So tell me if that is scallable enough...

    Anyway gentlemen, it seems that I have brought you a "science fiction story", if you want you can try it, it is neither my gain or loss... I think I've spent enough time on here.

    I am taking the criticism good and bad from the guys at ISC (bind) a lot more seriously than you guys... So far no one has said it would not work. I got a full lecture on how the load balancing has to be done by the Software client on the desktop and other crap. That would only involve rewriting every single application that does lookups to be re-written or the kernel resolver would have to have the some of the intelligence of bind, again something that will not happen for years...

    Quite ironic, that I got this 30 minutes ago both netfront and Dyixian.

    hardware# host has address
    ;; connection timed out; no servers could be reached
    You have new mail.

    Last edited by Fenix2; 01-06-2011 at 03:58 PM.

  5. #25

    Join Date
    Dec 2002

    >> I am taking the criticism good and bad from the guys at ISC (bind) a lot more seriously than you guys... So far no one has said it would not work.

    Then why bother posting it here, just for the sake of the ad at the bottom of your post?

    Yes, it works, but I don't see the point of it all. Even IP over carrier pigeons works and again I don't see the point of it all.

    Your problem is you post a bunch of acronyms and tech speak and then when someone asks you why it should work, you take it all way too personally and get defensive and choose to post a 500 word essay as opposed to have more productive conversations with your friends over at ISC.

    I'm over and out from this thread.

    >> hardware# host
    >> has address
    >> ;; connection timed out; no servers could be reached

    host -v might be more helpful?

    Trying ""
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54906
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
    ;        IN    MX
    ;; AUTHORITY SECTION:        11    IN    SOA 2008010978 43200 3600 1209600 180
    Received 95 bytes from in 5 ms
    You seem to have weird issues both with the CDN ( which again no one has reported and it is monitored ) and DNS Made Easy's servers. Not sure what could be causing them. Just got back from a long overseas trip where I was pretty much in the middle of no where and had no connectivity problems with the CDN or DNS issues that seem to plague you.

  6. #26

    Join Date
    Mar 2010

    You said you would be interested in reading it. So I posted it.... I also poited it as people can benefit from it. What I can't do is sit here and convince you or others why it is a good idea or whether it works.... (I need a baseball bat). I say it does, it has done for 4 years.... You say it is round robin, traffic graphs say otherwise.

    As for DNS resolution, both Dyixian and Netfront are going through the same path to your DNS servers..... and that was clearly not working.... You see that is that 2% of the Internet you don't get to see. It was your turn. Just chance that it happens now. I probably should have tested from other HK providers....

    Last edited by Fenix2; 01-06-2011 at 05:05 PM.