Internet access for organizations today is no longer about connectivity for email and web browsing. A stable Internet connection is a vital component in the chain of IT systems required to conduct business. Typically, in the past, the focus around Internet connectivity has been on cost, with vendors providing solutions allowing organizations to spread their traffic across consumer and enterprise products.
This approach is all good and well and can provide significant cost savings, especially when employee traffic is directed over low-cost consumer products such as ADSL; however, when you are conducting B2B business through front-end servers hosted in your DMZ, resilience becomes a major concern. In this scenario, a dead Internet link can mean a loss of revenue and even, potentially more serious, brand damage. In this paper, we discuss several methods that can be used to improve the resilience of an Internet link. While this sounds like it should be a simple case of connecting to multiple Internet Service Providers, the devil, as they say, is in the detail.
Business networks have been mission-critical for some time now, and the focus on resilience and business continuity has always been top of any CIO’s mind. However, the general areas of interest for this focus were restricted to internal networks and systems. With more and more business being conducted directly via the web or B2B over Internet links to systems hosted in DMZ’s, it is simply no longer permissible for an Internet link to be down. Loss of access to the Internet can directly impact revenue generation, especially today as the business operating models begin shifting towards off-site cloud computing and software as a service.
A solution to the problem
Multihoming is essentially a method whereby a company can connect to more than one ISP at the same time. The concept was born out of the need to protect Internet access in either an ISP link failure or an ISP internal failure. In the earlier days of Internet access, most traffic was outbound except email. An Internet link failure left internal users with no browsing capability and email backing up on inbound ISP mail gateways. Once the link was restored, so was browsing and email delivery. The direct impact to the business was relatively small and mostly not revenue affecting. Early solutions to this problem were to connect multiple links to the same ISP, but while this offered some level of link resilience, it could provide no safeguards against an internal ISP failure.
Today, however, most organizations deploy a myriad of on-site Internet access services such as VPNs, voice services, webmail, and secure internal system access while also using business-critical off-site services such as software as a service (SaaS) and other cloud-based solutions. Furthermore, while corporate front-end websites are traditionally hosted offsite with web hosting firms, the real-time information on the corporate websites and B2B sites is provided by back-end systems based in the corporate data center or DMZ. Without a good quality Internet connection, these vital links would be severed.
Varied requirements and complexity
That said, the requirement for multihoming are varied and could range from the simple need for geographic link diversity (single ISP) to full link and ISP resilience where separate links are run from separate data centers to different ISPs. While the complexity varies for each option, the latter forms the most complex deployment option but affords the highest availability. The former provides some degree of protection but does require a higher grade of ISP.
A major component of the complexity comes in around IP addressing. The way the Internet IP addressing system works is that each ISP applies for a range of addresses from the central Internet registrar in their region. They would then allocate a range of IP addresses, called an address space, to their customers from this pool. It goes without saying that no two ISP’s can issue the same address space to a customer.
Why would this be a problem? Simply put, it’s all about routing. Routing is the process whereby the Internet finds out how to get traffic to your particular server. It’s a bit like the Google map for the Internet. For somebody to find your server, a “route” or path needs to exist to your server’s IP address. Since you are getting your Internet service, and hence your IP address space, from your ISP, they are responsible for publishing the route to your server across the entire Internet. They are effectively the source of your route, and nobody else can do that for your particular address space. You can see how things can go wrong if the ISP suffers some form of internal failure. If your particular route disappeared, your server would vanish from the Internet, even if your Internet link was up and running. This is precisely the kind of issue multihoming tries to solve, but we will start at the more simple options and work our way up for completeness.
Single Link, Single ISP, Multiple address spaces
While not a multihoming solution in the strictest sense of the term, the single link, multiple address option can be useful for small sites. In this scenario, the publicly accessible host is assigned two IP addresses from two different address spaces. You would, of course, need two address spaces from your ISP for this to work. Thus, theoretically, if a routing issue occurs that impacts one of the address spaces, the other may still be available. Of course, the single physical ISP link is a single point of failure, and this option would seem to offer little in the form of real resilience.
This scenario, generally called multi-attached, is a variation on the above. The site now connects through multiple links, each with a different IP address space, but still via a single ISP. If one of the links fails, its IP addresses will become unreachable. However, the other IP address on the remaining link will still be available, and your server would still be reachable. Internet Service Providers use a control protocol to manage their IP routes called Border Gateway Protocol or BGP. This protocol is used to manage the traffic re-routing over the live link. BGP can be complex and demands a lot from the equipment it runs on. Of course, with complexity comes a cost; however, this scenario’s BGP deployment is not as onerous as with a fully multihomed site and should not attract too much attention from the CFO. While the deployment is a simpler version of full multihoming, it does restrict the corporate to a single ISP, which may not be part of the business’s strategic intent.
Multiple Links, Multiple ISP, Single address space
This scenario is what is generally meant when discussing multihoming. The BGP protocol is used to manage the single address space’s visibility across the multiple links and ISP’s and, thus, maintain the routes. The BGP protocol communicates between the corporate routers and those of the two ISP’s with the protocol to detect a link failure and divert traffic to the functioning link, even if this is via a different ISP network.
What’s the catch?
There is always a catch, and in this case, there is actually a number of them. To run true dual ISP multihoming and BGP as a corporate, you would need your own Provider Independent (PI) IP address space, and you would need to apply for a unique BGP Autonomous System Number (ASN). The AS Number is used to identify your site as a valid Internet location in the eyes of BGP. While applying for an ASN is not an onerous undertaking, it does place some significant responsibility squarely with you instead of the ISP. Deploying BGP effectively brings your organization one step closer to the Internet by making you responsible for advertising your own public IP address spaces and, thus, your routes. It also means that any operational mistakes you make will spectacularly ripple through the entire Internet.
Address space considerations
Most large organizations that operate true multihoming already have their own Provider Independent address space. This is an address space that they requested directly from the local Internet registrar themselves some time ago before IP version 4 (IPv4) addresses started running out. Today it is virtually impossible to be allocated a PI address space from the IPv4 pool. It is possible to run a multihomed scenario by using ISP-provided IP address spaces. Still, the network configurations become considerably more complex and, at some point, start defeating the end goal of increasing resilience. In the real world, increased complexity seldom equates to improved resilience.
A true BGP-enabled multihoming deployment (often known as running defaults) will require hardware capable of storing IP routing tables of Internet-scale. This is desirable as it protects the organization from an internal ISP failure; however, it requires the routers on-site to be of a “carrier-grade,” in other words, big and beefy. The Internet routing tables are the massive and vast amount of processing power and memory required to run defaults. It is possible to run in a reduced route mode where only local prefixes are stored on the routers. Still, given the effort and expense of deploying a full multihomed solution, compromise should not really be part of the conversation.
While there are definite advantages to full multihoming, there are also some significant caveats. Complexity and Scaling aside, the real reasons and costs for considering multihoming should be carefully considered.