Road to Highly Available Asterisk Server – Step 1 – Solid Network

I have been using Asterisk server’s on-off for a few years.  Nothing in production, but just for small projects.  The big reason is lack of it being HA or Highly available.  There is no multiple server architecture than is easily resilient and can take a loss of a server or multiple devices and/or failures.  And I do not see much out there.  So here is my shot at trying to harden and provide the basic ability to recover from common equipment and/or network failures.

The Network.  Before we work on any servers, install any phones, etc, we need a solid network.  Most of the time I see folks trying to identify the least expensive server and/or network devices.  However in my world, Corporate Enterprise, here lies the problem.  These are the items that fail most often.  So at a minimum.  You want two switches.  it would also be best if you do not chince out and get the least expensive one.  You need one with “Spanning Tree Protocol” (STP).  Why STP?  Because STP is what helps prevent loops in a switched network.  And we are going to create a loop on purpose and we want STP to stop it when we want it stopped, and enable links when we experience a failure.  Check out the following sites on Spanning Tree Protocol.

http://en.wikipedia.org/wiki/Spanning_Tree_Protocol

http://www.cisco.com/univercd/cc/td/doc/product/rtrmgmt/sw_ntman/cwsimain/cwsi2/cwsiug2/vlan2/stpapp.htm

So check out the following diagram

This is the basis for a simple and solid network.  Two switches.  Both running Spanning Tree.  Take a look at the two red cables.  They plug into the last port on each switch and plug in to the corresponding port on the 2nd switch.  Now if I plug one cable in, that is ok.  It simply is connecting the first switch to the second.  Normally we are ok here.  Now when we plug the second cable connecting the two switches together, we cause a “Loop”, and it will immediately cause the network to fail.  No traffic will pass.  Loops in a switched network = BAD = the worst thing you can do.

But sometimes we want loops.  Actually we want loops quite a bit.  Why?  Because we want path redundancy.  We always want multiple paths to get to where were want to go.  But in a switched network, one without a router in between, that would cause a loop, and we want STP to identify the best path, and close off the slowest path.  But be ready just in case there is a path problem, and then jump in and find another path.  All the while while not causing a loop.

So in our example, we want STP to detect the loop and automatically set one of the ports to “not forward” any traffic.  This way we can have multiple paths on our small switched network.  This is the redundancy we are now starting to look for.

Finally, notice the Asterisk servers with the GREEN links.  Normally we see a server with two NICS like this and its ok.  And we could connect the servers with two NICS, but we would have to have two IP addresses.  That would not work with Asterisk.  Why?  Because the phones are configured to use a single IP address with Asterisk.  Now some can use a backup IP, but we have two servers and that would be 2X2=4 IP addresses.  Now we are getting complicated.  We want to make this simpler and easy to recover from a failure.  Not more complicated.

Where are we going with this? Well I’ll tell you.  What if we could get each server to use both of it’s NICS and use only one IP?  Now we are talking.  Now we have gone from 4 IP’s to 2.  Finally, what if we could get each server to have a single virtual IP, and they both decide who is going to be the IP?  And it is in a coordinated fashion.  So if PBX1 is running the Asterisk server, and it has any type of hardware failure, or Switch #1 fails, or anything like that.  Then PBX2 automatically takes over the virtual IP.  That would be cool.  This way you could recover from a server failure or recover from a switch failure.  Automatically!   Now in some situations, especially with a switch failures, some of the phones connected to the switch are down.  That is the cool part of this.  You have 1/2 your phones/users on one switch and the other half on the other one.  You not all down.  Only partially down.  And that could make the difference of being in a bad spot the entire day, or just a blip in the radar for the day.  It’s all how you look at it.

Next Post will be how to configure the Asterisk server to use two NICS but only have one IP address. And we will connect each NIC to a separate switch.  This is called “bridging” and will be the next post.  Check back later.