Starting a WISP: guide to selecting a routing architecture

Understanding the choices – why is routing design so important?

Routing is the foundation of every IP network. Even a router as small as the one in your home has a routing table and makes routing decisions.

Selecting a routing architecture is a critical but often overlooked step to ensure that a startup WISP can provide the necessary performance, scalability and resiliency to its subscribers.

This post will go through each the major design types and highlight pros/cons and when it is appropriate to use a particular routing architecture.

A note on IPv6

Dual stack is assumed in all of the designs presented. The cost of IPv4 public will continue to climb.

It’s no longer a scalable option in 2020 to build an ISP network without at least a plan for IPv6 and ideally a production implementation.

1. Flat network (aka bridged network)

“Behind the L3 boundary, there be L2 dragons”

-ancient network proverb

Unfortunately, this is often the worst choice for all but the smallest WISPs that don’t have any plans to scale beyond 1 to 100 subscribers.

Bridged networks with one or more subnets in the same L2 broadcast domain are the most commonly deployed routing design that we see in day to day consulting working for WISPs.

Bridged networks are attractive because they require minimal networking knowledge to get up and running.

These networks have a number of limitations in scale and performance and are susceptible to loops. They also can cause RF problems with the number of broadcasts sent across all towers.

This drawing is from the blog post ‘ WISP Design – Migrating from Bridged to Routed’ and has more information on issues with bridged networks and how to migrate a current bridged network to routed.

This is not an ideal choice for a startup, because it almost guarantees you’ll need a disruptive, time consuming and expensive migration once the subscriber count starts to grow.

CAUTION: “for-profit” WISPs – it is **NOT** recommended to deploy this design.

When should I deploy this network type?

Now that the “Most of you probably shouldn’t do this” warnings are out of the way… there are a few corner cases of WISPs that are for government use, non-profits, research, etc that this design can be a good fit for.

Use this design when:

  • simplicity is the ultimate goal (way beyond all others)
  • the WISP will *never* go beyond 100 subscribers
  • the WISP will be managed by someone without a networking and/or technical background
Click here for a PDF version of this drawing

2. Static routing

Static routing is a *slight* step up from a bridged network.

With layer 3 separation between the towers, the risks of major performance issues with growth go way down.

However, the administrative burden of growth is still an issue with static routes.

This design can be used for a very simple network with only a few routers until a dynamic routing protocol can be configured.

When **NOT** to deploy this design

  • Startup WISPs that plan on having more than one geographic IP transit location should consider one of the two BGP based designs as there are better policy options to influence routing.
  • Startup WISPs that expect rapid growth will not want to use this design, it doesn’t scale well and is difficult to manage for more than a few routers. (< 5 routers)
  • WISPs that want to dynamically failover between backhaul links can quickly get into issues when trying to manage failover for more than one or two routers with static routes.
  • If traffic engineering is desired, this is not the right design

Note: Static routing in this context means static routes for all subnet reachability – it is not meant to include a static route (when needed) to as a default on an isolated management network or for a DIA circuit

When should you deploy this design?

  • A small network that will never exceed 1 to 5 routers
  • An extremely lossy and/or high-latency RF network that will cause issues with dynamic protocols.
  • If knowledge of OSPF/BGP becomes a roadblock in getting a routed network up and running
  • If routing is a knowledge gap, use this in the first 30 to 60 days to test radios and Internet access while working on dynamic routing. Don’t leave a network that is intended to scale on static routes.
Click here for a PDF version of this drawing

One of the questions we are often asked is:

Do I really need dynamic routing for a WISP that’s very small?

The answer lies in the drawing above…

it’s easy to see when looking at this drawing how complex and cumbersome static routing can become even for just 2 to 3 routers.

Dynamic routing using OSPF

Open Shortest Path First or OSPF is an interior gateway protocol defined by RFC2328 for version 2 (IPv4) and RFC5340 for version 3 (IPv6).

Without going into an enormous amount of detail about the background of OSPF and IGPs in general (RIP, EIGRP, IS-IS), which is out of the scope of this post, here are a few key points about the protocol:

OSPF overview for WISPs

  • Uses Dijkstra’s shortest path first algorithm (developed in 1956 – long before IP networks) to compute paths in the network.
  • Relies on ‘cost‘ which is an arbitrary value that can be set to mirror the speed of a given RF link (typically under ideal conditions) if desired to better reflect the “best” path through a series of backhauls.
  • A link-state protocol – this means that OSPF is concerned with the speed, topology and current state of every link on every router within an area. Due to this behavior, flapping RF links can sometimes case a “ripple” effect of bouncing routes across an area. This is one good use case to put non-core routes at towers into separate areas. (alternatively if BGP is used, only transit/loopback routes remain in OSPF so the net effect is similar)
  • OSPF does a great job at mapping out paths, speeds and reachability for subnets , this is why it’s often the first dynamic routing protocol many WISPs first learn. It also has significant limitations when used as a protocol for policy, which BGP is better suited for – the next design will show them paired together to get the best mix of reachability/paths + policy.

When **NOT** to deploy this design

  • Startup WISPs that plan on having more than one geographic IP transit location should consider one of the two BGP based designs as there are better policy options to influence both default and tower routing.
  • A WISP that plans to offer private L2VPN/L3VPN services should use the iBGP/OSPF/MPLS design in the next section.
  • A WISP that has or will have a complex tower topology with many redundant paths and needs a heavy focus on traffic engineering should consider the eBGP design in the last section.

When should you deploy this design?

  • A WISP that has no plans to offer private L2VPN/L3VPN services can successfully use this design
  • A WISP that will have mostly a non-mesh or significant partial-mesh physical topology of towers and backhauls – essentially this means that policy will likely not be required for traffic engineering and redundant RF PTP links can exist in standby and not be used towards aggregate capacity.
  • If the amount of IPv4 space used by a startup WISP will be less than a /24, then MPLS/VPLS is not required to improve IPv4 subnetting efficiency. While it is possible to deploy MPLS with only OSPF, the next section on iBGP will recommend the deployment of MPLS/LDP with iBGP/OSPF.
Click here for a PDF version of this drawing

Dynamic routing using iBGP/OSPFv2 & v3/MPLS

Border Gateway Protocol or BGP is an exterior gateway protocol defined by RFC4271 for IPv4 and RFC2545 for IPv6. Although BGP started out as a protocol that was intended only for use on the Internet between public ASNs, it quickly became used in a variety of network types due to the policy options it offers. Policy describes BGP in a nutshell, it isn’t concerned with link speeds, physical topology or link state…BGP is purely focused on policy and the best path algorithm.

Multiprotocol Label Switching or MPLS is a forwarding protocol that assigns labels to routes and allows for the abstraction of different services carried in an overlay on top of the routed/label-switched core. MPLS is defined in RFC3031.

Similar to OSPF, I won’t go into an enormous amount of detail about BGP and MPLS, which is out of the scope of this post, but here are a few key points about each protocol:

BGP overview for WISPs

  • BGP uses TCP port 179 to exchange routing information with another BGP speaker.
  • Internal BGP vs. External BGP has little to do with whether it’s used on the Internet. It simply means either a peering within the same ASN (internal) or a peering between different ASNs (external). There are a number of differences between the two, but the most fundamental is the next hop behavior – eBGP rewrites the next hop to be the local router before advertising the route whereas iBGP does not change the next hop by default.
  • Internal BGP uses recursive routing (next hop is not directly connected and requires route lookup) and does not change its next hop. OSPF is used to advertise the next hop so the next hop (typically a loopback) is always reachable. If you reference the drawing for this design section, you’ll notice the routes are either green (BGP) or blue (OSPF) to help clarify how they work together
  • Internal BGP relies on route reflectors (RR) to manage all routes in an ASN. This helps to avoid a full mesh of peerings. Normally a router in the core will act as the route reflector. One of the advantages for a WISP of using RRs is simplified tower router configs – they will always peer to the same pair of RRs regardless of location.
  • Using BGP in a WISP allows for a number of policy options to make selection of a default route or influencing a tower path much easier.
  • One of the greatest benefits of using BGP in a WISP is the simplification of routing protocols for host and subscriber subnets. Once routing is BGP end to end from the peering to the upstream router all the way to the last mile at the tower, it becomes easier to manage and apply policy.
  • BGP also offers better scaling options to grow than OSPF on a WISP network. I’ve consulted for a number of WISPs over 10,000 subscribers (which is a fairly sizeable WISP) and the vast majority of them run BGP due to policy and scale limitations of OSPF.

MPLS Overview for WISPs

  • MPLS requires a signaling protocol to exchange labels. Label Distribution Protocol or LDP is most commonly used. LDP uses TCP port 646 to build sessions and UDP port 646 for discovery. LDP is what takes the route information from other protocols like OSPF and BGP and assigns labels to it for forwarding.
  • MPLS is an incredibly useful tool for WISPs because it allows for the network to be sliced into virtual segments at layer 2 (L2VPN) or layer 3 (L3VPN) to deliver services that subscribers may ask for, or for internal needs like isolated management routing tables or VRFs (Virtual Routing and Forwarding).
  • One of the most helpful features of MPLS for a WISP lately has been the use of VPLS (L2VPN) to make subnetting a public IPv4 block more efficient. Traditionally, WISPs would try and break up a public block into small subnets to have public IPv4 available at every tower. As IPv4 became more scarce and costly, this became much harder to do. VPLS solves this problem by hosting the subnet at a data center or central point and then extending Layer 2 to wherever it is needed. This service is highlighted in purple in the drawing

BGP – More information

Network Collective

MPLS – More information
MikroTik US MUM 2016 – Dallas, Texas

MPLS Overview, Design and Implementation for WISPs

When **NOT** to deploy this design

  • If a WISP feels this is more complex than they can handle, the previous OSPF design can be used with loopbacks to prepare for the addition of BGP and MPLS at a future date. While it will require some migration time, it still provides a path forward to enable more advanced services.
  • If your focus is mainly residential, your subscriber count will never exceed 1000 subscribers and your tower topology is relatively simple, then you can likely use the previous OSPF design successfully.
  • If Traffic Engineering without using segment routing or MPLS TE is the single most important design requirement, then the eBGP design in the next section is the best choice.

When should you deploy this design?

  • All the time! (almost…) This is probably the most common type of design we build and deploy as WISP consultants because it:
    • Scales well (we’ve deployed at the scale of thousands of towers)
    • Is easier to manage operationally – using route reflectors centralizes routing policy decisions.
    • Has the most options available to deliver overlay services which align with the ability of the business to rapidly bring a product to market.
    • Isn’t new – this marriage of BGP/OSPF/MPLS has been the foundation of most Telco and Fiber operators for more than a decade. The reason is simple – it works.
  • Deploying L3VPN services for management VRF or for business customers
  • Planning a hybrid build with fiber and will be moving to equipment that supports more advanced features like MPLS TE with Fast Reroute or Segment Routing (even if you can’t leverage all traffic engineering features day one – the design will be ready to grow into more capable gear)

Click here for a PDF version of this drawing

One of the questions we are often asked about this design is:

Should I start with all of these protocols from the very beginning. Why not just pick BGP or OSPF? Why do I need both?

The answer is twofold

  • It’s far easier to learn more advanced routing concepts when the network is smaller or a complete greenfield. At some point you’ll need more advanced tools and migrating 25, 50 or 100+ towers is expensive, disruptive and incredibly time consuming.
  • The reason you wouldn’t pick one or the other in this design is that each protocol focuses on its strengths – OSPF is excellent for path calculation and topology and BGP is excellent at policy. With BGP working on top of OSPF, you get the best of both worlds.

Dynamic routing using eBGP

This is definitely one of the newest designs we’ve worked with and deployed into production. It’s growing in popularity because using BGP for all routing actually simplifies the design quite a bit while still being able to deliver VPLS services. This design also allows for incredibly diverse traffic engineering options and full IPv4/IPv6 dual stack with BGP which is limited in some vendors.

Vendor note:
It’s worth noting that some of the limitations in the current version of MikroTik RouterOS 6.xx are what prompted this specific design

  • MPLS Traffic engineering has significant limitations and cannot enforce more than one policy between two points. It also cannot enforce policy across OSPF areas
  • Dual stack with iBGP in IPv6 will not be functional until Router OS Version 7 is in prof and out of beta.

Rather than list bullet points on this topic to illustrate why and how eBGP is useful for traffic engineering in WISPs, i’ll share the following video from:

MikroTik US MUM 2017 – Denver, Colorado

When **NOT** to deploy this design

  • If a WISP requires L3VPN and/or VRFs for management, then the previous iBGP design would be better suited for that
  • If BGP knowledge becomes a roadblock that cannot be settled or worked around, then moving back to an OSPF only design is an option (However, I always recommend using labs like GNS3 or EVE-NG to train engineers and techs to solve this problem)

When should you deploy this design?

  • When traffic engineering is the primary driver for the business, but utilizing equipment that supports segment routing/MPLS TE is out of the budget, this is a workable solution to have total control of traffic paths.
    • BGP Communities can be used to steer a subnet along any path desired
    • Traffic paths can be modified using communities for both traffic to and from the towers.
  • When IPv6 dual stack with BGP on the tower network is required (Again a limitation specifically attributed to MikroTik RouterOS v6 and fixed in v7 beta)

Click here for a PDF version of this drawing

How to choose?

Even with the shallow depth i’ve given the routing protocols, this should still highlight the pros and cons of each design and also illustrate the role of the network vendor in supporting the necessary protocols.

Take some time (even if you have to read this article in several passes) to really understand the business case of your WISP, what you sell and what’s important for you to develop in the future.

Then evaluate the protocols needed vs. the budget required for more advanced equipment and calculate the ROI of features and agility vs. equipment/licensing cost.

Good luck!

WISP Design – Migrating from Bridged to Routed

TFW adding the 301st subscriber to your bridged WISP….

Why are bridged networks so popular?

  • Getting an ISP network started can be a daunting task. Especially, if you don’t have a networking background.
  • Understanding L1/L2/L3 is not easy – I spent a number of years working in IT before I really started to grasp concepts like subnetting, the OSI model and Layer 2 vs. Layer 3. It takes a while.
  • Bridged networks are very attractive when first starting out. No subnetting is required and the entire network can be NATted out an upstream router with minimal configuration.

What does a “bridged” network look like?

  • Bridged networks use a single Layer 3 subnet across the same Layer 2 broadcast domain (typically over switches and software/hardware bridges) which is extended to all towers in the WISP
  • Bridging can be done with or without VLANs but they are most commonly untagged.
  • The diagram below is a very common example of a bridged WISP network.


What is the difference between switching and bridging?

These days, there isn’t much difference between the two terms, switch is a marketing term for a multiport hardware-accelerated bridge that became popular in the 1990s to distinguish it from hubs which did not separate collision domains. Both types can use VLANs, spanning tree and forward Layer-2 frames to multiple ports

  • Bridging (Software) – Most radios use some variant of linux and a bridge that is dependent upon the CPU to forward frames. Most routers will also allow ports to be bridged together in software and the speed is dependent upon system resources and load
  • Bridging (Hardware) – Most commonly, you’ll find this in vendors like MikroTik and Juniper. Certain hardware models allow the bridge to be offloaded into hardware instead of CPU so that frames can be forwarded at wire speed
  • Switching (Hardware) – This category includes most all ethernet switches. Frames are forwarded using ASICs that depend on a CAM table to hold MAC addresses.

What are some of the issues with bridged Networks?

  • Broadcast – One L2 broadcast will go through every radio and backhaul in a WISP network
  • ARP Traffic -Also an L2 broadcast type, ARP storms and heavy ARP traffic can easily cripple a bridged network
  • MAC Table size limitations – some equipment types have limited MAC table sizes. Often in CPU based equipment, the limitation is either RAM or default settings in the linux bridge which is typically limited to 1024 entries per bridge.
  • Scale – Typically anything beyond a /24 (254 hosts) will start to have issues without a number of L2 enhancements like client isolation, MAC filtering, etc. At some point those solutions don’t scale
  • Subnetting – ideally you don’t want multiple subnets on the same broadcast domain for security and isolation of failure domains
  • Performance – most routers are more efficient in routing packets vs. bridging packets in a larger network.
  • Security – Implementing security policies and isolating customers and protected infrastructure is much easier at L3

How does routing help?

Routing separates broadcast domains

Here is what a single broadcast domain looks like in a bridged network

And to compare, here is the same network but routed

How do I  Migrate?

Answer: Patience and planning (and VLANs)

The question you’ve probably been waiting for….how do I migrate? So the dirty little secret is that you don’t have to migrate all at once.

There are a few different ways you can use VLANs to migrate the network one tower at a time.

Where do I start?

  • Prep work – be sure to back up all configs and if possible, put together a diagram of your network that includes physical connections and VLANs / Subnets where applicable – this can be done with Visio, LucidChart or using mapping software
  • Pick a good time – Look at your monitoring software and pick a day and time that represents your lowest volume of traffic
  • Be realistic about time – If you think it will take 1 hour, plan for 4 – you’ll be amazed how often 1 hour turns into 4 🙂
  • Have a rollback plan – Understand what steps you need to take to roll back – even better, write it down!

Types of migration

Type 1 – Last mile back to the core – start at the very end of a chain of towers and work your way back in – one tower at a time

  • Benefits
    • Lower risk, only affecting one or two towers at a time at the end of a chain of towers.
    • Doesn’t specifically require VLANs in some network topologies but they are still recommended
    • Easy rollback, if it doesn’t work, replace the original config and analyze what went wrong
    • If you’re successful, you can move to the next closest tower and repeat the process which continues to shrink the broadcast domain – this also has the side benefit of helping to stabilize your bridged network as you migrate by making it smaller.
  • Drawbacks
    • Distance – in some networks, getting all the way out to the edge during a late night maintenance window can be a challenge

Type 2 – Core out to the last mile – start at the core or where your bandwidth comes in (often the same place) and work your way out – one tower at a time

  • Benefits
    • Physically closer to migrate the first hop
    • Can use VLANs to keep the existing bridged topology but route to towers that are converted
    • Same as the first, if you’re successful, you can move to the next closest tower and repeat the process which continues to shrink the broadcast domain – this also has the side benefit of helping to stabilize your bridged network as you migrate by making it smaller.
  • Drawbacks
    • Risk – if you have an issue with the first hop, you may take down a larger number of towers than 1 or 2 as compared to the first method.
    • Requires more config – you need to preserve the legacy broadcast domain through converted towers and then go back and clean it up

Type 3 – Build L3 to a new tower from the core – If you happen to have a new tower build on deck that will directly connect back to the core (where the gateway for the bridged L3 network is), then you can build a new tower as L3.  This helps to understand what’s involved and then use one of the previous two methods to migrate the rest of the network. 

  • Benefits
    • Lowest risk option – building out new sites in a different design is one of the lowest risk ways to migrate
    • No need for rollback – it’s new and not in service, so if you don’t get it right on the first try, you can keep working on it
  • Drawbacks
    • There aren’t any major drawbacks to this approach, except the rest of the network must still be migrated using one of the previous two methods

Using switch-centric design to assist with migration

In order to pass VLANs and the legacy broadcast domain through the network easier, consider putting all physical links into a switch at the core and the tower instead of directly into the router.

This type of design makes operation of the WISP significantly easier as new subnets and services that are needed don’t always need a trip to the tower to add cabling.

It also makes config migration easier when upgrading the tower router by putting most of the interface references into VLANs instead of physical interfaces.

Example of a switch-centric tower design

Closing thoughts…

This will help to get you started down the road to migration. Using a virtual lab like EVE-NG or GNS3 will help to understand the concepts before you deploy it in prod and is a good addition to the process.

Take your time and think through what you want to do and write down your plan – often you’ll find gaps when you create a list of steps which you can correct before migration and save time.

Good luck!

Need help with your Migration? Call the WISP experts at IP ArchiTechs