PPPoE High Availability Design – Incorporating Multiple Access Concentrators/BRAS

Background:

One of the most widely used protocols for authentication of user connections is PPPoE (or Point-to-Point over Ethernet).  Traditionally, PPPoE was used in DSL deployments but became one of the most adopted forms of customer device authentication in many networks.  Often used with a AAA system such as RADIUS, the ability to authenticate, authorize and account for customer connections made the use of PPPoE so appealing.

The protocol itself resides at the data link layer (OSI Layer 2) and provides control mechanisms between the connection endpoints.  Within this process lies several other moving parts, if you would like to read more you can visit this wiki page which explains PPPoE rather well (https://en.wikipedia.org/wiki/Point-to-Point_Protocol_over_Ethernet ).  For the purpose of this article though, I will be sticking to a very specific problem that arises; how to build redundancy when using PPPoE.

PPPoE is a layer 2 connection protocol widely used in service provider networks.  Connections initiated from a client terminate on what is known as a BRAS (Broadband Remote Authentication Server), or Access Concentrator (AC) from herein.  The function of the AC is to negotiate the link parameters between itself and the client and then pass any specific properties to the client.

During this, the AC will check its local database to see if the client credentials exist, username/password combination.  If configured to authenticate using AAA, it will send a request and await a response and then act accordingly.

With this type of connection working at layer 2, the PPPoE client and AC obviously must have a Layer 2 adjacency in order to form the link.

The Problem:

As consultants, we are asked quite frequently how we design and implement redundancy when using PPPoE as a client termination method.  When you introduce a second (or third, fourth etc.) AC for an active/standby or active/active solution, it is often unclear how to load balance PPPoE sessions across multiple AC’s.  In this article I am going to lay out the foundation and solution to achieve AC load balancing.

Use Case:

You currently only have one active AC/BRAS servicing your client connections and wish to add a second, preferably in an active/active redundant method.  You also wish to load balance between the two AC’s as best as possible while still providing the fail-over between each should one fail.  With this in mind, we will take the following drawing taken from my EVE-NG lab which I used to create this.

1

This LAB is intended to show the mechanism to provide HA fail-over between two AC’s, not the different types of transport from which we get client traffic to the AC.  In this scenario, we are using a switch configured to define a single Vlan to the PPPoE-Client and trunk all Vlans to the AC.  Note:  in this LAB, AC’s are labelled BRAS-1 and BRAS-2.

I have decided to use Vlan 10 and 30 as primary for BRAS-1, Vlan 20 and 40 as primary for BRAS-2.  Take note, while we decided to allocate primary traffic sources to each AC, we still trunk all available vlans to them.  This is to ensure that if a network related event was to occur and knock BRAS-2 offline, after the session/idle timeouts expire on the client, the client will still be able to make a connection to BRAS-1.

PPPoE Packet Sequence:

To better understand what we are doing, here is the PPPoE sequence of events that occur:

  • PADI – This is the discovery packet sent by the initiating client. Are there any access concentrators (AC) out there…
  • PADO – The AC receives a discovery packet and responds with this message. Typically includes the MAC address of the AC, any defined service names and some other stuff.
  • PADR – Client receives the offer and sends a request for connection to the AC
  • PADS – AC confirms the request, builds the necessary session information, and confirms to client.
  • PADT – This is a termination packet sent by either the AC or client. Tears down the established session.

The above is a very basic layout of the process, if you wish to understand more please look at the wiki page linked above.

When there is only one AC in play, the client has no choice who to connect to, obviously.  However, when we introduce a second AC, and assuming we have not configured service-names or other identifying parameters, the client will choose the AC with the lowest MAC address in the offer packet.  This means the connection will most likely always be too the same AC, even though both are responding with a PADO and the client receives them.

Solution:

Now that we have that out of the way, we can begin to understand how we achieve a HA design with load balancing.  By using the PADO packet offset mechanism.  Take a look now:

2

As you can see on the lab drawing, there is a defined PADO offset for each vlan on both BRAS devices.  In MikroTik configuration, this looks as follows:

/interface pppoe-server server
add default-profile=PPPoE-Client01 disabled=no 
interface=Vlan10-PPPoE-Client01 pado-delay=100 service-name=PPPoE-Client01

By offsetting the PADO (remember, it’s the offer sent back to the client) we can manipulate the clients into choosing which AC to send the request for session.  Why?  We are effectively delaying the PADO packet so that neither AC will respond at the same time.  This allows the client to send its request to the first AC that it thinks is responding.

While I have used all MikroTik software in my LAB to create the test, Cisco and many other vendors have this option available.  This will allow an ISP to scale large numbers of PPPoE sessions at the fraction of the cost of other well known vendors.

My LAB Setup:

  • 2 Mikrotik CHR’s configured as the access concentrators, local authentication and IP assignments.
  • 1 MikroTik CHR configured as a VLAN bridge, switch essentially.
  • 4 MikroTik CHR’s configured as PPPoE Clients with a script to automatically create/delete 200 PPPoE connections on each.
  • All CHR’s configured with one (1) CPU and 256M RAM on MikroTik free license level.

Assumptions:

Obviously there are many different factors that are assumed with this LAB.  In the real world, such things as IP Addressing and authentication methods would need to be taken into account.  For example, having the same IP pool present on more than one AC that handles client connections could very well cause issues, unless management of the CPE was not needed and the AC also performed NAT for the upstream connections.  Though using a RADIUS billing/authentication system would solve this issue while making use of its own IP addressing pool.  This would then more than likely require some IGP routing protocol to exist on the AC to tell the upstream nodes which address was present where, and when.

As I stated, this article is more to show it is possible and the mechanism in which to implement.  All the other network minutia has to be considered individually…

Conclusion:

Using this mechanism allows for the use of multiple Access Concentrators to serve the same clients.  There are many other ways to use the PADO offset in order to provide a scaleable, resilient authentication method when using PPPoE.  Though, if you have a network with only one AC at the moment, following this approach will provide some extra piece of mind.

******EDIT – 4/24/2018*******

While this article does not go into how to build or use EVE-NG, you can download the EVE-NG topology and configuration files for the LAB here.  All config files match the names in the LAB.  There may be some cloud management changes you need to make to be able to winbox using ROMON etc.  Username/Passwords on the CHR’s are default.

On the client PPPoE routers, there are two scripts that can be run:

To ramp the connections up.

/system script run add

To tear the connections down.

/system script run remove

Hopefully this helps people out.

*******************************

Here is a short video showing the connections ramping up on both AC’s.

WISP Design – Building Highly Available VPLS for Public Subnets

What is VPLS?

Virtual Private LAN Service or VPLS is a Layer 2 overlay or tunnel that allows for the encapsulation of ethernet frames (with or without VLAN tags) over an MPLS network.

https://tools.ietf.org/html/rfc4762

VPLS is often found in Telco networks that rely on PPPoE to create centralized BRAS deployments by bringing all of the end users to a common point via L2.

MikroTik VPLS example (https://wiki.mikrotik.com/wiki/Transparently_Bridge_two_Networks_using_MPLS)

Wlink

Background

The idea for this post came out of a working session (at the bar of course) at WISPAmerica 2018 in Birmingham, Alabama.

There was a discussion about how to create redundancy for VPLS tunnels on multiple routers. I started working on this in EVE-NG as we were talking about it.

The goal is creating highly available endpoints for VPLS when using them to deploy a public subnet that can be delivered to any tower in the WISP. The same idea works for wireline networks as well.

Use Case

As IPv4 becomes harder to get, ISPs like WISPs, without large blocks of public space find it difficult to deploy them in smaller subnets. The idea behind breaking up a /23 or /24 for example, is that every tower has public IP addresses available.

However, the problem with this approach is that some subnets may not be used if there isn’t much demand for a dedicated public IP by customers.

What makes VPLS attractive in this scenario is that the public subnet (a /24 in this example) can be placed at the data center as an intact prefix.

VPLS tunnels then allow for individual IP addresses to exist at any tower in the network which provides flexibility and conserves IPv4 space by not subnetting the block into /29 /28 /27 at the tower level.

Lab Network

VPLS

Deployment

In this lab, the VPLS tunnels terminate in two different data centers as well as at a tower router to create an L2 segment for 203.0.113.0/24. VRRP is then run between the two data center VPLS routers so that the gateway of 203.0.113.1 can failover to the other DC if needed.

Failover

Here is an example of the convergence time when we manually fail R1 and the gateway flips over to R2 in the other DC. The yellow highlight marks the point where R1 has failed and R2 VRRP has become master.

vpls-failover

Configurations

R1-vpls-agg

/interface bridge
add name=Lo0
add name=vpls1-1
/interface vrrp
add interface=vpls1-1 name=vpls1-1-vrrp priority=200
/interface vpls
add disabled=no l2mtu=1500 mac-address=02:2C:0B:61:64:CB name=vpls1 remote-peer=1.1.1.2 vpls-id=1:1
add disabled=no l2mtu=1500 mac-address=02:7C:8C:C9:CE:8E name=vpls2 remote-peer=1.1.1.3 vpls-id=1:1
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/interface bridge port
add bridge=vpls1-1 interface=vpls1
add bridge=vpls1-1 interface=vpls2
/ip address
add address=1.1.1.1 interface=Lo0 network=1.1.1.1
add address=10.1.1.1/24 interface=ether1 network=10.1.1.0
add address=203.0.113.2/24 interface=vpls1-1 network=203.0.113.0
add address=203.0.113.1/24 interface=vpls1-1-vrrp network=203.0.113.0
/ip dhcp-client
add disabled=no interface=ether1
/mpls ldp
set enabled=yes lsr-id=1.1.1.1 transport-address=1.1.1.1
/mpls ldp interface
add interface=ether1
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=1.1.1.1/32
/system identity
set name=R1-vpls-agg

R2-vpls-agg

/interface bridge
add name=Lo0
add name=vpls1-1
/interface vrrp
add interface=vpls1-1 name=vpls1-1-vrrp
/interface vpls
add disabled=no l2mtu=1500 mac-address=02:C3:4C:31:FB:C9 name=vpls1 remote-peer=1.1.1.1 vpls-id=1:1
add disabled=no l2mtu=1500 mac-address=02:02:34:C0:A3:3C name=vpls2 remote-peer=1.1.1.3 vpls-id=1:1
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/interface bridge port
add bridge=vpls1-1 interface=vpls1
add bridge=vpls1-1 interface=vpls2
/ip address
add address=10.1.1.2/24 interface=ether1 network=10.1.1.0
add address=1.1.1.2 interface=Lo0 network=1.1.1.2
add address=203.0.113.3/24 interface=vpls1-1 network=203.0.113.0
add address=203.0.113.1/24 interface=vpls1-1-vrrp network=203.0.113.0
/ip dhcp-client
add disabled=no interface=ether1
/mpls ldp
set enabled=yes lsr-id=1.1.1.2 transport-address=1.1.1.2
/mpls ldp interface
add interface=ether1
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=1.1.1.2/32
/system identity
set name=R2-vpls-agg

R3-Tower-1

/interface bridge
add name=Lo0
add name=vpls-1-1
/interface vpls
add disabled=no l2mtu=1500 mac-address=02:CB:47:7A:92:0B name=vpls1 remote-peer=1.1.1.1 vpls-id=1:1
add disabled=no l2mtu=1500 mac-address=02:E3:C5:5B:EC:BF name=vpls2 remote-peer=1.1.1.2 vpls-id=1:1
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/interface bridge port
add bridge=vpls-1-1 interface=ether1
add bridge=vpls-1-1 interface=vpls1
add bridge=vpls-1-1 interface=vpls2
/ip address
add address=10.1.1.3/24 interface=ether2 network=10.1.1.0
add address=1.1.1.3 interface=Lo0 network=1.1.1.3
/ip dhcp-client
add disabled=no interface=ether1
/mpls ldp
set enabled=yes lsr-id=1.1.1.3 transport-address=1.1.1.3
/mpls ldp interface
add interface=ether2
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=1.1.1.3/32
/system identity
set name=R3-tower-vpls