MikroTik CCR1072-1G-8S+ Review – Part 3 – 80 Gbps Throughput testing

[adrotate banner=”5″]

 

[metaslider id=249]

The 80 Gbps barrier has finally been broken (and yes we are rounding up) !!!!

Well at least it has been reached by someone other than MikroTik. It’s taken us quite a while to get all the right pieces to push 80 Gbps of traffic through the CC1072 but with the latest round of servers that just got delivered to our lab, we were able to go beyond our previous high water mark of 54 Gbps all the way to just under 80 Gbps. There have been a number of questions about this particular router and what the performance will look like in the real world. While this is still a lab test, we are using non-MikroTik equipment and iperf which is considered an extremely accurate performance measuring tool for TCP and UDP.

Video of the CCR1072-1G-8S+ in action  (Turn up your volume to hear the roar of the ESXi servers as they approach 80 Gbps)

How we did it – The Hardware 

CCR1072-1G-8S+ – Obviously you can’t have a test of the CCR1072 without one to test on. Our CCR1072-1G-8S+ is a pre-production model so there are some minor differences between it and the units that are shipping right now.

1055_l

HP DL360-G6 – It’s hard to find a better value in the used server market than the HP DL360 series.  This series of server can be found on ebay for under $500 USD in general and makes a great ESXi host for development and testing work.

 

 

 

DL360-G6

 

 

Intel x520-DA2 10 Gbps PCI-E NIC –  This NIC was definitely not the cheapest 10 gig NIC on the market, but after some research, we found it was one of the most compatible and stable across a wide variety of x86 hardware. The only downside to using this particular NIC is the requirement to use INTEL branded SPF+ modules. We tried several types of generics and after reading some forums posts, we were able to figure out that it only takes a few different types of INTEL SFP+ modules.  This didn’t impact performance in any way but doubled the cost of the SFPs required.

Intel_X520-DA2_475x260

How we did it – The Software

RouterOS – We tried several thoughought the testing cycle, but settled on 6.30.4 bugfix.

iperf3 – If you need to generate traffic for a load test, this is probably one of the best programs to use.

VM Ware ESXi 6.0 – There were certainly a fair amount of hypervisors to choose from, and we even considered a Docker deployment for the performance advantages but in the end, VM Ware was the easiest to deploy and manage since we needed some flexibility in the vswitch and 9000 MTU.

vmware

CentOS 6.6 Virtual Machines – We chose CentOS as the base OS for TCP throughput testing because it supports VM Ware tools and the VMXNET3 paravirtualized NIC. It is not possible to get a VM to 10 Gbps of throughput without using a paravirtualized NIC so this limits the range of Linux distros to mainstream types like Ubuntu and CentOS. Also, we tend to use CentOS more than the other flavors of linux and so it allowed us to get the VMs ready quickly to generate traffic.

Centos-poweredby

 

Network Topology for the test – We essentially had to put two VLANs on every physical interface so that the link could be fully loaded by using the 1072 to route between VLANs. A VM was built on each VLAN and tagged in the vswitch for simplicity. In earlier tests, we tried to use LACP between ESXi and the 1072 but had problems loading the links fully with the hashing algorithms we had available.

CCR1072-1G-8S-plus-80gig-testing-network-diagram

RouterOS Config for the test 

www.iparchitechs.com/iparchitechs.com-ccr1072-80gig-test-config.rsc

Conclusions, challenges and what we learned

CCR1072-1G-8S+ at full capacity just under 80 Gbps using (4) HP DL360 ESXi hosts and (16) CentOS VMs.
CCR1072-1G-8S+ at full capacity just under 80 Gbps using (4) HP DL360 ESXi hosts and (16) CentOS VMs.

The CCR1072-1G-8S+ is an extremely powerful router and had little difficulty in reaching full throughput once we were able to muster enough resources to load all of the links. As RouterOS continues to develop and moves into version 7, this router has the potential to be extremely disruptive to mainstream network vendors.  At $3,000.00 USD list price, this router is ab absolute steal for the performance, 1U footprint and power consumption. Look for more and more of these to show up in data centers, ISPs and enterprises.

Challenges

There have been a number of things that we have had to work through to get to 80 Gbps but we finally got there.

  • VMWARE ESXi – LACP Hashing – Initially we built LACP channels between the ESXi hosts and the 1072 expecting to load the links by using multiple source and destination IPs but we ran into issues with traffic getting stuck on one side of the LACP channel and had to move to independent links.
  • Paravirtualized NIC – The VM guest OS must support a paravirtualized NIC like VMXNET3 to reach 10 Gbps speeds.
  • CPU Resources – TCP consumes an enormous amount of CPU cycles and we were only able to get 27 Gbps per ESXi host (we had two) and 54 Gbps total
  • TCP Offload – Most NICs these days allow for offloading of TCP segmentation by hardware in the NIC rather than the CPU, we were never able to get this working properly in VM WARE to reduce the load on the hypervisor host CPUs.
  • MTU – While the CCR1072 supports 10,226 bytes max MTU, VMWARE vswitch only supports 9000 so we lost some potential throughput by having to lower the MTU by more than 10%

What we learned

  • MTU – In order for the CCR1072-1G-8S+ (and really any MikroTik) to reach maximum throughput, the highest L2/L3 MTU (we used 9000) must be used on:
    • The CCR
    • The Guest OS
    • The Hypervisor VSWITCH
  • Hypervisor Throughput – Although virtualization has come a long way, there is still a fair amount of performance lost especially in the network stack when putting a software layer between the Guest OS and the Hardware NIC.
  • CPU Utilization with no firewall rules or queues is very low. We haven’t had a chance to test with queues or firewall rules but CPU utilization never went above 10% across all cores.
  • TCP tuning for 10 Gbps in the Guest OS is very important – see this link: https://fasterdata.es.net/host-tuning/linux/

Please comment and let us know what CCR1072-1G-8S+ test you would like to see next!!!

MikroTik CCR1072-1G-8S+ Review – update on Part 3 – Throughput

[adrotate banner=”4″]

Breaking the 80 Gbps barrier!!!

799px-Bell_X-1_46-062_(in_flight)

The long wait for real-world 1072 performance numbers is almost over – the last two VMWARE server chassis we needed to push the full 80 Gpbs arrived in the StubArea51 lab today. Thanks to everyone who wrote in and commented on the first two reviews we did on the CCR-1072-1G-8S+.  We initially began work on performance testing throughput for the CCR1072 in late July of this year, but had to order a lot of parts to get enough 10 Gig PCIe cards, SFP+ modules and fiber to be able to push 80 Gbps of traffic through this router.

Challenges

There have been a number of things that we have had to work through to get to 80 Gbps but we are very close. This will be detailed in the full review we plan to release next week but here are a few.

  • VMWARE ESXi – LACP Hashing – Initially we built LACP channels between the ESXi hosts and the 1072 expecting to load the links by using multiple source and destination IPs but we ran into issues with traffic getting stuck on one side of the LACP channel and had to move to independent links.
  • CPU Resources – TCP consumes an enormous amount of CPU cycles and we were only able to get 27 Gbps per ESXi host (we had two) and 54 Gbps total
  • TCP Offload – Most NICs these days allow for offloading of TCP segmentation by hardware in the NIC rather than the CPU, we were never able to get this working properly in VM WARE to reduce the load on the hypervisor host CPUs.
  • MTU – While the CCR1072 supports 10,226 bytes max MTU, VMWARE vswitch only supports 9000 so we lost some potential throughput by having to lower the MTU by more than 10%

Results preview

Here is a snapshot of where we have been able to get to using 6.30.4 bugfix so far

Single TCP Stream performance at 10 Gig in iperf at 9000 bytes MTU (L2/L3)

1072-10gig-tcp-stream

Throughput in WinBox

10-Gig-TCP-stream-interface

Max throughput so far with two ESXi 6.0 hosts (HP DL360 G6 with 2 Intel Xeon x5570 CPUs)

CCR1072-Traffic

 

 

Two more servers arrived today – Full results coming very soon!

Once we hit the performance limit at 54 Gbps, we ordered two more servers and they just arrived. We will be racking them into the lab immediately and should have some results by next week. Thanks for being patient and check back soon!!