Please Note: This website has been archived and is no longer maintained.
See the Open Networking Foundation for current OpenFlow-related information.

Controller Performance Comparisons

From OpenFlow Wiki

Jump to: navigation, search

Contents

Overview

The purpose of this page is to provide a dumping ground for side-by-side comparisons of OpenFlow controllers.

The OpenFlow ecosystem has seen rise to numerous controllers in multiple languages (C, C++, Java, Python and Ruby for starters). And while often raw performance numbers are published, to date there has been no central repository for performance comparisons using the same methodology. This page is very much a work in progress and only covers 4 controller implementations.

Controllers

  • NOX: A C++/Python controller built and open sourced by Nicira Networks [1]
  • Beacon: A Java controller built by David Erickson at Stanford [2]
  • Maestro: A Java controller built by Zheng Cai at Rice university (project webpage)

Comparison 05/17/2011

Controllers Tested

  • NOX Destiny: destiny-fast branch, git commit e9c3da6bb12ad3fa0d2b609e697a50ce44ca19f4
  • Beacon: ioloop branch, git commit 46d33b881d00ea4fed1e555d46b533a87b1b81e8
  • Maestro: 0.2.1

Test Setup

  • CPU: 1 x Intel Core i7 930 @ 3.33ghz, 4 physical cores, 8 threads
  • RAM: 9GB
  • OS: Ubuntu 10.04.1 LTS x86_64
    • Kernel: 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:58:24 UTC 2010 x86_64 GNU/Linux
    • Boost Library: v1.42 (libboost-all-dev)
    • malloc: Google's Thread-Caching Malloc version 0.98-1
    • Java: Sun Java 1.6.0_25
  • Controller configuration:
    • NOX
      • Configured with ../configure --enable-ndebug --with-python=no
      • tcmalloc loaded before launch export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0
      • Launched with taskset -c 0 ./nox_core -i ptcp:6633 switch -t 1
      • Launched with taskset -c 0-1 ./nox_core -i ptcp:6633 switch -t 2
      • Launched with taskset -c 0-2 ./nox_core -i ptcp:6633 switch -t 3
      • Launched with taskset -c 0-3 ./nox_core -i ptcp:6633 switch -t 4
    • Beacon
      • Important beacon.ini parameters: -XX:+AggressiveOpts -Xmx3000M
      • Threads changed in beacon.properties via controller.threadCount=X
      • Launched with taskset -c 0 ./beacon
      • Launched with taskset -c 0-1 ./beacon
      • Launched with taskset -c 0-2 ./beacon
      • Launched with taskset -c 0-3 ./beacon
    • Maestro
      • ls.sh: $JAVA_HOME/bin/java -Xmx6000M -cp build/ sys.Main conf/openflow.conf conf/learningswitch.dag interactive
      • thread count modified in conf/openflow.conf
      • Launched with taskset -c 0 ./ls.sh
      • Launched with taskset -c 0-1 ./ls.sh
      • Launched with taskset -c 0-2 ./ls.sh
      • Launched with taskset -c 0-3 ./ls.sh
  • Test methodology
    • cbench is run locally via loopback, the 4th thread's performance is slightly impacted
    • cbench emulates 32 switches, sending packet-ins from 1 million source MACs per switch
    • 10 loops of 10 seconds each are run 3 times and averaged per thread/switch combination
    • tcmalloc loaded first export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0
    • Launched with taskset -c 7 ./cbench -c localhost -p 6633 -m 10000 -l 10 -s 32 -M 1000000 -t

Results

051711 32switch throughput.png


Historical Comparisons

All comparisons below this point are old and should be considered only for historical purposes.



Comparison 05/01/2011

Controllers

  • Nox: Unmodified Nox was used as a baseline. Nox does not support multi-threading so can only be tested in the single CPU case. Nox is written in C++
  • Nox D: A Nox D is a multi-threaded and highly optimized implementation of Nox (it can be found in the destiny-fast branch on Noxrepo). Nox D is written in C++.
  • Beacon: Java controller out of Stanford.
  • Maestro: A Java controller from Rice university (project webpage). Maestro v0.2.0 is tested. The latest version in the svn repo with performance improvements is yet to be tested.

Performance Metrics

  • Flow setup throughput.
  • Flow setup latency.

Test Setup

  • Machines: 2 x Dell PowerEdge 2950 (1 for controller, 1 for benchmarker and packet capturing)
    • CPU: 2 x Intel(R) Xeon(R) CPU E5405 (4 Cores, 12M Cache, 2.00 GHz, 1333 MHz FSB)
    • RAM: 4GB
    • Network: 2 x Gigabit ports (tg3 driver)
      • Buffer sizes: TODO
      • TCP setting:
    • OS: Debian Squeeze 32-bit
      • Kernel: 2.6.32-5-686-bigmem
      • Boost Library: v1.42 (libboost-all-dev)
      • malloc: Google's Thread-Caching Malloc (libgoogle-perftools-dev)
      • Java: Sun Java 1.6.0_24 (sun-java6-jdk)
  • Connectivity: machines are connected via 2 directly attached gigabit links. Directly connected interfaces have IP addresses in the same broadcast domain. The second connection is used to run a second instance of the benchmarker software in case more bandwidth is needed for the test.
  • Controller configuration:
    • nox: must be configured with "--enable-ndebug" passed to the configure script.
    • nox_d: must be configured with "--enable-ndebug --with-python=no" passed to the configure script.
    • beacon: see beacon.ini
    • maestro: see conf/openflow.conf
  • Control application used: Layer-2 learning switch application. The switch application is a good representative of the controller flow handling performance with tunable read/write ratio (number of unique MAC addresses).
  • Running controllers: Turn off debugging and verbose output.
    • nox: ./nox_core -i ptcp:6633 switch
    • nox_d: ./nox_core -i ptcp:6633 switch -t $NTHREADS
    • beacon: ./beacon
    • maestro: ./runbackground.pl conf/openflow.conf conf/learningswitch.dag
  • Setting CPU affinity for controllers: The following script binds the running threads of a controller to different CPUs (on an 8-core system). Just replace $CTRLNAME with a unique part of controller's binary name (e.g., nox for nox and nox_d). (maestro's runbackground already sets cpu affinity).

   nthreads=$1
   pids=`ps -eLF | grep $CTRLNAME | grep -v grep |
         awk '{print $4}' | sort -n | tail -n $nthreads`
   count=0
   for tid in $pids; do
       cpu=`echo "0 4 2 6 1 5 3 7" | awk -v val=$count '{print $val}'`
       taskset -p -c $cpu $tid
       echo $tid assigned to cpu $cpu
       count=$((count+1))
   done

  • Running the benchmarker:
    • Get the latest version of oflops and compile it.
    • Run with cbench -c $ctrladdr -p $port -s $nswitch -m $duration -l $loop -t where $ctrladdr and $port are controller IP address and port number respectively, $nswitch is the number of emulated switches, $duration is the duration of test, and $loop is the number of times to repeat the test. The -t option is for running the throughput test: omit it for the latency test.

Results

All ctrls throughput 16.png