Thursday, April 25, 2013


Ping with TCL and multihomed router for quality measurement.


Introduction


In this article I'm describing how can be tested quality of links in a multi-homed router environment. At the end of the article you can find attached texts of the program. During this article, it's supposed, you are using Debian system. In case of any other distribution, things should work the same way, but there may be a need for minimalistic changes.

Imagine, you have a linux router which is connected to more then one, Internet providers. Such connection is often reffered as multihomed. In multihomed network scenario, you may want to track an Internet quality of both connections and choose the best one.

An interesting case of a multihomed network is - a multihomed router with VPN connections.
As you may see from the example drawing, if every office have 2 connections to the Internet, then we should have 4 VPN tunnels to cover each of possible Internet routs to interconnect both VPN routers.
In this scenario we have 4 routes to test. We have 4 directions to 'ping'. Real world cases may be much more complicated than this example (and have more directions to 'ping').

Unfortunately, TCL is very weak from the point of view of networking. So, in case of 'ping' utility, we should, either create new TCL module or use external, already existing utility. We will use the second approach, which is usually much easier to implement.

For our purposes, from all existing (on Linux) 'ping' utilities, I prefer to use “oping”.
apt-cache show oping
gives us a link to a homepage:
Homepage: http://verplant.org/liboping/

From manual page we can see how to use the command:
oping [-4 | -6] [-c count] [-i interval] host [host [host ...]]
The beautiful part of 'oping' syntax is that you can specify more then one host to ping. For example:
oping 10.0.3.1 10.0.4.1
will send ICMP messages to both destinations. In reply we will get:
PING 10.0.3.1 (10.0.3.1) 56 bytes of data.
PING 10.0.4.1 (10.0.4.1) 56 bytes of data.
56 bytes from 10.0.3.1 (10.0.3.1): icmp_seq=1 ttl=63 time=1.77 ms
56 bytes from 10.0.4.1 (10.0.4.1): icmp_seq=1 ttl=63 time=1.20 ms
56 bytes from 10.0.3.1 (10.0.3.1): icmp_seq=2 ttl=63 time=0.97 ms
56 bytes from 10.0.4.1 (10.0.4.1): icmp_seq=2 ttl=63 time=1.33 ms
56 bytes from 10.0.3.1 (10.0.3.1): icmp_seq=3 ttl=63 time=1.27 ms
56 bytes from 10.0.4.1 (10.0.4.1): icmp_seq=3 ttl=63 time=1.17 ms

Please, note, ICMP sequence (icmp_seq) for both hosts is the same on each iteration. This makes it very handy to write test scripts. You can 'ping' several targets and if some of them are down, you still will know, your Internet connection is up until, at least, one of your targets is answering.

Multihomed node and linux routing

To make ping tests, and for many other tasks, we want some routes to go as if we would be connected only to first provider, and some as if we would be connected only to the second provider. For example, in case, main Internet provider has problems, we want to change default routes for LAN networks, but we want to keep untouched testing 'oping' commands and/or VPN tunnel daemons routing. Another example, to make this concept, more obvious: you have two LAN sub networks and for some reason want to route one through main provider, another, through redundant. In this case you want your router behave as if we would have separate router for each sub network.

Such features are united under term - 'policy based routing'.

For linux, policy based routing mean you have some criteria according to which you may choose routing table, instead of using default routing table, for data flow. In practice, we are to use iproute2 packet utilities to work with 'policy based routing'.

'policy based routing' for our needs

Let's imagine, we have two Internet up links (eth0, eth1). We want to 'ping' some hosts using all up links we have. Please pay your attention, to the fact we may want to run oping 10.0.3.1 10.0.4.1 on both of our Internet links. Which means we can't just put ip route add 10.0.3.1/32 via X.X.X.X dev eth0 to our router. This would direct all 'pings' via only ane route.

To achieve our targets we need to use 'policy based routing'. I suggest to have separate additional routing table for each Internet link.

So, what we need is:
1. create additional routing tables
2. find a way how to force oping to run using routing table we want.

To create additional routing table, open /etc/iproute2/rt_tables. On Debian you will see something like this:
cat /etc/iproute2/rt_tables
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
This file define routing tables. Local, main and default are predefined tables. Each table is identified by table number and table name.
Add to this file two strings:
252 tinc_I1
251 tinc_I2
Now we have two new tables in our system. I use name tinc_* , because, I usually use uplink dedicated routing tables to run VPN daemons and for VPNs I use - tinc http://tinc-vpn.org/.

Now we can fill tables with routes. Example:
ip route add default via 10.0.1.254 table tinc_I1 dev eth0
ip route flush cache

Please, note, system is looking after rote records in tables. So, if your interface goes down and then up (for example, you reconnect ethernet cable) - your route will disappear from all tables. To overcome this you can put next string in /etc/network/interfaces
iface eth0 inet static
address 10.0.1.22
netmask 255.255.255.0
up ip route replace default via 10.0.1.254 table tinc_I1 dev eth0

Then command 'ip route replace default via 10.0.1.254 table tinc_I1 dev eth0' will be executed automatically each time eth0 goes up.

Now, when we have table, we should think, how to force oping to run using routing table we want. In general, you specify which table to use by ip rule command. For packets our system forwards as a router, there are no problems to put a rule like this:
ip rule add from 10.10.10.0/24 table tinc_I1
Everything coming from 10.10.10.0/24 network will be routed using tinc_I1 table. In the example we have applied 'from' identifier. Please read man pages to see all possible identifiers.

For local processes situation is more complicated. Many of approaches applicable for forwarded traffic are not working correctly with locally generated traffic. This is due to Linux kernel is taking decision on how to route packets for local and for non local traffic in different ways. When kernel does not have enough criteria to select a routing table, it will always use a table called main. (until default ip rule settings are not changed).

'policy based routing' and locally generated traffic

After many experiments trying to find a method which works for local daemons, I've find out one approach which works. Several rules should be followed:
1. Process, generating network traffic should be possible to bind to an INTERFACE. Binding to IP address does not help.
2. We should have a criteria ip rule to distinct a needed IP flow from others.

Rule number one means that not every local program can direct it's traffic using non 'main' routing table. Please note, interface we bind to must be recorded in the routing table we want to use. It may be not obvious, but adding default route to a table - identifies also interface, which make a record we need.
Rule number two. That's easy to create proper 'ip rule' when you know source and destination addresses. ip rule add from 10.0.1.254 to 10.0.3.1 table tinc_I1. This is ok, for VPNs with fixed IPs, but not for 'ping', where our destination IPs can change, or we can have a wish to 'ping' same IPs from different interfaces. In case of 'ping' we can follow rule number two in three steps:
  • run each 'ping' process under a certain user group.
  • create Iptables rule which will add FWMARK to packets for processes run by certain user group.
  • create IP rule for FWMARKs.

FWMARK is internal kernel mark Linux box can put to network packets. It exists while packet is inside the box. You can put such marks by Iptables. You can use those in iptables and ip rule commands.
One of criteria upon which Iptables can put a mark is user group which run a local process. You can look for a syntax and options by running:
iptables -m owner --help
You will get long help listing and in the end of it:
owner match options:
[!] --uid-owner userid[-userid] Match local UID
[!] --gid-owner groupid[-groupid] Match local GID
[!] --socket-exists Match if socket exists
Those are possible options we can use in Iptables.

Now we can master the whole construction.
a) Add new user groups. I prefer to use the same group numbers as we have for routing tables.
groupadd -g 252 tinc_I1
groupadd -g 251 tinc_I2
b) Add iptables rule to assign FWMARK for processes run by new user groups.
iptables -t mangle -A OUTPUT -m owner --gid-owner 252 -j MARK --set-mark 252
iptables -t mangle -A OUTPUT -m owner --gid-owner 251 -j MARK --set-mark 251
to make things easier, I suggest to have the same mark numbers, as we used while adding new groups.
c) Add 'ip rule's for each group to direct it's traffic to the proper routing table. This will be done by looking to FWMARKs assigned by Iptables.
ip rule add fwmark 252 prio $r_prio table tinc_I1
ip rule add fwmark 251 prio $r_prio table tinc_I2
d) Run 'ping' commands under different user groups.
For this we should use sudo command. To install it under Debian, run: apt-get install sudo. Now to make it easier to work with sudo modify /etc/sudoers file. Comment string where you have: root ALL=(ALL) ALL. Put instead of it: root ALL=(ALL:ALL) ALL. So, you should get:
# root ALL=(ALL) ALL
root ALL=(ALL:ALL) ALL
Now we can run our 'ping' commands:
sudo -g#252 /usr/bin/oping -D eth0 10.0.3.1 10.0.3.2
sudo -g#251 /usr/bin/oping -D eth1 10.0.3.1
or the same but another way:
sudo -gtinc_I1 /usr/bin/oping -D eth0 10.0.3.1 10.0.3.2
sudo -gtinc_I2 /usr/bin/oping -D eth1 10.0.3.1
-D option tells 'oping' to bind to a specific interface.

'policy based routing' for local 'ping' command summary by example

1. Run 'oping' command by appropriate user group which corresponds to a certain routing table:
sudo -g#252 /usr/bin/oping -D eth0 10.0.3.1 10.0.3.2
2. 'oping' binds to an interface eth0. The interface is added to a proper routing table by ip route replace default via 10.0.1.254 table tinc_I1 dev eth0.
After this command you may need to run: ip route flush cache.
Please note, nothing prohibits an interface to be recorded in more then one routing table.
3. Iptables sees you run a command from user group with number 252 and name tinc_I1. It assignes FWMARK with number 252 to all packets generated by this command.
4. Linux kernel looks to IP rules and find among them: ip rule list|grep tinc_I1. Which gives on my system:
2763: from all fwmark 0xfc lookup tinc_I1
0xfc - hexadecimal representation of 252.
2763 - rule priority. Each 'ip rule' has it's priority. You can specify priority by 'prio' modifier. Rules are scanned in the order of increasing priority.
This rule 'sends' packets to the proper routing table.

TCL and 'external' utilities

'oping' is not part of the Tcl language. To run external utility which is going to produce continuous output, we should use Tcl command: open. From the manual page, we can find out:
If the first character of fileName is “|” then the remaining characters of fileName are treated as a list of arguments that describe a command pipeline to invoke, in the same style as the arguments for exec. In this case, the channel identifier returned by open may be used to write to the command's input pipe or read from its output pipe
As output of the 'oping' is line oriented (you see output line by line, not character by character), we should also say to Tcl channel driver, we are going to read, when the whole line has arrived. Those things are done in two lines of code:
set cmd {sudo -g#252 /usr/bin/oping -D eth0 10.0.3.1 10.0.3.2}
set pipe [open "|$cmd"]
fconfigure $pipe -buffering line

Next, we want to read from 'oping' output and to process each new line which appears on the output. For this we use Tcl event driven facilities. We tell to Tcl, that as soon as new line appears in our pipe from the 'oping' side, we should call special function to process this line:
fileevent $pipe readable [list Pinger $pipe]
In this example, as soon as new line of 'oping' output will appear in our pipe channel, function called Pinger will be called and $pipe will be passed as a parameter to this function.

I've already mentioned previously, we are going to use Tcl event driven facilities. Our event processing mechanism will not start operate, until we are not in special mode. To go to special event processing mode we can use: vwait 1 command. As a starting point for more information regarding events, you can read this article: http://www.tcl.tk/man/tcl8.5/tutorial/Tcl40.html.

Design of the program

There is an interesting book in the Internet - “How to Design Programs” - www.htdp.org. This book contains a programming example http://www.htdp.org/2003-09-26/Book/curriculum-Z-H-5.html#node_sec_2.2, which later is discussed in chapter 3 - http://www.htdp.org/2003-09-26/Book/curriculum-Z-H-6.html#node_chap_3. This example illustrates ideas of how to compose the program from functions and auxiliary (helping) functions. I'll show those ideas in short version and converted to Tcl.
Imagine, we want to calculate aria of a ring. We know R and r, which are radius of inner and outer discs (R - outer, r - inner). A person would calculate like this, So = pi*R^2, Si = pi*r^2 → Sring = So - Sr = pi*(R^2 - r^2). But in a world of real and complicated tasks, we should go different way.
The approach proposed in the book is to start from the target, and then to divide it into smaller tasks by applying helping functions. In this scenario, we start our programming from: Sring = So - Sr. This first statement gives us the final result, but contains two undefined yet functions, which we are to express by helping functions.
In Tcl, we could express these functions like this:
proc area-of-ring {R r} {
return [expr \
[area-of-disc $R] - [area-of-disc $r] \
]
}

proc area-of-disc r {
return [expr 3.14*$r*$r]
}
Now we can run something like:
area-of-ring 6 4

Please, catch main idea: we start solving of a problem, from the final result, we want to get.
  • This makes program code more readable. At first glance you see biggest steps, to get the result, and if you have a wish, you look in to helping functions, which also may consist of it's helping functions.
  • This, leads you to your target in a finite number of steps. You just divide your task in smaller pieces, then all of them in smaller, until elementary one. Which is handy way to solve complicated tasks.
  • This makes it easy to change parts of the program if needed.

How the program works

1. We have a config file called conf.tcl. To make things easier, this file is included in the program by source [file join $Path conf.tcl]. It consists of Tcl dict, which identifies each direction we are going to ping by names: dir1, dir2 .. dirN. Please stick to this naming, as program is using patter matching as dir*. Each dir* has a key named cmd, which contains as a value 'oping' command with all necessary options to ping the direction. For example:
dict set directions dir1 cmd {sudo -g#252 /usr/bin/oping -D eth0 10.0.3.1 10.0.3.2}
2. The same dict has special reccord:
dict set directions current_dir dir1
This identifies direction used by default. Later, if and when it is needed to replace main direction by alternative, the name of alternative which has became active is stored under this key.
3. Program is using same dict to save some helping data structures:
  • Each dir* has 2 buffers storing values of the last 100 'pings'. One of the buffers called 'metrics' stores 'oping' results. Another called 'drops' stores ones if instead of a result we have got a timeout and zeroes if we've got reply. If more then one destination is 'pinged', then the best result is selected to be stored.
  • Each dir* has counter key, which helps us to go through the buffer, it changes values from 0 to 99, and then becomes 0 once again.
  • Each dir* has icmp_seq key, which stores the value of the last icmp_seq shown by 'oping'. In case we read line and icmp_seq is the same as recorded in dict, we may understand, that 'oping' has more then one targets to ping, and we have to choose the best result.
  • Each dir* has tmp_list key, this key has all results from the same iteration of the 'oping'. Actually, program is writing to this list, until new icmp_seq has arrived from the 'oping'. After that, the best result is chosen and put to the 'metrics' and record of 0 or 1 is made to 'drops'.
4. Program is started by start_Monitor.tcl executable file. This contains main big steps to achieve our results. It:
  • loads helping functions by source directive.
  • loads conf.tcl.
  • initializes helping dict structures.
  • starts 'oping' by calling start_Pinger $dir for each direction.
  • after 10000 CompareMetrics 5000. This calls subroutine which is to compare results and to decide whether we are to switch to new direction or not. CompareMetrics is started after 10 seconds from the start of theprogram. CompareMetrics is repeated each 5 seconds.
  • vwait 1, enters event loop.
5. from doc.txt:
# =============================================
# structure of directions dict
# =============================================
directions
|
+-dir1
| +-cmd {sudo -g#252 /bin/ping -I eth0tinc 10.0.3.1}
| +-metrics "list of 100 elements" - filled with Ping results
| +-drops "list of 100 elements" - filled with 1-th where was drops
| +-counter N (0-99) - Number of element in a list
| +-icmp_seq N - last icmp_seq
| +-tmp_list - list of results for the same $icmp_seq numbers
+-dir2
....
+-current_dir - current dir

6. If there are no timeouts, then route with better results for last 100 iterations win; If, there was drops, then the route with less drops win. In conf.tcl you can specify:
  • set Drops 5; - N of not replied pockets to jump to other channell
  • set Diff 0.5; - we calculate metrics of currently active dir and metrics of dir better then current, like Better/Active. If Better/Active < Diff → we switch to new dir.
7. If we switch to new direction, then file change_dir in scripts directory is executed. Name of new active dir* is passed as a parameter. Later, this name is recorded under directions → current_dir key. change_dir file should contain script to take a 1st parameter and based on it to change system routing.

Finally, after program is tested and right scripts are composed for switching of the routing, you may have a wish to start your monitoring program automatically, right after the system is booted. It's very easy to do under Debian Linux. Open /etc/rc.local. This file is executed at the very end of the Debian Linux boot process. Put link to your script before string exit 0. In my case last two strings of the file look like this:
/opt/scripts/monitor4ik/start_Monitor.tcl
exit 0

I've put monitoring program to the archive - monitor.tar.gz. It can be downloaded from: http://www.mediafire.com/?lvei23ggw9525zu
After download, you can tar xzvf ./monitor.tar.gz at preferred location. Program should run from any location. Before running the program, Tcl should be installed to the Linux platform. Start program by running start_Monitor.tcl.






No comments:

Post a Comment