Archive for May, 2010

Getting tc(8) to work for me with Linux with SIP/IAX

Sunday, May 30th, 2010

In Thoughts on VoIP and achieving good call quality I was still trying to figure out how to configure the outbound prioritsation of traffic on my Linux box to favour VoIP using tc(8). Finally I’ve had time to try and figure it out and as far as I can see it helps a bit.

Just a reminder of the problem.

My linux box runs Asterisk, but is also a web and mail server, runs NFS etc, etc, …. That means that sometimes it’s busy sending traffic which is not VoIP traffic. As a result during a voice call even though the bandwidth in my LAN is sufficient for all of this, some of the voice traffic may get delayed, affecting audio quality. It’s also worth noting that Asterisk accepts calls from my voip phone and then resends the call to the final destination which may be an internal PSTN gateway or an external VoIP provider. This double call issue means that any VoIP delays may get accentuated.

From the previous article I successfully checked that the voip traffic generated by Asterisk and my voip phones uses the dscp values EF for RTP voice traffic and CS3 for call signalling. Unfortunately my Siemens C470IP uses AF31, so this needs to be taken into account too.

I came across a few different posts about setting up tc(8) but none of them seemed to fit my situation. Some people configure it on a linux router and manage the bandwidth that way, others try to do the prioritisation based on ip port filtering. With RTP this does not work very well and besides it seems that if you want to implement QoS it’s best to try to do it consistently 1 way.

So I modified a script I found which almost seemed to do what I needed, and basically did the following:

  • Convert the 3 DSCP values into decimal
  • Multiply them by 4 as there are 2 bits to the right within the ToS byte
  • Convert that number to hex

That gave me the following values

DSCP NAME  Value  bits      x4  Hex
=========  =====  =======  ===  ====
CS3           24  011 000   96  0x60
EF            46  101 110  184  0xb8
AF31          26  011 010  104  0x68

Next, to match this properly in the tos byte we need to match the first 6 bits, as the last 2 are used for ECN.

See this link which shows the relation between DSCP and ToS bits.

Hence for tc(8) I need to use a byte mask of 0xfc.

The remaining script which I use is shown below and basically assigns all traffic to a low priority queue and then matches these 3 DSCP values, putting them in queue 0 (highest priority).


#!/bin/sh
#
# Taken from: http://www.howtoforge.com/voip_qos_traffic_shaping_iproute2_asterisk
# and adapted to filter by DSCP values EF, CS3 and AF31 (due to a Siemens
# voip phone not using the right dscp value).
#
[ -n "$DEBUG" ] && set -x
myname=$(basename $0)

start () {
# wrong? but set all traffic to lowest queue
tc qdisc add dev $interface root handle 1: prio priomap 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

# Adjust each queue so they themselves have some queuing discipline:
# This gives the first queue a supposed capacity of 3000 packets. In
# reality, the size will be 128 packets as it is hard coded in the tc
# program as being the maximum size possible.

tc qdisc add dev $interface parent 1:1 handle 10: sfq limit 3000
tc qdisc add dev $interface parent 1:2 handle 20: sfq
tc qdisc add dev $interface parent 1:3 handle 30: sfq

# Simon’s thoughts:
# decimal binary 4x hex bitmask
# CS3 24 011 000 96 0×60 0xfc
# EF 46 101 110 184 0xB8 0xfc
# AF31 26 011 010 104 0×68 0xfc

tc filter add dev $interface protocol ip parent 1: prio 1 u32 match ip tos 0×60 0xfc flowid 1:1
tc filter add dev $interface protocol ip parent 1: prio 1 u32 match ip tos 0xb8 0xfc flowid 1:1
tc filter add dev $interface protocol ip parent 1: prio 1 u32 match ip tos 0×68 0xfc flowid 1:1
}

# To see some statistics
status () {
tc -s qdisc ls dev $interface
}

# To remove your queues and return to the normal state
stop () {
tc qdisc del dev $interface root
}

interface=eth0

case $1 in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo “Usage: $myname {start|stop|restart|status}”
esac

This seems to now configure the interface giving VoIP (IAX and SIP traffic in my case high priority of everything else.

Work to do is to figure out how to recongise peer to peer traffic for when I download ISO images and drop those into a lower queue than the other ssh, smtp, http traffic that the server is doing. That however is not a concern for the voice traffic.

Initial thoughts on space compression using the innodb_plugin

Sunday, May 23rd, 2010

While setting up MySQL Enterprise Monitor 2.2 (Merlin) on a system which had been running version 2.1 I thought I’d try and see what difference the change from using normal innodb tables to using the compressed table format available in the innodb plugin.

I’ve been using a separate db backend for merlin because for me it’s easier to manage and also the database backend has been put on a dedicated server. I’ve also been trying the innodb_plugin on another busier server as I had performance problems with the normal 5.1.42 built-in innodb engine which the plugin managed to solve.

So given that I was using a separate db server I upgraded it to 5.1.47, configured the server to use the plugin (1.0.8) rather than to use the built-in innodb engine and then decided to alter the data tables (dc_p_long, dc_p_string and dc_p_double) to use the new innodb compressed table format. These tables are designed for storing a large number of rows of a specific type but there was no harm in trying.

Here are the results by doing the following:

SET GLOBAL innodb_file_format = "Barracuda";
ALTER TABLE dc_p_xxxx ROW_FORMAT=compressed;

Using the older Antelope storage format:

dc_p_string 178 MB
dc_p_double 514 MB
dc_p_long 15.3 GB

Using Barracuda COMPRESSED:
dc_p_string 35 MB
dc_p_double 223 MB
dc_p_long 6.8 GB

The compressed format is using the standard block size. I need to do further tests to see how much difference using the smaller 1 kb, 2 kb or 4 kb blocks will make.

So from the point of view of Merlin only it does seem to make sense to use this format, assuming performance is not significantly affected. After all it means I can store twice the amount of data on disk, and one of the problems I have had in the past is that I could only keep a week’s worth of data because of storage limitations. Note: the 2.1 to 2.2 update will save a lot of space as the string table will drop in size significantly. However gaining an extra 50% by using innodb compression seems worth doing if it comes for free.

That said I have been told that there are still a few issues with this new table format so for anyone looking to use it in production it may be best to wait for 5.1.48 which should remove a few of these edge cases. If you only want to see how much difference the storage usage is then 5.1.47 should be ok. YMMV.

In the meantime I’m going to leave this server running for a while. Merlin does hammer the db quite heavily, so I’ll be able to see if it survives in a few days.

I also have a few servers which currently use MyISAM tables because of the smaller disk footprint compared to innodb. These servers do suffer from some of MyISAM’s weaknesses such as concurrent reading and writing on the same table is not possible, so now it looks like we might have a good reason to try this compressed table format out and with that gain a lot using innodb. Recovery after a crash has always been a problem on this type of server and innodb recovery should be both quicker and less intrusive.

I’m looking forward to further experimentation but so far this new compressed format does look promising. So thanks to the innodb folk who have made this possible.

After writing this I thought I’d add a few more comments:

  • On the Merlin tables the selectivity of the main value column was:  dc_p_long (0.2), dc_p_string (0.3), dc_p_double (0.002). These values probably significantly affect how compression may work.
  • I’d like to see some sort of writeup by Innodb on how performance is going to be affected by using the compressed format compared to the normal one so if we switch we know what gotchas to be aware of [1].
  • It would be nice to have a writeup by Innodb on how to determine the approximate space savings converting the tables to the compressed format without having to do this. It took me over 2 hours to convert a 15GB table so much larger tables would take a lot longer. Having some idea of whether the saving is worthwhile and how much it might be would make it clearer to people if this change is necessary

[1] Having said that for the change I made I could have used Merlin directly to do this. It’s a shame I didn’t remember. Basically all that is needed is to run the mysql proxy on the database backend and feed the connections into that proxy. Then after altering all tables we use the Query Analysis (QUAN) functionality in merlin to compare the top queries before making the change and then look again after making the change. Any performance change will be easy to measure. Hopefully the Merlin developers can perhaps do this on a test setup they have and report back the results, and also provide the history size and also number of servers monitored.

Upgrading DES3010F firmware to Build 4.20.B27

Saturday, May 15th, 2010

I recently managed upgrade my D-Link DES3010F switch to Firmware Version: Build 4.20.B27, requiring Hardware Version: A4, Boot PROM Version Build: 1.01.009.

This failed for me about a year ago, leaving the switch unusable and it wasn’t clear what the problem was.  The switch is a managed switch, that is you can configure many things like VLANs, and prioritise traffic in various ways. Finally thanks to the DLINK support I worked out that the problem had been that I had previously attempted to update the Firwmware version using an older Boot PROM Version Build: 1.01.007 and this made the switch fail when it booted with the 4.20.B27 firmware version.

So if you try to upgrade the firmware version without first upgrading the Boot PROM version you may find you have the same problem.

The switch stops working. If you connect to the RS232 console you see the following:

  Boot Procedure                                                    V1.01.007 
------------------------------------------------------------------------------- 
  Power On Self Test ...................................... 100 % 

  MAC Address   : 00-1E-58-46-XX-XX 
  H/W Version   : A3 



  Please wait, loading Runtime image .............  00 % (5second) 
 Move runtime code from flash to sdram error! 



  Please use Z-Modem protocol download firmware! 



?....??...??.M....?................... .....................)¦..A?......?.....??.?......?.?..?.??  ¦...??.i.??..?..?..??...?..HH?**4

So the switch complains about copying the new firmware into RAM during startup.

I eventually found a copy of the old firmware ES30XXR4_RUNTIME_V4.00.018.had and managed to load it back using minicom and the zmodem transfer via the RS232 interface. Initially I thought that the problem was with the firmware. Indeed in the call to DLINK support they said that you had to be very careful using the right firmware on the right switch as there are various different hardware models (due to country differences) and if you use the wrong version the firmware would not work. However they weren’t too clear on providing me with the firmware I needed.

So after trying again a year later and being told to use the firmware which I knew did not work I asked, “are you sure? This did not work last time.” and they came back saying. Please upgrade the BOOT PROM version first to version P101009.had. This was the missing step and once complete allowed me to upgrade to the latest firmware. The firmware file I used to upgrade the switch was: DES-3000_Series_A3_FW_v420.B27.had.

This is what this looks like when you are upgraded:

DES3010F configuration screen

DES3010F configuration screen

The Boot PROM version needs upgrading via the RS232 serial port using ZMODEM upgrade.

Now that this is done I have the advantage of being able to talk to the switch using ssh rather than telnet and also there are a few improvements in the options for managing the switch. So I am happy.

I’m reporting this here simply as when I was trying to resolve the problem I could not find the answer. If you have the same switch and are looking to do the same thing this post may help you.

Thoughts on VoIP and achieving good call quality.

Saturday, May 1st, 2010

I spoke about the problems I was having with achieving good VoIP call quality some time ago. In spite of many tweaks the problems had not been going away.  I changed my ADSL router for one that includes QoS, hoping this would help solve the problem. However, it has not entirely done that.  I have basically been having a problem that some calls seemed to work fine, but others would have occasional delays which would ruin the voice experience.

So I decided to read up again on Quality of Service. It’s interesting that for most home users there is not a lot of documentation, or that it is incomplete. One good reference is Cisco’s Enterprise QoS Solution Network Design Guide. Not light reading but it’s pretty complete and certain things started to stand out:

First my Vigor 2820n ADSL router does support QoS but ONLY for traffic going out of the ADSL interface. I have 3 voip devices excluding my PC which runs Asterisk so a lot of the VoIP traffic is on my LAN. It suddenly became clear. There was no QoS taking place on the LAN.

While obvious to the professionals who play with this, QoS is something you only control for “outbound” traffic. That is where you have to focus. So looking at my setup it became clear that I needed to apply QoS in more than one place:

  • on my Asterisk PC for outgoing VoIP traffic to external VoIP providers but also to my VoIP devices on my LAN
  • on my ADSL router for external VoIP calls
  • on my switch for LAN traffic. I actually bought some time ago a DLINK DES3010F 8-port managed switch which included QoS support. This needed configuring too.

Of course this required 3 completely different sets of configuration.

  • The documentation on my DLINK switch is ok, but does has many different options and does not provide good examples of how to implement QoS.
  • Linux uses tc. There are various tutorials but it is a bit of a pain to use and is not that intuitive.
  • The Vigor router’s configuration is not too bad, and it helped a lot once I realised that the QoS applied only to the outbound ADSL interface and not to the WAN. It would have been nice if the documentation were clearer.

Then I came up across another thing. To configure QoS you need to decide which traffic needs to get the appropriate tagging for prioritisation. I’m still using IAX and SIP in Asterisk though I’d like to move over completely to SIP to simplify things. SIP is easy: it normally runs on a single port. The same is true of IAX.

RTP on the other hand, which is the protocol to carry the voice traffic, does not seem to have any recommended port ranges. As a result of that it became clear: my Asterisk configuration used one port range, my Linksys devices used another and my Siemens C470IP yet another.  For configuring QoS on the RTP streams that is not helpful. Configuring based on port ranges is probably not the best way to go but seems the easiest initially. Ideally the applications all need to correctly tag their traffic and then the PC, router and switch can prioritise it.

Finally of course Quality of Service is explained in different ways: DiffServ uses one terminology and ToS another. This can be confusing if you are playing with this for the first time and do not have a good guide to help.

So I’ve modified the switch and the router, perhaps not in the best way and also the PC with tc and now the voice quality seems to have improved quite a bit. I still need to tweak this further but it’s complex and for most home users if you get this wrong or do not do it at all your voice quality can drop considerably.

So if you are having trouble with voice quality on a SOHO voip system and have internal voip devices take a look at this. It might be the cause of the issue and nothing to do with the hardware you are using.

Update: 03-05-2010

Since writing this I thought I’d update the configuration. Based on the Cisco document I wanted to configure all voice traffic to dscp EF and call signalling to CS3.

I used wireshark to look at the traffic generated by each device and saw that my Gigaset C470IP sends signalling using the AF31 (the document mentions that some devices use this value rather than CS3) so I needed to adapt the configuration to take this into account.

Below are the changes I made.

DrayTek Vigor 2820N

Draytek Quality of Service configuration

Draytek Quality of Service

DLink DES3010F

The DLink switch required a few tweaks.

Then set the dscp values for EF (46), CS3 (24) and AF31 (26) to go to the highest priority (Class-3) queue.

This tells the switch to apply the dscp priorities.

Your setup may need more configuration but this seems to work if you only want to prioritise voip calls above everything else.

Asterisk

The following settings were needed in asterisk (1.4)

/etc/asterisk/sip.conf

tos_sip=cs3                    ; Sets TOS for SIP packets.
tos_audio=ef                   ; Sets TOS for RTP audio packets.

/etc/asterisk/iax.conf

tos=ef
 

Other Thoughts

With the current configuration I see that much incoming SIP and RTP traffic does not have the DSCP values correctly tagged. Perhaps my ISP is filtering this? The end result is that the switch will not correctly prioritise the incoming traffic and so I may suffer dropped packets. I’m not yet sure if there’s a way to fix this as the place to do it would be in the ADSP router and I don’t see that the Draytek has any way to set the DSCP values based on incoming traffic properties like destination port.