Asymmetric routing

I’ve been working on a OpenVPN deployment that is working with asymmetric routing. I think it is working with a sub-optimal solution. First, let me show how it is deployed and what is bothering me.

Big picture

Big picture
So, lets check the path of a packet going from client A to client B.

Order IP Description
1 10.1.0.92 client site A
2 10.1.0.254 default gw site A
3 10.1.0.45 openvpn gw site A
4 172.16.1.1 wan openvpn site A
5 172.16.1.58 wan openvpn site B
6 10.1.8.254 default gateway site B
7 10.1.8.2 client site B

But when the packet comes from client B to client A the path is a bit different.

Order IP Description
1 10.1.8.2 client site B
2 10.1.8.254 default gateway site B
3 172.16.1.2 wan openvpn site B
4 172.16.1.58 wan openvpn site A
5 10.1.0.45 openvpn gw site A
6 10.1.0.92 client site A

Pay attention that the packet is not going through 10.1.0.254 (default gw site A). So the path is asymmetric.

The problem

As the packet does not go through 10.1.0.254 (default gw site A) in both ways, the firewall is unable to track the connection state and it starts to reject packets from that connection. Let’s take one example of client A trying to connect to client B.

client A -> client B – SYN packet
(this packet is forwarded by firewall in default gw site A)

client B -> client A – SYN/ACK packet
(this packet is delivered to client A without going through default gw site A)

client A -> client B – ACK packet
(this packet is reject by default gw site A)

Alternative #1 – ICMP redirect

As soon as the 10.1.0.254 (default gw site A) detects the next hop is in the same network as the sender, 10.1.0.92 (client site A), it should send an ICMP redirect instructing the client to send packets directly to 10.1.0.45 (openvpn gw site A). In this way you get the optimal path in both ways.

If your router is a linux box remember to enable it to send ICMP redirects by changing yours sysctl.conf

1
net.ipv4.conf.all.send_redirects = 1

Unfortunately, routes created by ICMP redirects are short lived in Windows clients, and when this route gets expired a packet is sent to 10.1.0.254 in the middle of a TCP connection, witch wasn’t tracked, so it rejects the packet and the connection is dropped.

Alternative #2 – ICMP redirect + Stateless firewall

To get alternative #1 fixed you just need to make rules stateless in the firewall (aka don’t use switches -m state –state NEW in iptables). So when the ICMP redirect route gets expired the host will send the packet to it’s default gateway and will receive another ICMP redirect instead of an ICMP port unreachable.

Alternative #3 – Use DHCP to push routes

If all your clients are working with DHCP and both client and servers supports RFC3442 it is a very good alternative. It will make clients always use the best route using the right gateway.

You can check how to configure it ISC-DHCP server here.
You can check how to configure it Microsoft DHCP server here.

If you have a mixed network with DHCP clients and static clients this alternative can be a problem since you won’t get the all routes delivered to all hosts, also I had problems with some SIP phones that doesn’t support RFC3442.

Remember that you can still use both this alternative with alternatives #1 and #2 at same time.

Alternative #4 – Source routing

If you decide that you would like to get the packets going through both gateways to get a symmetric routing you can use this alternative.
Check out LARTC how-to to get more info.

Alternative #5 – Transfer network

This last altervative involves using new network to get packets transferred between networks.
To achieve this you should create a new VLAN to connect the hosts:

  • 10.1.0.254 (default gw site A)
  • 10.1.0.45 (openvpn gw site A)

If you like to use just one NIC in 10.1.0.254 (default gw site A) you can make use of IEEE 802.1q, but your switch must support it.
If you have a spare NIC in 10.1.0.254 (default gw site A) you can just configure the new network using this NIC.

After you get the transfer network configured on 10.1.0.254 (default gw site A), you just need to route the packets addressed to VPN to go through this transfer network. Do the same on your VPN box to the packets addressed to your LAN.
Since they are in separated networks no ICMP redirect will occur.

References

#1 icmp-redirects-are-bad
#2 rfc3442

Exchange 2013 & iPhone Sync

Today I’ve got an iPhone unable to sync with Exchange 2013. Curiously, only the inbox was not syncing. All other folders were working fine.

Checking into event logs I’ve found one error related to the sync problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
An exception occurred and was handled by Exchange ActiveSync. This may have been caused by an outdated or corrupted Exchange ActiveSync device partnership. This can occur if a user tries to modify the same item from multiple computers. If this is the case, Exchange ActiveSync will re-create the partnership with the device. Items will be updated at the next synchronization.
URL=
--- Exception start ---
Exception type: Microsoft.Exchange.Data.Storage.SyncStateExistedException
Exception message: The sync state named 2 already exists.
Exception level: 0
Exception stack trace: at Microsoft.Exchange.Data.Storage.SyncState.CreateSyncStateFolderInFolder(SyncStateInfo syncStateInfo, Folder syncStateParentFolder, PropertyDefinition[] properties, Object[] values, ISyncLogger syncLogger)
at Microsoft.Exchange.Data.Storage.SyncState.CreateSyncStateStoreObject(SyncStateStorage syncStateStorage, SyncStateInfo syncStateInfo, Folder syncStateParentFolder, PropertyDefinition[] properties, Object[] values, ISyncLogger syncLogger)
at Microsoft.Exchange.Data.Storage.FolderSyncState.CreateSyncState(SyncStateStorage syncStateStorage, Folder syncStateParentFolder, ISyncProviderFactory syncProviderFactory, String syncFolderId, ISyncLogger syncLogger)
at Microsoft.Exchange.Data.Storage.SyncStateStorage.CreateFolderSyncState(ISyncProviderFactory syncProviderFactory, String syncFolderId)
at Microsoft.Exchange.AirSync.SyncCollection.OpenSyncState(Boolean autoLoadFilterAndSyncKey, SyncStateStorage syncStateStorage)
at Microsoft.Exchange.AirSync.SyncCommand.SyncTheCollection(SyncCollection collection, Boolean createSubscription, Boolean tryNullSync)
at Microsoft.Exchange.AirSync.SyncCommand.OnExecute()
at Microsoft.Exchange.AirSync.SyncCommand.ExecuteCommand()
at Microsoft.Exchange.AirSync.Command.WorkerThread()
Inner exception follows below:
Exception type: Microsoft.Exchange.Data.Storage.ObjectExistedException
Exception message: Could not create folder 2.
Exception level: 1
Exception stack trace: at Microsoft.Exchange.Data.Storage.FolderCreatePropertyBag.CreateMapiFolder()
at Microsoft.Exchange.Data.Storage.FolderCreatePropertyBag.SaveFolderPropertyBag(Boolean needVersionCheck)
at Microsoft.Exchange.Data.Storage.CoreFolder.Save(SaveMode saveMode)
at Microsoft.Exchange.Data.Storage.Folder.Save(SaveMode saveMode)
at Microsoft.Exchange.Data.Storage.SyncState.CreateSyncStateFolderInFolder(SyncStateInfo syncStateInfo, Folder syncStateParentFolder, PropertyDefinition[] properties, Object[] values, ISyncLogger syncLogger)
Inner exception follows below:
Exception type: Microsoft.Mapi.MapiExceptionCollision
Exception message: MapiExceptionCollision: Unable to create folder. (hr=0x80040604, ec=-2147219964)
Diagnostic context:
Lid: 55847 EMSMDBPOOL.EcPoolSessionDoRpc called [length=116]
Lid: 43559 EMSMDBPOOL.EcPoolSessionDoRpc returned [ec=0x0][length=220][latency=0]
Lid: 52176 ClientVersion: 15.0.1076.9
Lid: 50032 ServerVersion: 15.0.1076.6009
Lid: 23226 --- ROP Parse Start ---
Lid: 27962 ROP: ropCreateFolder [28]
Lid: 17082 ROP Error: 0x80040604
Lid: 25953
Lid: 21921 StoreEc: 0x80040604
Lid: 27962 ROP: ropExtendedError [250]
Lid: 1494 ---- Remote Context Beg ----
Lid: 41352 qdwParam: 0x8D29C49F7F66A52
Lid: 55033 dwParam: 0x0
Lid: 10786 dwParam: 0x0 Msg: 15.00.1076.000:LITIO
Lid: 1750 ---- Remote Context End ----
Lid: 31418 --- ROP Parse Done ---
Lid: 22417
Lid: 30609 StoreEc: 0x80040604
Lid: 29073
Lid: 20369 StoreEc: 0x80040604
Lid: 63664
Lid: 40448 StoreEc: 0x80040604
Exception level: 2
Exception stack trace: at Microsoft.Mapi.MapiExceptionHelper.InternalThrowIfErrorOrWarning(String message, Int32 hresult, Boolean allowWarnings, Int32 ec, DiagnosticContext diagCtx, Exception innerException)
at Microsoft.Mapi.MapiExceptionHelper.ThrowIfError(String message, Int32 hresult, IExInterface iUnknown, Exception innerException)
at Microsoft.Mapi.MapiFolder.InternalCreateFolder(String folderName, String folderComment, Boolean openIfExists, Boolean createSearchFolder, Boolean instantSearch, Boolean optimizedConversationSearch, Boolean createPublicFolderDumpster, Byte[] folderId, Boolean createInternalAccess)
at Microsoft.Exchange.Data.Storage.FolderCreatePropertyBag.CreateMapiFolder()
--- Exception end ---.

The basics were already coverd by the help desk team.

First, I’ve tried to check and remove the mobile partnership linked to the Mailbox.

1
Get-MobileDevice -Mailbox johndoe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
RunspaceId : c42a3abe-e15a-4999-b0d0-8a8aa90a4299
FriendlyName : iPhone 5s
DeviceId : ApplDNPM4HW2FFGD
DeviceImei :
DeviceMobileOperator :
DeviceOS : iOS 8.3 12F70
DeviceOSLanguage : pt
DeviceTelephoneNumber :
DeviceType : iPhone
DeviceUserAgent : Apple-iPhone6C2/1206.70
DeviceModel : iPhone6C2
FirstSyncTime : 26/05/2014 20:13:19
UserDisplayName : mydomain.local/Usuários/John Doe
DeviceAccessState : Allowed
DeviceAccessStateReason : Global
DeviceAccessControlRule :
ClientVersion : 14.1
ClientType : EAS
IsManaged : False
IsCompliant : False
IsDisabled : False
AdminDisplayName :
ExchangeVersion : 0.10 (14.0.100.0)
Name : iPhone§ApplDNPM4HW2FFGD
DistinguishedName : CN=iPhone§ApplDNPM4HW2FFGD,CN=ExchangeActiveSyncDevices,CN=John Doe,OU=Usuários,DC=mydomain,DC=local
Identity : mydomain.local/Usuários/John Doe/ExchangeActiveSyncDevices/iPhone§ApplDNPM4HW2FFGD
Guid : daec6592-9203-4031-b825-628466e311e1
ObjectCategory : myforest.local/Configuration/Schema/ms-Exch-Active-Sync-Device
ObjectClass : {top, msExchActiveSyncDevice}
WhenChanged : 04/06/2015 11:41:00
WhenCreated : 26/05/2014 17:13:20
WhenChangedUTC : 04/06/2015 14:41:00
WhenCreatedUTC : 26/05/2014 20:13:20
OrganizationId :
Id : mydomain.local/Usuários/John Doe /ExchangeActiveSyncDevices/iPhone§ApplDNPM4HW2FFGD
OriginatingServer : TITANIO
IsValid : True
ObjectState : Unchanged
1
Remove-MobileDevice "mydomain.local/Usuários/John Doe/ExchangeActiveSyncDevices/iPhone§ApplDNPM4HW2FFGD"

The command executed without errors. So far things behaved as expected, but when I get mobile partnerships again, I get the same partnership as if it had never been deleted.

After walking in circles I’ve tried a long shoot. Seems that something was corrupted. Let’s check the user mailbox.

1
New-MailboxRepairRequest -Mailbox johndoe -CorruptionType ProvisionedFolder,SearchFolder,AggregateCounts,Folderview
1
Get-MailboxRepairRequest -mailbox johndoe | fl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RunspaceId : c42a3abe-e15a-4999-b0d0-8a8aa90a4299
Identity : a0598b5b-82db-45a7-8309-4ba865de95e2\6213348d-47c7-4ff5-8a44-9f7151039311\ef8f1654-bfeb-4786-a03f
-2b0b238fdf72
Mailbox : d59341d9-d810-4291-b70e-b6c2ede007c6
Source : OnDemand
Priority : Normal
DetectOnly : False
JobState : Succeeded
Progress : 100
Tasks : {AggregateCounts}
CreationTime : 03/08/2015 18:34:01
FinishTime : 03/08/2015 18:35:05
LastExecutionTime : 03/08/2015 18:35:05
CorruptionsDetected : 4
ErrorCode :
CorruptionsFixed : 4
TimeInServer : 00:00:43.6290000
Corruptions : {Type:AggregateCountMismatch (Fixed:True), Type:AggregateCountMismatch (Fixed:True),
Type:AggregateCountMismatch (Fixed:True), Type:AggregateCountMismatch (Fixed:True)}
IsValid : True
ObjectState : New

Bingo. After the mailbox repair the mobile partnership remove worked fine. Reconfigured the device and everything is working again.

Exchange 2013 corrupted mailbox database

People don’t give backups the importance it deserves. Today I’ve been woked up by a call asking for help with an Exchange Server.

Digging into Event logs I found the problem: Corrupted Mailbox Database.

1
At '03/08/2015 08:22:08', the copy of database 'Mailbox Database 0640250043' on this server encountered an error during the mount operation. For more information, consult the Event log on the server for "ExchangeStoreDb" or "MSExchangeRepl" events. The mount operation will be tried again automatically.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Active Manager failed to mount database Mailbox Database 0640250043 on server LITIO. Error: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: MapiExceptionDatabaseError: Unable to mount database. (hr=0x80004005, ec=1108)
Diagnostic context:
Lid: 65256
Lid: 10722 StoreEc: 0x454
Lid: 1494 ---- Remote Context Beg ----
Lid: 45120 dwParam: 0xEF3D1
Lid: 57728 dwParam: 0xEF4DB
Lid: 46144 dwParam: 0xEF577
Lid: 34880 dwParam: 0xEF577
Lid: 34760 StoreEc: 0xFFFFFDE3
Lid: 41344 Guid: a0598b5b-82db-45a7-8309-4ba865de95e2
Lid: 35200 dwParam: 0x340C
Lid: 46144 dwParam: 0xEF79A
Lid: 34880 dwParam: 0xEF79A
Lid: 54472 StoreEc: 0x1388
Lid: 42184 StoreEc: 0x454
Lid: 1750 ---- Remote Context End ----
Lid: 1047 StoreEc: 0x454
1
2
3
Information Store - Mailbox Database 0640250043 (13252) Mailbox Database 0640250043: Unable to read the header of logfile C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043\E00.log. Error -541.
For more information, click http://www.microsoft.com/contentredirect.asp.

When I asked for backups I’ve got the worst answer possible: They were from 3 weeks ago. Unacceptable.

Let’s try a recovery of the current database.

1
[PS] E:\MailboxDatabase>eseutil.exe /p ".\Mailbox Database 0640250043.edb" /g

Some errors had been fixed, but the database still unmountable because of the error in the log file E00.log. So it’s time to accept risks. Let’s get rid of the current logs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[PS] C:\>
[PS] C:\>cd "C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043"
[PS] C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043>mkdir bkp
Directory: C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 03/08/2015 13:33 bkp
[PS] C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043>move * bkp

Checking the database status it seems ok to get up.

1
[PS] E:\MailboxDatabase>eseutil.exe /mh ".\Mailbox Database 0640250043.edb"

After this command new clean logs were generated for the database.

1
[PS] C:\>Mount-Database "Mailbox Database 0640250043"

It worked.

The broken VPN

I’ve been asked for help with a VPN that was misbehaving after a server upgrade. The customer has a site-to-site VPN based on OpenVPN and the server of one side got a crashed disk and has been reinstalled.

The server is running Windows 2008 R2 with routing and remote access service enabled. The same configuration as the previous one.
The complain was that routers could ping each other over the VPN but clients couldn’t. Hosts of site B could ping the router of site A but couldn’t ping hosts of site A.

The basics were already checked: packet forwarding enabled, firewall, VPN certificates, etc.

Topology

A1 — A0 — B0 — B1

A1 = Client of site A, Windows 2008 R2
A0 = Router of site A, Windows 2008 R2
B0 = Router of site B, Linux Ubuntu 12.04
B1 = Client of site B, Windows 8.1

Symptoms

Let’s take one example, one ping from B1 to A1. The ICMP packet made the whole way to A1 and then A1 sent the reply.

The reply has been seen on the following interfaces (in this order):

  1. A1 LAN interface;
  2. A0 LAN interface;
  3. A0 VPN interface;
  4. B0 VPN interface.

After it has been seen on B0 VPN interface, it disappeared.

Diagnosing

Sounds like a problem in router B, but router B hasn’t changed and was communicating with other sites as well, but every capture showed packets disappearing on the router B at the VPN interface.

Nothing logged even in verbose mode. Weird.

After spent a lot of time getting the VPN reconfigured from scratch, just to be sure that everything was ok, I’ve ended up in the same scenario. So I decided to compare two packets byte-per-byte.

Packet 1: ICMP reply from router A0 to host B1
Packet 2: ICMP reply from host A1 to host B1

So I got a clue. The CRC of the packet 2 is ALWAYS ZERO, it leaves the host A1 in this way. Even weirder.

Took another capture of a ping from A1 to A0, which works. Both request and reply has ZEROs in the CRC.

– Why packets are leaving with ZEROs in CRC field? Maybe something related to IP checksum offloading, but it should arrive in the next host with the correct CRC.
– Why a packet with wrong CRC in being accepted by A0 and A1?

Next capture. From a ping from A0 to B0, which works, and has a correct CRC!

After all those captures I’ve started to question if computers belongs to exact sciences, but nowadays working with IT is digging deep into several layers. A0 and A1 were able to talk with ZEROs CRC because they are XEN VMs.
The XEN only do the CRC when the packet leaves the XEN host, as the packet is being forward through the OpenVPN it never gets CRC calculated!
After some research, I found an option in the virtual NIC.

Correct TCP/UDP checksum value

Dealing with Avaya 9620 phones with SIP firmware

We started to update all Avaya phones to the SIP firmware. After the update we were able to to calls but some annoying issues were happening.

Failures making calls

All 9620 phones were falling to make calls, no matter it was internal or external, but working properly to receive calls. After you dial the number you get a busy signal. Weird, it works every two calls. Nothing logged in Asterisk.

After captured packets from one extension to Asterisk the problem was clear. For some reason the device was not sending the authentication token in the INVITE packet.

We were unable to find why the Avaya firmware has this behavior, but got it solved by changing the configuration of SIP to accept insecure INVITEs.

Just added the following parameter in the sip.conf

1
insecure=invite

Be careful adding this option to your asterisk since it insecure.

Wrong clock

All phones were showing random wrong clocks in the display.
To adjust it we setup a few setting to get configuration provisioning working.
First I’ve just setup NTP server on the DHCP server.

DHCP time server

After this all devices were showing the same wrong time. I’ve tried to adjust the option 002 Time Offset but with no lucky. So I read the 9620 phone manual and found that it has some extra options for provisioning.

First, it has a custom DHCP option, called Option 242 SSON. In this option you can point to an HTTP Server that has config file to be download during the phone boot.

Configuring the custom DHCP Option 242

DHCP predefined options
DHCP create option 242

Then configure it in the scope.

DHCP configure option 242

Configuring the settings file

The documentation says that the phone will download a file called 46xxsettings.txt with settings and it is right, but you also need to have a file called 96xxupgrade.txt in the same directory and it is not described anywhere. Found it capturing the packets during the boot of the phone. This file come along with the SIP firmware.

Inside this file (96xxupgrade.txt) it look for the settings:

1
2
3
4
5
6
############################################################
## Get additional configuration files ##
############################################################
# GETSET
GET 46xxsettings.txt

The setting to adjust the clock in 96xxupgrade.txt:

1
2
3
4
5
SET SNTPSRVR 10.1.0.46
SET GMTOFFSET "-03:00"
SET DSTOFFSET "1"
SET DSTSTART "3SunOct2L"
SET DSTSTOP "3SunFeb2L"