The broken VPN

I’ve been asked for help with a VPN that was misbehaving after a server upgrade. The customer has a site-to-site VPN based on OpenVPN and the server of one side got a crashed disk and has been reinstalled.

The server is running Windows 2008 R2 with routing and remote access service enabled. The same configuration as the previous one.
The complain was that routers could ping each other over the VPN but clients couldn’t. Hosts of site B could ping the router of site A but couldn’t ping hosts of site A.

The basics were already checked: packet forwarding enabled, firewall, VPN certificates, etc.

Topology

A1 — A0 — B0 — B1

A1 = Client of site A, Windows 2008 R2
A0 = Router of site A, Windows 2008 R2
B0 = Router of site B, Linux Ubuntu 12.04
B1 = Client of site B, Windows 8.1

Symptoms

Let’s take one example, one ping from B1 to A1. The ICMP packet made the whole way to A1 and then A1 sent the reply.

The reply has been seen on the following interfaces (in this order):

  1. A1 LAN interface;
  2. A0 LAN interface;
  3. A0 VPN interface;
  4. B0 VPN interface.

After it has been seen on B0 VPN interface, it disappeared.

Diagnosing

Sounds like a problem in router B, but router B hasn’t changed and was communicating with other sites as well, but every capture showed packets disappearing on the router B at the VPN interface.

Nothing logged even in verbose mode. Weird.

After spent a lot of time getting the VPN reconfigured from scratch, just to be sure that everything was ok, I’ve ended up in the same scenario. So I decided to compare two packets byte-per-byte.

Packet 1: ICMP reply from router A0 to host B1
Packet 2: ICMP reply from host A1 to host B1

So I got a clue. The CRC of the packet 2 is ALWAYS ZERO, it leaves the host A1 in this way. Even weirder.

Took another capture of a ping from A1 to A0, which works. Both request and reply has ZEROs in the CRC.

– Why packets are leaving with ZEROs in CRC field? Maybe something related to IP checksum offloading, but it should arrive in the next host with the correct CRC.
– Why a packet with wrong CRC in being accepted by A0 and A1?

Next capture. From a ping from A0 to B0, which works, and has a correct CRC!

After all those captures I’ve started to question if computers belongs to exact sciences, but nowadays working with IT is digging deep into several layers. A0 and A1 were able to talk with ZEROs CRC because they are XEN VMs.
The XEN only do the CRC when the packet leaves the XEN host, as the packet is being forward through the OpenVPN it never gets CRC calculated!
After some research, I found an option in the virtual NIC.

Correct TCP/UDP checksum value