Exchange 2013 corrupted mailbox database

People don’t give backups the importance it deserves. Today I’ve been woked up by a call asking for help with an Exchange Server.

Digging into Event logs I found the problem: Corrupted Mailbox Database.

1
At '03/08/2015 08:22:08', the copy of database 'Mailbox Database 0640250043' on this server encountered an error during the mount operation. For more information, consult the Event log on the server for "ExchangeStoreDb" or "MSExchangeRepl" events. The mount operation will be tried again automatically.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Active Manager failed to mount database Mailbox Database 0640250043 on server LITIO. Error: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: MapiExceptionDatabaseError: Unable to mount database. (hr=0x80004005, ec=1108)
Diagnostic context:
Lid: 65256
Lid: 10722 StoreEc: 0x454
Lid: 1494 ---- Remote Context Beg ----
Lid: 45120 dwParam: 0xEF3D1
Lid: 57728 dwParam: 0xEF4DB
Lid: 46144 dwParam: 0xEF577
Lid: 34880 dwParam: 0xEF577
Lid: 34760 StoreEc: 0xFFFFFDE3
Lid: 41344 Guid: a0598b5b-82db-45a7-8309-4ba865de95e2
Lid: 35200 dwParam: 0x340C
Lid: 46144 dwParam: 0xEF79A
Lid: 34880 dwParam: 0xEF79A
Lid: 54472 StoreEc: 0x1388
Lid: 42184 StoreEc: 0x454
Lid: 1750 ---- Remote Context End ----
Lid: 1047 StoreEc: 0x454
1
2
3
Information Store - Mailbox Database 0640250043 (13252) Mailbox Database 0640250043: Unable to read the header of logfile C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043\E00.log. Error -541.
For more information, click http://www.microsoft.com/contentredirect.asp.

When I asked for backups I’ve got the worst answer possible: They were from 3 weeks ago. Unacceptable.

Let’s try a recovery of the current database.

1
[PS] E:\MailboxDatabase>eseutil.exe /p ".\Mailbox Database 0640250043.edb" /g

Some errors had been fixed, but the database still unmountable because of the error in the log file E00.log. So it’s time to accept risks. Let’s get rid of the current logs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[PS] C:\>
[PS] C:\>cd "C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043"
[PS] C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043>mkdir bkp
Directory: C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 03/08/2015 13:33 bkp
[PS] C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0640250043>move * bkp

Checking the database status it seems ok to get up.

1
[PS] E:\MailboxDatabase>eseutil.exe /mh ".\Mailbox Database 0640250043.edb"

After this command new clean logs were generated for the database.

1
[PS] C:\>Mount-Database "Mailbox Database 0640250043"

It worked.

The broken VPN

I’ve been asked for help with a VPN that was misbehaving after a server upgrade. The customer has a site-to-site VPN based on OpenVPN and the server of one side got a crashed disk and has been reinstalled.

The server is running Windows 2008 R2 with routing and remote access service enabled. The same configuration as the previous one.
The complain was that routers could ping each other over the VPN but clients couldn’t. Hosts of site B could ping the router of site A but couldn’t ping hosts of site A.

The basics were already checked: packet forwarding enabled, firewall, VPN certificates, etc.

Topology

A1 — A0 — B0 — B1

A1 = Client of site A, Windows 2008 R2
A0 = Router of site A, Windows 2008 R2
B0 = Router of site B, Linux Ubuntu 12.04
B1 = Client of site B, Windows 8.1

Symptoms

Let’s take one example, one ping from B1 to A1. The ICMP packet made the whole way to A1 and then A1 sent the reply.

The reply has been seen on the following interfaces (in this order):

  1. A1 LAN interface;
  2. A0 LAN interface;
  3. A0 VPN interface;
  4. B0 VPN interface.

After it has been seen on B0 VPN interface, it disappeared.

Diagnosing

Sounds like a problem in router B, but router B hasn’t changed and was communicating with other sites as well, but every capture showed packets disappearing on the router B at the VPN interface.

Nothing logged even in verbose mode. Weird.

After spent a lot of time getting the VPN reconfigured from scratch, just to be sure that everything was ok, I’ve ended up in the same scenario. So I decided to compare two packets byte-per-byte.

Packet 1: ICMP reply from router A0 to host B1
Packet 2: ICMP reply from host A1 to host B1

So I got a clue. The CRC of the packet 2 is ALWAYS ZERO, it leaves the host A1 in this way. Even weirder.

Took another capture of a ping from A1 to A0, which works. Both request and reply has ZEROs in the CRC.

– Why packets are leaving with ZEROs in CRC field? Maybe something related to IP checksum offloading, but it should arrive in the next host with the correct CRC.
– Why a packet with wrong CRC in being accepted by A0 and A1?

Next capture. From a ping from A0 to B0, which works, and has a correct CRC!

After all those captures I’ve started to question if computers belongs to exact sciences, but nowadays working with IT is digging deep into several layers. A0 and A1 were able to talk with ZEROs CRC because they are XEN VMs.
The XEN only do the CRC when the packet leaves the XEN host, as the packet is being forward through the OpenVPN it never gets CRC calculated!
After some research, I found an option in the virtual NIC.

Correct TCP/UDP checksum value

Dealing with Avaya 9620 phones with SIP firmware

We started to update all Avaya phones to the SIP firmware. After the update we were able to to calls but some annoying issues were happening.

Failures making calls

All 9620 phones were falling to make calls, no matter it was internal or external, but working properly to receive calls. After you dial the number you get a busy signal. Weird, it works every two calls. Nothing logged in Asterisk.

After captured packets from one extension to Asterisk the problem was clear. For some reason the device was not sending the authentication token in the INVITE packet.

We were unable to find why the Avaya firmware has this behavior, but got it solved by changing the configuration of SIP to accept insecure INVITEs.

Just added the following parameter in the sip.conf

1
insecure=invite

Be careful adding this option to your asterisk since it insecure.

Wrong clock

All phones were showing random wrong clocks in the display.
To adjust it we setup a few setting to get configuration provisioning working.
First I’ve just setup NTP server on the DHCP server.

DHCP time server

After this all devices were showing the same wrong time. I’ve tried to adjust the option 002 Time Offset but with no lucky. So I read the 9620 phone manual and found that it has some extra options for provisioning.

First, it has a custom DHCP option, called Option 242 SSON. In this option you can point to an HTTP Server that has config file to be download during the phone boot.

Configuring the custom DHCP Option 242

DHCP predefined options
DHCP create option 242

Then configure it in the scope.

DHCP configure option 242

Configuring the settings file

The documentation says that the phone will download a file called 46xxsettings.txt with settings and it is right, but you also need to have a file called 96xxupgrade.txt in the same directory and it is not described anywhere. Found it capturing the packets during the boot of the phone. This file come along with the SIP firmware.

Inside this file (96xxupgrade.txt) it look for the settings:

1
2
3
4
5
6
############################################################
## Get additional configuration files ##
############################################################
# GETSET
GET 46xxsettings.txt

The setting to adjust the clock in 96xxupgrade.txt:

1
2
3
4
5
SET SNTPSRVR 10.1.0.46
SET GMTOFFSET "-03:00"
SET DSTOFFSET "1"
SET DSTSTART "3SunOct2L"
SET DSTSTOP "3SunFeb2L"

Echo in the line

I’ve been working in an emergency action plan to switch from Avaya to Asterisk. After working for 7.5 years uninterruptedly the Avaya died, actually the S8300 Media Server died the rest of the box still working.

After setting up the new Asterisk box some users were complaining that they can hear their own voices in some calls, but not all of them.

After some research we found that the echo was related to analog PSTN lines using 2-wire somewhere in far endpoint and the E1 board installed did not have echo cancellation. I’ve suggested the customer to get an echo cancellation module but configured OSLEC software echo cancellation as workaround.

As OSLEC is part of linux kernel, you have to follow few steps to get it working with Asterisk.

Building OSLEC

  • Download kernel 2.6.28 source
  • Download latest DAHDI sources

Create a directory called staging inside the DAHDI source tree

1
mkdir linux/drivers/staging

Copy the OSLEC code from the 2.6.28 source tree into the staging directory you just created.

1
2
cd linux/drivers/staging
cp -R ~/linux-2.6.32-504.3.3.el6/drivers/staging/echo linux/drivers/staging

Compile DAHDI and install it

1
2
make
make install

Configuring OSLEC

Adjust DAHDI configuration /etc/dahdi/system.conf

1
2
3
4
5
6
7
8
span=1,0,0,CCS,HDB3,CRC4
span=2,1,0,CAS,HDB3
bchan=1-15,17-31
cas=32-46,48-62:1101
dchan=16,47
loadzone=br
defaultzone=br
echocanceller=oslec,1-15,17-31,32-46,48-62

Restart DAHDI

1
service dahdi restart

At first try I got some noise instead of echo, got it solved by adjusting the txgain in the chan_dadhdi.conf.

1
txgain=-10.0