Wherein Our Hero Battles Exchange 2007

So, you’ve got an Exchange 2007 CCR Cluster set up and all is well in the world; your data is safely replicated offsite so that in the event of a disaster, you can have your users back up and emailing in the time it takes a DNS record to update.

But then, disaster! A different disaster to the previously mentioned one, obviously, because this disaster causes a cluster failover and the connection between nodes is down for just long enough that they get out of sync and require a reseed to fix.

At this point I’d like to jump off on a slight tangent to bemoan the inconsistency with which Exchange 2007 handles the interruption of replication traffic. On the one hand, you can shut down one node for a couple of hours and when you bring it back up again replication resumes quite happily, but on the other hand if you have 5 minutes of iffy network connectivity, suddenly the databases* are irrevocably out of sync and need to be reseeded.

Anyway, you’re not too concerned by this turn of events because, while a reseed of your ~90Gb database takes a few hours it’s not like the cluster is going to fail back while you’ve got databases in an inconsistent state, is it?

Well, it shouldn’t have happened, but it did; the bloody thing failed back while halfway through reseeding the database and then, obviously, couldn’t mount it at the other end. This posed something of a problem, because Move-ClusteredMailboxServer (or the GUI equivalent) gets upset when your databases aren’t in sync and refuses to let you fail over and Restore-StorageGroupCopy would have forcibly mounted the database sans up-to-date logs and effectively reverted it to the state it was in before it all failed over the first time, binning a lot of emails in the process.

Thankfully, Move-ClusteredMailboxServer has a very handy -IgnoreDismounted option which allows you, when you’re really sure, to skip all replication health checks on Dismounted databases, allowing you to fail the server over and remount the (more) up-to-date version of the database, whereupon you can attempt to re-reseed it. So, if you find yourself in a similar quandary, with your databases all out of sync and at risk of losing hours or even days worth of data, before starting a restore from tape & printing your CV, you can always try: Move-ClusteredMailboxServer -Identity <CCR Cluster Name> -TargetMachine <Target Cluster Node> -IgnoreDismounted Just remember that there’s a good chance of at least some data loss, but if you’re in a position to need to use it, the alternative is probably a lot of data loss so it’s a risk that might be worth taking.

* I know that technically Exchange 2007 CCR replicates at the Storage Group level rather than Database level, but as you can only have a single database in a CCR replicated Storage Group and “database” is easier to type, I’ve used it instead.

One Reply to “Wherein Our Hero Battles Exchange 2007”

  1. Eeek – sounds like fun….. You need a new network if you are having “5 minutes of iffy network connectivity” 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.