Tag Archives: CCR

FMI: Decomissioning Exchange 2007 CCR Cluster

Uninstall Exchange Server 2007 from Passive node

  • Login to passive node and check whether Exchange CMS is located on active node with Get-ClusteredMailboxServerStatus cmdlet from EMS
  • Run:
    C:\Program Files\Microsoft\Exchange Server\bin\Setup.com /mode:uninstall

Evict pasive node from Windows Cluster

  • Stop the Cluster service by running:
    net stop clussvc
  • After the Cluster service has been stopped, evict the node by running:
    Cluster <ClusterName> node <NodeName> /evict

Remove CMS from Active Node, uninstall Mailbox role & destroy Cluster

  • Login to active node and run:
    C:\Program Files\Microsoft\Exchange Server\Bin\Setup.com /mode:uninstall /removeCMS /CMSName:<CMSName>
  • Destroy Windows Cluster (Right click on the cluster name then choose More Actions > Destroy Cluster in Failover Cluster Management)

Wherein Our Hero Battles Exchange 2007

So, you’ve got an Exchange 2007 CCR Cluster set up and all is well in the world; your data is safely replicated offsite so that in the event of a disaster, you can have your users back up and emailing in the time it takes a DNS record to update.

But then, disaster! A different disaster to the previously mentioned one, obviously, because this disaster causes a cluster failover and the connection between nodes is down for just long enough that they get out of sync and require a reseed to fix.

At this point I’d like to jump off on a slight tangent to bemoan the inconsistency with which Exchange 2007 handles the interruption of replication traffic. On the one hand, you can shut down one node for a couple of hours and when you bring it back up again replication resumes quite happily, but on the other hand if you have 5 minutes of iffy network connectivity, suddenly the databases* are irrevocably out of sync and need to be reseeded.

Anyway, you’re not too concerned by this turn of events because, while a reseed of your ~90Gb database takes a few hours it’s not like the cluster is going to fail back while you’ve got databases in an inconsistent state, is it?

Well, it shouldn’t have happened, but it did; the bloody thing failed back while halfway through reseeding the database and then, obviously, couldn’t mount it at the other end. This posed something of a problem, because Move-ClusteredMailboxServer (or the GUI equivalent) gets upset when your databases aren’t in sync and refuses to let you fail over and Restore-StorageGroupCopy would have forcibly mounted the database sans up-to-date logs and effectively reverted it to the state it was in before it all failed over the first time, binning a lot of emails in the process.

Thankfully, Move-ClusteredMailboxServer has a very handy -IgnoreDismounted option which allows you, when you’re really sure, to skip all replication health checks on Dismounted databases, allowing you to fail the server over and remount the (more) up-to-date version of the database, whereupon you can attempt to re-reseed it. So, if you find yourself in a similar quandary, with your databases all out of sync and at risk of losing hours or even days worth of data, before starting a restore from tape & printing your CV, you can always try: Move-ClusteredMailboxServer -Identity <CCR Cluster Name> -TargetMachine <Target Cluster Node> -IgnoreDismounted Just remember that there’s a good chance of at least some data loss, but if you’re in a position to need to use it, the alternative is probably a lot of data loss so it’s a risk that might be worth taking.

* I know that technically Exchange 2007 CCR replicates at the Storage Group level rather than Database level, but as you can only have a single database in a CCR replicated Storage Group and “database” is easier to type, I’ve used it instead.