Lync Server 2013 Pool failover process.
Pool failover is very important in disaster recovery situation. You must failover primary Pool to backup pool site and vice versa every six month as an best practice that way if in case disaster situation occurs then you will be ready for pool failover.
Pool failover process involves failing over the Central Management store, if it requires. This is important because the Central Management store must be functional when the pool’s users are failed over to backup pool.
Additionally, if a Front End pool fails but the Edge pool at that site is still running, you must know whether the Edge pool uses the failed pool as a next hop pool. If it does, you must change the Edge pool to use a different Front End pool as failed over Front End pool. How you change the next hop setting depends on whether the Edge will use a pool at the same site as the Edge pool, or a different site.
Below mentioned pool failover steps include CMS move, pool failover and pool failback.
Pool failover pre-requisite:
Before pool failover make sure below:
1. Check CMS replication status, it must true for all your servers: MS Management Replication status: is True for all server.
Get-CsManagementStoreReplicationStatus | fl UpToDate,ReplicaFqdn
Check if Backup Relationship is correctly showing backup pool FQDN.
Get-CsPoolBackupRelationship -PoolFqdn cshqpool.mydomain.com
2. Check users pool server information shows primary and backup pool machines, simply the run below command let.
to see pool information
3. Verify CMS service connection points and validate the connection parameter points to the current pool's primary SQL store (and SQL mirror if applicable)
4. Finally check backup service sync status:
Get-CsBackupServiceStatus -PoolFqdn "Cshqpool.mydomain.com"
If you are failover Lync pool which hold Central Management Store (CMS) then you have move CMS first before invoking pool failover.
In this document I will show both with CMS and without CMS pool failover.
1. Failover pool With CMS:
Example: Pool failover with CMS. I have pool1 (Cshqpool.mydomain.com) which hold CMS and pool2 (Csbrmpool.mydomain.com) is backup pool.
Note: The CMS must be moved to another pool if the source pool being failed over is currently hosting the active CMS.
Assuming that both pool available however doing failover test for DR purpose.
A. Do above all test and make sure the both pool is in “NormalState” and healthy condition to invoke failover test.
Log to Front End on Backup Pool FE11 and run below command: (when CMS is offline/down then use Invoke-CsManagementServerFailover -Force ) confirm the screen to allow change.
In my testing scenario CMS is online and working state hence not using “Force”.
The failback will be automatically performed, primary FEs hydrated, and services started.
Current server - Central Management Store SCP: HQ1WP-SQLVG05.mydomain.com
Proposed State: Central Management Store SCP: BRMWP-SQLVG09.mydomain.com
Check the connection point using Get-CsManagementConnection which will show SQL server as failover server name.
CMS server new SqlServer: BRMWP-SQLVG09.mydomain.com
Wait for 5 min and Invoke-CsManagementStoreReplocation and check the replication status.
Now failover actual pool to backup pool:
Finally run the pool failover command:
Invoke-CsPoolFailover -PoolFqdn Cshqpool.mydomain.com
Below screenshots shows pool failover process
Get services and stop Lync services.
Front End services stopped on all servers.
You can see event 32155 “Pool fail over complete”.
Pool fail over is complete.
Now update edge pool to use next hop as backup Front End pool.
This must be changed via PowerShell via Set-CsEdgeServer and pointed to a registrar in the target/destination site. Command Let:
Set-CsEdgeServer -Identity EdgeServer:
Set-CsEdgeServer -Identity EdgeServer:EdgePool.mydomain.com -Registrar csbrmpool.mydomain.com
Finally check the user’s pool information and see if user register to backup pool (csbrmpool.mydomain.com).
Once all test done.
Failback to Primary pool back to primary site:
Invoke-CsPoolFailback -PoolFqdn cshqpool.mydomain.com
Failback takes time so once you the failback command do your other work.
Generally failback takes an hour time.
Refer below screenshot to know how pool failback happen:
1. Once pool failback process start it shows to get and start services to target pool (in our test scenarios primary pool FE services getting started).
If you open event viewer, you will see,
Warning event 32174 " Server startup is being delayed because fabric has not finished initial placement of users.
Cause: This is normal during cold-start of a Pool and during server startup.
While pool failover, Skype for Business client status will change to “Presence Unknown” and it will show error message “A network or server issue is temporarily limiting features.”
User status automatically change to correct status after pool failover completes.