redis官方集群手动故障转移测试
手动故障转移
有的时候在主节点没有任何问题的情况下强制手动故障转移也是很有必要的,比如想要升级主节点的Redis进程,我们可以通过故障转移将其转为slave再进行升级操作来避免对集群的可用性造成很大的影响。
Redis集群使用 CLUSTER FAILOVER命令来进行故障转移,不过要被转移的主节点的从节点上执行该命令 手动故障转移比主节点失败自动故障转移更加安全,因为手动故障转移时客户端的切换是在确保新的主节点完全复制了失败的旧的主节点数据的前提下下发生的,所以避免了数据的丢失。
首先查看集群节点状态
./redis-cli -h 10.0.0.11 -p 6000 -c cluster nodes
8855e87e7c5bca09a30155914ab9b6c9de15e13a 10.0.0.11:6000@16000 myself,master – 0 1525917347000 17 connected 0-5460
1e0512df5d582845c340e195c079b9875714a526 10.0.0.13:6001@16001 slave db2dc37f5482eecee2c7bc68ae40b7a6703fd463 0 1525917349033 21 connected
70ab44b00bce074061bb308d76bb71933bb6e3c5 10.0.0.12:6001@16001 slave 8855e87e7c5bca09a30155914ab9b6c9de15e13a 0 1525917351035 17 connected
db2dc37f5482eecee2c7bc68ae40b7a6703fd463 10.0.0.12:6000@16000 master – 0 1525917348032 21 connected 5461-10922
2c7a33b71981034ae212c0c6832ca8c39df6aa25 10.0.0.13:6000@16000 master – 0 1525917350034 23 connected 10923-16383
d305e866b60f18d23e098874735f640d846ebe05 10.0.0.11:6001@16001 slave 2c7a33b71981034ae212c0c6832ca8c39df6aa25 0 1525917347029 23 connected
典型的三主三从集群结构
手动转移测试 ,注意只能在从节点上输入命令
./redis-cli -h 10.0.0.1 -p 6001 -c cluster failover
19235:M 10 May 09:48:49.818 # Manual failover requested by slave d305e866b60f18d23e098874735f640d846ebe05.
19235:M 10 May 09:48:49.967 # Failover auth granted to d305e866b60f18d23e098874735f640d846ebe05 for epoch 22
19235:M 10 May 09:48:49.968 # Connection with slave 10.0.0.11:6001 lost.
19235:M 10 May 09:48:49.969 # Configuration change detected. Reconfiguring myself as a replica of d305e866b60f18d23e098874735f640d846ebe05
19235:S 10 May 09:48:49.969 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
19235:S 10 May 09:48:50.550 * Connecting to MASTER 10.0.0.11:6001
19235:S 10 May 09:48:50.550 * MASTER <-> SLAVE sync started
19235:S 10 May 09:48:50.550 * Non blocking connect for SYNC fired the event.
19235:S 10 May 09:48:50.550 * Master replied to PING, replication can continue…
19235:S 10 May 09:48:50.550 * Trying a partial resynchronization (request d1dab2e71267c89cafe36362cb75d5115db79821:4816868).
19235:S 10 May 09:48:50.551 * Successful partial resynchronization with master.
19235:S 10 May 09:48:50.551 # Master replication ID changed to 722a7e301693257b5bd1b5f3997c72e6135bfd16
19235:S 10 May 09:48:50.551 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
再次查看集群节点状态,发现主从已经互换
8855e87e7c5bca09a30155914ab9b6c9de15e13a 10.0.0.11:6000@16000 myself,master – 0 1525917347000 17 connected 0-5460
1e0512df5d582845c340e195c079b9875714a526 10.0.0.13:6001@16001 slave db2dc37f5482eecee2c7bc68ae40b7a6703fd463 0 1525917349033 21 connected
70ab44b00bce074061bb308d76bb71933bb6e3c5 10.0.0.12:6001@16001 slave 8855e87e7c5bca09a30155914ab9b6c9de15e13a 0 1525917351035 17 connected
db2dc37f5482eecee2c7bc68ae40b7a6703fd463 10.0.0.12:6000@16000 master – 0 1525917348032 21 connected 5461-10922
2c7a33b71981034ae212c0c6832ca8c39df6aa25 10.0.0.13:6000@16000 master – 0 1525917350034 23 connected 10923-16383
d305e866b60f18d23e098874735f640d846ebe05 10.0.0.11:6001@16001 master – 0 1525917147527 22 connected 10923-16383