Testing resource recovery

Posted on Tue 04 December 2012 in hints-and-kinks • 2 min read

If MySQL happens to die in your cluster, Pacemaker will automatically recover the service in place. To test this, select any node on your cluster and send the mysqld process a KILL signal:

killall -KILL mysqld

Then, monitor your cluster status with crm_mon -rf. After a few seconds, you should see one of your p_mysql clones entering the FAILED state:

============
Last updated: Mon Dec  3 19:03:25 2012
Last change: Mon Dec  3 18:54:44 2012 via crmd on bob
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============

Online: [ bob alice charlie ]

Full list of resources:

 p_ip_mysql_galera  (ocf::heartbeat:IPaddr2):   Started alice
 Clone Set: cl_mysql [p_mysql]
     p_mysql:1  (ocf::heartbeat:mysql): Started bob FAILED
     Started: [ alice charlie ]

Migration summary:
* Node alice: 
* Node bob: 
* Node charlie: 

Failed actions:
    p_mysql:1_monitor_30000 (node=bob, call=30, rc=7, status=complete): not running

Then, after a few seconds, the resource will automatically recover:

============
Last updated: Mon Dec  3 19:03:35 2012
Last change: Mon Dec  3 18:54:44 2012 via crmd on bob
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============

Online: [ bob alice charlie ]

Full list of resources:

 p_ip_mysql_galera  (ocf::heartbeat:IPaddr2):   Started alice
 Clone Set: cl_mysql [p_mysql]
     Started: [ alice bob charlie ]

Migration summary:
* Node alice: 
* Node bob: 
   p_mysql:1: migration-threshold=1000000 fail-count=1
* Node charlie: 

Failed actions:
    p_mysql:1_monitor_30000 (node=bob, call=30, rc=7, status=complete): not running

To subsequently get rid of the entry in the Failed actions list, use crm resource cleanup cl_mysql.

This article originally appeared on the hastexo.com website (now defunct).

Part 7 of "MySQL/Galera in Pacemaker High Availability Clusters"

Previous articles

Next articles