Dealing with node failure
Posted on Tue 04 December 2012 in hints-and-kinks • 2 min read
If an entire node happens to get killed, and that node currently does
not hold the Galera IP (192.168.122.99 in our example), then the other
nodes simply continue to function normally, and you can connect to and
use them without interruption. In the example below, alice
has left
the cluster:
============
Last updated: Mon Dec 3 22:24:55 2012
Last change: Mon Dec 3 22:23:19 2012 via crmd on charlie
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============
Online: [ bob charlie ]
OFFLINE: [ alice ]
Full list of resources:
p_ip_mysql_galera (ocf::heartbeat:IPaddr2): Started bob
Clone Set: cl_mysql [p_mysql]
Started: [ bob charlie ]
Stopped: [ p_mysql:0 ]
If the node dies that does currently hold the Galera IP
(192.168.122.99 in our example), then the cluster IP shifts to a
different node, and when the failed node returns, it can re-fetch the
cluster state from the node that took over the IP address. In the
example below, in a healthy cluster the IP happens to be running on
bob
:
============
Last updated: Mon Dec 3 22:32:35 2012
Last change: Mon Dec 3 22:23:19 2012 via crmd on charlie
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============
Online: [ bob alice charlie ]
Full list of resources:
p_ip_mysql_galera (ocf::heartbeat:IPaddr2): Started bob
Clone Set: cl_mysql [p_mysql]
Started: [ alice bob charlie ]
Subsequently, bob
is affected by a failure, and the IP address
shifts to alice
:
============
Last updated: Mon Dec 3 22:33:33 2012
Last change: Mon Dec 3 22:23:19 2012 via crmd on charlie
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============
Online: [ alice charlie ]
OFFLINE: [ bob ]
Full list of resources:
p_ip_mysql_galera (ocf::heartbeat:IPaddr2): Started alice
Clone Set: cl_mysql [p_mysql]
Started: [ alice charlie ]
Stopped: [ p_mysql:1 ]
When bob
returns, it simply connects to alice
(which now hosts the
cluster IP), fetches the database state from there, and continues to
run:
============
Last updated: Mon Dec 3 22:35:46 2012
Last change: Mon Dec 3 22:23:19 2012 via crmd on charlie
Stack: openais
Current DC: charlie - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============
Online: [ bob alice charlie ]
Full list of resources:
p_ip_mysql_galera (ocf::heartbeat:IPaddr2): Started alice
Clone Set: cl_mysql [p_mysql]
Started: [ alice bob charlie ]
This article originally appeared on the hastexo.com
website (now defunct).