Posts Tagged ‘MySQL Replication’

MySQL Replication variable Seconds_Behind_Master explained

February 27th, 2011

Seconds_Behind_Master  variable in the MySQL Replication is often misused and misunderstood variables . Many DBA’s and Sysadmin use this variable to check the MySQL Replication latency between master and  slave in different replication topologies Master -> Slave , Master-> Master etc..

So what is this variable and how it works and what it shows as per the MySQL Documentation?

this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread and the value greatly depends on the  network between master and slave , this value can be realistic only on the fast network. this means that Seconds_Behind_Master  can not always  give the correct latency between the master and slave.this is the reason if you are monitoring this variables to know the how much the slave  behind master you must be getting some weird values and immediately coming to 0 .

In my   article I have recommend some methods on how MySQL replication should be   monitored.

How to fix corrrupted relay log

November 13th, 2010

When we get this type of error and MySQL replicaton stops

Last_Error: Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.

Possible reasons of the relay log corruption

1)      Network

2)      MySQL bugs on master/Slave

3)      Hardware problems

4)      max_allowed_packet value reset without restarting the replication

Solution:

Run show slave status  and get these values Relay_Master_Log_File and Exec_Master_Log_Pos, Now reset the replication on slave

CHANGE master TO master_log_file=Relay_Master_Log_File,master_log_pos=Exec_Master_Log_Pos ;

Start Slave ;

we are good to go, issue should be fixed now.for this to work Relay_Master_Log_File file must exist on master.

MySQL Replication Monitoring

July 8th, 2010

MySQL Replication is widely and we need to have strong replication monitoring for these things

1) To monitor if the replication is working.

2) To check the latency between the master and slave .

3) TO check the consistency between the master and slave , as sometime due to manual or master server crash , master and slave may go out of sync.

lets see how we can set up monitoring for each of the scenarios.

MySQL replication status on the can be checked via

mysql> show slave status \G;

output would be like

Slave_IO_State: Waiting for master to send event
Master_Host: master
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000542
Read_Master_Log_Pos: 231260599
Relay_Log_File: relaylog.000496
Relay_Log_Pos: 231260744
Relay_Master_Log_File: mysql-bin.000542
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 231260599
Relay_Log_Space: 231260935
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2013
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:

more details on this can be found onhttp://dev.mysql.com/doc/refman/5.1/en/show-slave-status.html

Case 1) MySQL Replication is working
As we know that MySQL Replication uses two threads Slave_IO and Slave_SQL , slave_IO is responsible for reading the master binary log and writing events from master to the relay log on the slave and slave_SQL is responsible for executing the events from the relay log on the slave .so for MySQL Replication to work both the threads must be running , to check that from the show slave status \G the following values  must be  Slave_IO_Running: Yes and  Slave_SQL_Running: Yes

we can check the variables using the shell or script or Nagios to monitor.

Case 2) To check the latency between the master and slave

MySQL Replication is real time most of the time , but because of many reasons( Slave heavily loaded , IO issue on slave) slave may not be able to  catch up with the master.

I have seen many people using the “Seconds_Behind_Master” from the output of the show slave status , it works most of the time but not always .That value is comparing the slave’s current time with the time on the master when it executed the currently replicating SQL statement. If the slave is low-volume (or not properly slaving) you can end up with misleading information or even a false sense of security.Consider another case when you have  more level of replication example  A is master server   B is slave of A  and Server C which is slave of B , if there is some issue in replication between A and B  , the “Seconds_Behind_Master”  on C will still be showing as 0 but in actual the replication is broken in the sense that C is not getting the latest data from A.

The correct way of monitoring the replication is using

http://www.maatkit.org/doc/mk-heartbeat.html

how to use the mk-heartbeat

on master download the script from
wget http://www.maatkit.org/get/mk-heartbeat

make it executable

on master(192.168.2.80)
create table heartbeat  on Database  heart

CREATE TABLE heartbeat (
id int NOT NULL PRIMARY KEY,
ts datetime NOT NULL
);

It needs to have at least one row

INSERT INTO heartbeat (id) VALUES (1);
now run the script and make it a daemon

./mk-heartbeat -D heart –table heartbeat -u heartbeat -p XXXXXXXXX –update -h 192.168.2.80

on slave(192.168.2.82) download the script from
wget http://www.maatkit.org/get/mk-heartbeat
make it executable

./mk-heartbeat -D heart –table heartbeat -u heartbeat_slave -p XXXXXXXXX –monitor -h 192.168.2.82

the output would be something like

1s [  0.02s,  0.00s,  0.00s ]
1s [  0.03s,  0.01s,  0.00s ]
1s [  0.05s,  0.01s,  0.00s ]
0s [  0.05s,  0.01s,  0.00s ]
0s [  0.05s,  0.01s,  0.00s ]
0s [  0.05s,  0.01s,  0.00s ]
0s [  0.05s,  0.01s,  0.00s ]
1s [  0.07s,  0.01s,  0.00s ]

the output will tell if  mysql replication is having any lag or not.

case 3)  Inconsistency between Mater and Slave

Some time because of manual error or master crash or unclean shutdown the master and slave may be out of sync, its very important to check that and take corrective action .
if not detected on time slave might go completely out of sync and might need to be set completely.

how do we check that master and slave are completely sync.

we can gr8 tool mk-table-checksum from maatkit http://www.maatkit.org/doc/mk-table-checksum.html

lets see how we can use the mk-table-checksum

download
wget http://www.maatkit.org/get/mk-table-checksum
make it executable

create table checksum  on the master 192.168.2.10

on database Test

CREATE TABLE checksum (
db         char(64)     NOT NULL,
tbl        char(64)     NOT NULL,
chunk      int          NOT NULL,
boundaries char(100)    NOT NULL,
this_crc   char(40)     NOT NULL,
this_cnt   int          NOT NULL,
master_crc char(40)         NULL,
master_cnt int              NULL,
ts         timestamp    NOT NULL,
PRIMARY KEY (db, tbl, chunk)
);

for checking we will create a test table

create table testreplication( i int , b varchar(100))

we will insert a row

insert into testreplication
select 1,’name’

–replicate=test.checksum will replicate the checksum run on master(192.168.2.10) to slave (192.168.2.12)
–tables test.testreplication  will check only the table test.testreplication

./mk-table-checksum -u test -p XXXXXXX –replicate=test.checksum  –tables test.testreplication  192.168.2.10

When we run it we get the output on command line

DATABASE   TABLE           CHUNK HOST         ENGINE      COUNT         CHECKSUM TIME WAIT STAT  LAG
test       testreplication     0 192.168.2.10 MyISAM          1         1e5504e6    0 NULL NULL NULL

we now log in to the slave (192.168.2.12) and on the database test we run this query

SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
AS crc_diff
FROM checksum
WHERE master_cnt <> this_cnt OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc);

out put will be blank as the replication is working properly

db    tbl    chunk    cnt_diff    crc_diff

now we remove the row directly from the slave (192.168.2.12)

delete from testreplication ;

we again run the

./mk-table-checksum -u test -p XXXXXXX –replicate=test.checksum  –tables test.testreplication  app146

output on command line

DATABASE   TABLE           CHUNK HOST         ENGINE      COUNT         CHECKSUM TIME WAIT STAT  LAG
test       testreplication     0 192.168.2.10 MyISAM          1         1e5504e6    0 NULL NULL NULL

when we run the query to check the result

SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
AS crc_diff
FROM checksum
WHERE master_cnt <> this_cnt OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc);

now we get

db             tbl              chunk    cnt_diff    crc_diff
test          testreplication      0    -1                1

as now the master and slave data differs , the output is giving table name and the crc_diff , one very important thing to understand that this finds the tables which are not in sync even when the new data is getting logged into the master DB as  the checksum calculating command on the master is replicated exactly on the slave  and we know that the mysql replication is syncrnous so on the slave when this command to calculate the checksum is executed its considering the data as it was on master.using this we can find out all the tables which are not in sync in master and slave and take corrective action.

in next post I would be writing in detail how to find the difference on the individual tables between master and slave and how to rectify that and how to automate that process .

Thanks
Pankaj Joshi