Saturday, February 06, 2010

MySql Replication: Relay Log Corruption

MySql Replication Relay Log Corruption

 1 Problem: Disk was full in the slave. Relay log corrupted

After the checking the slave MySQL server error, I noticed that the slave was stopped with the following error message in the error log:

090925 16:39:47 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1330925129, event_type: 16
090925 16:39:47 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
090925 16:39:47 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
090925 16:39:47 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log ‘appl-1-log-bin.000078' position 987008254

2 Use mysqlbinlog to check problem with the relay log:

To varify if the relay log was indeed corrupted, I used the mysqlbinlog utility to check the relay log file with the position that was mentioned in the error log.

  #>mysqlbinlog --position=19193011 slave-1-relay-bin.000575
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 542069838, event_type: 1
Could not read entry at offset 19193011:Error in log format or read error
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;


3 To fix the problem.

Following steps were taken to fix the problem and restart the slave

Step 1:  Check slave status

From the mysql client, issued show slave status command to get the current bin log file name and position as we as error message.

MySQL>show slave status\G;

******* 1. row **********
Slave_IO_State:
Master_Host: appl-1.xyz..com
Master_User: repl_dw
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: appl-1-log-bin.000078
Read_Master_Log_Pos: 987008254
Relay_Log_File: slave-1-relay-bin.000049
Relay_Log_Pos: 97173
Relay_Master_Log_File: appl-1-log-bin.000078
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their name issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 987008254
Relay_Log_Space: 298049181
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
mysql>

Took note on Master_Log_File: appl-1-log-bin.000078 and Exec_Master_Log_Pos: 987008254 values.  The Master_Log_File was the master bin log file the slave was reading while it encountered the error.  The Exec_Master_Log_Pos was the postion in the master bin log file which the slave stopped at when it encountered the error.


Step 2: Stop the slave, and reset bin log file position

To make sure that we did not lost any data, we CANGE  MASTER TO to re-point replication to Relay_Master_Log_File and Exec_Master_Log_Pos which were the file and postion where the slave stopped due to the error. The CHANGE MASTER TO command also would purge existing relay logs and  re-fetched all events which have not been executed yet. Doing this command make sure your master is operational and it still has all the logs needed to re-fetch events.

MySql> stop slave;

From the mysql prompt, issue the following commond:
CHANGE master TO master_log_file=Relay_Master_Log_File, master_log_pos=Exec_Master_Log_Pos

MySqk> change master to master_log_file=appl-1-log-bin.000078',master_log_pos=987008254;
Query OK, 0 rows affected (1.22 sec)

Step 3 Re-Started the Slave

MySql> slave start;

After the slave started, checked the MySQL server error log:

090925 18:23:53 [Note] Slave SQL thread initialized, starting replication in log 'appl-1-log-bin.000078' at position 987008254, relay log '/u0/mysql/mysqlreplog/dwh-1-relay-bin.000001' position: 4
090925 18:23:53 [Note] Slave I/O thread: connected to master 'repl_user@appl-1.xyz..com:3306', replication started in log ‘app-1-log-bin.000078' at position 987008254

4 Explaination and Reference:

During replication the MySQL server creates a number of files that are used to hold the relayed binary log from the master, and record information about the current status and location within the relayed log. There are three file types used in the process:
  1.  The relay log consists of the events read from the binary log of the master. Events in this binary log are executed on the slave as part of the replication thread.  Therefore, it will be fine to reset this log file as we known what master bin log postion the slave was stopped at when the error occured. Moreover, Since the replay log was corrupted in our case, we need to regenerate the relay log file.
  2. The master.info file contains the status and current configuration information for the slave's connectivity to the master. The file holds information on the master host name, login credentials, and the current position within the master's binary log.
  3. The relay-log.info file holds the status information about the execution point within the slave's relay log files. The relationship amount the three files and the replication process is as follows:
    • The master.info file retains the point within the master binary log that has been read from the master.
    • These read events are written to the relay log.
    • The relay-log.info file records the position within the relay log of the statements that have been executed. 
 Reference:




Post Update History
----------------------
  • 2010-02-10: Initial draft from note.
  • 2010-05-06: Fixed typoes, organized note and add references.

1 comment:

Anonymous said...

Approvingly your article helped me truly much in my college assignment. Hats off to you dispatch, intention look forward for more related articles without delay as its united of my choice topic to read.