Showing posts with label MySQL Replication. Show all posts
Showing posts with label MySQL Replication. Show all posts

Tuesday, April 27, 2010

Mysql Replication Slave Stopped Due to “Lock Wait Timeout Exceeded” Error


MySQL Replication: [ERROR] Slave: : Error 'Lock wait timeout exceeded; try restarting transaction' on query.

In a deadlock situation, the Mysql Replication server slave would let the transaction to retry up to the number of time which was set for the slave_transaction_retries system variable before stopping the slave and logging a message like the one bellow in the error log:

100419 0:03:47 [ERROR] Slave: : Error 'Lock wait timeout exceeded; try restarting transaction' on query.   Default database: 'mysqldb'. Query: 'UPDATE a_table SET colum_1 =a_value_colum + 1000,…,… WHERE id='12345'', Error_code: 1205
100419 0:03:47 [Note] Slave SQL thread exiting, replication stopped in log 'slave-1-log-bin.006034' at position 10470968

Refering to the MySQL manual; the 1205 error has the following description:

Error: 1205 SQLSTATE: HY000 (ER_LOCK_WAIT_TIMEOUT)
Message: Lock wait timeout exceeded; try restarting transaction.

Since the tables involved in the locks were InnoDB tables, we could check the InnoDB status by issuing the following command:
Mysql> show engine innodb status\G;
---------------------------------------------
LATEST DETECTED DEADLOCK
---------------------------------------------
*** (1) TRANSACTION:
TRANSACTION 0 2513168731, ACTIVE 0 sec, process no 17876, OS thread id 685890464 starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 320, undo log entries 1
MySQL thread id 8018, query id 317267495 user_id Updating
...
There were some useful information regarding the lock from the above output under the LATEST DETECTED DEADLOCK section which details the causes of the deadlock such as what kind of locks were used, and the query statements that involved in the lock.

To resolve the deadlock issues, we may need to debug the query statements and make code changes. In the mean time, we can check innodb_lock_wait_timeout and slave_transaction_retries to see if we can increase it a bit to avoid stopping the slave. To do so, issue the following statement from the mysql client:

Mysql> show variables;
innodb_lock_wait_timeout | 150

slave_transaction_retries | 10

The innodb_lock_wait_timeout variable set the length of time a transaction will wait for a row lock resource before “giving up”. The default value is 50 seconds. Please note that this variable applies to InnoDB row locks only. A MySQL table lock does not happen inside InnoDB and this timeout does not apply to waits for table locks. InnoDB does detect transaction deadlocks in its own lock table immediately and rolls back one transaction. The lock wait timeout value does not apply to such a wait.

The slave_transaction_retries set the number of time the replication slave thread to retry before stopping with an error due to any of the following situation (the default for this variable is 10):

· an InnoDB deadlock
· the transaction's execution time exceeded InnoDB's innodb_lock_wait_timeout
· the transaction’s execution time exceeded NDBCLUSTER’s TransactionDeadlockDetectionTimeout
· the transaction’s execution time exceeded NDBCLUSTER's TransactionInactiveTimeout,

Please also notice that, if the slave_transaction_retries server variable is set, the transaction will hold the lock for (slave_transaction_retries * innodb_lock_wait_timeout) seconds before giving up. In the example above, it means that a lock will be held for proximately 1,500 seconds (10*150). Therefore, try to find out what is causing this lock and fix it maybe a better solution.

In any case, please remember to restart the slave because deadlock error stopped the slave as it had indicated in the error log.

Reference:

Saturday, February 06, 2010

MySql Replication: Relay Log Corruption

MySql Replication Relay Log Corruption

 1 Problem: Disk was full in the slave. Relay log corrupted

After the checking the slave MySQL server error, I noticed that the slave was stopped with the following error message in the error log:

090925 16:39:47 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1330925129, event_type: 16
090925 16:39:47 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
090925 16:39:47 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
090925 16:39:47 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log ‘appl-1-log-bin.000078' position 987008254

2 Use mysqlbinlog to check problem with the relay log:

To varify if the relay log was indeed corrupted, I used the mysqlbinlog utility to check the relay log file with the position that was mentioned in the error log.

  #>mysqlbinlog --position=19193011 slave-1-relay-bin.000575
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 542069838, event_type: 1
Could not read entry at offset 19193011:Error in log format or read error
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;


3 To fix the problem.

Following steps were taken to fix the problem and restart the slave

Step 1:  Check slave status

From the mysql client, issued show slave status command to get the current bin log file name and position as we as error message.

MySQL>show slave status\G;

******* 1. row **********
Slave_IO_State:
Master_Host: appl-1.xyz..com
Master_User: repl_dw
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: appl-1-log-bin.000078
Read_Master_Log_Pos: 987008254
Relay_Log_File: slave-1-relay-bin.000049
Relay_Log_Pos: 97173
Relay_Master_Log_File: appl-1-log-bin.000078
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their name issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 987008254
Relay_Log_Space: 298049181
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
mysql>

Took note on Master_Log_File: appl-1-log-bin.000078 and Exec_Master_Log_Pos: 987008254 values.  The Master_Log_File was the master bin log file the slave was reading while it encountered the error.  The Exec_Master_Log_Pos was the postion in the master bin log file which the slave stopped at when it encountered the error.


Step 2: Stop the slave, and reset bin log file position

To make sure that we did not lost any data, we CANGE  MASTER TO to re-point replication to Relay_Master_Log_File and Exec_Master_Log_Pos which were the file and postion where the slave stopped due to the error. The CHANGE MASTER TO command also would purge existing relay logs and  re-fetched all events which have not been executed yet. Doing this command make sure your master is operational and it still has all the logs needed to re-fetch events.

MySql> stop slave;

From the mysql prompt, issue the following commond:
CHANGE master TO master_log_file=Relay_Master_Log_File, master_log_pos=Exec_Master_Log_Pos

MySqk> change master to master_log_file=appl-1-log-bin.000078',master_log_pos=987008254;
Query OK, 0 rows affected (1.22 sec)

Step 3 Re-Started the Slave

MySql> slave start;

After the slave started, checked the MySQL server error log:

090925 18:23:53 [Note] Slave SQL thread initialized, starting replication in log 'appl-1-log-bin.000078' at position 987008254, relay log '/u0/mysql/mysqlreplog/dwh-1-relay-bin.000001' position: 4
090925 18:23:53 [Note] Slave I/O thread: connected to master 'repl_user@appl-1.xyz..com:3306', replication started in log ‘app-1-log-bin.000078' at position 987008254

4 Explaination and Reference:

During replication the MySQL server creates a number of files that are used to hold the relayed binary log from the master, and record information about the current status and location within the relayed log. There are three file types used in the process:
  1.  The relay log consists of the events read from the binary log of the master. Events in this binary log are executed on the slave as part of the replication thread.  Therefore, it will be fine to reset this log file as we known what master bin log postion the slave was stopped at when the error occured. Moreover, Since the replay log was corrupted in our case, we need to regenerate the relay log file.
  2. The master.info file contains the status and current configuration information for the slave's connectivity to the master. The file holds information on the master host name, login credentials, and the current position within the master's binary log.
  3. The relay-log.info file holds the status information about the execution point within the slave's relay log files. The relationship amount the three files and the replication process is as follows:
    • The master.info file retains the point within the master binary log that has been read from the master.
    • These read events are written to the relay log.
    • The relay-log.info file records the position within the relay log of the statements that have been executed. 
 Reference:




Post Update History
----------------------
  • 2010-02-10: Initial draft from note.
  • 2010-05-06: Fixed typoes, organized note and add references.

Saturday, March 21, 2009

MySql Replication: Skip Duplicate Key Error (Error 1062, Unique Constraint Violation)

MySQL replication will stop whenever there is a query causing an error on the slave. The error message will be written to the error log; therefore, it allows us to identify the problem and fix it,

One of the common error is error 1062, unique constraint violation, that is, when a duplicate key is being insert into a table. For example you might see an errors message in the error log or from “show slave status” output:

"1062 Error 'Duplicate entry 'abc' for key 1' on query. Default database: 'mydb'. Query: 'INSERT INTO ..."

Since, in most case, this error will need to fix in the source (master) database, we can skip the error and let the replication continue on slave.

To skip an error , from the slave, you can use the SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1 statement to skip a sql statement which caused the 1062 error.

mysql>stop slave;
mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
mysql> start slave;

For, 1062 error, during the normal replication process, you can also specify the following variable in the
my.cnf configuration file to tell the slave to skip all 1062 errors

slave-skip-errors = 1062

Of cause, please make sure that all the statements, which were skipped, will eventually be corrected in the master and will not cause any data issue.


Sunday, March 01, 2009

Mysql Replication: Use Slave Skip Counter to Skip Error Statement

To Skip an error in a replication event during replication, use:

SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N

This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.

This statement is valid only when the slave thread is not running. Otherwise, it produces an error.

When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.

· For transactional tables, an event group corresponds to a transaction.
· For nontransactional tables, an event group corresponds to a single SQL statement.

A single transaction can contain changes to both transactional and nontransactional tables.

When you use SET GLOBAL SQL_SLAVE_SKIP_COUNTER to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.

Examples of using SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N :

1. Skip statements that caused duplicates or referential integrity error.

When a referential integrity error occurred, the transaction will cause the replication thread to hang. To skip the transaction do:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
mysql> START SLAVE;

Use the value 1 for any SQL statement that does not use AUTO_INCREMENT or LAST_INSERT_ID(), otherwise you will need to use the value 2. Statements that use AUTO_INCREMENT or LAST_INSERT_ID() take up 2 events in the binary log.


Tuesday, August 12, 2008

MySql Replication: Set Up Master and Slave

1. INSTRUCTION

The following post contains that base installation steps for setting up MySql Replication Servers (version 5.0.x) on a new MySql database systems. It makes the assumption that the database system is a new system and there is no existing database running; therefore, there is no need to synchronizing the databases among master and slave before replication.

This document uses three nodes replication systems for demonstration purposes. Please see diagram bellow on how data flow.


The replication will go from mySql-1, the master server, to mySql-2, the first slave, slave-1. The additional slave, slave 2, will replicate from mySql-2; therefore, mySql-2 also acts as a master for mySql-3. Please note that Mysql replication server replicate data using ‘pull’ which means that data are pulled by the slaves from the master.

Note that the hostnames, user ids, passowrds, and installation directories are for demonstration purposes.

2. VARIFY SERVER INSTALLATIONS AND CONFIGURATION

The configurations demo here will replicate all databases.

2.1 Verify Master Server Installation and Configuration

Following are the minimum configuration variable setting. Please refer to MySql documentation on other configuration variable settings.

For master, add the following entries to the [mysqld] section of the my.cnf file:
#make sure that skip-networking is not enable (commented) in my.cnf

#skip-networking
server-id = 1
# log-bin is the path and file name where the binary logs are stored
log-bin=/var/lib/mysqlreplog/mySql-1-log-bin
innodb_flush_log_at_trx_commit=1
sync_binlog=1

2.2 Verify Slave Server Installation and Configuration

2.2.1 Slave Server: mySql-2
Add the following entries to the [mysqld] section of the my.cnf file. Since slave 1, server myMyMySql-2, will act as master server for slave 2, there are entries for both master and slave on the my.cnf file in this node.

[mysqld]
# Replication Configuration
#skip-networking
server-id=2
# Binary log
log-bin=/var/lib/mysqlreplog/myMySql-2-log-bin
log-slave-updates
sync_binlog=1

# Relay-logs
relay-log=/var/lib/mysqlreplog/mySql-2-relay-bin
relay-log-index=/var/lib/mysqlreplog/mySql-2-relay-bin-index
relay-log-info=/var/lib/mysqlreplog/mySql-2-relay-bin-info
master-info-file=/var/lib/mysqlreplog/mySql-2-master-info
max_relay_log_size= 100Mpid file

2.2.2 Slave Server: mySql-3

[mysqld]
# Replication Configuration
#skip-networking
server-id = 3
# Binary log
log-bin=/var/lib/mysqlreplog/mySql-3-log-bin

# Relay-logs
relay-log=/var/lib/mysqlreplog/mySql-3-relay-bin
relay-log-index=/var/lib/mysqlreplog/omySql-3-relay-bin-index
relay-log-info=/var/lib/mysqlreplog/omySql-3-relay-bin-info
master-info-file=/var/lib/mysqlreplog/omySql-3-master-info
max_relay_log_size= 100M


3. CREATE REPLICATION USERS

3.1 Create Replication User for Slave 1, mySql-2
Each master server need to have a user id for a slave to use to pull data from. On mySql-1, create a user for slave 1, server myMySql-2 to pull data from.

3.1.1 Create replication user and grant privileges: (password is a place holder only)

mysql>create user 'repsql2' identified by 'repsql2pass';

3.1.2 Grant Replication Privileges

mysql>grant replication slave on *.* to 'repsql2' @ 'hostname or ip' identified by 'repsql2pass';

Replace ”mySql-2. or ip” with the hostname or ip address of server mySql-2

3.2 Create Replication User for Slave-3, mySql-3
On mySql-2 create a user for slave 2, server mySql-3 to pull data from.

3.2.1 Create replication user and grant privileges: (password is a place holder only)
mysql>create user 'repsql3' identified by 'repsql3pass';

3.2.2 Grant Replication Privileges

mysql>grant replication slave on *.* to

Replace ”mySql-3. or ip” with the hostname or ip address of server mySql-3.

4. SET UP REPLICATION ON MASTER SERVER, mySql-1
1. Stop mysql server: mySql-1
#>/sbin/service mysqld stop

2. check that no mysqlds are running

#>ps –ef |grep mysql

3. Re-start the mysql server to reflect any change that made in section 2.1 Verify Master Server Installation and Configuration
4. Lock the tables to prepare for a backup dump.

mysql> flush tables with read lock;

Query OK, 0 rows affected (0.00 sec)

5. Record the log name and offset. The file column shows the name of the log file and the position shows the offset within the file. Record these values for setting up the slave.
mysql> show master status;

+------------------+----------+--------------+------------------+
File Position Binlog_Do_DB inlog_Ignore_DB
+------------------+----------+--------------+------------------+
mySql-1-log-bin.000001 98
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

6. Shut down mysql server again to create a back up. Make sure there is no mysqld running.

#>/sbin/service mysqld stop

#>ps –ef | grep mysql
7. create a back up of the data files in the mysql data directory: datadir=/var/lib/mysql

#> tar –cvf /u0/prod-mysql-snapshot.tar /var/lib/mysql
8. bring mysql back on line

>/sbin/service mysqld start
9. Unlock the tables
mysql> unlock tables;

Query OK, 0 rows affected (0.00 sec)


5. SETUP REPLICATION ON SLAVE SERVER 1, mySql-2

1. Stop slave server: mySql-2

#>/sbin/service mysql stop

2. Create directory /var/lib/mysql/backup
3. Backup the mysql data file in mySql-2 to the backup directory.

#>/var/lib/mysql>cp –R * backup

4. Apply the backup copy from master (from section 4, step 7.)
Scp the file over and untar it.
#/tar –xvf /u0/mySql-1-mysql-snapshot.tar
5. start the slave server and check the err log to make sure it started nomarly.
#/sbin/service mysql start

6. Update master information
Mysql> change master to master_host =’
-> master_port = 3306,
-> master_user=’repsql2’,
->master_password=’repsql2pass’,
->master_log_file=’’
->master_log_spos=;

6. Check slave status.

Mysql>show slave status\G;

Mysql>show full processlist;

7. logon the master server mySql-1 and check if slave threads are connected:

mysql>show full processlist;

6. SET UP SLAVE SERVER 2: mySql-3
6.1 Get Backup Data Dump From mySql-2
1.Logon to mysql server myMySql-2 and lock the tables.

Mysql> Flush tables with read lock;

2. Record the log name and offset;
mysql> show master status;
+----------------------+----------+--------------+------------------+
File Position Binlog_Do_DB \Binlog_Ignore_DB
+----------------------+----------+--------------+------------------+
mySql-2-log-bin.000007 159616
+----------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

3. Stop mysql server mySql-2
#>/sbin/service mysql stop

4. Make a backup of the data files

#> tar –cvf /u0/mySql-2-mysql-snapshot.tar /var/lib/mysql
5. bring myMySql-2 back online and unlock tables

#>/sbin/service mysql start
Mysql>unlock tables;

6.2 Set Up Slave Server 2, mySql-3
1. Stop mysql server mySql-3
#>/sbin/service mysql stop

2. Apply backup from mySql-2

#>/tar – xvf /u0/myMySql-2-mysql-snapshot-20080626.tar /var/lib/mysql

3. Start mysql server mySql-3

#>/sbin/service mysql start

4. Update master information

Mysql> change master to master_host =’
-> master_port = 3306,
-> master_user=’repsql3’,
->master_password=’resql3pass’,
->master_log_file=’’
->master_log_pos=

8. Check slave status.

Mysql>show slave status\G;
Mysql>show full processlist;

9. Logon the master server sq2-1 and check if slave threads are connected:

mysql>show full processlist;

7. TEST
Create a test table in mySql-1 and make sure that it get replicated to mySql-2 and mySql-3

Mysql> create table reptest (a int,b int,c varchar(100));

Mysql> insert into reptest(a,b,c) values (1,2,’relication test’);

8. STOP AND START SLAVE
To stop a slave, use:
Mysql> slave stop;

To start a slave, use:
Mysql> slave start;