Tuesday, October 26, 2010

MYSQL SPEEPUP INSERT INTO A MYISAM TABLE

Accordint to the MySQL Manual, there are a few ways to speedup inserts while adding data to a MyISAM table. Following are the original document from MySQL manual plus the notes I added while I was experiencing with real data loads.

1. Background Information
 
The time required for inserting a row, according to the MySQL manual, is determined by the following factors, where the numbers indicate approximate proportions:
· Cnnecting: (3)
· There are many server variables that can be used to help optimize the user connections as stated in the post:
· Sending query to server: (2)
· Parsing query: (2)
· Inserting row: (1 × size of row)
· Inserting indexes: (1 × number of indexes)
· Closing: (1)

There are two additional things which need to be considered regarding overheard for inserts:
  • This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query. To try to reduce the times need to open table, try to cache them. To cache the table, set the table_cache variable. Please note that the table_cache, max_connections, and max_tmp_tables system variables affect the maximum number of files the server keeps open. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. Please refer to this note for connection related variable setting.
  • The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

2. Methods Which Can Be Used to speed up Inserts 
These are the methods which can be used to speed up inserts. You will notice that most of the methods mentions here are related to MySQL MyISAM tables.

(1.) If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);

Instead of:

   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’);
   Insert into t1 (a,b,c,d) values (3,5,6,’2010-02-01’);
   Insert into t1 (a,b,c,d) values (7,8,9,’2010-03-01’);

(2.) If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

(3.) If multiple clients are inserting a lot of rows, you can get higher speed by using the INSERT DELAYED statement. “Insert Delay” hint delays the inserts by storing the insert statements in the memory queue utile the target table is free from reads. Hence, the insert statement will return quickly and all the insert statements that are stored in the queue can be written in one block, which is more optimized that written one statement at a time.



Hear is the syntax for insert delay:

Insert delay into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);
See Section 12.2.5.2, INSERT
Please note that delay insert only works with MyISAM, MEMORY, ARCHIVE and BLACKHOLE(v. 5.1.19) table. And, there is downside of using “Insert Delayed”. There will be negative performance gain if the table is empty or not being used and add overhead to the server to hold additional thread for each table for which there is delayed rows, etc. Please read the manual for details on the negative impacts.
Theses are system status variables can be used to monitor delay insert uses. Here are the Status Variable meaning:
  • Delayed_insert_threads -- Number of handler threads
  • Delayed_writes -- Number of rows written with INSERT DELAYED
  • Not_flushed_delayed_rows -- Number of rows waiting to be written

 (4.) For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, this works well for tables that have more reads than writes because it just keeps adding data at the end of the data file.
To enable concurrent inserts, we need to set the concurrent_insert system variable. The values for the variable are:

· 0 -- Disables concurrent inserts
· 1 -- Enables concurrent insert for MyISAM tables that do not have holes. A hole means that there is free space in the data file, i.e., rows were deleted from the table. This is the default value.
· 2 -- Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole,  new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole.

To set the variable with other values (0 or 2) rather than the default value, for example concurrent_insert = 2, enter the parameter in the my.cnf file and restart the server. Or you can also dynamically set it in MySQL, for example:
   MySQL> set global concurrent_insert = 2;

Here are a few things that need to keep in mind.

1. ) Under circumstances where concurrent inserts can be used, there is seldom any need to use the DELAYED modifier for INSERT statements.
2. ) If you are using the binary log, concurrent inserts are converted to normal inserts for CREATE ... SELECT or INSERT ... SELECT statements. This is done to ensure that you can re-create an exact copy of your tables by applying the log during a backup operation.
3. ) With LOAD DATA INFILE, if you specify CONCURRENT with a MyISAM table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other sessions can retrieve data from the table while LOAD DATA is executing.
4)Use of the CONCURRENT option affects the performance of LOAD DATA a bit, even if no other session is using the table at the same time.
MySQL document has more information on using concurrent inserts.
(5.) Using bulk load command, LOAD DATA INFILE, to load data into a table from a text file can be 20 times faster than using INSERT statements.

Updating indexes takes time and resources during inserts. When data are loaded into a table that has many indexes, we can follow the steps bellow to speed up the loads.

1.) Disable the indexes
Non unique indexes of a MyISAM table can be disabled using the following statement:
MySQL> ALTER TABLE t1 DISABLE KEYS;
The alter table disable keys statement also dose a flush table.
2.) Load data into the table using LOAD DATA INFILE statement. This does not update any indexes and therefore is very fast.
3.) Enable the indexes
Indexes can be enabled using the following statement:

MySQL> ALTER TABLE tbl_name ENABLE KEYS;

Here is a note on MySQL Disable and Enable Indexes Command.

http://databobjr.blogspot.com/2010/05/mysql-alter-table-disable-keys.html

You can also use myisamchk --keys-used=0 -rq /path/to/db/tbl_name to disable index keys and use myisamchk -rq /path/to/db/tbl_name to enable index keys. However, if you used myisamchk to disable or enable index keys, you must issue Flush Tables statement or a mysqladmin flush-tables command. More information can be found here: http://dev.mysql.com/doc/refman/5.0/en/myisamchk.html
.
(6.) To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable.

2. SUMMARY

The methodologies mentioned above were not THE only methods that could speed up inserts nor should they be used exclusively in that orders. It requires a lot of trails and errors to have one that works for a particular environment.

References:



No comments: