Showing posts with label MySQL Performance Tuning. Show all posts
Showing posts with label MySQL Performance Tuning. Show all posts

Thursday, December 02, 2010

Performance Tuning Best Practices for MySQL

Performance Tuning Best Practices for MySQL by Jay Pipes.




References:


Tuesday, October 26, 2010

MYSQL SPEEPUP INSERT INTO A MYISAM TABLE

Accordint to the MySQL Manual, there are a few ways to speedup inserts while adding data to a MyISAM table. Following are the original document from MySQL manual plus the notes I added while I was experiencing with real data loads.

1. Background Information
 
The time required for inserting a row, according to the MySQL manual, is determined by the following factors, where the numbers indicate approximate proportions:
· Cnnecting: (3)
· There are many server variables that can be used to help optimize the user connections as stated in the post:
· Sending query to server: (2)
· Parsing query: (2)
· Inserting row: (1 × size of row)
· Inserting indexes: (1 × number of indexes)
· Closing: (1)

There are two additional things which need to be considered regarding overheard for inserts:
  • This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query. To try to reduce the times need to open table, try to cache them. To cache the table, set the table_cache variable. Please note that the table_cache, max_connections, and max_tmp_tables system variables affect the maximum number of files the server keeps open. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. Please refer to this note for connection related variable setting.
  • The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

2. Methods Which Can Be Used to speed up Inserts 
These are the methods which can be used to speed up inserts. You will notice that most of the methods mentions here are related to MySQL MyISAM tables.

(1.) If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);

Instead of:

   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’);
   Insert into t1 (a,b,c,d) values (3,5,6,’2010-02-01’);
   Insert into t1 (a,b,c,d) values (7,8,9,’2010-03-01’);

(2.) If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

(3.) If multiple clients are inserting a lot of rows, you can get higher speed by using the INSERT DELAYED statement. “Insert Delay” hint delays the inserts by storing the insert statements in the memory queue utile the target table is free from reads. Hence, the insert statement will return quickly and all the insert statements that are stored in the queue can be written in one block, which is more optimized that written one statement at a time.



Hear is the syntax for insert delay:

Insert delay into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);
See Section 12.2.5.2, INSERT
Please note that delay insert only works with MyISAM, MEMORY, ARCHIVE and BLACKHOLE(v. 5.1.19) table. And, there is downside of using “Insert Delayed”. There will be negative performance gain if the table is empty or not being used and add overhead to the server to hold additional thread for each table for which there is delayed rows, etc. Please read the manual for details on the negative impacts.
Theses are system status variables can be used to monitor delay insert uses. Here are the Status Variable meaning:
  • Delayed_insert_threads -- Number of handler threads
  • Delayed_writes -- Number of rows written with INSERT DELAYED
  • Not_flushed_delayed_rows -- Number of rows waiting to be written

 (4.) For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, this works well for tables that have more reads than writes because it just keeps adding data at the end of the data file.
To enable concurrent inserts, we need to set the concurrent_insert system variable. The values for the variable are:

· 0 -- Disables concurrent inserts
· 1 -- Enables concurrent insert for MyISAM tables that do not have holes. A hole means that there is free space in the data file, i.e., rows were deleted from the table. This is the default value.
· 2 -- Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole,  new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole.

To set the variable with other values (0 or 2) rather than the default value, for example concurrent_insert = 2, enter the parameter in the my.cnf file and restart the server. Or you can also dynamically set it in MySQL, for example:
   MySQL> set global concurrent_insert = 2;

Here are a few things that need to keep in mind.

1. ) Under circumstances where concurrent inserts can be used, there is seldom any need to use the DELAYED modifier for INSERT statements.
2. ) If you are using the binary log, concurrent inserts are converted to normal inserts for CREATE ... SELECT or INSERT ... SELECT statements. This is done to ensure that you can re-create an exact copy of your tables by applying the log during a backup operation.
3. ) With LOAD DATA INFILE, if you specify CONCURRENT with a MyISAM table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other sessions can retrieve data from the table while LOAD DATA is executing.
4)Use of the CONCURRENT option affects the performance of LOAD DATA a bit, even if no other session is using the table at the same time.
MySQL document has more information on using concurrent inserts.
(5.) Using bulk load command, LOAD DATA INFILE, to load data into a table from a text file can be 20 times faster than using INSERT statements.

Updating indexes takes time and resources during inserts. When data are loaded into a table that has many indexes, we can follow the steps bellow to speed up the loads.

1.) Disable the indexes
Non unique indexes of a MyISAM table can be disabled using the following statement:
MySQL> ALTER TABLE t1 DISABLE KEYS;
The alter table disable keys statement also dose a flush table.
2.) Load data into the table using LOAD DATA INFILE statement. This does not update any indexes and therefore is very fast.
3.) Enable the indexes
Indexes can be enabled using the following statement:

MySQL> ALTER TABLE tbl_name ENABLE KEYS;

Here is a note on MySQL Disable and Enable Indexes Command.

http://databobjr.blogspot.com/2010/05/mysql-alter-table-disable-keys.html

You can also use myisamchk --keys-used=0 -rq /path/to/db/tbl_name to disable index keys and use myisamchk -rq /path/to/db/tbl_name to enable index keys. However, if you used myisamchk to disable or enable index keys, you must issue Flush Tables statement or a mysqladmin flush-tables command. More information can be found here: http://dev.mysql.com/doc/refman/5.0/en/myisamchk.html
.
(6.) To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable.

2. SUMMARY

The methodologies mentioned above were not THE only methods that could speed up inserts nor should they be used exclusively in that orders. It requires a lot of trails and errors to have one that works for a particular environment.

References:



Thursday, October 21, 2010

MySQL Thread States Repair By Sorting Vs. Repair With Key Cache

When perform any of the REPAIR, ALTER or LOAD or ENABLE KEYS command on a MyISAM table, it also causes the rebuilding of the indexes. Indexes are rebuilt (repaired) in two modes:
  •  Repair by sorting -- The repair code is using a sort to create indexes.
  •  Repair using keycache -- The repair code is using creating keys one by one through the key cache. This is much slower than Repair by sorting.
To start repairing, first it tests if the table can be repaired by sorting provided it meets the following requirements:
  •   Table at least has one key 
  •   Total size needed for individual key is less than myisam_max_sort_file_size
If it meets the above requirements, then it uses either regular sorting if myisam_repair_threads = 1 (default) by building each key at a time or in parallel if myisam_repair_threads > 1 by using ‘n’ threads in parallel (n = total keys in the table).
 
If it fails to satisfy the above conditions, then it falls to expensive keycache repair mode.

Therefore, to avoid keycache repair, set the myisam_max_sort_file_size to the max index size.

 Related Note:
References:




Friday, July 30, 2010

IP To Country Lookup Table Using MySQL Spatial Index

I have an IP to Country lookup table that is used by various applications. The table is very simple:

CREATE TABLE `lu_ip_to_country` (
     id INT UNSIGNED NOT NULL auto_increment,
     start_ip_num int(10) UNSIGNED NOT NULL,
     end_ip_num int(10) UNSIGNED NOT NULL,
     country_id char(2) NOT NULL,
     iso_country_code varchar(3) DEFAULT NULL,
     country_name varchar(100) DEFAULT NULL,
     PRIMARY KEY (id),
     key start_end_ip (start_ip_num,end_ip_num)
);
  •  Start_ip_num: Beginning IP Number
  •  End_ip_num: Ending IP Number
  •  Country_id: two character country abbreviations (US, BG, FR, etc.) similar to ISO code
  •  ISO_country_code: ISO 3166 Country Code
  •  Country_name: Country Name

Beginning IP Number and Ending IP Number are the number version of the IPs which are calculated as follows:
       Ip number = 16777216*w + 65536*x + 256*y + z
Where w, x, y, z represent the IP address (w, x,y,z). For example: if ip is 61.94.7.255,then w=61,x=94,y=7 and z=255.

 The queries that use to lookup country information are simple as well:
      Select country_id from lu_ip_to_country
      Where v_ip_num_lookup between start_ip_num and end_ip_num

In this query, the v_ip_num_lookup is a variable used to store the calculated ip number. For example, '61.94.7.255’=1029572607.

The other query utilized the MySQL INET_ATON() function, which convert the ip to ip number. i.e. INET_ATON('61.94.7.255') = 1029572607. The query is the following:

   Select country_id from lu_ip_to_country
   Where INET_ATON('61.94.7.255') between start_ip_num and end_ip_num

These tow queries worked fine but slow. It will take up-to 1.5 second to return the result. And they were even slower when there were concurrent requests to this table. I had tried different indexes, and even tried to use memory table, but the improvements were not notable.

The other week, while I was searching the web, I came across an article, which was written by Jeremy Cole, described how to use MySQL GIS with spatial R-tree indexes for similar ip-to-country lookup in MySQL.I decided to give it a try. And, it turn out that the result was excellence.

Jeremy Cole’s article gave a vary detail explanation on how GIS with spatial R-tree indexes work in this case. Here are the steps on how I implemented and tested it (following what were described in the article).

1. Reconstruct the table to using POLYGON type and spatial index

CREATE TABLE lu_ip_to_country_geoip (
     id INT UNSIGNED NOT NULL auto_increment,
     ip_poly POLYGON NOT NULL,
     start_ip_num int(10) UNSIGNED NOT NULL,
     end_ip_num int(10) UNSIGNED NOT NULL,
     country_id varchar(3) NOT NULL,
     iso_country_code varchchar(3) DEFAULT NULL,
    country_name varchar(100) DEFAULT NULL,
    PRIMARY KEY (id),
    SPATIAL INDEX (ip_poly)
);

2. Export the data from the existing table
     Select start_ip_num,
                end_ip_num,
                country_id,
               iso_country_code,
              country_name
             Into outfile ‘/tmp/ip-country-data.dat’
     FIELDS
            TERMINATED BY ","
              ENCLOSED BY "\""
     LINES
           TERMINATED BY "\n"
      FROM lu_ip_to_country;

If you don’t have the data in the first place, you can download them from MaxMind web site (http://www.maxmind.com/app/country) as mentioned in the article.

3. Load the data into the new table with the MySQL GIS function to build the POLYGON for ip_poly field from the start_ip_num and end_ip_num fields.

    LOAD DATA LOCAL INFILE "/tmp/ip-country-data.dat"
         INTO TABLE lu_ip_to_country_geoip
         FIELDS
            TERMINATED BY ","
             ENCLOSED BY "\""
         LINES
              TERMINATED BY "\n"
           (
               @start_ip_num, @end_ip_num,
                @country_id,@iso_country_code, @country_name
             )
       SET
            id := NULL,
            start_ip_num := @start_ip_num,
           end_ip_num := @end_ip_num,
           ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
          /* clockwise, 4 points and back to 0 */
          POINT(@start_ip_num, -1), /* 0, top left */
          POINT(@end_ip_num, -1), /* 1, top right */
          POINT(@end_ip_num, 1), /* 2, bottom right */
          POINT(@start_ip_num, 1), /* 3, bottom left */
         POINT(@start_ip_num, -1) /* 0, back to start */))),
       country_id := @country_code,
       iso_country_code := @iso-country_code,
       country_name := @country_string;

4. Test queries

mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country WHERE INET_ATON('61.94.7.255') BETWEEN start_ip_address AND end_ip_address;
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (1.57 sec)

mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country_geoip WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('61.94.7.255'), 0)));
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (0.35 sec)

5. Perform A/B Test on ETL process.

I also selected a reasonable data file and ran through the process using old ip-to-country table for process A and new ip-to-country table for process B and QA’edfor the following:

  •  Check to make sure all Country IDs/Codes match exactly on both out put
  • Compare times
 Both processes produced the same country code lookups except the new table made the same process run much faster.

References:


Thursday, May 27, 2010

MYSQL MYISAM_RECOVER OPTIONS

MYSQL MYISAM_RECOVER OPTIONS

This variable sets the mode for automatic recovery when a MyISAM table crashed. The option value is any combination of the following values:
  •   DEFAULT --Recovery without backup, forcing, or quick checking. It is the same as not giving any option.
  •  BACKUP --If the data file was changed during recovery, save a backup of the tbl_name.MYD file as tbl_name-datetime.BAK.
  •  FORCE --Run recovery even if we would lose more than one row from the .MYD file.
  •  QUICK --Don't check the rows in the table if there aren't any delete blocks.
 Multiple values can be set at the same tiem. To use multiple values, separate them by commas.
   
    myisam_recover=backup,force

To disable this option, set it to “”.

    myisam-recover=””
  
If one or more myisam-recover option is used, each time mysqld opens a MyISAM table, it checks if or not the table is marked as crashed or wasn’t closed properly. If it is the case, mysqld runs a check on the table and attempts to repair it if the table was corrupted. Following are the steps which mysqld will perform according to the MySQL Manual:
  •  The server checks the table for errors.
  •  If the server finds an error, it tries to do a fast table repair (with sorting and without re-creating the data file).
  •  If the repair fails because of an error in the data file (for example, a duplicate-key error), the server tries again, this time re-creating the data file.
  •  If the repair still fails, the server tries once more with the old repair option method (write row by row without sorting). This method should be able to repair any type of error and has low disk space requirements.
 If the recovery wouldn't be able to recover all rows from previously completed statements and you didn't specify FORCE in the value of the --myisam-recover option, automatic repair aborts with an error message in the error log:

       Error: Couldn't repair table: test.g00pages

If FORCE was specified, a warning like this is written instead:

      Warning: Found 344 of 354 rows when repairing ./test/g00pages

Note that if the automatic recovery value includes BACKUP, the recovery process creates files with names of the form tbl_name-datetime.BAK. You should have a cron script that automatically moves these files from the database directories to backup media.

Myisam-recover is not a dynamic variable; hence, it requires a server restart to reset the variable.

References:
 Related Posts:


MYSQL MYISAM_MAX_SORT_FILE_SIZE Variable

MYSQL MYISAM_MAX_SORT_FILE_SIZE Variable

MYISAM_MAX_SORT_FILE_SIZE sets the maximum size of the temporary file that MySQL is allowed to use while re-creating a MyISAM index (during REPAIR TABLE, ALTER TABLE, or LOAD DATA INFILE). If the file size would be larger than this value, the index is created using the key cache instead, which is slower. The value is given in bytes and the default value is 2GB.

Set this values a bit larger thatn the largest index file if there is disk space is available to ensure that REPAIR TABLE is be done by sort rather than repair by key cache. Sorting is much faster than repair by key cache.

References:
 Related Posts:


Wednesday, May 26, 2010

MYSQL BULK_INSERT_BUFFER_SIZE Variable

MYSQL BULK_INSERT_BUFFER_SIZE SYSTEM VARIABLE

MyISAM uses a special tree-like cache to make bulk inserts faster for INSERT ... SELECT, INSERT ... VALUES (...), (...), ..., and LOAD DATA INFILE when adding data to nonempty tables. The BULK_INSERT_BUFFER_SIZE variable limits the size of the cache tree in bytes per thread. Setting it to 0 disables this optimization. The default value is 8MB. The maximum value is 4GB. The bulk_insert_buffer_size can be set both on system and session levels.

If data is being added to a non-empty table, tuning the bulk_insert_buffer_size variable can make data insertion faster. Normaily, it shows the improvement when the data to be inserted is more than 10k rows. But it is hard to say what is the right value, so, trail and try with incremental buffer size values.

Also, performance improvement will be more obvisoue if the following variables are also set:
  • MYISAM_SORT_BUFFER_SIZE
  • KEY_BUFFER_SIZE
 In addition, when doing bulk inset, consider disable the indexes before loading the data using the following alter table command:

    alter table t disable keys;

References:
 Related Posts:


MYSQL MYISAM_SORT_BUFFER_SIZE Variable

MYSQL MYISAM_SORT_BUFFER_SIZE Variable

 This is the The size of the buffer that is allocated when sorting MyISAM indexes during a REPAIR TABLE or when creating indexes with CREATE INDEX or ALTER TABLE. The maximum size is 4 GB.

This variable should be set as large as the largest index in the table for index builds if there is sufficient RAM and it is not over 4 GB.

This variable can be set as global variable or session variable.
 
MYSQL>set session MYISAM_SORT_BUFFER_SIZE=1073741824;
or
MYSQL>set global MYISAM_SORT_BUFFER_SIZE=1073741824;

Or, to set it in my.cnf file:
MYISAM_SORT_BUFFER_SIZE=1073741824

References:


Friday, May 21, 2010

MySQL: Optimize Table

MySQL Optimize Table Command

The Optimize table command defragments, reclaims free space and makes concurrent inserts wrk again. For MyISAM table, optimize table command performs the following tasks:
  • Repaire the table if the table has deleted or split rows. This, in trun, help defragment and reclaim free spaces.
  • Sort index pages if it is needed.
  • Update the table’s statistics.
 One way to find out if one should run optimize table on a table is to compare the data length and data free values from the show table status command.

  •   Data_length is the length of the data file.
  •   Data_free is the the number of allocated but unused byte. These included those deleted records.
 When a data_free value is relative large in comparing to date_length, optimize table should be performed agaist the table. Please see the following example:

    mysql> show table status like '%table_name%' \G;

*************************** 1. row ***************************

 Name: table_name
 Engine: MyISAM
 Version: 10
 Row_format: Dynamic
Rows: 1473839
Avg_row_length: 186
Data_length: 420392440
Max_data_length: 281474976710655
Index_length: 52712448
Data_free: 145156624
Auto_increment: NULL
 Create_time: 2010-05-18 18:35:16
 Update_time: 2010-05-20 13:44:47
 Check_time: 2010-05-20 02:21:54
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: delay_key_write=1
 Comment:

   mysql> optimize table table_name;

+--------------------------------+----------+----------+----------+
Table                                           Op             Msg_type  Msg_text
 +--------------------------------+----------+----------+----------+
 dwstgdb.table_name                   optimize      status          OK
 +--------------------------------+----------+----------+----------+
 1 row in set (50.08 sec)

   mysql> show table status like '%table_name%' \G;

*************************** 1. row ***************************
 Name: table_name
 Engine: MyISAM
 Version: 10
 Row_format: Dynamic
 Rows: 1473839
 Avg_row_length: 186
 Data_length: 275235816
 Max_data_length: 281474976710655
 Index_length: 30075904
 Data_free: 0
 Auto_increment: NULL
 Create_time: 2010-05-18 18:35:16
 Update_time: 2010-05-21 00:33:58
 Check_time: 2010-05-21 00:34:14
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: delay_key_write=1
 Comment:

Now we can see that the data_free value down to 0 from 145156624 after performing optimize table. This will indeed deframent the table and speep up the queries. In general, optimize table command should be performed on a reqular basis.

Referemces:



MySQL: Alter Table Disable Keys

MySQL Disable Indexes Command

When inserting,updating or deleting a high percenblege of the data from a table, it is a best pratce to disable indexes. Especaily, when doing a bulk load into a empty table.  To disable indexes, one can drop all the index keys and recreate them later. Or, we can use Alter Table table name Disable Keys to deactivate  indexes  on a MyISAM table.

To do that, before performing the sql statements that involve with insert, update or delete, issue the following command:

         mysql> alter table tablename disable keys;

Now perform the tasks; for example:

        mysql> insert into tablename(...) values( ...);

After the task has been completed, issue the following command to reactivate the indexes:

         mysql>alter table tablename enable keys;

ALTER TABLE tablename DISABLE KEYS tells MySQL to stop updating nonunique indexes. ALTER TABLE tablename ENABLE KEYS tells MySQL to re-create missing indexes. MySQL does this with a special algorithm that is much faster than inserting keys one by one, so disabling keys before performing bulk insert operations should give a considerable speedup. When using ALTER TABLE tablename DISABLE KEYS, please keep in mind the following:

  •  ALTER TABLE tablename DISABLE KEYS will keep UNIQUE keys active.
  • Using ALTER TABLE tablename DISABLE KEYS requires ALTER, INSERT, CREATE and the INDEX privilege for that table.
  • While the nonunique indexes are disabled, they are ignored for statements such as SELECT and EXPLAIN that otherwise would use them.
  • ENABLE KEYS and DISABLE KEYS were not supported for partitioned tables prior to MySQL 5.1.11.
 References:


MySQ: Delay_Key_Write

MySQL Alter Table t Delay_ Key_ Write Command

The delay_key_write option works with MyISAM tables. Delayed key writing tells storage engine not to wtire change index data to disk at the end of any query that changes the data. Instead, the changes are kept in the key buffer until the server prunes the buffer or close the table. This option can boots performance if the MyISAM table is changing frequenry, for example, has a lot of inserts and updates.

To turn on delay_key_write option for a table, you can include the following statement in to create table script:

     Delay_key_write =1

Or for existing table, use the alter table statement:

    MySQL> ALTER TABLE table name DELAY_KEY_WRITE = 1;

Please note that,in MySQL, Alter Table statement takes time to excute and locks the table. Therefore, for large tables, so do this with care.

There are a few thing need to keep in mind when using delay_key_write option as well. First, the changes, mainly, the index changes, will be kep in memory until the buffer cache prunes or the table is closed. A table is closed when:
  •   When the cache is full and a thread tries to open a table that is not in the cache.
  •  When the cache contains more than table_cache entries and a thread is no longer using a table.
  •  When FLUSH TABLES command is issued.
 This means that if the server or table is crashed, there will be indexes cruptions since the indexes on the index file (tablename.MYI) was not updated while the data file (tablename.MYD) was. Therefore, it is recomanded that the myisam-recover variable to be turned on during server startup. For example, set
 --myisam-recover=BACKUP,FORCE in my.cnf file or start the server with –myisam-recover option. Secondly, if the table has many write delays, it will table longer for the table to close since it needs time to flush all the caches to the disk, and it will take longer to run FLUSH TABLES statements(flush table, flush table with read lock) Hence, flush tables should be used a bit frequely if you use delay_key_write option. Lastly, queries may need to wait for the MyISAM engine to free up space in the key buffer cache because the unflushed dirty blocks in the key buffer cache take up space.

Delay_Key_write option can also be enable globally for all MyISAM tables. To do so, set the delay_key_write option to ‘all’:

   delay_key_write = all

However, it is not a commend pratics to most systems because for infrequently written tables, delay_key_write may increase the chance of index corruption.

 References:


Tuesday, May 11, 2010

MySQL: query_cache_size

MySQL Query Cache Variable: query_cache_size

The query cache in MySQL stores the text of SELECT statement as well as its result set. If an identical query is sent to the server later, the server retrieves the result set from the query cache instead parsing and exciting the query again. Therefore, it reduces time and resource that need to execute the query. However, please note that Query Cache only works for identical queries (exactly the same). For example, the following two queries are not considered identical:


Select city, state, zip from address where city = ‘new york’
Vs.
Select city, state, zip from address where city = ‘NEW YORK’

These two queries are also not considered identical:

Select city, state, zip from address where city = ‘NEW YORK’
Vs.
SELECT city, state, zip from address WHERE city= NEW YORK’

Because one use lower case clause word (select and where) and the other use upper case clause (SELECT and WHERE). Also, the cache is not used for queries of the following types:
· Queries that are a subquery of an outer query
· Queries executed within the body of a stored function, trigger, or event

(1.) Query Cache Variables

The following are server variables need to be set to make use of the query cache.

1. Have_query_cache: this variable indicates where or not query cache is available. The default is ‘Yes’ for standard installation.
2. Query_cahce_type: this variable sets the query cache type. It can be set in global and session level. The values for this variable can be 0,1 or 2:
  •  0 - tells the server not to cache results or retrieve results from the query cache. However, this setting dose not deallocate the query cache buffer. Refer to query_cache_size section for how to allocate the query cache buffer.
  •  1 – tells the server to cache all queries except for those that begin with select sql_no_cache. The default for Query_cahce_type is 1.
  •  2 – tells the server to cache query results only for queries that begin with select sql_cache.
3. Query_cache_size: this variable sets the amount of memory allocated for caching query result sets. To disable or deallocate the query cache, set this variable value to 0, which is also the default. Please note that memory are allocated for query cache regardless if Query_cahce_type is set to 0 or not if query_cache_size variable is set to 40960 bytes or more. If the value is set to less than 40960, the server will reset the query_cache_size variable to 0 and place a warning message in the error log.

4. query_cache_min_res_unit: this variable sets the minimum size (in bytes) for) blocks allocated by the query cache. The default value is 4KB. According to the MySQL Manual, this should be adequate for most case. The query cache allocates blocks for storing query results by allocating memory blocks set by the query_cache_min_res_unit variable. If there are many queries with small result sets, the default block size may lead to memory fragmentation; moreover, it can force the query cache to prune (delete) queries from the cache due to lack of memory. In this case, you should decrease the value of the query_cache_min_res_unit variable.

5. Query_cache_limit: this variable sets the maximum size result set that the query cache will store. The default is 1MB.

(2.) Query Cache Status Variables and Monitoring
  1. Qcache_free_memory: this status variable shows that amount of free memory for query cache. If the amount is low in comparing to the amount of the Query_cache_size, memory may be added to the Query_cache_size.
  2. Qcache_not_cached: this status displays the number of queries that were not cached. Not all the queries are cacheable such as a non-select query.
  3. Qchache_queries_in_cache: this status display the number of queries registered in the query cache.
  4. Qcache_total_blocks: this status displays the total number of blocks in the query cache.
  5. Qcache_hits: this status variable displays the number of query cache hits
  6. Qcache_inserts: this status variable displays the number of queries added to the query cache.
  7. Com_select: this status display number of times a select statement has been executed. If a query result is returned from query cache, the server increments the Qcache_hits status variable, not Com_select.
              Com_select = (Qcache_inserts + Qcache_not_cached + queries with errors found during the
                                       column-privileges check)
              The total number of SELECT queries = ( Com_select + Qcache_hits + queries with
                                                                                    errors found  by parser)
              The total number of successfully executed SELECT queries = ( Com_select + Qcache_hits )

              Query cache utilization = (Qcache_hits/ total number of successfully executed SELECT queries)
                          Or
               Query cache utilization = (Qcache_hits/( Com_select + Qcache_hits ))

           Please note that it all depends on the kind of applications, the % Query cache utilization will be
           different.

    8. Qcache_free_blocks: this variable display the number of free memory blocks in the query cache.
    9. Qcache_lowmem_prunes: this status displays the number of queries that were deleted from the query
        cache because of low memory. This status variable value also increases if the cache is not big enough to
        hold the query statements. Also, this variable increase if there were many small query results sets which
        indicates memory fragmentation. If the qcache_lowmem_prunes status variable is greater than 0,
        consider increasing the query_cache_size dynamically and monitor the Qcache_lowmem_prunes status
        variable to make sure that it stop increase.

 (3.) MySQL Commands
To determine if or not the query cache has become fragmented, you can execute the FLUSH QUERY CACHE statement to defragment the cache... After issuing the statement, only a single free block remains. This command dose not removes queries from the cache.

To remove all the queries from the cache, use the RESET QUERY CACHE command.

Some other considerations need to be given when setting up Query Cache. As the size of the query cache grows, it will take longer to scan for matching queries, take longer to retrieve the result sets, and increase mutex contention.

References:
Related Posts:



Tuesday, May 04, 2010

MySQL Key_Buffer_Size System Variable

MySQL MyISAM Key Cache: Key_Buffer_Size System Variable

Index blocks for MyISAM tables are buffered and are shared by all threads. key_buffer_size is the size of the buffer used for index blocks. The key buffer is also known as the key cache. For 32-bit system, the maximum allow amount is 4 GB. As of MySQL 5.0.52, values larger than 4GB are allowed for 64-bit platforms (except 64-bit Windows, for which large values are truncated to 4GB with a warning).


You can increase the value to get better index handling for all reads and multiple writes; on a system whose primary function is to run MySQL using the MyISAM storage engine, 25% of the machine's total memory is an acceptable value for this variable. However, you should be aware that, if you make the value too large (for example, more than 50% of the machine's total memory), your system might start to page and become extremely slow. This is because MySQL relies on the operating system to perform file system caching for data reads,so you must leave some room for the file system cache. You should also consider the memory requirements of any other storage engines that you may be usingin addition to MyISAM.

Please note that the key cache is not used if the key_buffer_size value is too small to allocate the minimal number of block buffers (8).

1.  Key_Buffer_Size Status Varaiables and Monitoring:

The following global status variables should be checked periodically:
 
Key_read_requests: the number of requests that have been executed to read a key block from the cache.
Key_reads: the number of physical reads of a key block from disk.

     Key cache hit (read miss) ratio = (1-(Key_reads/Key_read_requests)) * 100.

The key cache hit ratio represents the proportion of keys that are being read from the key cache in memory instead of from disk. This should normally be greater than 99% for optimum efficiency.

Key_write_requests: the number of requests to write a key block to the cache.
Key_writes: the number of actual writes, logical and physical key block, to disk.

      Kewrite miss ratio = (1-(Key_writes/Key_write_requests)) * 100

The key write ratio is usually near 100% if you are using mostly updates and deletes, but might be much smaller if you tend to do updates that affect many rows at the same time or if you are using the DELAY_KEY_WRITE table option.

Key_blocks_unused: the number of used blocks in the key cache. This value is a high-water mark that indicates the maximum number of blocks that have ever been in use at one time. The more blocks in the key cache, the less the server is using disk-based I/O to examine the index data, therefore, the value
of this variable should be high.
key_cache_block_size: the number of requests that have been executed to read key block from the cache.
The fraction of the key buffer in use can be determined using key_buffer_size in conjunction with the Key_blocks_unused status variable and the buffer block size, which is available from the key_cache_block_size system variable:

       1- ((Key_blocks_unused × key_cache_block_size) / key_buffer_size)

This value is an approximation because some space in the key buffer is being allocated internally for administrative structures.

Key_block_not_flushed: the number of key blocks in the key caches that have changed but have not yet been flushed to disk. These can be considered ‘dirty” key blocks in the key cache. The index will be corrupted during a crash and will need to be repaired before it can be used again. Therefore, this value should be near zero (0).

Adjust the key_buffer_size variable and monitor the key cache hit ratio. Please don’t forget to put the corresponding value of key_buffer_size in your my.cnf; hence, the variable is set properly when the server is restarted.


References:
Related Post:

Sunday, April 11, 2010

MySQL Individual Thread System Variables Setting And Tunning

MySQL Performance Tunning: Individual Thread Variables Setting And Tunning

MySQL use many factors to determine the amount of memory assigned to each

individual thread that is running. Following is a list of variables that can be applied to an individual thread:

· Sort_buffer_size
· Join_buffer_size
· Read_buffer_size
· Tmp_table_size
· Max_heap_table_size

1. The System Variables

Sort_buffer_size: this variable determines the amount of memory to allocate for sourt operations. Each thread that needs to do a sort will allocates a buffer from the variable value. If there is too many records to be sorted in the memory, then a disk based sort takes place instead, Which take longer to perform. To ensure the ORDER BY and GROUP BY operation performance, the sort_buffer_size can be increase.

Join_buffer_size: this variable determines the buffer size for joins that don’t use indexes (use full table scan). Normally, the best way to get fast joins is to add indexes; however, if this is not possible, increase the values of this variable to get a faster full join.

Read_buffer_size: this variable determines the size of the buffer used for catching row data during full table scans. Each thread that does a sequential scn allocates a buffer tof this size(in bytes) fro each table it scans. The default is set to 131072.

Tmp_table_size: this system variable determines the maximum size allowed for a temporary table staored in memory tables before it is convertrd to an on-dis MyISAM table. This variable should be increased if GROUP BY queries involving many rows are performed and the is enough memory to support them.

Max_heap_table_size: This variable sets the maximum size to which MEMORY tables are allowed to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existing MEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the global max_heap_table_size value.

2. Getting Server Status and Setting the variable values

Each of the status variables below should be monitored to determine the size of the sort_buffer_size variabl:
· Sort_merge_passes: this status variable displays the number of merge passes that the sort alforithm has had to perform. Increase the sort_buffer_size values if the status value is large.
· Sort_range: this status variable show the number of sorts that were done using ranges
· Sort_rows: this status show the numberof sourted rows
· Sort_scan: this status show number of sourts that were done table scan.

The following stauts measure the number of joins that did not using indexes:
· Select_full_join
· Select_full_range_join
· Select_range
· Select_range_check
· Select_scan
Please note that one join buffer is allocated for each full join between two tables; and, complex joins between several tables for which indexes are not used should be avoided at all costs.

If many sequential scans rae carried out, it is best to increase the read_buffer_size values. According to MySQL documentation, the normal size for this variable should set to ! MB to 2 MB, but can be set up to 8 MB.

The following status need to be taken into account when setting tmp_table_size variable value:

· Created_tmp_disk_tables: this status shows the number of temporary tables on disk cread automatically by the serv while queries were executed. If BLOB/TEXT fields are requested in the queries, MySQL will use disk based temporary tables automatically. We need to determine how many BLOB/TEXT fields reuested statements were ran on the server before deciding change the tmp_table_size variable value.
· Created_ tmp_files: this status shows then number of temporary files has created.
· Created_tmp_tables: this status shows the number of in-memory temporary tables created automatically by the server while excuting queries. If the ratio of created_tmp_disk_tables to created_tmp_tables is hight, consider increasing the tmp_table_size variable value. One or two creates per second is reasonable.

Please note that the max_heap_table_size variable places a maximum limit on tmp_table_size because memory storageengine is used for implicit in-memory temporary tables. For example, if tmp_table_size is 32M and max_heap_table_size is 16M, then the implicit MEMORY temporary tables will convert to on-disk MyISAM tables where they grow past 15M. Therefore, both these variable need to be considered when dealing with implicit temporary tables.

Also, it is best to have the individual session to set the values of the variables if only some queries need lare values and to avoid larger than necessary allocations for the queries which don't need large size.


References:
Related Posts:


Sunday, March 28, 2010

MySQL Server Connection System Variables Setting and Tuning

MySQL Performance Tunning: Server Connection Variables Setting and Tunning

The max_connections MySQL server configuration variables sets the maximum concurrent connections allow to the database server. If there is not enough connections for all the clients to connect, an error message, “too many connects”, is sent back the client and connection is refused. While the max_connections variable dictiated how many connections are allowed to connect to the server, all the variables list bellow should be configured and tuned to maximize the server connection performance.

  •  Max_Connections
  •  Max_user_connections
  •  Table_open_caches (table_cache)
  •  Open_files_limit
  •  thread_cache_size


First, as mentioned above, the connections variable sets the number of concurrent connections to the MySQL server at one time. The default value is 100 which is low for most systems Hower, The value should not be set to too hight because each connection require memory and file descriptor. On the other hand, setting a large number of connections benefit persistent connection because large number of connections allow persistent connection to remain opened rather than being close and reopen frequently. For linux system, the number is range from 500 to 1000. If there is no enough connections, the client will receive a “Too many connections” error message and will be refused to connected to the database server.

The max_user_connections variable sets the maximum number of simultaneous connections allowed to any given MySQL user account. The default is set to 0 which means “no limit.” This variable has a global value that can be set at server startup or runtime. It also has a read-only session value that indicates the effective simultaneous-connection limit that applies to the account associated with the current session. The session value is initialized as follows:
· If the user account has a nonzero MAX_USER_CONNECTIONS resource limit, the session max_user_connections value is set to that limit.
· Otherwise, the session max_user_connections value is set to the global value.

The table_open_cache (table_cache for MySql version before 5.1.3) sets the vaules for the number of open table for all threads that the server allows. The default values is 64. Increase this values will increase the number of file descriptors that MySql requires. It is important to make sure that the OS can handle the open file desriter set by the table_opne_cache variable. If the values of table_open_cache is set to too high a value, the MySql server may run out of file descriptots and refuse connections, fail to excute queries, and has unreliable behavior.

The open_file_limit variable sets the number of open file descriptors the operating system allows to open.

The thread_cache_size variable sets the amount of threads will be cached after a connection disconeects for the purpose of maintaining memory for new connection. When a client disconnects, the client's threads are put in the cache if there are fewer than thread_cache_size threads there. Requests for threads are satisfied by reusing threads taken from the cache if possible, and only when the cache is empty is a new thread created.

1.Setting Variables

To determind what value to be set for each variable, we should start by getting some statistics from the current system. If it is going to be a newly deploy system, we should do some scientifit extimation for each values and mornitoring the system for a few days and correct the statistics from the running system, then, change the configuration values as needed.

To check server status from mysql prompt do the following:
mysql>show status \G;

The about commond will display all the status variables. You can also choose to display a sub set of the status variable by using the ‘Like’ clause.

mysql> show global status like '%connection%';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| Connections | 76 |
| Max_used_connections | 5 |
+----------------------+-------+
2 rows in set (0.00 sec)

mysql> show global status like 'open%table%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Open_table_definitions | 77 |
| Open_tables | 4 |
| Opened_table_definitions | 73 |
| Opened_tables | 11531 |
+--------------------------+-------+
4 rows in set (0.02 sec)

To set server variables, use the SET GLOBAL variable_name=value command in the mysql prompt:

mysql> set global max_connections=300;
Query OK, 0 rows affected (0.00 sec)

Please remember to update the my.cnf file with the updated variable values so the server will start with the new variables in the next restart.

1.1 Getting Server Status

First, check the following server status variables are related to server connections.

· Connections
· Max_used_connections
· Open_tables
· Opened_tables
· Opened_table_definitions
· Open_files
· Threads_cached
· Threads_connected
· Threads_created
· Threads_running

Connections status show the number of connections attemts (successful or fail) to the MySql server since the server started.

Max_used_connections status show the maximum number of connections that have been used simultaneously since the server started. This maximum number this status variable will show is connections + 1. Therefore the connections system variable should be set to above this status number and increased when this status number increased. It is a good idead to set up an alert to monitor the Max_used_connections status variable and proactively change the connections system variable value when the status variable value is close to the connections value.

Open_tables status show the number of currently open tables. A single table open 10 times will add 10 to open_table status. Since mysql is a multi-threaded system, each running quering will open one or more tables, same or different. Truce, open_tables status values will much larger that the total physical number of table a database has. If the open_tables status return a large number, it is a sign that table_cache is not big enough.

Opened_tables status show the number of tables that have been opened. If opend_tables is big, it means that not all the tables are being cached; therefore, table_cache value need to be increased.

Opened_table_definitions status (MySql v5.1.24 or later) show the number of .frm files (table definitions) that have been cached.

Open_files status shows the number of open files descriptors that are currently active.

Threads_cache status shows the number of threads in the thread_cashe

Thread_connected status shows the number of currently opend connections.

Thread_created status shows the number of threads created to handle connections. If the Thread_created status values is big or close to the number of the thread_cache_size server variable, then we should consider to increase the thread_cache_size server variable value. It is good to evaluate the cache miss rate from time to time using the following formular:
Theads_created / Connections.

One way to check if the thread_cache_size is large enough is to gradutely increase it value until the thread_created status value increases increase at a low rate. Thread connection overhead is quite low on linux system.

Thread_running status shows the number if threads that are currently active and not sleeping.

Show Processlist command can also be used (with PROCESS privilege) to obtain the total running processes.

It is also worth to take a look the difference between a process and a thread and how MySQL utilizes threads. A processe has their own heap (memory space), whereas threads share this space. When a new process is created, the operating system has to create a new address space for it to execute a set of instructions with full control over various resources to complete the tasks. On the other hand, threads work under the address space given to its parent process and can change state without having to create a new process. MySQL is a single process multi-threads architecture (a thread-based server architecture) which allows the various executing threads (or lightweight processes) to access key shared resources.

If MySQL is running on Linux, we should also check the open file descriptors (sockets). To check current open file descriptors limits, use the following command:

#>cat /proc/sys/fs/file-max
8192

The above example show that the max open file discriptors is 8192.
We should also check open file limit per shell:
#> ulimit –aH

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 77824
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

In the above example, the open file limit is 1024.

1.2 Setting the values

The following server variables are related to connections which should be mornitored and tuned reqularly.

· Max_connections
· Max_user_connections
· Table_open_cache (table_cache for MySQL version prior to 5.1.3)
· Table_definition_cache
· Open_files_limit
· thread_cache_size

The max_connections variable values is default to 100. We should check the max_used_connections status often to reset the connections values when the max_connections status values is closed to the connections variable values.

The Max_user_connections variable values is default to 0 which means unlimit. Nomally, this variable should set to a few connections less than the max_connections. This way, you will always have a few connections reserve for the supper user, root. For example, if your max_connections was set to 500, and you have three application users for three applications to connect to the database, the total connections (max_user_conections) should set to less than 500, say 480.
The table_open_cache variable values is default to 64. We should monitoring the opend_tables status variable frequenry to see if the number are going up constanly. If the opened_tables status going up more than 1 per every 10 minutes on average, then we need to increase the table_open_cache variable values.

The table_definition_cache set the number of table definitions that can be stored in the definition cache. If you use a large number of tables, you can create a large table definition cache to speed up opening of tables. The table definition cache takes less space and does not use file descriptors, unlike the normal table cache. This variable was added in MySQL 5.1.3. The minimum and default values are 1 and 128 before MySQL 5.1.25. The minimum and default are both 256 as of MySQL 5.1.25.

The open_files_limit variable changes the number of file descriptors available to mysqld. You should try increasing the value of this option if mysqld gives you the error Too many open files and preven connections in the error log, it means that the requested number of file descriptors cannot be allocated; therefore, it is time to increase the open_files_limit variable.

The Thread_cache_size variable is default to 8. We should check the threads_created status variable to ensure that this status values is not increased in a high value constantly. Please note that thread_created will show a large numbers if the the thread_cache_size number was not set or set to a small value when the server started because very few threads were cahed. It is not nesccessary to set the thread_cache_size value to the same value as thread_created. We should start by setting the value of thread_cache_size to the values of threads_coonected or the total number of prcesses from show proccesslist command. And, continue monitor the system and increase the value only if the thread_created status value increase in a hight rate more.


In addition to setting the MySQL server variables, we should also set the linux open file descriptor limits to make sure the there is enough file descriptors for MySQL to use. To change the file-max values to 65535, do the following:

# echo "65535" > /proc/sys/fs/file-max

We should also add or change this new value in /etc/sysctl.conf so the server will boot up with this value if it ever need to reboot.
# Maximum number of open files permited
fs.file-max = 65535

To load new values from the sysctl.conf file, do the following:
% sysctl -p /etc/sysctl.conf

Also, we should incress this to the mysql user if you use mysql user to run your server (do it as root):

# vi /etc/security/limits.conf

Modify or add "nofile" (number of file) entries (if the server is run by root, replace mysql with a *.

mysql soft nofile 65535
mysql hard nofile 65535

And add the following to the /etc/pam.d/login file:

# vi /etc/pam.d/login
[Add the line]
session required /lib/security/pam_limits.so

2. Monitoring

We should continue monitoring the status variables, which were mentioned in section 7.2.2 above, to make sure all the server variables were set accurately. Again, performance changes over time. One set of the serser variables work well now will not guarantee will work well three months later. Past performance dose not guarantee future results, it is true for the stock market; it is true for server performance as well.

References:
Related Posts:
----------------------------------

Post Update History:
2010-03-28: Initial post. Draft from initial tuning note.
2010-05-06: Added Reference and Related Posts