Showing posts with label MySQL. Show all posts
Showing posts with label MySQL. Show all posts

Friday, June 15, 2012

Create Date Dimension Table in MySQL

Date Dimension Table

A date dimension is used to stored attributes of a calendar date. By assigning a day foreign key to a fact table makes it more efficient to generate date related query results. There many ways to define the table. More attributes can be added base on the data warehouse model and requirements. Here is a sample table created for MySQL database.

create table dim_day (
 day_key int not null auto_increment,   -- primay key.
 date datetime ,     -- date: 2008-08-18 00:00:00
 date_num int(8),    -- numeric value, in YYYYMMDD, 20080818
 day_num int (2),    -- numeric value, 18
 day_of_year int(4), -- the day of the year 
 day_of_week int(2), -- the day of the week
 day_of_week_name varchar(20), -- day of week name (Monday, Tuesday,etc)
 week_num int (2), --  week of the year 
 week_begin_date datetime,  -- week begin date
 week_end_date datetime, -- week end date
 last_week_begin_date datetime,  -- priore week begin date
 last_week_end_date datetime,   priore week end date
 last_2_week_begin_date  datetime,   -- priore two week begin date
 last_2_week_end_date datetime,  -- priore two ween end date
 month_num int (2) ,  -- month in number, ie. 12
 month_name varchar(20),  -- month in name, ie. December
 YEARMONTH_NUM int(6),  -- year and month in number, ie. 201212
 last_month_num int (2), -- priore month in number, ie. 11
 last_month_name varchar(20), -- priore month in name, November
 last_month_year int(4),  -- priore month in year, 2012
 last_yearmonth_num int(6), -- priore year and month in  number, ie, 2o1211
 quarter_num int (2),  -- quarter in number, ie 4
 year_num int (4), -- year in number, ie, 2012
 created_date timestamp not null  ,  -- date record was created
 updated_date timestamp not null , -- date record was updated
 primary key (day_key)
 );

Here is a porcedure use to populate the table in MySql. It takes two input parameners, start_date and end_date, for example, 1990-01-01 and 2099-12-31.

use dwdb;

drop procedure if exists sp_day_dim;

truncate table dim_day;

delimiter //

CREATE PROCEDURE sp_day_dim (in p_start_date datetime, p_end_date datetime)
BEGIN

  Declare StartDate datetime;
  Declare EndDate datetime;
  Declare RunDate datetime;

-- Set date variables

  Set StartDate = p_start_date; -- update this value to reflect the earliest date that you will use.
  Set EndDate = p_end_date; -- update this value to reflect the latest date that you will use.
  Set RunDate = StartDate;

-- Loop through each date and insert into DimTime table

WHILE RunDate <= EndDate DO

INSERT Into dim_day(
 date ,
 date_num,
 day_num ,
 Day_of_Year,
 Day_of_Week,
 Day_of_week_name,
 Week_num,
 week_begin_date,
 week_end_date,
 last_week_begin_date,
 last_week_end_date,
 last_2_week_begin_date,
 last_2_week_end_date,
 Month_num ,
 Month_Name,
 yearmonth_num,
 last_month_num,
 last_month_name,
 last_month_year,
 last_yearmonth_num,
 Quarter_num ,
 Year_num  ,
 created_date, 
 updated_date 
)
select 
RunDate date
,CONCAT(year(RunDate), lpad(MONTH(RunDate),2,'0'),lpad(day(RunDate),2,'0')) date_num
,day(RunDate) day_num
,DAYOFYEAR(RunDate) day_of_year
,DAYOFWEEK(RunDate) day_of_week
,DAYNAME(RunDate) day_of_week_name
,WEEK(RunDate) week_num
,DATE_ADD(RunDate, INTERVAL(1-DAYOFWEEK(RunDate)) DAY) week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL(7-DAYOFWEEK(RunDate)) DAY),'23:59:59') week_end_date
,DATE_ADD(RunDate, INTERVAL ((1-DAYOFWEEK(RunDate))-7) DAY) last_week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL ((7-DAYOFWEEK(RunDate))-7) DAY),'23:59:59')last_week_end_date
,DATE_ADD(RunDate, INTERVAL ((1-DAYOFWEEK(RunDate))-14) DAY) last_2_week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL ((7-DAYOFWEEK(RunDate))-7) DAY),'23:59:59')last_2_week_end_date
,MONTH(RunDate) month_num
,MONTHNAME(RunDate) month_name
,CONCAT(year(RunDate), lpad(MONTH(RunDate),2,'0')) YEARMONTH_NUM
,MONTH(date_add(RunDate,interval -1 month)) last_month_num
,MONTHNAME(date_add(RunDate,interval -1 month)) last_month_name
,year(date_add(RunDate,interval -1 month)) last_month_year
,CONCAT(year(date_add(RunDate,interval -1 month)),lpad(MONTH(date_add(RunDate,interval -1 month)),2,'0')) Last_YEARMONTH_NUM
,QUARTER(RunDate) quarter_num
,YEAR(RunDate) year_num
,now() created_date
,now() update_date
;
-- commit;
-- increase the value of the @date variable by 1 day

Set RunDate = ADDDATE(RunDate,1);

END WHILE;
commit;
END;
//

Reference:

Monday, November 21, 2011

LOAD SLOWLY CHANGING DIMENSION TYPE 2 USING MYSQL STORED PROCEDURE

LOAD SLOWLY CHANGING DIMENSION TYPE 2 USING MYSQL STORED PROCEDURE

Unlike Oracle (merge statement) or other databases, MySQL does not support “upsert” (perform update or insert in one sql statement). Therefore, it requires some coding to achieve both insert and update when loading a type 2 slowly changing dimension.

Here was my lab exercise  which I used a common dimension, product.

1. Tables.

I created the following tables during my lab exercise.

1.1 Dimension table: dim_product.

This was the product dimension table that we want to keep history of changes of production name, product description, product category and price.
Create table dim_product(
PRODUCT_SID int(12) NOT NULL auto_increment ,
PRODUCT_ID int(12) NOT NULL,
PRODUCT_CATEGORY_ID int(10) not null,
PRODUCT_NAME VARCHAR(50),
PRODUCT_DESCRIPTION VARCHAR(200),
PRICE DECIMAL(10,2),
PRICE_EFFECTIVE_DATE DATEtime,
LAST_MODIFIED_DATE DATEtime,
EFFECTIVE_START_DATE DATEtime,
EFFECTIVE_END_DATE DATEtime,
IS_CURRENT_ROW char(1),
CREATED_DATE DATEtime,
UPDATED_DATE DATEtime,
primary key (product_sid)
);

1.2 Staging table: stg_product.

This was the staging table that we will use to store new product records. Assuming production records will transfer to this table first before load into the product dimension.

create table pcdb.stg_product (
product_id int(12) not null,
product_category_id int(10) not null,
product_name varchar(50),
product_description varchar(200),
price decimal(8,2),
price_effective_date datetime,
last_modified_date datetime,
created_date datetime);

1.3 Tempary table: tmp_product.

This is a temporary table that will be used in the store procedure to process data.

create table tmp_product (
PRODUCT_SID int(12) ,
PRODUCT_ID int(12) NOT NULL,
PRODUCT_CATEGORY_ID int(10) not null,
PRODUCT_NAME VARCHAR(50),
PRODUCT_DESCRIPTION VARCHAR(200),
PRICE DECIMAL(10,2),
PRICE_EFFECTIVE_DATE DATEtime,
LAST_MODIFIED_DATE DATEtime,
created_date datetime,
scd_row_type_id tinyint(2)
);

2. The Stored Procedure

I created a stored procedure to accomplish the SCD type 2 dimension load in MySQL. Here was the simple store procedure.

use labdb
drop procedure if exists sp_dim_product;
delimiter //

CREATE PROCEDURE sp_dim_product()

BEGIN

/* Procedure name: sp_dim_prodcut
Description: user subscriptions 
SCD: Type 2.
Dependence: dim_user_product
Table size: large. keep track changes in product.
*/

Declare v_start_date datetime;
declare v_load_id int;
declare v_error_msg varchar(200);
declare v_proc_name varchar(36) default 'sp_dim_product'; 
declare v_validation tinyint;
declare v_max_data_date datetime;

DECLARE EXIT HANDLER FOR SQLEXCEPTION 
BEGIN 
set v_error_msg =concat('Procedure ',v_proc_name,' Encounter SQLEXCEPTION Error: Error code unknown');
SELECT v_error_msg; 
END;

truncate table tmp_product;

insert into tmp_product(
PRODUCT_SID ,
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
scd_row_type_id ,
)
select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;

insert into dim_product(
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
CREATED_DATE ,
UPDATED_DATE 
)
select PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
now() ,
now()
from tmp_product
where scd_row_type_id in (1,2)
;

commit;

END;
//

3 How it works

A slowly changing dimension type 2 (SCD2) is defined to keep track changes of attributes in the underlying source table. While there are many attributes in a table, the business rules decide what changes to track. Here, for demonstration purpose, I choose to keep track of changes of each of the following attributes: production name, product description, and product category and price.

In the procedure, I used left outer join on product id, between the dimension table (existing records) and the staging table (new and modified records). For the new records, the product_sid will be null because they were not in the dimention table and I used a scd_row_type_id=1 flag to indicate that they were new records and assigned each of them with a ‘0’ poduct_sid. For the existing records, there would be a product_sid for each record and I used a scd_row_type_id=2 flag to indicate that they were modified exiting records.

select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;

To identify the modified records, I used a case statement. By following the business rules, I compared each of the attributes to see if any of the attributes were modified. If it was, assigned a ‘2’ to the scd_row_type_id flag.

( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) )
 then 2

To indicate the current record, I created a column called IS_CURRENT_ROW and set it to ‘T’ it was a current row. For existing (history) records, I set it to the IS_CURRENT_ROW to ‘F’.

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;


At the end of the procedure, I inserted both the new and modified records which the t.scd_row_type_id was ‘1’ or ‘2’. Please note that records had not been modified, would had scd_row_type_id = 0 and would not be inserted into the dimention table.

insert into dim_product(
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
CREATED_DATE ,
UPDATED_DATE 
)
select PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
now() ,
now()
from tmp_product
where scd_row_type_id in (1,2)
;

To test the procedure, first run the following query to insert two records into the staging table:

truncate table stg_product;

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID, PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE, LAST_MODIFIED_DATE, CREATED_DATE)
values (1,8,'Apple iPad 36 GB','Apple iPad 36 GB with case', 800,now(),now(),now());

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID, PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE, LAST_MODIFIED_DATE,CREATED_DATE)
values (2,7,'Canon 12 MPIX Digital Camera','Canon 12 MPIX Digital Camera, Zoon Len, Case', 150,now(),now(),now());

commit;


Check the stage tables to make sure the records are there.

MySQL>select * from stg_product;

Now run the store procedure which was created in section 2. After running the procedure, run the following query to check the records in the dim_product table.

MySQL>select * from dim_product;

There should be two rows in the dimension table with the column effective_start_date, effective_end_date and is_current_row populated. Now, let’s find out how the merge statement handles SCD Type 2 records. To begin, run the following queries against the stage table:

update stg_product set price = 700 , price_effective_date = now(), last_modified_date = now() where product_id = 1;
insert into stg_product (
PRODUCT_ID, 
PRODUCT_CATEGORY_ID,
PRODUCT_NAME,
PRODUCT_DESCRIPTION,
PRICE,
PRICE_EFFECTIVE_DATE,
LAST_MODIFIED_DATE,
CREATED_DATE)
values (3,6,'EPSon Color Printer',
'EPSon color print with high definition photo prints', 200,now(),now(),now());
commit;


The above statements updated a record where product_id equal to 1 to change it price to 700, and inserted one more new record, product_id 3, into the stg_product table. To help understand what the records set looks like from the select statement that use the left outer join, let’s copy the query and execute it to get the record set.

select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;


As you can see, we have one rows for product_id 1, and one row for the new record, product_id 3. Also, at the end of the row, you can also see that the scd_record_type_id column has two ids for product id 1. One is 2 and the other one is 1.

Now, run the stored procedure in section 2 to load the data into the dimension table.

4. Summary

This purpose of this procedure is to show how to use outer join to maintain a Type 2 slow changing dimention in MySQL. This procedure dose not include all the necessary error checking routines. Also, there are other ‘real world’ situations need to be considered when implementing this procedure. For example, if there are large amount of records have been modified for any given time, there may be a performance issue when performing an update. Also, as the dimension table growth bigger, outer join will get slower. Therefore, other techniques need to be used to make the codes more efficient.


Reference:

Related Note:

Friday, July 22, 2011

Passing Parameters To A MYSQL Query Inside A Shell Script

Passing Parameter To A MySQL query Inside A Shell Script.

From time to time, I found myself need to run some adhoc queries to pull data. I found that it was more convenience to create a shell script that could accept the query files as variable and can pass variables to the query.

/* script:get_customer_record.sql */

Select c_id, c_first_name,c_last_name, c_address,
……
,last_modified_date
from customer
where last_modified_date >=@start_date and last_modified_date <= @end_date;

@start_date and @end_date are variables to be passed at run time.

Here is the wrapper shell script:

#!/bin/bash

## script name; mysql_script_runner.sh
## wrapper script to execute mysql script with variables

ARGS=4

if [ $# -ne "$ARGS" ]
then 

 echo "you passed $# parameters"
 echo "Usage: `basename $0` sql_script_file start_date end_date output_file"

exit 
fi 
sql_script=$1 
start_date=$2 
end_date=$3 
output_file=$4 

#run mysql query with paramenters 

/usr/bin/mysql –uuser_id -ppassword –h mysql-host -A -e "set @start_date=${start_date}; set @end_date=${end_date}; source ${sql_script};" >${data_file};

exit

# end of script.


To execute the script from command line, I can do something like this:

 # mysql_script_runner.sh get_customer_record.sql ‘2011-06-01’ ‘2011-06-12 23:59:59’ cust_rec.dat

I can also set it up to run as a cron job with some small changes.

References:


Related Note:

Tuesday, October 26, 2010

MYSQL SPEEPUP INSERT INTO A MYISAM TABLE

Accordint to the MySQL Manual, there are a few ways to speedup inserts while adding data to a MyISAM table. Following are the original document from MySQL manual plus the notes I added while I was experiencing with real data loads.

1. Background Information
 
The time required for inserting a row, according to the MySQL manual, is determined by the following factors, where the numbers indicate approximate proportions:
· Cnnecting: (3)
· There are many server variables that can be used to help optimize the user connections as stated in the post:
· Sending query to server: (2)
· Parsing query: (2)
· Inserting row: (1 × size of row)
· Inserting indexes: (1 × number of indexes)
· Closing: (1)

There are two additional things which need to be considered regarding overheard for inserts:
  • This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query. To try to reduce the times need to open table, try to cache them. To cache the table, set the table_cache variable. Please note that the table_cache, max_connections, and max_tmp_tables system variables affect the maximum number of files the server keeps open. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. Please refer to this note for connection related variable setting.
  • The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

2. Methods Which Can Be Used to speed up Inserts 
These are the methods which can be used to speed up inserts. You will notice that most of the methods mentions here are related to MySQL MyISAM tables.

(1.) If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);

Instead of:

   Insert into t1 (a,b,c,d) values (1,2,3,’2010-01-01’);
   Insert into t1 (a,b,c,d) values (3,5,6,’2010-02-01’);
   Insert into t1 (a,b,c,d) values (7,8,9,’2010-03-01’);

(2.) If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

(3.) If multiple clients are inserting a lot of rows, you can get higher speed by using the INSERT DELAYED statement. “Insert Delay” hint delays the inserts by storing the insert statements in the memory queue utile the target table is free from reads. Hence, the insert statement will return quickly and all the insert statements that are stored in the queue can be written in one block, which is more optimized that written one statement at a time.



Hear is the syntax for insert delay:

Insert delay into t1 (a,b,c,d) values (1,2,3,’2010-01-01’), (3,5,6,’2010-02-01’), (7,8,9,’2010-03-01’);
See Section 12.2.5.2, INSERT
Please note that delay insert only works with MyISAM, MEMORY, ARCHIVE and BLACKHOLE(v. 5.1.19) table. And, there is downside of using “Insert Delayed”. There will be negative performance gain if the table is empty or not being used and add overhead to the server to hold additional thread for each table for which there is delayed rows, etc. Please read the manual for details on the negative impacts.
Theses are system status variables can be used to monitor delay insert uses. Here are the Status Variable meaning:
  • Delayed_insert_threads -- Number of handler threads
  • Delayed_writes -- Number of rows written with INSERT DELAYED
  • Not_flushed_delayed_rows -- Number of rows waiting to be written

 (4.) For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, this works well for tables that have more reads than writes because it just keeps adding data at the end of the data file.
To enable concurrent inserts, we need to set the concurrent_insert system variable. The values for the variable are:

· 0 -- Disables concurrent inserts
· 1 -- Enables concurrent insert for MyISAM tables that do not have holes. A hole means that there is free space in the data file, i.e., rows were deleted from the table. This is the default value.
· 2 -- Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole,  new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole.

To set the variable with other values (0 or 2) rather than the default value, for example concurrent_insert = 2, enter the parameter in the my.cnf file and restart the server. Or you can also dynamically set it in MySQL, for example:
   MySQL> set global concurrent_insert = 2;

Here are a few things that need to keep in mind.

1. ) Under circumstances where concurrent inserts can be used, there is seldom any need to use the DELAYED modifier for INSERT statements.
2. ) If you are using the binary log, concurrent inserts are converted to normal inserts for CREATE ... SELECT or INSERT ... SELECT statements. This is done to ensure that you can re-create an exact copy of your tables by applying the log during a backup operation.
3. ) With LOAD DATA INFILE, if you specify CONCURRENT with a MyISAM table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other sessions can retrieve data from the table while LOAD DATA is executing.
4)Use of the CONCURRENT option affects the performance of LOAD DATA a bit, even if no other session is using the table at the same time.
MySQL document has more information on using concurrent inserts.
(5.) Using bulk load command, LOAD DATA INFILE, to load data into a table from a text file can be 20 times faster than using INSERT statements.

Updating indexes takes time and resources during inserts. When data are loaded into a table that has many indexes, we can follow the steps bellow to speed up the loads.

1.) Disable the indexes
Non unique indexes of a MyISAM table can be disabled using the following statement:
MySQL> ALTER TABLE t1 DISABLE KEYS;
The alter table disable keys statement also dose a flush table.
2.) Load data into the table using LOAD DATA INFILE statement. This does not update any indexes and therefore is very fast.
3.) Enable the indexes
Indexes can be enabled using the following statement:

MySQL> ALTER TABLE tbl_name ENABLE KEYS;

Here is a note on MySQL Disable and Enable Indexes Command.

http://databobjr.blogspot.com/2010/05/mysql-alter-table-disable-keys.html

You can also use myisamchk --keys-used=0 -rq /path/to/db/tbl_name to disable index keys and use myisamchk -rq /path/to/db/tbl_name to enable index keys. However, if you used myisamchk to disable or enable index keys, you must issue Flush Tables statement or a mysqladmin flush-tables command. More information can be found here: http://dev.mysql.com/doc/refman/5.0/en/myisamchk.html
.
(6.) To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable.

2. SUMMARY

The methodologies mentioned above were not THE only methods that could speed up inserts nor should they be used exclusively in that orders. It requires a lot of trails and errors to have one that works for a particular environment.

References:



Thursday, October 21, 2010

MySQL Thread States Repair By Sorting Vs. Repair With Key Cache

When perform any of the REPAIR, ALTER or LOAD or ENABLE KEYS command on a MyISAM table, it also causes the rebuilding of the indexes. Indexes are rebuilt (repaired) in two modes:
  •  Repair by sorting -- The repair code is using a sort to create indexes.
  •  Repair using keycache -- The repair code is using creating keys one by one through the key cache. This is much slower than Repair by sorting.
To start repairing, first it tests if the table can be repaired by sorting provided it meets the following requirements:
  •   Table at least has one key 
  •   Total size needed for individual key is less than myisam_max_sort_file_size
If it meets the above requirements, then it uses either regular sorting if myisam_repair_threads = 1 (default) by building each key at a time or in parallel if myisam_repair_threads > 1 by using ‘n’ threads in parallel (n = total keys in the table).
 
If it fails to satisfy the above conditions, then it falls to expensive keycache repair mode.

Therefore, to avoid keycache repair, set the myisam_max_sort_file_size to the max index size.

 Related Note:
References:




Friday, July 30, 2010

IP To Country Lookup Table Using MySQL Spatial Index

I have an IP to Country lookup table that is used by various applications. The table is very simple:

CREATE TABLE `lu_ip_to_country` (
     id INT UNSIGNED NOT NULL auto_increment,
     start_ip_num int(10) UNSIGNED NOT NULL,
     end_ip_num int(10) UNSIGNED NOT NULL,
     country_id char(2) NOT NULL,
     iso_country_code varchar(3) DEFAULT NULL,
     country_name varchar(100) DEFAULT NULL,
     PRIMARY KEY (id),
     key start_end_ip (start_ip_num,end_ip_num)
);
  •  Start_ip_num: Beginning IP Number
  •  End_ip_num: Ending IP Number
  •  Country_id: two character country abbreviations (US, BG, FR, etc.) similar to ISO code
  •  ISO_country_code: ISO 3166 Country Code
  •  Country_name: Country Name

Beginning IP Number and Ending IP Number are the number version of the IPs which are calculated as follows:
       Ip number = 16777216*w + 65536*x + 256*y + z
Where w, x, y, z represent the IP address (w, x,y,z). For example: if ip is 61.94.7.255,then w=61,x=94,y=7 and z=255.

 The queries that use to lookup country information are simple as well:
      Select country_id from lu_ip_to_country
      Where v_ip_num_lookup between start_ip_num and end_ip_num

In this query, the v_ip_num_lookup is a variable used to store the calculated ip number. For example, '61.94.7.255’=1029572607.

The other query utilized the MySQL INET_ATON() function, which convert the ip to ip number. i.e. INET_ATON('61.94.7.255') = 1029572607. The query is the following:

   Select country_id from lu_ip_to_country
   Where INET_ATON('61.94.7.255') between start_ip_num and end_ip_num

These tow queries worked fine but slow. It will take up-to 1.5 second to return the result. And they were even slower when there were concurrent requests to this table. I had tried different indexes, and even tried to use memory table, but the improvements were not notable.

The other week, while I was searching the web, I came across an article, which was written by Jeremy Cole, described how to use MySQL GIS with spatial R-tree indexes for similar ip-to-country lookup in MySQL.I decided to give it a try. And, it turn out that the result was excellence.

Jeremy Cole’s article gave a vary detail explanation on how GIS with spatial R-tree indexes work in this case. Here are the steps on how I implemented and tested it (following what were described in the article).

1. Reconstruct the table to using POLYGON type and spatial index

CREATE TABLE lu_ip_to_country_geoip (
     id INT UNSIGNED NOT NULL auto_increment,
     ip_poly POLYGON NOT NULL,
     start_ip_num int(10) UNSIGNED NOT NULL,
     end_ip_num int(10) UNSIGNED NOT NULL,
     country_id varchar(3) NOT NULL,
     iso_country_code varchchar(3) DEFAULT NULL,
    country_name varchar(100) DEFAULT NULL,
    PRIMARY KEY (id),
    SPATIAL INDEX (ip_poly)
);

2. Export the data from the existing table
     Select start_ip_num,
                end_ip_num,
                country_id,
               iso_country_code,
              country_name
             Into outfile ‘/tmp/ip-country-data.dat’
     FIELDS
            TERMINATED BY ","
              ENCLOSED BY "\""
     LINES
           TERMINATED BY "\n"
      FROM lu_ip_to_country;

If you don’t have the data in the first place, you can download them from MaxMind web site (http://www.maxmind.com/app/country) as mentioned in the article.

3. Load the data into the new table with the MySQL GIS function to build the POLYGON for ip_poly field from the start_ip_num and end_ip_num fields.

    LOAD DATA LOCAL INFILE "/tmp/ip-country-data.dat"
         INTO TABLE lu_ip_to_country_geoip
         FIELDS
            TERMINATED BY ","
             ENCLOSED BY "\""
         LINES
              TERMINATED BY "\n"
           (
               @start_ip_num, @end_ip_num,
                @country_id,@iso_country_code, @country_name
             )
       SET
            id := NULL,
            start_ip_num := @start_ip_num,
           end_ip_num := @end_ip_num,
           ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
          /* clockwise, 4 points and back to 0 */
          POINT(@start_ip_num, -1), /* 0, top left */
          POINT(@end_ip_num, -1), /* 1, top right */
          POINT(@end_ip_num, 1), /* 2, bottom right */
          POINT(@start_ip_num, 1), /* 3, bottom left */
         POINT(@start_ip_num, -1) /* 0, back to start */))),
       country_id := @country_code,
       iso_country_code := @iso-country_code,
       country_name := @country_string;

4. Test queries

mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country WHERE INET_ATON('61.94.7.255') BETWEEN start_ip_address AND end_ip_address;
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (1.57 sec)

mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country_geoip WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('61.94.7.255'), 0)));
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (0.35 sec)

5. Perform A/B Test on ETL process.

I also selected a reasonable data file and ran through the process using old ip-to-country table for process A and new ip-to-country table for process B and QA’edfor the following:

  •  Check to make sure all Country IDs/Codes match exactly on both out put
  • Compare times
 Both processes produced the same country code lookups except the new table made the same process run much faster.

References:


Thursday, June 03, 2010

MYSQL UPGRADE 5.0 TO 5.1

UPGRADE MYSQL 5.0.45 TO ENTERPRISE 5.1.40

I did a MySQL upgrade from 5.0.45 to 5.1.40 on two  servers. Before starting the actual upgrade, I spent some time reading manuals and web postings. It turned out that the times were well spent. The upgraded version was Enterprise 5.1.40. However, the upgrade processes should be the same if it was upgraded to the same Community version.

These are some information on existing software:

· Server version: 5.0.45-log Source distribution
· OS: Redhat Linux 4. (2.6.18-92.el5), 64 bits.

Note that the existing installation was from the packages that were bundled with Redhat Linux packages.

Following were the steps I took to do the upgrade. I also included here all the issues that I had encountered in each step.

1. Obtain a full database dump (backup) and made a backup copy of the my.cnf file.

These two things are, no matter it was involving an upgrade or other major server works, important to do and should not skip.

#>mysqldump –u root –p dbs >dbs-dump.sql
#>cp /etc/my.cnf /tmp/my.cnf

2. Stop MySQL

#>./etc/init.d/mysqld stop

3. Upgrade (Install) MySQL Server

#> rpm -Uvh MySQL-server-advanced-gpl-5.1.40sp1-0.rhel4.x86_64.rpm

Here I got my first error:

   error: Failed dependencies:
   libmysqlclient.so.15()(64bit) is needed by (installed) perl-DBD-MySQL-3.0007-1.fc6.x86_64
  libmysqlclient.so.15(libmysqlclient_15)(64bit) is needed by (installed) perl-DBD-MySQL-3.0007- 1.fc6.x86_64

Initially, I thought this error was due to some compatibility issues, so I decided to go ahead and to install the shared libraries (shared-compat package) first:

 #> rpm -Uvh MySQL-shared-compat-advanced-gpl-5.1.40sp1-0.rhel4.x86_64.rpm

The package was installed without any error. I then tried to install the server package again.

#> rpm -Uvh MySQL-server-advanced-gpl-5.1.40sp1-0.rhel4.x86_64.rpm

This time, the server installation started without the dependency problems, but it gave me more warning and error messages:

Preparing... ########################################### [100%]
******************************************************************
A MySQL server package (mysql-server-5.0.45-7.el5) is installed.

The current MySQL server package is provided by a different vendor (Red Hat, Inc.) than MySQL AB or Sun Microsystems, Inc..

Some files may be installed to different locations, including log files and the service startup script in /etc/init.d/.
Upgrading directly from MySQL 5.0 to MySQL 5.1 may notbe safe in all cases. A manual dump and restore using mysqldump is recommended. It is important to review the MySQL manual's Upgrading
section for version-specific incompatibilities.

A manual upgrade is required.

- Ensure that you have a complete, working backup of your data and my.cnf files
- Shut down the MySQL server cleanly
- Remove the existing MySQL packages. Usually this command will list the packages you should remove:
rpm -qa grep -i '^mysql-'

You may choose to use 'rpm --nodeps -ev ' to remove the package which contains the mysqlclient shared library. The library will be reinstalled by the MySQL-shared-compat package.

- Install the new MySQL packages supplied by Sun Microsystems, Inc.
- Ensure that the MySQL server is started
- Run the 'mysql_upgrade' program

This is a brief description of the upgrade process. Important details can be found in the MySQL manual, in the Upgrading section.
******************************************************************
error: %pre(MySQL-server-advanced-gpl-5.1.40sp1-0.rhel4.x86_64) scriptlet failed, exit status 1
error: install: %pre scriptlet failed (2), skipping MySQL-server-advanced-gpl-5.1.40sp1-0.rhel4

Well, it turned out, there was no such thing as “Upgrade”. It required that we removed the existing packages, installed all the new packages, and then, run the upgrade script to upgrade the existing databases.

Following the instructions, I checked what was installed in the server:

#> rpm -qa grep -i '^mysql-'

mysql-5.0.45-7.el5
MySQL-shared-compat-advanced-gpl-5.1.40sp1-0.rhel4
mysql-server-5.0.45-7.el5

Since I just installed the MySQL-shared-compat-advanced-gpl-5.1.40sp1-0.rhel4 package, all I needed to remove were the following packages:

· mysql-5.0.45-7.el5
· mysql-server-5.0.45-7.el5

#> rpm -e mysql-server-5.0.45-7.el5.x86_64
warning: /var/log/mysqld.log saved as /var/log/mysqld.log.rpmsave

# > rpm -e mysql-5.0.45-7.el5
warning: /etc/my.cnf saved as /etc/my.cnf.rpmsave

Now, I was ready to install the new packages again:

# > rpm -Uvh MySQL-server-advanced-gpl-5.1.40sp1-0.rhel4.x86_64.rpm

The installation was successful with the following messages:

Preparing... ########################################### [100%]
1:MySQL-server-advanced-g########################################### [100%]

PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !
To do so, start the server, then issue the following commands:
/usr/bin/mysqladmin -u root password 'new-password'
/usr/bin/mysqladmin -u root -h mysql.domain.com password 'new-password'

Alternatively you can run:
/usr/bin/mysql_secure_installation
which will also give you the option of removing the test databases and anonymous user created by default. This is strongly recommended for production servers.

See the manual for more instructions.
Please report any problems with the /usr/bin/mysqlbug script!
he latest information about MySQL is available at http://www.mysql.com/
Support MySQL by buying support/licenses from http://shop.mysql.com/

Starting MySQL.[ OK ]
Giving mysqld 2 seconds to start

By checking the log and using ps –ef | grep mysql, I confirmed that the server was started.

4. Installed the client package.

Since the server was running, I wanted to install the client package so I could connect to the server.

#> rpm -Uvh MySQL-client-advanced-gpl-5.1.40sp1-0.rhel4.x86_64.rpm

No problem installing the client. So, I tried to connect to the server and it worked.

#>mysql -u root

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.40sp1-enterprise-gpl-advanced MySQL Enterprise Server - Advanced Edition (GPL)
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;

+--------------------+
Database
+--------------------+
information_schema
ysql
test
+--------------------+
3 rows in set (0.00 sec)

So far so good.

5. Installed other packages.

There were other packages, such as share libraries, embedded MySQL server library and devel libraries and include files (MySQL-devel package),etc, that needed to install in my case . I installed all of them without any problem.

6. Restarted the server with my.cnf file.

Since the server were started without the my.cnf file, I needed to stop the server, copy the my.cnf back to the /etc/ directory (see step 1) and restarted the server. While executing the rpm –e command to remove the old packages, the my.cnf file was renamed to my.cnf.rpmsave (see step 3)

#>./etc/init.d/mysql stop
#>cp /tmp/my.cnf /etc/my.cnf
#>./etc/init.d/mysql start

Checked the error log, I noticed the following error messages:

100602 13:05:34 [ERROR] Column count of mysql.db is wrong. Expected 22, found 20. The table is probably corrupted
100602 13:05:34 [ERROR] mysql.user has no `Event_priv` column at position 29
100602 13:05:34 [ERROR] Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
100602 13:05:34 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.40sp1-enterprise-gpl-advanced-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Enterprise Server - Advanced Edition (GPL)
100602 13:44:57 [Warning] Statement may not be safe to log in statement format. Statement: ALTER TABLE general_log MODIFY COLUMN server_id INTEGER UNSIGNED NOT NULL
100602 13:44:57 [Warning] Statement may not be safe to log in statement format. Statement: ALTER TABLE slow_log MODIFY COLUMN server_id INTEGER UNSIGNED NOT NULL

These messages were expected because I had not run the mysql_upgrade script.

7. Executed the mysql_upgrade script

The script, in my case, was located in /usr/bin.

#>./usr/bin/mysql_upgrade

Again, it was a smooth run.

8. Loaded the database dump file

At this point, I could load the database back to the server.

#>mysql –u root –p < dbs-dump.sql

There was no error or warning message.

9. Restarted the server

Here, just to restart the server one more time to make sure there was no error message.

#>/etc/init.d/mysql stop
#>/etc/init.d/mysql start

Checked the error log and there was not error. I also ran some queries and connected an application to the database server.  No error as well.  So, Server was upgraded.

Summary: It is very important to read the MySQL manual, other articles such as those that listed under References, and search the web for any new information before doing the actual upgrade. Also, if it is possible, try it out in a development server first before doing it in a production environment. There are more than one way to do the upgrade, but each has its own upside and downside. I did the above upgrade on two QA servers and there was very little time pressure. One thing to keep in mind is that doing mysql dump and load takes time depends on the size of the database; therefore, a detail plan is need for large production system upgrade. There is other method, such as replication, that can be used to reduce times. It is also recommended that we should check for performance issues and query consistent issues after the upgrade. In some cases, mysql_upgrade may not run as smoothly as it was in my case. The MySQL manual also states some known bugs and issues which are worth to check out against the existing installation to make sure if or not they apply to the existing installation and there is a plan to due with them if they do.

Following are the summary of steps for the upgrade:

1. Backup databases and my.cnf file
2. Stop MySQL server
3. Remove existing MySQL packages
4. Install MySQL 5.1 server package
5. Install MySQL 5.1 client and other MySQL 5.1 packages
6. Restart MySQL with my.cnf file
7. Execute mysql_upgrade script
8. Load database backup to MySQL server (v.5.1)
9. Restart server and check error log.

References:

Related Posts:


Thursday, May 27, 2010

MYSQL MYISAM_RECOVER OPTIONS

MYSQL MYISAM_RECOVER OPTIONS

This variable sets the mode for automatic recovery when a MyISAM table crashed. The option value is any combination of the following values:
  •   DEFAULT --Recovery without backup, forcing, or quick checking. It is the same as not giving any option.
  •  BACKUP --If the data file was changed during recovery, save a backup of the tbl_name.MYD file as tbl_name-datetime.BAK.
  •  FORCE --Run recovery even if we would lose more than one row from the .MYD file.
  •  QUICK --Don't check the rows in the table if there aren't any delete blocks.
 Multiple values can be set at the same tiem. To use multiple values, separate them by commas.
   
    myisam_recover=backup,force

To disable this option, set it to “”.

    myisam-recover=””
  
If one or more myisam-recover option is used, each time mysqld opens a MyISAM table, it checks if or not the table is marked as crashed or wasn’t closed properly. If it is the case, mysqld runs a check on the table and attempts to repair it if the table was corrupted. Following are the steps which mysqld will perform according to the MySQL Manual:
  •  The server checks the table for errors.
  •  If the server finds an error, it tries to do a fast table repair (with sorting and without re-creating the data file).
  •  If the repair fails because of an error in the data file (for example, a duplicate-key error), the server tries again, this time re-creating the data file.
  •  If the repair still fails, the server tries once more with the old repair option method (write row by row without sorting). This method should be able to repair any type of error and has low disk space requirements.
 If the recovery wouldn't be able to recover all rows from previously completed statements and you didn't specify FORCE in the value of the --myisam-recover option, automatic repair aborts with an error message in the error log:

       Error: Couldn't repair table: test.g00pages

If FORCE was specified, a warning like this is written instead:

      Warning: Found 344 of 354 rows when repairing ./test/g00pages

Note that if the automatic recovery value includes BACKUP, the recovery process creates files with names of the form tbl_name-datetime.BAK. You should have a cron script that automatically moves these files from the database directories to backup media.

Myisam-recover is not a dynamic variable; hence, it requires a server restart to reset the variable.

References:
 Related Posts:


MYSQL MYISAM_MAX_SORT_FILE_SIZE Variable

MYSQL MYISAM_MAX_SORT_FILE_SIZE Variable

MYISAM_MAX_SORT_FILE_SIZE sets the maximum size of the temporary file that MySQL is allowed to use while re-creating a MyISAM index (during REPAIR TABLE, ALTER TABLE, or LOAD DATA INFILE). If the file size would be larger than this value, the index is created using the key cache instead, which is slower. The value is given in bytes and the default value is 2GB.

Set this values a bit larger thatn the largest index file if there is disk space is available to ensure that REPAIR TABLE is be done by sort rather than repair by key cache. Sorting is much faster than repair by key cache.

References:
 Related Posts:


Wednesday, May 26, 2010

MYSQL BULK_INSERT_BUFFER_SIZE Variable

MYSQL BULK_INSERT_BUFFER_SIZE SYSTEM VARIABLE

MyISAM uses a special tree-like cache to make bulk inserts faster for INSERT ... SELECT, INSERT ... VALUES (...), (...), ..., and LOAD DATA INFILE when adding data to nonempty tables. The BULK_INSERT_BUFFER_SIZE variable limits the size of the cache tree in bytes per thread. Setting it to 0 disables this optimization. The default value is 8MB. The maximum value is 4GB. The bulk_insert_buffer_size can be set both on system and session levels.

If data is being added to a non-empty table, tuning the bulk_insert_buffer_size variable can make data insertion faster. Normaily, it shows the improvement when the data to be inserted is more than 10k rows. But it is hard to say what is the right value, so, trail and try with incremental buffer size values.

Also, performance improvement will be more obvisoue if the following variables are also set:
  • MYISAM_SORT_BUFFER_SIZE
  • KEY_BUFFER_SIZE
 In addition, when doing bulk inset, consider disable the indexes before loading the data using the following alter table command:

    alter table t disable keys;

References:
 Related Posts:


MYSQL MYISAM_SORT_BUFFER_SIZE Variable

MYSQL MYISAM_SORT_BUFFER_SIZE Variable

 This is the The size of the buffer that is allocated when sorting MyISAM indexes during a REPAIR TABLE or when creating indexes with CREATE INDEX or ALTER TABLE. The maximum size is 4 GB.

This variable should be set as large as the largest index in the table for index builds if there is sufficient RAM and it is not over 4 GB.

This variable can be set as global variable or session variable.
 
MYSQL>set session MYISAM_SORT_BUFFER_SIZE=1073741824;
or
MYSQL>set global MYISAM_SORT_BUFFER_SIZE=1073741824;

Or, to set it in my.cnf file:
MYISAM_SORT_BUFFER_SIZE=1073741824

References:


Friday, May 21, 2010

MySQL: Optimize Table

MySQL Optimize Table Command

The Optimize table command defragments, reclaims free space and makes concurrent inserts wrk again. For MyISAM table, optimize table command performs the following tasks:
  • Repaire the table if the table has deleted or split rows. This, in trun, help defragment and reclaim free spaces.
  • Sort index pages if it is needed.
  • Update the table’s statistics.
 One way to find out if one should run optimize table on a table is to compare the data length and data free values from the show table status command.

  •   Data_length is the length of the data file.
  •   Data_free is the the number of allocated but unused byte. These included those deleted records.
 When a data_free value is relative large in comparing to date_length, optimize table should be performed agaist the table. Please see the following example:

    mysql> show table status like '%table_name%' \G;

*************************** 1. row ***************************

 Name: table_name
 Engine: MyISAM
 Version: 10
 Row_format: Dynamic
Rows: 1473839
Avg_row_length: 186
Data_length: 420392440
Max_data_length: 281474976710655
Index_length: 52712448
Data_free: 145156624
Auto_increment: NULL
 Create_time: 2010-05-18 18:35:16
 Update_time: 2010-05-20 13:44:47
 Check_time: 2010-05-20 02:21:54
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: delay_key_write=1
 Comment:

   mysql> optimize table table_name;

+--------------------------------+----------+----------+----------+
Table                                           Op             Msg_type  Msg_text
 +--------------------------------+----------+----------+----------+
 dwstgdb.table_name                   optimize      status          OK
 +--------------------------------+----------+----------+----------+
 1 row in set (50.08 sec)

   mysql> show table status like '%table_name%' \G;

*************************** 1. row ***************************
 Name: table_name
 Engine: MyISAM
 Version: 10
 Row_format: Dynamic
 Rows: 1473839
 Avg_row_length: 186
 Data_length: 275235816
 Max_data_length: 281474976710655
 Index_length: 30075904
 Data_free: 0
 Auto_increment: NULL
 Create_time: 2010-05-18 18:35:16
 Update_time: 2010-05-21 00:33:58
 Check_time: 2010-05-21 00:34:14
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: delay_key_write=1
 Comment:

Now we can see that the data_free value down to 0 from 145156624 after performing optimize table. This will indeed deframent the table and speep up the queries. In general, optimize table command should be performed on a reqular basis.

Referemces:



MySQ: Delay_Key_Write

MySQL Alter Table t Delay_ Key_ Write Command

The delay_key_write option works with MyISAM tables. Delayed key writing tells storage engine not to wtire change index data to disk at the end of any query that changes the data. Instead, the changes are kept in the key buffer until the server prunes the buffer or close the table. This option can boots performance if the MyISAM table is changing frequenry, for example, has a lot of inserts and updates.

To turn on delay_key_write option for a table, you can include the following statement in to create table script:

     Delay_key_write =1

Or for existing table, use the alter table statement:

    MySQL> ALTER TABLE table name DELAY_KEY_WRITE = 1;

Please note that,in MySQL, Alter Table statement takes time to excute and locks the table. Therefore, for large tables, so do this with care.

There are a few thing need to keep in mind when using delay_key_write option as well. First, the changes, mainly, the index changes, will be kep in memory until the buffer cache prunes or the table is closed. A table is closed when:
  •   When the cache is full and a thread tries to open a table that is not in the cache.
  •  When the cache contains more than table_cache entries and a thread is no longer using a table.
  •  When FLUSH TABLES command is issued.
 This means that if the server or table is crashed, there will be indexes cruptions since the indexes on the index file (tablename.MYI) was not updated while the data file (tablename.MYD) was. Therefore, it is recomanded that the myisam-recover variable to be turned on during server startup. For example, set
 --myisam-recover=BACKUP,FORCE in my.cnf file or start the server with –myisam-recover option. Secondly, if the table has many write delays, it will table longer for the table to close since it needs time to flush all the caches to the disk, and it will take longer to run FLUSH TABLES statements(flush table, flush table with read lock) Hence, flush tables should be used a bit frequely if you use delay_key_write option. Lastly, queries may need to wait for the MyISAM engine to free up space in the key buffer cache because the unflushed dirty blocks in the key buffer cache take up space.

Delay_Key_write option can also be enable globally for all MyISAM tables. To do so, set the delay_key_write option to ‘all’:

   delay_key_write = all

However, it is not a commend pratics to most systems because for infrequently written tables, delay_key_write may increase the chance of index corruption.

 References:


MySQL: Failed to CREATE PROCEDURE ERROR 1307 (HY000)

ERROR 1307 (HY000): Failed to CREATE PROCEDURE …

When tried to create a procedure, I got the above error message. The user had the needed create routine privileges and the procedure syntax was corrected. Used the following command to check the version and found out that it was v.5.1.30.

mysql> select version();
+----------------------+
| version() |
+----------------------+
| 5.1.30-community-log |
+----------------------+
1 row in set (0.00 sec)

After some researches, it turned out that the database was a copy from an earlier version. So to resolve the problem, I need to apply the mysql_upgrade script. For Redhat Linux installation, it was located in /usr/bin. But it is easy to find out where it is by using the Linux ‘which’ command:

#> which mysql_upgrade
/usr/bin/mysql_upgrade

To excute the script, from the command line run the script using mysql root user and password:

#>/usr./bin/mysql_upgrade -u root –p

Now it should be fine. The script also printed out messages on the screen on what got updated during the execution.
References:


Tuesday, May 11, 2010

MySQL: Startup MySQL Server Failed

Starting MySQL/etc/init.d/mysql: line 159: kill: (9193) - No such process [FAILED]

When tried to start the MySQL server ( v. 5.0.51a, in RedHat Linux 4) using /etc/init.d/mysql start, I got the following error message and the server failed to start.

#>/etc/init.d/start
Starting MySQL/etc/init.d/mysql: line 159: kill: (9193) - No such process [FAILED]

Unforturalely, there was not error in the error log to tell what caused the problem. After some debugging, it turned out, it was because there was one of the server variable, which was added to the my.cnf file after the last successfull server start, was not supported. The server variable was:

    open_file_limit

After command out this variable in the my.cnf file, there server started without problem,

In general, if there is any variable which was wrongly entered (misspelled, for example) in the my.cnf file, the server will fail to start. However, there will not be any error message in the error log to inform the user what cause the error.
References:
Related Post:

MySQL: query_cache_size

MySQL Query Cache Variable: query_cache_size

The query cache in MySQL stores the text of SELECT statement as well as its result set. If an identical query is sent to the server later, the server retrieves the result set from the query cache instead parsing and exciting the query again. Therefore, it reduces time and resource that need to execute the query. However, please note that Query Cache only works for identical queries (exactly the same). For example, the following two queries are not considered identical:


Select city, state, zip from address where city = ‘new york’
Vs.
Select city, state, zip from address where city = ‘NEW YORK’

These two queries are also not considered identical:

Select city, state, zip from address where city = ‘NEW YORK’
Vs.
SELECT city, state, zip from address WHERE city= NEW YORK’

Because one use lower case clause word (select and where) and the other use upper case clause (SELECT and WHERE). Also, the cache is not used for queries of the following types:
· Queries that are a subquery of an outer query
· Queries executed within the body of a stored function, trigger, or event

(1.) Query Cache Variables

The following are server variables need to be set to make use of the query cache.

1. Have_query_cache: this variable indicates where or not query cache is available. The default is ‘Yes’ for standard installation.
2. Query_cahce_type: this variable sets the query cache type. It can be set in global and session level. The values for this variable can be 0,1 or 2:
  •  0 - tells the server not to cache results or retrieve results from the query cache. However, this setting dose not deallocate the query cache buffer. Refer to query_cache_size section for how to allocate the query cache buffer.
  •  1 – tells the server to cache all queries except for those that begin with select sql_no_cache. The default for Query_cahce_type is 1.
  •  2 – tells the server to cache query results only for queries that begin with select sql_cache.
3. Query_cache_size: this variable sets the amount of memory allocated for caching query result sets. To disable or deallocate the query cache, set this variable value to 0, which is also the default. Please note that memory are allocated for query cache regardless if Query_cahce_type is set to 0 or not if query_cache_size variable is set to 40960 bytes or more. If the value is set to less than 40960, the server will reset the query_cache_size variable to 0 and place a warning message in the error log.

4. query_cache_min_res_unit: this variable sets the minimum size (in bytes) for) blocks allocated by the query cache. The default value is 4KB. According to the MySQL Manual, this should be adequate for most case. The query cache allocates blocks for storing query results by allocating memory blocks set by the query_cache_min_res_unit variable. If there are many queries with small result sets, the default block size may lead to memory fragmentation; moreover, it can force the query cache to prune (delete) queries from the cache due to lack of memory. In this case, you should decrease the value of the query_cache_min_res_unit variable.

5. Query_cache_limit: this variable sets the maximum size result set that the query cache will store. The default is 1MB.

(2.) Query Cache Status Variables and Monitoring
  1. Qcache_free_memory: this status variable shows that amount of free memory for query cache. If the amount is low in comparing to the amount of the Query_cache_size, memory may be added to the Query_cache_size.
  2. Qcache_not_cached: this status displays the number of queries that were not cached. Not all the queries are cacheable such as a non-select query.
  3. Qchache_queries_in_cache: this status display the number of queries registered in the query cache.
  4. Qcache_total_blocks: this status displays the total number of blocks in the query cache.
  5. Qcache_hits: this status variable displays the number of query cache hits
  6. Qcache_inserts: this status variable displays the number of queries added to the query cache.
  7. Com_select: this status display number of times a select statement has been executed. If a query result is returned from query cache, the server increments the Qcache_hits status variable, not Com_select.
              Com_select = (Qcache_inserts + Qcache_not_cached + queries with errors found during the
                                       column-privileges check)
              The total number of SELECT queries = ( Com_select + Qcache_hits + queries with
                                                                                    errors found  by parser)
              The total number of successfully executed SELECT queries = ( Com_select + Qcache_hits )

              Query cache utilization = (Qcache_hits/ total number of successfully executed SELECT queries)
                          Or
               Query cache utilization = (Qcache_hits/( Com_select + Qcache_hits ))

           Please note that it all depends on the kind of applications, the % Query cache utilization will be
           different.

    8. Qcache_free_blocks: this variable display the number of free memory blocks in the query cache.
    9. Qcache_lowmem_prunes: this status displays the number of queries that were deleted from the query
        cache because of low memory. This status variable value also increases if the cache is not big enough to
        hold the query statements. Also, this variable increase if there were many small query results sets which
        indicates memory fragmentation. If the qcache_lowmem_prunes status variable is greater than 0,
        consider increasing the query_cache_size dynamically and monitor the Qcache_lowmem_prunes status
        variable to make sure that it stop increase.

 (3.) MySQL Commands
To determine if or not the query cache has become fragmented, you can execute the FLUSH QUERY CACHE statement to defragment the cache... After issuing the statement, only a single free block remains. This command dose not removes queries from the cache.

To remove all the queries from the cache, use the RESET QUERY CACHE command.

Some other considerations need to be given when setting up Query Cache. As the size of the query cache grows, it will take longer to scan for matching queries, take longer to retrieve the result sets, and increase mutex contention.

References:
Related Posts: