CREATE TABLE `lu_ip_to_country` (
id INT UNSIGNED NOT NULL auto_increment,
start_ip_num int(10) UNSIGNED NOT NULL,
end_ip_num int(10) UNSIGNED NOT NULL,
country_id char(2) NOT NULL,
iso_country_code varchar(3) DEFAULT NULL,
country_name varchar(100) DEFAULT NULL,
PRIMARY KEY (id),
key start_end_ip (start_ip_num,end_ip_num)
);
- Start_ip_num: Beginning IP Number
- End_ip_num: Ending IP Number
- Country_id: two character country abbreviations (US, BG, FR, etc.) similar to ISO code
- ISO_country_code: ISO 3166 Country Code
- Country_name: Country Name
Beginning IP Number and Ending IP Number are the number version of the IPs which are calculated as follows:
Ip number = 16777216*w + 65536*x + 256*y + z
Where w, x, y, z represent the IP address (w, x,y,z). For example: if ip is 61.94.7.255,then w=61,x=94,y=7 and z=255.The queries that use to lookup country information are simple as well:
Select country_id from lu_ip_to_country
Where v_ip_num_lookup between start_ip_num and end_ip_num In this query, the v_ip_num_lookup is a variable used to store the calculated ip number. For example, '61.94.7.255’=1029572607.
The other query utilized the MySQL INET_ATON() function, which convert the ip to ip number. i.e. INET_ATON('61.94.7.255') = 1029572607. The query is the following:
Select country_id from lu_ip_to_country
Where INET_ATON('61.94.7.255') between start_ip_num and end_ip_num
These tow queries worked fine but slow. It will take up-to 1.5 second to return the result. And they were even slower when there were concurrent requests to this table. I had tried different indexes, and even tried to use memory table, but the improvements were not notable.
The other week, while I was searching the web, I came across an article, which was written by Jeremy Cole, described how to use MySQL GIS with spatial R-tree indexes for similar ip-to-country lookup in MySQL.I decided to give it a try. And, it turn out that the result was excellence.
Jeremy Cole’s article gave a vary detail explanation on how GIS with spatial R-tree indexes work in this case. Here are the steps on how I implemented and tested it (following what were described in the article).
1. Reconstruct the table to using POLYGON type and spatial index
CREATE TABLE lu_ip_to_country_geoip (
id INT UNSIGNED NOT NULL auto_increment,
ip_poly POLYGON NOT NULL,
start_ip_num int(10) UNSIGNED NOT NULL,
end_ip_num int(10) UNSIGNED NOT NULL,
country_id varchar(3) NOT NULL,
iso_country_code varchchar(3) DEFAULT NULL,
country_name varchar(100) DEFAULT NULL,
PRIMARY KEY (id),
SPATIAL INDEX (ip_poly)
);
2. Export the data from the existing table
Select start_ip_num,
end_ip_num,
country_id,
iso_country_code,
country_name
Into outfile ‘/tmp/ip-country-data.dat’
FIELDS
TERMINATED BY ","
ENCLOSED BY "\""
LINES
TERMINATED BY "\n"
FROM lu_ip_to_country;
If you don’t have the data in the first place, you can download them from MaxMind web site (http://www.maxmind.com/app/country) as mentioned in the article.
3. Load the data into the new table with the MySQL GIS function to build the POLYGON for ip_poly field from the start_ip_num and end_ip_num fields.
LOAD DATA LOCAL INFILE "/tmp/ip-country-data.dat"
INTO TABLE lu_ip_to_country_geoip
FIELDS
TERMINATED BY ","
ENCLOSED BY "\""
LINES
TERMINATED BY "\n"
(
@start_ip_num, @end_ip_num,
@country_id,@iso_country_code, @country_name
)
SET
id := NULL,
start_ip_num := @start_ip_num,
end_ip_num := @end_ip_num,
ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
/* clockwise, 4 points and back to 0 */
POINT(@start_ip_num, -1), /* 0, top left */
POINT(@end_ip_num, -1), /* 1, top right */
POINT(@end_ip_num, 1), /* 2, bottom right */
POINT(@start_ip_num, 1), /* 3, bottom left */
POINT(@start_ip_num, -1) /* 0, back to start */))),
country_id := @country_code,
iso_country_code := @iso-country_code,
country_name := @country_string;
4. Test queries
mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country WHERE INET_ATON('61.94.7.255') BETWEEN start_ip_address AND end_ip_address;
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (1.57 sec)
mysql> SELECT country_id,iso_country_code,country_name FROM lu_ip_to_country_geoip WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('61.94.7.255'), 0)));
+------------+------------------+--------------+
| country_id | iso_country_code | country_name |
+------------+------------------+--------------+
| ID | IDN | Indonesia |
+------------+------------------+--------------+
1 row in set (0.35 sec)
5. Perform A/B Test on ETL process.
I also selected a reasonable data file and ran through the process using old ip-to-country table for process A and new ip-to-country table for process B and QA’edfor the following:
- Check to make sure all Country IDs/Codes match exactly on both out put
- Compare times
References:
- Jcole’s Weblog: On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS
- MySQL Forum: MySQL and GIS, GeoIP: FAQs, How-To, Articles, Blogs
- MySQL Manual: Spatial Extensions
- MySQL Manual: Creating Spatial Indexes
- MySQL Manual: Class Polygon
- MySQL Manual: Relations on Geometry Minimal Bounding Rectangles (MBRs)
- MySQL Manual: Class Point
- Other References on the web
No comments:
Post a Comment