MySQL Great Circle Distance (Haversine formula) MySQL Great Circle Distance (Haversine formula) php php

MySQL Great Circle Distance (Haversine formula)


From Google Code FAQ - Creating a Store Locator with PHP, MySQL & Google Maps:

Here's the SQL statement that will find the closest 20 locations that are within a radius of 25 miles to the 37, -122 coordinate. It calculates the distance based on the latitude/longitude of that row and the target latitude/longitude, and then asks for only rows where the distance value is less than 25, orders the whole query by distance, and limits it to 20 results. To search by kilometers instead of miles, replace 3959 with 6371.

SELECT id, ( 3959 * acos( cos( radians(37) ) * cos( radians( lat ) ) * cos( radians( lng ) - radians(-122) ) + sin( radians(37) ) * sin(radians(lat)) ) ) AS distance FROM markers HAVING distance < 25 ORDER BY distance LIMIT 0 , 20;


$greatCircleDistance = acos( cos($latitude0) * cos($latitude1) * cos($longitude0 - $longitude1) + sin($latitude0) * sin($latitude1));

with latitude and longitude in radian.

so

SELECT   acos(       cos(radians( $latitude0 ))    * cos(radians( $latitude1 ))    * cos(radians( $longitude0 ) - radians( $longitude1 ))    + sin(radians( $latitude0 ))     * sin(radians( $latitude1 ))  ) AS greatCircleDistance  FROM yourTable;

is your SQL query

to get your results in Km or miles, multiply the result with the mean radius of Earth (3959 miles,6371 Km or 3440 nautical miles)

The thing you are calculating in your example is a bounding box.If you put your coordinate data in a spatial enabled MySQL column, you can use MySQL's build in functionality to query the data.

SELECT   idFROM spatialEnabledTableWHERE   MBRWithin(ogc_point, GeomFromText('Polygon((0 0,0 3,3 3,3 0,0 0))'))


If you add helper fields to the coordinates table, you can improve response time of the query.

Like this:

CREATE TABLE `Coordinates` (`id` INT(10) UNSIGNED NOT NULL COMMENT 'id for the object',`type` TINYINT(4) UNSIGNED NOT NULL DEFAULT '0' COMMENT 'type',`sin_lat` FLOAT NOT NULL COMMENT 'sin(lat) in radians',`cos_cos` FLOAT NOT NULL COMMENT 'cos(lat)*cos(lon) in radians',`cos_sin` FLOAT NOT NULL COMMENT 'cos(lat)*sin(lon) in radians',`lat` FLOAT NOT NULL COMMENT 'latitude in degrees',`lon` FLOAT NOT NULL COMMENT 'longitude in degrees',INDEX `lat_lon_idx` (`lat`, `lon`))    

If you're using TokuDB, you'll get even better performance if you add clusteringindexes on either of the predicates, for example, like this:

alter table Coordinates add clustering index c_lat(lat);alter table Coordinates add clustering index c_lon(lon);

You'll need the basic lat and lon in degrees as well as sin(lat) in radians, cos(lat)*cos(lon) in radians and cos(lat)*sin(lon) in radians for each point.Then you create a mysql function, smth like this:

CREATE FUNCTION `geodistance`(`sin_lat1` FLOAT,                              `cos_cos1` FLOAT, `cos_sin1` FLOAT,                              `sin_lat2` FLOAT,                              `cos_cos2` FLOAT, `cos_sin2` FLOAT)    RETURNS float    LANGUAGE SQL    DETERMINISTIC    CONTAINS SQL    SQL SECURITY INVOKER   BEGIN   RETURN acos(sin_lat1*sin_lat2 + cos_cos1*cos_cos2 + cos_sin1*cos_sin2);   END

This gives you the distance.

Don't forget to add an index on lat/lon so the bounding boxing can help the search instead of slowing it down (the index is already added in the CREATE TABLE query above).

INDEX `lat_lon_idx` (`lat`, `lon`)

Given an old table with only lat/lon coordinates, you can set up a script to update it like this: (php using meekrodb)

$users = DB::query('SELECT id,lat,lon FROM Old_Coordinates');foreach ($users as $user){  $lat_rad = deg2rad($user['lat']);  $lon_rad = deg2rad($user['lon']);  DB::replace('Coordinates', array(    'object_id' => $user['id'],    'object_type' => 0,    'sin_lat' => sin($lat_rad),    'cos_cos' => cos($lat_rad)*cos($lon_rad),    'cos_sin' => cos($lat_rad)*sin($lon_rad),    'lat' => $user['lat'],    'lon' => $user['lon']  ));}

Then you optimize the actual query to only do the distance calculation when really needed, for example by bounding the circle (well, oval) from inside and outside.For that, you'll need to precalculate several metrics for the query itself:

// assuming the search center coordinates are $lat and $lon in degrees// and radius in km is given in $distance$lat_rad = deg2rad($lat);$lon_rad = deg2rad($lon);$R = 6371; // earth's radius, km$distance_rad = $distance/$R;$distance_rad_plus = $distance_rad * 1.06; // ovality error for outer bounding box$dist_deg_lat = rad2deg($distance_rad_plus); //outer bounding box$dist_deg_lon = rad2deg($distance_rad_plus/cos(deg2rad($lat)));$dist_deg_lat_small = rad2deg($distance_rad/sqrt(2)); //inner bounding box$dist_deg_lon_small = rad2deg($distance_rad/cos(deg2rad($lat))/sqrt(2));

Given those preparations, the query goes something like this (php):

$neighbors = DB::query("SELECT id, type, lat, lon,       geodistance(sin_lat,cos_cos,cos_sin,%d,%d,%d) as distance       FROM Coordinates WHERE       lat BETWEEN %d AND %d AND lon BETWEEN %d AND %d       HAVING (lat BETWEEN %d AND %d AND lon BETWEEN %d AND %d) OR distance <= %d",  // center radian values: sin_lat, cos_cos, cos_sin       sin($lat_rad),cos($lat_rad)*cos($lon_rad),cos($lat_rad)*sin($lon_rad),  // min_lat, max_lat, min_lon, max_lon for the outside box       $lat-$dist_deg_lat,$lat+$dist_deg_lat,       $lon-$dist_deg_lon,$lon+$dist_deg_lon,  // min_lat, max_lat, min_lon, max_lon for the inside box       $lat-$dist_deg_lat_small,$lat+$dist_deg_lat_small,       $lon-$dist_deg_lon_small,$lon+$dist_deg_lon_small,  // distance in radians       $distance_rad);

EXPLAIN on the above query might say that it's not using index unless there's enough results to trigger such. The index will be used when there's enough data in the coordinates table.You can add FORCE INDEX (lat_lon_idx)to the SELECT to make it use the index with no regards to the table size, so you can verify with EXPLAIN that it is working correctly.

With the above code samples you should have a working and scalable implementation of object search by distance with minimal error.