Is there a REAL performance difference between INT and VARCHAR primary keys? Is there a REAL performance difference between INT and VARCHAR primary keys? mysql mysql

Is there a REAL performance difference between INT and VARCHAR primary keys?


You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.

That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.

Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.

Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.


It's not about performance. It's about what makes a good primary key. Unique and unchanging over time. You may think an entity such as a country code never changes over time and would be a good candidate for a primary key. But bitter experience is that is seldom so.

INT AUTO_INCREMENT meets the "unique and unchanging over time" condition. Hence the preference.


I was a bit annoyed by the lack of benchmarks for this online, so I ran a test myself.

Note though that I don't do it on a regular basic, so please check my setup and steps for any factors that could have influenced the results unintentionally, and post your concerns in comments.

The setup was as follows:

  • Intel® Core™ i7-7500U CPU @ 2.70GHz × 4
  • 15.6 GiB RAM, of which I ensured around 8 GB was free during the test.
  • 148.6 GB SSD drive, with plenty of free space.
  • Ubuntu 16.04 64-bit
  • MySQL Ver 14.14 Distrib 5.7.20, for Linux (x86_64)

The tables:

create table jan_int (data1 varchar(255), data2 int(10), myindex tinyint(4)) ENGINE=InnoDB;create table jan_int_index (data1 varchar(255), data2 int(10), myindex tinyint(4), INDEX (myindex)) ENGINE=InnoDB;create table jan_char (data1 varchar(255), data2 int(10), myindex char(6)) ENGINE=InnoDB;create table jan_char_index (data1 varchar(255), data2 int(10), myindex char(6), INDEX (myindex)) ENGINE=InnoDB;create table jan_varchar (data1 varchar(255), data2 int(10), myindex varchar(63)) ENGINE=InnoDB;create table jan_varchar_index (data1 varchar(255), data2 int(10), myindex varchar(63), INDEX (myindex)) ENGINE=InnoDB;

Then, I filled 10 million rows in each table with a PHP script whose essence is like this:

$pdo = get_pdo();$keys = [ 'alabam', 'massac', 'newyor', 'newham', 'delawa', 'califo', 'nevada', 'texas_', 'florid', 'ohio__' ];for ($k = 0; $k < 10; $k++) {    for ($j = 0; $j < 1000; $j++) {        $val = '';        for ($i = 0; $i < 1000; $i++) {            $val .= '("' . generate_random_string() . '", ' . rand (0, 10000) . ', "' . ($keys[rand(0, 9)]) . '"),';        }        $val = rtrim($val, ',');        $pdo->query('INSERT INTO jan_char VALUES ' . $val);    }    echo "\n" . ($k + 1) . ' millon(s) rows inserted.';}

For int tables, the bit ($keys[rand(0, 9)]) was replaced with just rand(0, 9), and for varchar tables, I used full US state names, without cutting or extending them to 6 characters. generate_random_string() generates a 10-character random string.

Then I ran in MySQL:

  • SET SESSION query_cache_type=0;
  • For jan_int table:
    • SELECT count(*) FROM jan_int WHERE myindex = 5;
    • SELECT BENCHMARK(1000000000, (SELECT count(*) FROM jan_int WHERE myindex = 5));
  • For other tables, same as above, with myindex = 'califo' for char tables and myindex = 'california' for varchar tables.

Times of the BENCHMARK query on each table:

  • jan_int: 21.30 sec
  • jan_int_index: 18.79 sec
  • jan_char: 21.70 sec
  • jan_char_index: 18.85 sec
  • jan_varchar: 21.76 sec
  • jan_varchar_index: 18.86 sec

Regarding table & index sizes, here's the output of show table status from janperformancetest; (w/ a few columns not shown):

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| Name              | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Collation              ||-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| jan_int           | InnoDB |      10 | Dynamic    | 9739094 |             43 |   422510592 |               0 |            0 |   4194304 |           NULL | utf8mb4_unicode_520_ci |  | jan_int_index     | InnoDB |      10 | Dynamic    | 9740329 |             43 |   420413440 |               0 |    132857856 |   7340032 |           NULL | utf8mb4_unicode_520_ci |   | jan_char          | InnoDB |      10 | Dynamic    | 9726613 |             51 |   500170752 |               0 |            0 |   5242880 |           NULL | utf8mb4_unicode_520_ci |  | jan_char_index    | InnoDB |      10 | Dynamic    | 9719059 |             52 |   513802240 |               0 |    202342400 |   5242880 |           NULL | utf8mb4_unicode_520_ci |  | jan_varchar       | InnoDB |      10 | Dynamic    | 9722049 |             53 |   521142272 |               0 |            0 |   7340032 |           NULL | utf8mb4_unicode_520_ci |   | jan_varchar_index | InnoDB |      10 | Dynamic    | 9738381 |             49 |   486539264 |               0 |    202375168 |   7340032 |           NULL | utf8mb4_unicode_520_ci | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

My conclusion is that there's no performance difference for this particular use case.