MySQL "IN" operator performance on (large?) number of values MySQL "IN" operator performance on (large?) number of values sql sql

MySQL "IN" operator performance on (large?) number of values


Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000.

However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:

WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836

Or whatever the gaps are.


I have been doing some tests, and as David Fells says in his answer, it is quite well optimized. As a reference, I have created an InnoDB table with 1,000,000 registers and doing a select with the "IN" operator with 500,000 random numbers, it takes only 2.5 seconds on my MAC; selecting only the even registers takes 0.5 seconds.

The only problem that I had is that I had to increase the max_allowed_packet parameter from the my.cnf file. If not, a mysterious “MYSQL has gone away” error is generated.

Here is the PHP code that I use to make the test:

$NROWS =1000000;$SELECTED = 50;$NROWSINSERT =15000;$dsn="mysql:host=localhost;port=8889;dbname=testschema";$pdo = new PDO($dsn, "root", "root");$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);$pdo->exec("drop table if exists `uniclau`.`testtable`");$pdo->exec("CREATE  TABLE `testtable` (        `id` INT NOT NULL ,        `text` VARCHAR(45) NULL ,        PRIMARY KEY (`id`) )");$before = microtime(true);$Values='';$SelValues='(';$c=0;for ($i=0; $i<$NROWS; $i++) {    $r = rand(0,99);    if ($c>0) $Values .= ",";    $Values .= "( $i , 'This is value $i and r= $r')";    if ($r<$SELECTED) {        if ($SelValues!="(") $SelValues .= ",";        $SelValues .= $i;    }    $c++;    if (($c==100)||(($i==$NROWS-1)&&($c>0))) {        $pdo->exec("INSERT INTO `testtable` VALUES $Values");        $Values = "";        $c=0;    }}$SelValues .=')';echo "<br>";$after = microtime(true);echo "Insert execution time =" . ($after-$before) . "s<br>";$before = microtime(true);  $sql = "SELECT count(*) FROM `testtable` WHERE id IN $SelValues";$result = $pdo->prepare($sql);  $after = microtime(true);echo "Prepare execution time =" . ($after-$before) . "s<br>";$before = microtime(true);$result->execute();$c = $result->fetchColumn();$after = microtime(true);echo "Random selection = $c Time execution time =" . ($after-$before) . "s<br>";$before = microtime(true);$sql = "SELECT count(*) FROM `testtable` WHERE id %2 = 1";$result = $pdo->prepare($sql);$result->execute();$c = $result->fetchColumn();$after = microtime(true);echo "Pairs = $c Exdcution time=" . ($after-$before) . "s<br>";

And the results:

Insert execution time =35.2927210331sPrepare execution time =0.0161771774292sRandom selection = 499102 Time execution time =2.40285992622sPairs = 500000 Exdcution time=0.465420007706s


You can create a temporary table where you can put any number of IDs and run nested queryExample:

CREATE [TEMPORARY] TABLE tmp_IDs (`ID` INT NOT NULL,PRIMARY KEY (`ID`));

and select:

SELECT id, name, priceFROM productsWHERE id IN (SELECT ID FROM tmp_IDs);