sql query optimisation sql query optimisation oracle oracle

sql query optimisation


So, here is the explain plan for a query which joins on just the concatenated string:

SQL> explain plan for  2     select e.* from emp e  3         join big_table bt on bt.col2 = 'search'||trim(to_char(e.empno))  4  /Explained.SQL> select * from table(dbms_xplan.display)  2  /PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 179424166-------------------------------------------------------------------------------| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |-------------------------------------------------------------------------------|   0 | SELECT STATEMENT   |          |  1052 | 65224 |    43   (0)| 00:00:01 ||   1 |  NESTED LOOPS      |          |  1052 | 65224 |    43   (0)| 00:00:01 ||   2 |   TABLE ACCESS FULL| EMP      |    20 |   780 |     3   (0)| 00:00:01 ||*  3 |   INDEX RANGE SCAN | BIG_VC_I |    53 |  1219 |     2   (0)| 00:00:01 |-------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))15 rows selected.SQL>

Compare and contrast with the plan for a query which includes the LIKE clause in its join:

SQL> explain plan for  2     select e.* from emp e  3           join big_table bt on (bt.col2 like 'search%'  4               and bt.col2 = 'search'||trim(to_char(e.empno)))  5  /Explained.SQL> select * from table(dbms_xplan.display)  2  /PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 179424166-------------------------------------------------------------------------------| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |-------------------------------------------------------------------------------|   0 | SELECT STATEMENT   |          |     1 |    62 |     5   (0)| 00:00:01 ||   1 |  NESTED LOOPS      |          |     1 |    62 |     5   (0)| 00:00:01 ||*  2 |   TABLE ACCESS FULL| EMP      |     1 |    39 |     3   (0)| 00:00:01 ||*  3 |   INDEX RANGE SCAN | BIG_VC_I |     1 |    23 |     2   (0)| 00:00:01 |-------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')   3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))       filter("BT"."COL2" LIKE 'search%')17 rows selected.SQL>

The cost of the second query is much lower than the first. But this is because the optimizer is estimating that the second query will return far fewer rows than the first query. More information allows the database to make a more accurate prediction. (In fact the query will return no rows).

Of course this does presume the joined column is indexed, otherwise it won't make any difference.

The other thing to bear in mind is that the columns which are queried can affect the plan. This version selects from BIG_TABLE rather than EMP.

SQL> explain plan for  2     select bt.* from emp e  3           join big_table bt on (bt.col2 like 'search%'  4                        and bt.col2 = 'search'||trim(to_char(e.empno)))  5  /Explained.SQL> select * from table(dbms_xplan.display)  2  /PLAN_TABLE_OUTPUT---------------------------------------------------------------------------------------------------------------Plan hash value: 4042413806------------------------------------------------------------------------------------------| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT             |           |     1 |    46 |     4   (0)| 00:00:01 ||   1 |  NESTED LOOPS                |           |       |       |            |          ||   2 |   NESTED LOOPS               |           |     1 |    46 |     4   (0)| 00:00:01 ||*  3 |    INDEX FULL SCAN           | PK_EMP    |     1 |     4 |     1   (0)| 00:00:01 ||*  4 |    INDEX RANGE SCAN          | BIG_VC_I  |     1 |       |     2   (0)| 00:00:01 ||   5 |   TABLE ACCESS BY INDEX ROWID| BIG_TABLE |     1 |    42 |     3   (0)| 00:00:01 |------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')   4 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))       filter("BT"."COL2" LIKE 'search%')19 rows selected.SQL>


The query analysis of the various database engines would really tell the story but my first instinct would be that the first form is in fact optimized. The reason is that the compiler cannot guess as the to results of the concatenation. It must do more work to determine the value against which to do the match and would likely result in a table scan. The first still must do that, however, it is able to narrow the resultset using the LIKE operator (presuming an index exists on the someID column) first and thus has to do fewer concatenations.