Missing STOPKEY per partition in Oracle plan for paging by local index Missing STOPKEY per partition in Oracle plan for paging by local index sql sql

Missing STOPKEY per partition in Oracle plan for paging by local index


When you use bind variables, Oracle is forced to use dynamic partition pruning instead of static partition pruning. The result of this is that Oracle doesn't know at parse time which partitions will be accessed, as this changes based on your input variables.

This means that when using literal values (instead of bind variables), we know which partitions will be accessed by your local index. Therefore the count stopkey can be applied to the output of the index before we prune the partitions.

When using bind variables, the partition range iterator has to figure out which partitions you're accessing. It then has a check to ensure that the first of your variables in the between operations do actually have a lower value then the second one (the filter operation in the second plan).

This can easily be reproduced, as the following test case shows:

create table tab (  x date,  y integer,  filler varchar2(100)) partition by range(x) (  partition p1 values less than (date'2013-01-01'),  partition p2 values less than (date'2013-02-01'),  partition p3 values less than (date'2013-03-01'),  partition p4 values less than (date'2013-04-01'),  partition p5 values less than (date'2013-05-01'),  partition p6 values less than (date'2013-06-01'));insert into tab (x, y)  select add_months(trunc(sysdate, 'y'), mod(rownum, 5)), rownum, dbms_random.string('x', 50)  from   dual   connect by level <= 1000;create index i on tab(x desc, y desc) local;exec dbms_stats.gather_table_stats(user, 'tab', cascade => true);explain plan for SELECT * FROM (  SELECT rowid FROM tab  where  x between date'2013-01-01' and date'2013-02-02'  and    y between 50 and 100  order  by x desc, y desc)where rownum <= 5;SELECT * FROM table(dbms_xplan.display(null, null, 'BASIC +ROWS +PARTITION'));--------------------------------------------------------------------                                                                                                                                                                                                                                         | Id  | Operation                   | Name | Rows  | Pstart| Pstop |                                                                                                                                                                                                                                         --------------------------------------------------------------------                                                                                                                                                                                                                                         |   0 | SELECT STATEMENT            |      |     1 |       |       |                                                                                                                                                                                                                                         |   1 |  COUNT STOPKEY              |      |       |       |       |                                                                                                                                                                                                                                         |   2 |   VIEW                      |      |     1 |       |       |                                                                                                                                                                                                                                         |   3 |    SORT ORDER BY STOPKEY    |      |     1 |       |       |                                                                                                                                                                                                                                         |   4 |     PARTITION RANGE ITERATOR|      |     1 |     2 |     3 |                                                                                                                                                                                                                                         |   5 |      COUNT STOPKEY          |      |       |       |       |                                                                                                                                                                                                                                         |   6 |       INDEX RANGE SCAN      | I    |     1 |     2 |     3 |                                                                                                                                                                                                                                         -------------------------------------------------------------------- explain plan for SELECT * FROM (  SELECT rowid FROM tab  where  x between to_date(:st, 'dd/mm/yyyy') and to_date(:en, 'dd/mm/yyyy')  and    y between :a and :b  order  by x desc, y desc)where rownum <= 5;SELECT * FROM table(dbms_xplan.display(null, null, 'BASIC +ROWS +PARTITION'));---------------------------------------------------------------------                                                                                                                                                                                                                                        | Id  | Operation                    | Name | Rows  | Pstart| Pstop |                                                                                                                                                                                                                                        ---------------------------------------------------------------------                                                                                                                                                                                                                                        |   0 | SELECT STATEMENT             |      |     1 |       |       |                                                                                                                                                                                                                                        |   1 |  COUNT STOPKEY               |      |       |       |       |                                                                                                                                                                                                                                        |   2 |   VIEW                       |      |     1 |       |       |                                                                                                                                                                                                                                        |   3 |    SORT ORDER BY STOPKEY     |      |     1 |       |       |                                                                                                                                                                                                                                        |   4 |     FILTER                   |      |       |       |       |                                                                                                                                                                                                                                        |   5 |      PARTITION RANGE ITERATOR|      |     1 |   KEY |   KEY |                                                                                                                                                                                                                                        |   6 |       INDEX RANGE SCAN       | I    |     1 |   KEY |   KEY |                                                                                                                                                                                                                                        --------------------------------------------------------------------- 

As in your example, the second query can only filter the partitions to a key at parse time, rather than the exact partitions as in the first example.

This is one of those rare cases where literal values can provide better performance than bind variables. You should investigate whether this is a possibility for you.

Finally, you say you want 20 rows from each partition. Your query as stands won't do this, it'll just return you the first 20 rows according to your ordering. For 20 rows/partition, you need to do something like this:

select rd from (    select rowid rd,            row_number() over (partition by trx_id order by create_ts desc) rn    from OUT_SMS         where  TRX_ID between ? and ?              and CREATE_TS between ? and ?    order by CREATE_TS DESC, TRX_ID DESC) where rn <= 20

UPDATE

The reason you're not getting the count stopkey is to do with the filter operation in line 4 of the "bad" plan. You can see this more clearly if you repeat the example above, but with no partitioning.

This gives you the following plans:

----------------------------------------                                                                                                                                                                                                                                                                     | Id  | Operation               | Name |                                                                                                                                                                                                                                                                     ----------------------------------------                                                                                                                                                                                                                                                                     |   0 | SELECT STATEMENT        |      |                                                                                                                                                                                                                                                                     |*  1 |  COUNT STOPKEY          |      |                                                                                                                                                                                                                                                                     |   2 |   VIEW                  |      |                                                                                                                                                                                                                                                                     |*  3 |    SORT ORDER BY STOPKEY|      |                                                                                                                                                                                                                                                                     |*  4 |     TABLE ACCESS FULL   | TAB  |                                                                                                                                                                                                                                                                     ----------------------------------------                                                                                                                                                                                                                                                                     Predicate Information (identified by operation id):                                                                                                                                                                                                                                                          ---------------------------------------------------                                                                                                                                                                                                                                                             1 - filter(ROWNUM<=5)                                                                                                                                                                                                                                                                                        3 - filter(ROWNUM<=5)                                                                                                                                                                                                                                                                                        4 - filter("X">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd                                                                                                                                                                                                                                                            hh24:mi:ss') AND "X"<=TO_DATE(' 2013-02-02 00:00:00', 'syyyy-mm-dd                                                                                                                                                                                                                                           hh24:mi:ss') AND "Y">=50 AND "Y"<=100)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ----------------------------------------                                                                                                                                                                                                                                                                     | Id  | Operation               | Name |                                                                                                                                                                                                                                                                     ----------------------------------------                                                                                                                                                                                                                                                                     |   0 | SELECT STATEMENT        |      |                                                                                                                                                                                                                                                                     |*  1 |  COUNT STOPKEY          |      |                                                                                                                                                                                                                                                                     |   2 |   VIEW                  |      |                                                                                                                                                                                                                                                                     |*  3 |    SORT ORDER BY STOPKEY|      |                                                                                                                                                                                                                                                                     |*  4 |     FILTER              |      |                                                                                                                                                                                                                                                                     |*  5 |      TABLE ACCESS FULL  | TAB  |                                                                                                                                                                                                                                                                     ----------------------------------------                                                                                                                                                                                                                                                                     Predicate Information (identified by operation id):                                                                                                                                                                                                                                                          ---------------------------------------------------                                                                                                                                                                                                                                                             1 - filter(ROWNUM<=5)                                                                                                                                                                                                                                                                                        3 - filter(ROWNUM<=5)                                                                                                                                                                                                                                                                                        4 - filter(TO_NUMBER(:A)<=TO_NUMBER(:B) AND                                                                                                                                                                                                                                                                             TO_DATE(:ST,'dd/mm/yyyy')<=TO_DATE(:EN,'dd/mm/yyyy'))                                                                                                                                                                                                                                             5 - filter("Y">=TO_NUMBER(:A) AND "Y"<=TO_NUMBER(:B) AND                                                                                                                                                                                                                                                                "X">=TO_DATE(:ST,'dd/mm/yyyy') AND "X"<=TO_DATE(:EN,'dd/mm/yyyy'))   

As you can see, there's an extra filter operation when you use bind variables appearing before the sort order by stopkey. This happens after accessing the index. This is checking that the values for the variables will allow data to be returned (the first variable in your between does actually have a lower value than the second). This isn't necessary when using literals because the optimizer already knows that 50 is less than 100 (in this case). It doesn't know whether :a is less than :b at parse time however.

Why exactly this is I don't know. It could be intentional design by Oracle - there's no point doing the stopkey check if the values set for the variables result in zero rows - or just an oversight.


I can reproduce your findings on 11.2.0.3. Here's my test case:

SQL> -- Table with 100 partitions of 100 rows SQL> CREATE TABLE out_sms  2  PARTITION BY RANGE (trx_id)  3     INTERVAL (100) (PARTITION p0 VALUES LESS THAN (0))  4  AS  5  SELECT ROWNUM trx_id,  6         trunc(SYSDATE) + MOD(ROWNUM, 50) create_ts  7  FROM dual CONNECT BY LEVEL <= 10000;Table createdSQL> CREATE INDEX OUT_SMS_IDX ON out_sms (create_ts desc, trx_id desc) LOCAL;Index created[static plan]SELECT rd  FROM (SELECT /*+ INDEX(OUT_SMS OUT_SMS_IDX) */         rowid rd          FROM out_sms         WHERE create_ts BETWEEN systimestamp AND systimestamp + 10           AND trx_id BETWEEN 1 AND 500         ORDER BY create_ts DESC, trx_id DESC) WHERE rownum <= 20;    ---------------------------------------------------------------------------| Id  | Operation                   | Name        | Rows  | Pstart| Pstop |---------------------------------------------------------------------------|   0 | SELECT STATEMENT            |             |     1 |       |       ||*  1 |  COUNT STOPKEY              |             |       |       |       ||   2 |   VIEW                      |             |     1 |       |       ||*  3 |    SORT ORDER BY STOPKEY    |             |     1 |       |       ||   4 |     PARTITION RANGE ITERATOR|             |     1 |     2 |     7 ||*  5 |      COUNT STOPKEY          |             |       |       |       ||*  6 |       INDEX RANGE SCAN      | OUT_SMS_IDX |     1 |     2 |     7 |---------------------------------------------------------------------------[dynamic]     ----------------------------------------------------------------------------| Id  | Operation                    | Name        | Rows  | Pstart| Pstop |----------------------------------------------------------------------------|   0 | SELECT STATEMENT             |             |     1 |       |       ||*  1 |  COUNT STOPKEY               |             |       |       |       ||   2 |   VIEW                       |             |     1 |       |       ||*  3 |    SORT ORDER BY STOPKEY     |             |     1 |       |       ||*  4 |     FILTER                   |             |       |       |       ||   5 |      PARTITION RANGE ITERATOR|             |     1 |   KEY |   KEY ||*  6 |       INDEX RANGE SCAN       | OUT_SMS_IDX |     1 |   KEY |   KEY |----------------------------------------------------------------------------

As in your example the ROWNUM predicate is pushed inside the partition index range scan in the first case, not in the second case. When using static variables, the plan shows that Oracle fetches only 20 rows per partition, whereas using dynamic variables, Oracle will fetch all rows that satisfy the WHERE clause in each partition. I couldn't find a setting or a statistics configuration where the predicate could be pushed when using bind variables.

I hoped that you could use dynamic filters with wider static limits to game the system but it seems that the ROWNUM predicate isn't used inside individual partitions as soon as there are dynamic variables present:

SELECT rd  FROM (SELECT /*+ INDEX(OUT_SMS OUT_SMS_IDX) */         rowid rd          FROM out_sms         WHERE nvl(create_ts+:5, sysdate) BETWEEN :1 AND :2           AND nvl(trx_id+:6, 0) BETWEEN :3 AND :4           AND trx_id BETWEEN 1 AND 500           AND create_ts BETWEEN systimestamp AND systimestamp + 10         ORDER BY create_ts DESC, trx_id DESC) WHERE rownum <= 20Plan hash value: 2740263591----------------------------------------------------------------------------| Id  | Operation                    | Name        | Rows  | Pstart| Pstop |----------------------------------------------------------------------------|   0 | SELECT STATEMENT             |             |     1 |       |       ||*  1 |  COUNT STOPKEY               |             |       |       |       ||   2 |   VIEW                       |             |     1 |       |       ||*  3 |    SORT ORDER BY STOPKEY     |             |     1 |       |       ||*  4 |     FILTER                   |             |       |       |       ||   5 |      PARTITION RANGE ITERATOR|             |     1 |     2 |     7 ||*  6 |       INDEX RANGE SCAN       | OUT_SMS_IDX |     1 |     2 |     7 |----------------------------------------------------------------------------

If this query is important and its performance is critical, you could transform the index to a global index. It will increase partition maintenance but most partition operations can be used online with recent Oracle versions. A global index will work as with standard non-partitioned table in this case:

SQL> drop index out_sms_idx;Index droppedSQL> CREATE INDEX OUT_SMS_IDX ON out_sms (create_ts DESC, trx_id desc);Index createdSELECT rd  FROM (SELECT          rowid rd          FROM out_sms         WHERE create_ts BETWEEN :1 AND :2           AND trx_id BETWEEN :3 AND :4         ORDER BY create_ts DESC, trx_id DESC) WHERE rownum <= 20------------------------------------------------------------------------| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)|------------------------------------------------------------------------|   0 | SELECT STATEMENT    |             |     1 |    12 |     2   (0)||*  1 |  COUNT STOPKEY      |             |       |       |            ||   2 |   VIEW              |             |     1 |    12 |     2   (0)||*  3 |    FILTER           |             |       |       |            ||*  4 |     INDEX RANGE SCAN| OUT_SMS_IDX |     1 |    34 |     2   (0)|------------------------------------------------------------------------


I can confirm that the issue in question is still a problem on Oracle 12.1.0.2.0.

And even hardcoded partition elimination bounds are not enough.

Here is the test table in my case:

CREATE TABLE FR_MESSAGE_PART (    ID NUMBER(38) NOT NULL CONSTRAINT PK_FR_MESSAGE_PART PRIMARY KEY USING INDEX LOCAL,    TRX_ID NUMBER(38) NOT NULL, TS TIMESTAMP NOT NULL, TEXT CLOB)    PARTITION BY RANGE (ID) (PARTITION PART_0 VALUES LESS THAN (0));CREATE INDEX IX_FR_MESSAGE_PART_TRX_ID ON FR_MESSAGE_PART(TRX_ID) LOCAL;CREATE INDEX IX_FR_MESSAGE_PART_TS ON FR_MESSAGE_PART(TS) LOCAL;

The table is populated with several millions of records of OLTP production data for several months. Each month belongs to a separate partition.

Primary key values of this table always include time part in higher bits that allows to use ID for range partitioning by calendar periods. All messages inherit higher time bits of TRX_ID. This ensures that all messages belonging to the same business operation do always fall in the same partition.

Let's start with hardcoded query for selecting a page of the most recent messages for a given time period with partition elimination bounds applied:

select * from (select * from FR_MESSAGE_PARTwhere TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 40;

But, having freshly gathered table statistics, Oracle optimizer still falsely estimates that sorting two entire monthly partitions would be faster than a range scan for two days by existing local index:

-----------------------------------------------------------------------------------------------------------------------| Id  | Operation                   | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |-----------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT            |                 |    40 | 26200 |       |   103K  (1)| 00:00:05 |       |       ||*  1 |  COUNT STOPKEY              |                 |       |       |       |            |          |       |       ||   2 |   VIEW                      |                 |   803K|   501M|       |   103K  (1)| 00:00:05 |       |       ||*  3 |    SORT ORDER BY STOPKEY    |                 |   803K|    70M|    92M|   103K  (1)| 00:00:05 |       |       ||   4 |     PARTITION RANGE ITERATOR|                 |   803K|    70M|       | 86382   (1)| 00:00:04 |     2 |     3 ||*  5 |      TABLE ACCESS FULL      | FR_MESSAGE_PART |   803K|    70M|       | 86382   (1)| 00:00:04 |     2 |     3 |-----------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   5 - filter("TS"<TIMESTAMP' 2017-12-01 00:00:00' AND "TS">=TIMESTAMP' 2017-11-29 00:00:00' AND               "ID">=376894993815568384)

Actual execution time appears by an order of magnitude longer than estimated in plan.

So we have to apply a hint to force usage of the index:

select * from (select /*+ FIRST_ROWS(40) INDEX(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 40;

Now the plan uses the index but still envolves slow sorting of two entire partitions:

-----------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                     | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |-----------------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                              |                       |    40 | 26200 |       |   615K  (1)| 00:00:25 |       |       ||*  1 |  COUNT STOPKEY                                |                       |       |       |       |            |          |       |       ||   2 |   VIEW                                        |                       |   803K|   501M|       |   615K  (1)| 00:00:25 |       |       ||*  3 |    SORT ORDER BY STOPKEY                      |                       |   803K|    70M|    92M|   615K  (1)| 00:00:25 |       |       ||   4 |     PARTITION RANGE ITERATOR                  |                       |   803K|    70M|       |   598K  (1)| 00:00:24 |     2 |     3 ||*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       |   803K|    70M|       |   598K  (1)| 00:00:24 |     2 |     3 ||*  6 |       INDEX RANGE SCAN                        | IX_FR_MESSAGE_PART_TS |   576K|       |       |  2269   (1)| 00:00:01 |     2 |     3 |-----------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   5 - filter("ID">=376894993815568384)   6 - access("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')

After some struggling through Oracle hints reference and google it was found that we also have to explicitly specify the descending direction for index range scan with INDEX_DESC or INDEX_RS_DESC hint:

select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 40;

This at last gives fast plan with COUNT STOPKEY per partition which scans partitions in descending order and sorts only at most 40 rows from each partition:

------------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                      | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |------------------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                               |                       |    40 | 26200 |       |   615K  (1)| 00:00:25 |       |       ||*  1 |  COUNT STOPKEY                                 |                       |       |       |       |            |          |       |       ||   2 |   VIEW                                         |                       |   803K|   501M|       |   615K  (1)| 00:00:25 |       |       ||*  3 |    SORT ORDER BY STOPKEY                       |                       |   803K|    70M|    92M|   615K  (1)| 00:00:25 |       |       ||   4 |     PARTITION RANGE ITERATOR                   |                       |   803K|    70M|       |   598K  (1)| 00:00:24 |     3 |     2 ||*  5 |      COUNT STOPKEY                             |                       |       |       |       |            |          |       |       ||*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       |   803K|    70M|       |   598K  (1)| 00:00:24 |     3 |     2 ||*  7 |        INDEX RANGE SCAN DESCENDING             | IX_FR_MESSAGE_PART_TS |   576K|       |       |  2269   (1)| 00:00:01 |     3 |     2 |------------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   5 - filter(ROWNUM<=40)   6 - filter("ID">=376894993815568384)   7 - access("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')       filter("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')

This runs blazing fast but estimated plan cost is still falsely too high.

So far so good. Now let's try to make the query parametrized to be used in our custom ORM framework:

select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS >= :1 and TS < :2  and ID >= :3 and ID < :4order by TS DESC) where ROWNUM <= 40;

But then COUNT STOPKEY per partition disappears from the plan as stated in the question and confirmed in the other answer:

----------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |----------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                               |                       |    40 | 26200 | 82349   (1)| 00:00:04 |       |       ||*  1 |  COUNT STOPKEY                                 |                       |       |       |            |          |       |       ||   2 |   VIEW                                         |                       |   153 |    97K| 82349   (1)| 00:00:04 |       |       ||*  3 |    SORT ORDER BY STOPKEY                       |                       |   153 | 14076 | 82349   (1)| 00:00:04 |       |       ||*  4 |     FILTER                                     |                       |       |       |            |          |       |       ||   5 |      PARTITION RANGE ITERATOR                  |                       |   153 | 14076 | 82348   (1)| 00:00:04 |   KEY |   KEY ||*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       |   153 | 14076 | 82348   (1)| 00:00:04 |   KEY |   KEY ||*  7 |        INDEX RANGE SCAN DESCENDING             | IX_FR_MESSAGE_PART_TS |   110K|       |   450   (1)| 00:00:01 |   KEY |   KEY |----------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   4 - filter(TO_NUMBER(:4)>TO_NUMBER(:3) AND TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1))   6 - filter("ID">=TO_NUMBER(:3) AND "ID"<TO_NUMBER(:4))   7 - access("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))       filter("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))

Then I tried to retreat to hardcoded monthly-aligned partition elimination bounds but still retain parametrized timestamp bounds to minimize plan cache spoiling.

select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS >= :1 and TS < :2  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 40;

But still got slow plan:

------------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                      | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |------------------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                               |                       |    40 | 26200 |       | 83512   (1)| 00:00:04 |       |       ||*  1 |  COUNT STOPKEY                                 |                       |       |       |       |            |          |       |       ||   2 |   VIEW                                         |                       | 61238 |    38M|       | 83512   (1)| 00:00:04 |       |       ||*  3 |    SORT ORDER BY STOPKEY                       |                       | 61238 |  5501K|  7216K| 83512   (1)| 00:00:04 |       |       ||*  4 |     FILTER                                     |                       |       |       |       |            |          |       |       ||   5 |      PARTITION RANGE ITERATOR                  |                       | 61238 |  5501K|       | 82214   (1)| 00:00:04 |     3 |     2 ||*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       | 61238 |  5501K|       | 82214   (1)| 00:00:04 |     3 |     2 ||*  7 |        INDEX RANGE SCAN DESCENDING             | IX_FR_MESSAGE_PART_TS | 79076 |       |       |   316   (1)| 00:00:01 |     3 |     2 |------------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   4 - filter(TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1))   6 - filter("ID">=376894993815568384)   7 - access("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))       filter("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))

@ChrisSaxon in his answer here has mentioned that missing nested STOPKEY COUNT has something to do with filter(TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1)) operation which validates that the upper bound is really bigger than the lower one.

Taking this into account I tried to cheat the oprimizer by transforming TS between :a and :b into equivalent :b between TS and TS + (:b - :a). And this worked!

After some additional investigation of the root cause of this change, I've found that just replacing TS >= :1 and TS < :2 with TS + 0 >= :1 and TS < :2 helps to achieve optimal execution plan.

select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS + 0 >= :1 and TS < :2  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 40;

The plan now has proper COUNT STOPKEY per partition and a notion of INTERNAL_FUNCTION("TS")+0 which prevented the toxic extra bounds checking filter, I guess.

------------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                      | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |------------------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                               |                       |    40 | 26200 |       | 10120   (1)| 00:00:01 |       |       ||*  1 |  COUNT STOPKEY                                 |                       |       |       |       |            |          |       |       ||   2 |   VIEW                                         |                       | 61238 |    38M|       | 10120   (1)| 00:00:01 |       |       ||*  3 |    SORT ORDER BY STOPKEY                       |                       | 61238 |  5501K|  7216K| 10120   (1)| 00:00:01 |       |       ||   4 |     PARTITION RANGE ITERATOR                   |                       | 61238 |  5501K|       |  8822   (1)| 00:00:01 |     3 |     2 ||*  5 |      COUNT STOPKEY                             |                       |       |       |       |            |          |       |       ||*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       | 61238 |  5501K|       |  8822   (1)| 00:00:01 |     3 |     2 ||*  7 |        INDEX RANGE SCAN DESCENDING             | IX_FR_MESSAGE_PART_TS |  7908 |       |       |   631   (1)| 00:00:01 |     3 |     2 |------------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   5 - filter(ROWNUM<=40)   6 - filter("ID">=376894993815568384)   7 - access("TS"<TO_TIMESTAMP(:2))       filter(INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<TO_TIMESTAMP(:2))

We had to implement the mentioned Oracle-specific + 0 workaround and partition elimination bounds hardcoding in our custom ORM framework. It allows to retain the same fast paging performance after switching to partitioned tables with local indices.

But I wish much patience and sanity to those who venture to do the same switch without complete control of sql-building code.

It appears Oracle has too much pitfalls when partitioning and paging are mixed together. For example, we found that Oracle 12's new OFFSET ROWS / FETCH NEXT ROWS ONLY syntax sugar is almost unusable with local indexed partitioned tables as most of analytic windowing functions it's based upon.

The shortest working query to fetch some page behind the first one is

select * from (select * from (    select /*+ FIRST_ROWS(200) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */* from FR_MESSAGE_PARTwhere TS + 0 >= :1 and TS < :2  and ID >= 376894993815568384 and ID < 411234940974268416order by TS DESC) where ROWNUM <= 200) offset 180 rows;

Here is an example of actual execution plan after running such query:

SQL_ID  c67mmq4wg49sx, child number 0-------------------------------------select * from (select * from (select /*+ FIRST_ROWS(200)INDEX_RS_DESC("FR_MESSAGE_PART" ("TS")) GATHER_PLAN_STATISTICS */ "ID","MESSAGE_TYPE_ID", "TS", "REMOTE_ADDRESS", "TRX_ID","PROTOCOL_MESSAGE_ID", "MESSAGE_DATA_ID", "TEXT_OFFSET", "TEXT_SIZE","BODY_OFFSET", "BODY_SIZE", "INCOMING" from "FR_MESSAGE_PART" where"TS" + 0 >= :1 and "TS" < :2 and "ID" >= 376894993815568384 and "ID" <411234940974268416 order by "TS" DESC) where ROWNUM <= 200) offset 180rowsPlan hash value: 2499404919----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                 | Name                  | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                          |                       |      1 |        |       |       |   640K(100)|          |       |       |     20 |00:00:00.01 |     322 |       |       |          ||*  1 |  VIEW                                     |                       |      1 |    200 |   130K|       |   640K  (1)| 00:00:26 |       |       |     20 |00:00:00.01 |     322 |       |       |          ||   2 |   WINDOW NOSORT                           |                       |      1 |    200 |   127K|       |   640K  (1)| 00:00:26 |       |       |    200 |00:00:00.01 |     322 |   142K|   142K|          ||   3 |    VIEW                                   |                       |      1 |    200 |   127K|       |   640K  (1)| 00:00:26 |       |       |    200 |00:00:00.01 |     322 |       |       |          ||*  4 |     COUNT STOPKEY                         |                       |      1 |        |       |       |            |          |       |       |    200 |00:00:00.01 |     322 |       |       |          ||   5 |      VIEW                                 |                       |      1 |    780K|   487M|       |   640K  (1)| 00:00:26 |       |       |    200 |00:00:00.01 |     322 |       |       |          ||*  6 |       SORT ORDER BY STOPKEY               |                       |      1 |    780K|    68M|    89M|   640K  (1)| 00:00:26 |       |       |    200 |00:00:00.01 |     322 | 29696 | 29696 |26624  (0)||   7 |        PARTITION RANGE ITERATOR           |                       |      1 |    780K|    68M|       |   624K  (1)| 00:00:25 |     3 |     2 |    400 |00:00:00.01 |     322 |       |       |          ||*  8 |         COUNT STOPKEY                     |                       |      2 |        |       |       |            |          |       |       |    400 |00:00:00.01 |     322 |       |       |          ||*  9 |          TABLE ACCESS BY LOCAL INDEX ROWID| FR_MESSAGE_PART       |      2 |    780K|    68M|       |   624K  (1)| 00:00:25 |     3 |     2 |    400 |00:00:00.01 |     322 |       |       |          ||* 10 |           INDEX RANGE SCAN DESCENDING     | IX_FR_MESSAGE_PART_TS |      2 |    559K|       |       | 44368   (1)| 00:00:02 |     3 |     2 |    400 |00:00:00.01 |       8 |       |       |          |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Outline Data-------------  /*+      BEGIN_OUTLINE_DATA      IGNORE_OPTIM_EMBEDDED_HINTS      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')      DB_VERSION('12.1.0.2')      OPT_PARAM('optimizer_dynamic_sampling' 0)      OPT_PARAM('_optimizer_dsdir_usage_control' 0)      FIRST_ROWS(200)      OUTLINE_LEAF(@"SEL$3")      OUTLINE_LEAF(@"SEL$2")      OUTLINE_LEAF(@"SEL$1")      OUTLINE_LEAF(@"SEL$4")      NO_ACCESS(@"SEL$4" "from$_subquery$_004"@"SEL$4")      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")      INDEX_RS_DESC(@"SEL$3" "FR_MESSAGE_PART"@"SEL$3" ("FR_MESSAGE_PART"."TS"))      END_OUTLINE_DATA  */Predicate Information (identified by operation id):---------------------------------------------------   1 - filter("from$_subquery$_004"."rowlimit_$$_rownumber">180)   4 - filter(ROWNUM<=200)   6 - filter(ROWNUM<=200)   8 - filter(ROWNUM<=200)   9 - filter("ID">=376894993815568384)  10 - access("TS"<:2)       filter((INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<:2))

Note how much actual fetched rows and time are better than optimizer estimations.


Update

Beware than even this optimal plan could fail down to slow local index full scan in case lower partition elimination bound was guessed too low that the lowest partition doesn't contain enough records to match query filters.

rleishman's Tuning "BETWEEN" Queries states:

The problem is that an index can only scan on one column with a range predicate (<, >, LIKE, BETWEEN). So even if an index contained both the lower_bound and upper_bound columns, the index scan will return all of the rows matching lower_bound <= :b, and then filter the rows that do not match upper_bound >= :b.

In the case where the sought value is somewhere in the middle, the range scan will return half of the rows in the table in order to find a single row. In the worst case where the most commonly sought rows are at the top (highest values), the index scan will process almost every row in the table for every lookup.

It means that, unfortunately, Oracle doesn't take into account the lower bound of a range scan filter until it reaches STOPKEY COUNT condition or scans the whole partition!

So we had to limit lower partition elimination bound heuristics to the same month the lower timestamp period bound falls into.This defends against full index scans at expense of a risk of not showing some delayed transaction messages in the list.But this can be easily resolved by extending the supplied time period if needed.


I've also tried to apply the same + 0 trick to force optimal plan with dynamic partition elimination bounds binding:

select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PARTwhere TS+0 >= :1 and TS < :2  and ID >= :3 and ID+0 < :4order by TS DESC) where ROWNUM <= 40;

The plan then still retains proper STOPKEY COUNT per partition but the partition elimination is lost for upper bound as may be noticed by Pstart column of plan table:

----------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |----------------------------------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                               |                       |    40 | 26200 |  9083   (1)| 00:00:01 |       |       ||*  1 |  COUNT STOPKEY                                 |                       |       |       |            |          |       |       ||   2 |   VIEW                                         |                       |   153 |    97K|  9083   (1)| 00:00:01 |       |       ||*  3 |    SORT ORDER BY STOPKEY                       |                       |   153 | 14076 |  9083   (1)| 00:00:01 |       |       ||   4 |     PARTITION RANGE ITERATOR                   |                       |   153 | 14076 |  9082   (1)| 00:00:01 |    10 |   KEY ||*  5 |      COUNT STOPKEY                             |                       |       |       |            |          |       |       ||*  6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART       |   153 | 14076 |  9082   (1)| 00:00:01 |    10 |   KEY ||*  7 |        INDEX RANGE SCAN DESCENDING             | IX_FR_MESSAGE_PART_TS | 11023 |       |   891   (1)| 00:00:01 |    10 |   KEY |----------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<=40)   3 - filter(ROWNUM<=40)   5 - filter(ROWNUM<=40)   6 - filter("ID">=TO_NUMBER(:3) AND "ID"+0<TO_NUMBER(:4))   7 - access("TS"<TO_TIMESTAMP(:2))       filter(INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<TO_TIMESTAMP(:2))