Optimizing SELECT query performance Optimizing SELECT query performance oracle oracle

Optimizing SELECT query performance


What I believe to be happening is the engine is having to resolve the 100m+ records from view join to 500m records BEFORE it applies limiting criteria (thus it creates a cross join and even if it can use indexes that's a lot of records to generate then parse. So even though you wrote it as an outer join, the engine isn't able to processes it that way (I don't know why)

So at a minimum 100m*500m = 50,000m that's a lot of data to generate and then parse/limit.

By eliminating the view, the engine may be better able to optimize and use the indexes thus eliminating the need for the 50,000m record join.

Areas where I would focus my time in troubleshooting:

  • Eliminate the view just to remove it as a potential overhead issue.
  • Recognize no tie between stg_scd_customers_key and V_CRM_CASE_ID_EXISTS_IN_DWH exists. This means the engine may be doing a cross join BEFORE the results of STG_DIM_CRM_CASES to stg_scd_customers_key have been resolved.

CONSIDER eliminating the view, or using an inline view

Eliminating the view:

SELECT *  FROM STG_DIM_CRM_CASES       ,crm_ps_rc_case t      ,dim_crm_cases x       ,stg_scd_customers_key WHERE t.case_id=x.crm_case_id   AND STG_DIM_CRM_CASES.CRM_CASE_ID = t.CASE_ID(+)   AND STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+)   AND STG_DIM_CRM_CASES.Case_Create_Date        between  stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+)

using an inline view:

SELECT *  FROM STG_DIM_CRM_CASES   (select  t.case_id   from crm_ps_rc_case t, dim_crm_cases x   where t.case_id=x.crm_case_id) V_CRM_CASE_ID_EXISTS_IN_DWH      ,stg_scd_customers_key WHERE STG_DIM_CRM_CASES.CRM_CASE_ID = V_CRM_CASE_ID_EXISTS_IN_DWH.CASE_ID(+)   AND STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+)   AND STG_DIM_CRM_CASES.Case_Create_Date        between  stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+)

As to why: - http://www.dba-oracle.com/art_hints_views.htm

While order of the where clause SHOULDN'T matter consider: On the off chase the engine is executing in the order listed, limiting the 500m down and then adding the supplemental data from the view would logically be faster.

SELECT *  FROM STG_DIM_CRM_CASES,stg_scd_customers_key,V_CRM_CASE_ID_EXISTS_IN_DWH WHERE STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+)   and STG_DIM_CRM_CASES.Case_Create_Date between  stg_scd_customers_key.start_date(+) and  stg_scd_customers_key.end_date(+)   and STG_DIM_CRM_CASES.CRM_CASE_ID = V_CRM_CASE_ID_EXISTS_IN_DWH.CASE_ID(+)


The problem is in scanning all partitions:

18 | PX SEND HASH | :TQ10001 | 515M| 14G| 507K (3)| 01:58:28 | | | | S->P | HASH | 25 | 19 | PARTITION RANGE ALL |
| 515M| 14G| 507K (3)| 01:58:28 | 1 | 2982 | |
| | 26 | 20 | TABLE ACCESS FULL | STG_SCD_CUSTOMERS_KEY | 515M| 14G|

It happens because you are using left join to this table. Can you select 1 partition using bind variable? What is partition key?I don't see hint for parallel but according to you plan it uses parallel. Is there parallel degree on any object level? Can you remove parallel and post explain plan without parallel please?


I think the problem is the view, which I suspect is completely executing and returning all rows before conditions are being applied.

The overall effect of the view is to add the column CASE_ID that is not null if CRM_CASE_ID is found in it, null otherwise. I've replaced the view with two direct joins and a CASE expression. By replacing the convenience of the view with logic, you can join directly to each table in it and so avoid one level of join depth.

Try running this version of the query:

SELECT  a.*, b.*, c.*,  CASE WHEN t.case_id is not null and X.case_id is not null then t.case_id END CASE_IDFROM STG_DIM_CRM_CASES aLEFT JOIN crm_ps_rc_case t  ON t.case_id = a.CRM_CASE_IDLEFT JOIN dim_crm_cases x  ON x.crm_case_id = a.CRM_CASE_IDLEFT JOIN V_CRM_CASE_ID_EXISTS_IN_DWH b  ON a.CRM_CASE_ID = b.CASE_IDLEFT JOIN stg_scd_customers_key c  ON a.account_number = c.account_number and a.Case_Create_Date between c.start_date and  stg_scd_customers_key.end_date

If you replace a.*, b.*, c.* with only the exact columns you actually need, you'll get a speed up because there's simply less data to return. If you also put indexes on looked-up keys plus all the columns you actually select (a covering index), you will speed it up considerably, because index-only access can be used.

You should verify there are indexes in all joined-to columns as a minimum.