INNER JOIN performance with '<' or '>' condition

sql oracle indexing inner-join

Please check first if you realy use Oracle database. The syntax of your SQL suggest either other RDBMS or some prepocessor.

To get an impression what you can expect from such kind of queries, you may use a dummy example as follows.

Generate Sample Data

create table myTab as with mySeq as (select rownum SessionOrder from dual connect by level <= 10000)select 12 something, SessionOrder from mySeq union allselect 14  something, SessionOrder from mySeq;

This produces both subsources each with 10.000 sequences starting from 1 to 10.000.

Test Query

create table myRes asselect a.SessionOrder rn0, b.SessionOrder rn1from myTab a join myTab b on a.SessionOrder > b.SessionOrder anda.something = 12 and b.something = 14;

Produces 49.995.000 rows in less that 30 seconds.

If you expect to get such large result in much less time, you'll need an advanced optimization. Without knowing your data and requirement no generll advice is possible.

sql oracle indexing inner-join

As recommended I tried to solve the problem with other strategy that got greater performance.

Despite this simple solution, I do not understand why the original query got too slow. I think that the Oracle engine is not using indexes.

SELECT i_0."SessionOrder",  i_1."SessionOrder"FROM "RawEvent" i_0INNER JOIN "RawEvent" i_1 ON  i_0."SessionOrder" < i_1."SessionOrder" WHERE i_0."something" = 12 AND i_1."something" = 14

sql oracle indexing inner-join

Your query performs three tasks:

1) fetch the data for both subsets (12 and 14)

2) join the data and

3) pass the result to the client

Note that the index access (that you suspects to cause problems) is relevant only for the step 1.So to get a better impression the first thing is to realize the distribution of the time elapsed between the three steps.This can be done using SQL*Plus (I'm using the same generated data as in my previous answer)

Data Access

As my table has no index, performing a count(*) performs a FULL TABLE SCAN. So in a worst case a twice of the time is used to get the data.

SQL> set timi onSQL> set autotrace onSQL> select count(*) from mytab;  COUNT(*)----------     20000Elapsed: 00:00:01.13Execution Plan----------------------------------------------------------Plan hash value: 3284627250--------------------------------------------------------------------| Id  | Operation          | Name  | Rows  | Cost (%CPU)| Time     |--------------------------------------------------------------------|   0 | SELECT STATEMENT   |       |     1 |  5472   (1)| 00:00:01 ||   1 |  SORT AGGREGATE    |       |     1 |            |          ||   2 |   TABLE ACCESS FULL| MYTAB | 20000 |  5472   (1)| 00:00:01 |--------------------------------------------------------------------

The FTS is ready in about a one second, so to get both groups aprox. two seconds are elapsed.

JOIN

The elapsed time for join can by simulated with a CTAS of the join query.

SQL> create table myRes as  2  select a.SessionOrder rn0, b.SessionOrder rn1  3  from myTab a join myTab b on a.SessionOrder > b.SessionOrder and  4  a.something = 12 and b.something = 14;Table created.Elapsed: 00:00:23.65

Join returns nearly 50M rows (due to greater than condition) and take about 21 seconds (I subtract 2 seconds for the data access).

PASS the Data to Client

We use the option set autotrace traceonly to suppress the output of the query on the client screen, but the data is transferred, so we can measure the time. (If you render the result on the screen, the time will be even much higher)

SQL> SET ARRAYSIZE 5000SQL> set autotrace traceonlySQL> select a.SessionOrder rn0, b.SessionOrder rn1  2  from myTab a join myTab b on a.SessionOrder > b.SessionOrder and  3  a.something = 12 and b.something = 14;49995000 rows selected.Elapsed: 00:03:03.89Execution Plan----------------------------------------------------------Plan hash value: 2857240533-----------------------------------------------------------------------------| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |-----------------------------------------------------------------------------|   0 | SELECT STATEMENT    |       |    49M|   667M| 11077   (2)| 00:00:01 ||   1 |  MERGE JOIN         |       |    49M|   667M| 11077   (2)| 00:00:01 ||   2 |   SORT JOIN         |       | 10000 | 70000 |  5473   (1)| 00:00:01 ||*  3 |    TABLE ACCESS FULL| MYTAB | 10000 | 70000 |  5472   (1)| 00:00:01 ||*  4 |   SORT JOIN         |       | 10000 | 70000 |  5473   (1)| 00:00:01 ||*  5 |    TABLE ACCESS FULL| MYTAB | 10000 | 70000 |  5472   (1)| 00:00:01 |-----------------------------------------------------------------------------

Here is the most time spend about 2:40 minutes

Summary

So in the scenario from a total 3 minutes+ only about 2 seconds are spend on data access (or about 1%).Even if you cut the data access to a tenth - you'll see virtually no difference.The problem lies in the join and even more in the transfer of the data to the client.

When Index Can Help

And of course it depends...

In a very special case, where you have a very large table with very little data with something in (12,14)you may profit from index defined on something AND SessionOrder. This allows to use index only access to data bypassing table access at all.

CodeHunter

INNER JOIN performance with '<' or '>' condition

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last