Monday, February 9, 2009

Index range Scan Vs Nested loop in IN-LIST ?

Which one is better over the above? This question has been the one of the latest topic in our performance team when using a viable solution for varying In-List management.


Background:

We all know Oracle's shared pool is sensitive to the way sql is written - whitespace, uppercase/lowercase difference, number of bind variables in IN-Clause.
To avoid shared pool being bombarded with thousands of sqls of same type, we currently are using buckets for IN-clause.

Ex:

select * from emp where empno in (:B1,:B2,:B3,......,:B16) -- this is bucket size 16.

So, if user passes only one bind value, the rest of bind values will go as null. 
And if user passes two bind values, the rest of bind values will go as null. 

This way we control, the number of sqls which needs to be maintained in shared pool.

We implement different size of buckets: 16, 32, 64, 128, 256, 512, 768, 1000

Upon searching i came across Tomy kyte's solution mentioned here (http://tkyte.blogspot.com/2006/06/varying-in-lists.html).
Initially i used his 8i solution and results were somehow unexpected:
Here is the testcase:

SQL> create or replace type stringTableType as table of varchar2(4000);
  2  /

Type created.


SQL> SQL>   2    3    4
  5      v_str   long default p_str || ',';
  6
  7      v_n        number;
  8
  9      v_data    stringTableType := stringTableType();
 10
 11  begin
 12
 13      loop
 14
 15          v_n := instr( v_str, ',' );
 16
 17          exit when (nvl(v_n,0) = 0);
 18
 19          v_data.extend;
 20
 21          v_data( v_data.count ) := ltrim(rtrim(substr(v_str,1,v_n-1)));
 22
 23          v_str := substr( v_str, v_n+1 );
 24
 25      end loop;
 26
 27      return v_data;
 28
 29  end;
 30
 31  /

Function created.

SQL> create table test as select rownum r from dual connect by level<=100000;

Table created.

SQL> create index test_ux on test(r);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'TEST',cascade=>true)

PL/SQL procedure successfully completed.


Now running a NORMAL in-list with 10 inlist values:

SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test where r in (1,2,3,4,5,6,7,8,9,10);
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.23

Note the "Elapsed" time. It is 1.5 secs.

Now using the Collection method:

SQL> SQL> declare
a number;
begin
for i in 1..10000 loop
select count(*) into a from test
where r in ( select /*+ cardinality(10) */ * from THE ( select cast( convert2Table( '1,2,3,4,5,6,7,8,9,10' ) as stringTableType ) from dual ) );
end loop;
end;
/
 

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.41

Note the "Elapsed" time is almost 15x more than normal inlist.

This was a bit surprising.

I did the same test under 10046 + tkprof eyes.

SELECT COUNT(*)
FROM
 TEST WHERE R IN (1,2,3,4,5,6,7,8,9,10)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  10000      0.46       0.42          0          0          0           0
Fetch    10000      1.99       2.00          0     200000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001      2.46       2.43          0     200000          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=200000 pr=0 pw=0 time=2117708 us)
 100000   INLIST ITERATOR  (cr=200000 pr=0 pw=0 time=2271474 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=200000 pr=0 pw=0 time=1354629 us)(object id 60859168)

 

SELECT COUNT(*)
FROM
 TEST WHERE R IN ( SELECT /*+ cardinality(10) */ * FROM THE ( SELECT CAST(
  CONVERT2TABLE( '1,2,3,4,5,6,7,8,9,10' ) AS STRINGTABLETYPE ) FROM DUAL ) )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0         48          0           0
Execute  10000      5.40       5.42          0          0          0           0
Fetch    10000      9.57       9.65          0     120000          0       10000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    20001     14.99      15.09          0     120048          0       10000

Rows     Row Source Operation
-------  ---------------------------------------------------
  10000  SORT AGGREGATE (cr=120000 pr=0 pw=0 time=13856299 us)
 100000   NESTED LOOPS  (cr=120000 pr=0 pw=0 time=16447839 us)
 100000    VIEW  VW_NSO_1 (cr=0 pr=0 pw=0 time=11596021 us)
 100000     HASH UNIQUE (cr=0 pr=0 pw=0 time=11085381 us)
 100000      COLLECTION ITERATOR PICKLER FETCH CONVERT2TABLE (cr=0 pr=0 pw=0 time=3527457 us)
  20000       FAST DUAL  (cr=0 pr=0 pw=0 time=151992 us)
 100000    INDEX RANGE SCAN TEST_UX (cr=120000 pr=0 pw=0 time=2182846 us)(object id 60859168)


The difference between 2 is sea. Although the later did almost 1/2 LIO yet the total time spent on CPU was huge.

It seems to me that NL over INLIST ITERATOR is not a good choice ATLEAST for 10 values.

Now, I tried with 1000 values -- i didnt capture the screen shots. But here is what i saw.

-  for loop that runs 1000 times -- the "R in (1,2,3...1000)" took 54 secs. And using the Tom Kyte's suggested methood - it took 16 minutes.

Am i missing something?