KnowledgeShareing: Knowledge Sharing

Suppose there is sql query you have written which is going to take too much time .You have to optimize that query .
Then how should do that !!!
1)Where query taking more time in searching ?
2)how to find ?
3)Is query correct one ?

Lets explain you by this example

Here I have SQL query which is taking too much (infinite ) time to execute

Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%' )/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0)
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID

Here information regarding number of entries tables have

Table Name Number of entries present in table
account 3000000
Invoice 500000
Transactrion 31995000

By looking only query we can’t determine where it takes too much time ....
On what search operation its taking too much time to execute ?

To get good information regarding what type of operation are performed internally, do the following

1) Fire following query

explain plan for
SQl query for which we want to execute plan

Here in this case its like

SQL> explain plan for
Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%' )/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0)
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID

After this u get message like this

Explained.

2) After this fire following query and see output as shown below

SQL> select * from table (dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 378565521

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16T| 1275T| | 105M(100)|350:24:41 |
|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |
|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |
|* 3 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 4 | TABLE ACCESS FULL| ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
| 5 | TABLE ACCESS FULL | TRANSACTION | 32M| 213M| | 127K (1)| 00:25:29 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("ACC"."SPID"="TRAN"."SPID")
2 - access("ACC"."BAN"="INV"."BAN")
3 - filter("INV"."INVOICEDATE">=1244520000000 AND
"INV"."INVOICEDATE"<1244606400000)

20 rows selected.

After getting these information regarding query I had got very good comments and information that really helps to optimize query

As follows
1) first line indicates you

| 0 | SELECT STATEMENT | | 16T| 1275T| | 105M(100)|350:24:41 |

*During select statement we are going to search for 1275 T (tera bytes) of data to execute query .
*1275T Tera bytes means its Huge database which Big companies also don’t have .
*It means it’s clear indication that we put something wrong in where condition which going to create search criteria to go for search 1275 T data.
*350:24:41 its indicating total time required to execute this query .

2)In 2nd and 3rd line

|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |
|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |

*It indicates total time and Database it searching in executing particular join condition .
*In Query we have two joins which is indicated below table as

Predicate Information (identified by operation id):

1 - access("ACC"."SPID"="TRAN"."SPID")
2 - access("ACC"."BAN"="INV"."BAN")

Means to satisfies 1 joint condition it is searching for

|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |

And to satisfies 2nd joint condition its searching for

|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |

Rest three line as I understand indicates that After satisfying joint condition its taking time and data to search and fetch data.

|* 3 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 4 | TABLE ACCESS FULL| ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
| 5 | TABLE ACCESS FULL | TRANSACTION | 32M| 213M| | 127K (1)| 00:25:29 |

Then from DB Analyzers view he(Ameo) suggested some really good thing

1)Always put table which have small number of entries rather than putting as driving table which have maximum number of entries .

From Account acc ,Invoice inv ,Transaction tran

Here Account  Acts as Driving table .

So if we are putting as

From Transaction tran,Invoice inv, Account acc

It’s totally wrong...meansnot wrong by query it takes too more time than that of first one

2)second thing here clearly indicating that while putting where clause we are creating Cartesian Product .
(Query made on Cartesian product is always wrong one .......)

How , will explain
In where clause we are making as

("ACC"."SPID"="TRAN"."SPID")

("ACC"."BAN"="INV"."BAN")

Means there is link entry present between Account Transaction
Account Invoice
But there is no link between Transaction to Invoice .

That creates Cartesian product there and make query to search into huge data for result set .

After making changes suggested by DBA , Query becomes like below .You cant belive its executed in just 5 min !!!!!!!

Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%')/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0) Total
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID
And tran.BAN = inv.BAN

Here above I add one more joint to avoid Cartesian product .
And tran.BAN = inv.BAN

And after making changes to query and firing same command above we get result as .Now you can analyze it by your owns and see the Difference .(here i have given output for query which have Group by also ....neglect that and see only imp filed as suggested above and marked in colour below)

SQL> select * from table (dbms_xplan.display);

PLAN_TABLE_OUTPUT
Plan hash value: 637312481

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5502K| 493M| | 328K (1)| 01:05:44 |
| 1 | HASH GROUP BY | | 5502K| 493M| 1194M| 328K (1)| 01:05:44 |
|* 2 | HASH JOIN | | 5502K| 493M| 186M| 209K (1)| 00:41:58 |
| 3 | TABLE ACCESS FULL | ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
|* 4 | HASH JOIN | | 5439K| 212M| 18M| 172K (1)| 00:34:34 |
|* 5 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 6 | TABLE ACCESS FULL| TRANSACTION | 32M| 457M| | 127K (1)| 00:25:29 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ACC"."BAN"="INV"."BAN" AND "ACC"."SPID"="TRAN"."SPID")
4 - access("TRAN"."BAN"="INV"."BAN")
5 - filter("INV"."INVOICEDATE">=1244520000000 AND
"INV"."INVOICEDATE"<1244606400000)

21 rows selected.

Hope you feel good after reading this one :)

During my previous company project we faced lots of issues regarding performance of a sql query and after lots of debugging, goggling we came with some findings, which we have implemented in existing queries and was able to improve the performance of those existing non performing sql queries.

So here am going to share/discuss all those findings we have, which I think every developer should consider in performance point of view while writing any sql query

Performance tuning parameters which we are going to discuss are

• Index

• Table Partitioning

• Parallel Processing

• Where clause

• Join conditions

• Function

• Explain Plan used for debugging purpose.

Index

In Oracle what does it mean by index on table?

As a name itself indicates it’s nothing but a sequence number given to an entity from the collection/bunch of entity to recognize correct and exact entity by using some logic .In oracle point view when we create an index on any of the field ,oracle creates/gives a number entry (may be unique) for that particular record. So when we start searching that field record from table which is containing large amount of data in it and if we have an index created on that particular field it becomes very fast to get that particular record .Oracle itself maintains various algorithms to search index field record entry from table data.

So whenever performance issue comes in oracle database, first we have to check whether indexes are created or not on the table fields which is most commonly going to be used in where clause. If we take real time example, in book library there are number of books present and each book has particular code written on it, what that code is? It is the index created by librarian to search those books faster than normal case. Index creation format chosen by librarian may be of any type of his convenience and same applies with search algorithm as well .In Oracles as well it has its own standards and methods of procedures for index creation and its searching algorithm .User need not to worry about it .

We can create an index on any of field from database table,

The syntax for creating a index is:

CREATE [UNIQUE] INDEX index_name

ON table_name (column1, column2, . column_n)

[ COMPUTE STATISTICS ];

UNIQUE indicates that the combination of values in the indexed columns must be unique.

COMPUTE STATISTICS tells Oracle to collect statistics during the creation of the index. The statistics are then used by the optimizer to choose a "plan of execution" when SQL statements are executed.

For example:

CREATE INDEX supplier_idx

ON supplier (supplier_name);

In this example, we've created an index on the supplier table called supplier_idx. It consists of only one field - the supplier_name field.

We can create composite index also, means index created on two different table fields.

CREATE [UNIQUE] INDEX index_name

ON table_name (function1, function2, . function_n)

[ COMPUTE STATISTICS ];

There is more number of ways to create, rename as well as drop an index.

Now where should we use indexes and where should not

· If we are not using any where clause in the sql query , then there is no need in wasting time on indexing.

· When table contains small amount of the data , then there is no need of using indexing.

· When table contains huge amount of the data ,put index on the field which is being used in where clause.

· Don’t create indexes on all the fields present in the table because it’s difficult for oracle to maintain synchronization between all the index fields present in table and also it adds the time value difference in scanning ,so in these cases we loss the basic purpose of indexing.

Table partitioning

What does it mean by table partitioning?

let’s discuss direct technically, as it’s very easy in understanding the concept behind it..!!

Consider we have a table named as CallDetail having large amount of the data inside it and it is not partitioned . Daily millions of the records get added to it .Now consider we want to generate a report from table data for a specific period of time. let’s say we have table data for last 3 months and we want to generate a report for a first month .So if we written an sql query on the table which is not partitioned ,it will search whole data present inside the table i.e. 3 months of data, which doesn’t make a good sense . why should we search last 2 months data ? , if we are explicitly only looking for first month of data, here we are clearly wasting our time and resources for looking into unwanted data.

Here partitioning comes into picture .Oracle provides different types of partitioning features for oracle user .In example If we partitioned CallDetail table on the table field called “received_date” which is of data type as Date and we are partitioned table on month basis. In this situation Oracle internally split table into three parts , each part represent one month of the table data .Its explicitly an internal operation performed by database itself and use don’t have to do anything else and externally it looks like a single table .Now If we run the same sql having where clause on received_date field , oracle internally went to scan only first table (table for first month) and didn’t went for rest two months data scanning . In this case we are definitely going to save time which were initially spend for scanning unwanted two months of table data.

Different types of the table partitioning methods provided by Oracle.

• Range Partitioning

• Hash Partitioning

• List Partitioning

• Composite Partitioning

You can get the syntax and examples for the same @ below mentioned links.

http://www.oracle-dba-online.com/sql/oracle_table_partition.htm

http://www.devarticles.com/c/a/Oracle/Partitioning-in-Oracle/1/

Few point to be consider while using table partitioning…

• Table partitioning generally created / used on the table having large amount of table data.

• How take the advantages of the partitioning , it’s all depends on end user. If we created partitioning on some of the field , but never used that field in where clause , then it doesn’t make any sense

having table partitioned .

• Partitioning generally created on the date field from table.

Parallel Processing

It’s quite simple to discuss parallel processing as everyone is aware of its dictionary meaning, same applies here as well .In oracle parallel processing applies on the table level scanning.

For example :

select /*+ PARALLEL(transact, 2) */ gl.debitglcode, gl.creditglcode, SUM(transact.amount) amount

from appcrm.transaction transact;

In above example we are assigning two separate threads to scan table data in parallel way, so it will definitely save some amount of time than if it was assigned to a single thread.

Few points to be consider while using parallel processing…

• Always put/assign parallel processing on the table having large amount of data rather than table having small amount of data .

• Don’t assign number of threads blindly as more number of threads more CPU it needs , which really doesn’t good for any application…So in short use this feature quite smartly .

• Generally number of threads assign in parallel processing are in even number .

Where clause and Joins

“Where clause” is the most important part in any kind of the performance tuning , because generally most of the issues regarding performance are generated because of the incorrect use of where clause condition and always ends with modification in the where clause condition. All features mentioned above provided by oracle to user, but to use it correctly and smartly all depends on the use of the where clause . So always be cautious while using it and use it smartly.

Lets discuss this with an example,

Suppose I have a sql ,which uses two tables Calldetail (having large amount of data inside it ) and SubscriptionType (having very few/less amount of data)

Select c.ban , c.amount from

calldetail c ,

subscriptionType t

Where receiveddate > startDate
and receivedDate < endDate

and t.id = c.id

and t.id = 0;

• If we saw above sql ,here we are using where clause condition on larger table first and smaller table later , means we are using larger table as Driver table .In this case it will first search the whole data from Calldetail table and then go and check condition on smaller table .So rather doing this if we check condition on smaller table first and then check condition on larger table and because of this approach if condition checked on smaller table fails query wont execute condition put on larger table as we used “And” condition here.

• Similarly we are not putting alias name before mentioning received date so oracle will check from both the tables where that field present so if we explicitly mentions the table/alias name before mentioning any filed , that will also help to reduce the execution time.

• Similarly we haven’t put alias name before mentioning receiveDate so oracle will internally check for both the tables for the presence of receiveDate field. So In this case if we explicitly mention table/alias names before mentioning any table field, that will help to reduce execution time.

• If partitioning present on the tables always try to use partitioned field in where clause .it will definitely gives result set in faster manner .

So to get the result set in faster way, ideally query should be written as below

Select c.ban , c.amount

from calldetail c ,

subscriptionType t

Where t.id = c.id

and t.id = 0

and c.receiveddate > t.startDate
and c.receivedDate < t.endDate

Functions and few more things

• When we have to perform some repetitive operations in sql query or in cursor , always try to use oracle/Database provided function. Oracle have mechanism to maintain a cache of the operation we performed so if we use the same value to perform same operation and if we use function ,we can take the benefits of the cache memory (I didn’t go deep regarding this feature so not confident about it , but functions always make our code more readable as well).

• If we are using some calculated value again and again , try to write stored procedure and use variable inside it to store calculated value in variable and use it rather than calculating again and again .

• In some "for looping " cases use "where clause" rather than using “For Looping”.

Explain plan for Debugging purpose

Whenever you face any issue regarding the performance of the sql always debug using “Explain plan for query” .It will provide you all detail information regarding the execution of sql like ,

• How many joins formed inside the sql.

• Which types of joins are formed in query result set.

• Is there any Cartesian product happening in query (if its , then it’s clear indication that you wrote an incorrect sql )

• How much total time it will going to take to execute whole sql ,Also it provide all details regarding time it has taken to execute every single join.

• Is sql going for whole table scan or partition scan ?

You will get more information about what is “Explain plan for” and how to use it in my next blog J

KnowledgeShareing

Sunday, June 6, 2010

Explain Plan for in Oracle

Thursday, June 3, 2010

Performance tuning in Oracle

Followers

Blog Archive

About Me