Suppose there is sql query you have written which is going to take too much time .You have to optimize that query .
Then how should do that !!!
1)Where query taking more time in searching ?
2)how to find ?
3)Is query correct one ?
Lets explain you by this example
Here I have SQL query which is taking too much (infinite ) time to execute
Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%' )/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0)
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID
Here information regarding number of entries tables have
Table Name Number of entries present in table
account 3000000
Invoice 500000
Transactrion 31995000
By looking only query we can’t determine where it takes too much time ....
On what search operation its taking too much time to execute ?
To get good information regarding what type of operation are performed internally, do the following
1) Fire following query
explain plan for
SQl query for which we want to execute plan
Here in this case its like
SQL> explain plan for
Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%' )/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0)
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID
After this u get message like this
Explained.
2) After this fire following query and see output as shown below
SQL> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 378565521
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16T| 1275T| | 105M(100)|350:24:41 |
|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |
|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |
|* 3 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 4 | TABLE ACCESS FULL| ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
| 5 | TABLE ACCESS FULL | TRANSACTION | 32M| 213M| | 127K (1)| 00:25:29 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ACC"."SPID"="TRAN"."SPID")
2 - access("ACC"."BAN"="INV"."BAN")
3 - filter("INV"."INVOICEDATE">=1244520000000 AND
"INV"."INVOICEDATE"<1244606400000)
20 rows selected.
After getting these information regarding query I had got very good comments and information that really helps to optimize query
As follows
1) first line indicates you
| 0 | SELECT STATEMENT | | 16T| 1275T| | 105M(100)|350:24:41 |
*During select statement we are going to search for 1275 T (tera bytes) of data to execute query .
*1275T Tera bytes means its Huge database which Big companies also don’t have .
*It means it’s clear indication that we put something wrong in where condition which going to create search criteria to go for search 1275 T data.
*350:24:41 its indicating total time required to execute this query .
2)In 2nd and 3rd line
|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |
|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |
*It indicates total time and Database it searching in executing particular join condition .
*In Query we have two joins which is indicated below table as
Predicate Information (identified by operation id):
1 - access("ACC"."SPID"="TRAN"."SPID")
2 - access("ACC"."BAN"="INV"."BAN")
Means to satisfies 1 joint condition it is searching for
|* 1 | HASH JOIN | | 16T| 1275T| 44M| 105M(100)|350:24:41 |
And to satisfies 2nd joint condition its searching for
|* 2 | HASH JOIN | | 509K| 38M| 18M| 27408 (1)| 00:05:29 |
Rest three line as I understand indicates that After satisfying joint condition its taking time and data to search and fetch data.
|* 3 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 4 | TABLE ACCESS FULL| ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
| 5 | TABLE ACCESS FULL | TRANSACTION | 32M| 213M| | 127K (1)| 00:25:29 |
Then from DB Analyzers view he(Ameo) suggested some really good thing
1)Always put table which have small number of entries rather than putting as driving table which have maximum number of entries .
From Account acc ,Invoice inv ,Transaction tran
Here Account Acts as Driving table .
So if we are putting as
From Transaction tran,Invoice inv, Account acc
It’s totally wrong...meansnot wrong by query it takes too more time than that of first one
2)second thing here clearly indicating that while putting where clause we are creating Cartesian Product .
(Query made on Cartesian product is always wrong one .......)
How , will explain
In where clause we are making as
("ACC"."SPID"="TRAN"."SPID")
("ACC"."BAN"="INV"."BAN")
Means there is link entry present between Account Transaction
Account Invoice
But there is no link between Transaction to Invoice .
That creates Cartesian product there and make query to search into huge data for result set .
After making changes suggested by DBA , Query becomes like below .You cant belive its executed in just 5 min !!!!!!!
Select nvl(acc.SPID,0) SPID
,nvl(acc.TYPE,0) TYPE
,nvl(acc.BAN,'-') BAN
,nvl(acc.LASTNAME,'-') LastName
,nvl(acc.FIRSTNAME,'-') FirstName
,nvl(acc.ACCOUNTNAME,'-') AccountName
,nvl(acc.State,0) State
,to_char(to_date('01011970','ddmmyyyy HH24:MI:SS')+(inv.INVOICEDATE + '%TIMEZONEOFFSET%')/86400000,'%RESULTDATEFORMAT%') LastInvoiceDate
,nvl(inv.INVOICEID,0) LastInvoiceID
,nvl(decode(greatest(tran.amount,0),tran.amount,tran.amount,0),0) Total
From Account acc ,Invoice inv ,Transaction tran
Where acc.BAN = inv.BAN and inv.invoicedate>=%STARTDATE% and inv.invoicedate<%ENDDATE%
And acc.SPID = tran.SPID
And tran.BAN = inv.BAN
Here above I add one more joint to avoid Cartesian product .
And tran.BAN = inv.BAN
And after making changes to query and firing same command above we get result as .Now you can analyze it by your owns and see the Difference .(here i have given output for query which have Group by also ....neglect that and see only imp filed as suggested above and marked in colour below)
SQL> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 637312481
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5502K| 493M| | 328K (1)| 01:05:44 |
| 1 | HASH GROUP BY | | 5502K| 493M| 1194M| 328K (1)| 01:05:44 |
|* 2 | HASH JOIN | | 5502K| 493M| 186M| 209K (1)| 00:41:58 |
| 3 | TABLE ACCESS FULL | ACCOUNT | 3001K| 151M| | 14040 (1)| 00:02:49 |
|* 4 | HASH JOIN | | 5439K| 212M| 18M| 172K (1)| 00:34:34 |
|* 5 | TABLE ACCESS FULL| INVOICE | 503K| 12M| | 3192 (2)| 00:00:39 |
| 6 | TABLE ACCESS FULL| TRANSACTION | 32M| 457M| | 127K (1)| 00:25:29 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ACC"."BAN"="INV"."BAN" AND "ACC"."SPID"="TRAN"."SPID")
4 - access("TRAN"."BAN"="INV"."BAN")
5 - filter("INV"."INVOICEDATE">=1244520000000 AND
"INV"."INVOICEDATE"<1244606400000)
21 rows selected.
Hope you feel good after reading this one :)
Showing posts with label Knowledge Sharing. Show all posts
Showing posts with label Knowledge Sharing. Show all posts
Sunday, June 6, 2010
Thursday, June 3, 2010
Performance tuning in Oracle
During my previous company project we faced lots of issues regarding
performance of a sql query and after lots of debugging, goggling we came with
some findings, which we have implemented in existing queries and was able to
improve the performance of those existing non performing sql queries.
So here am
going to share/discuss all those findings we have, which I think every
developer should consider in performance point of view while writing any sql
query
Performance tuning parameters which we are going to discuss are
• Index
• Table Partitioning
• Parallel Processing
• Where clause
• Join conditions
• Function
• Explain Plan used for debugging purpose.
Index
In Oracle what does it mean by index on table?
As a name itself indicates it’s nothing but a
sequence number given to an entity from the collection/bunch of entity to
recognize correct and exact entity by using some logic .In oracle point view
when we create an index on any of the field ,oracle creates/gives a number
entry (may be unique) for that particular record. So when we start searching
that field record from table which is containing large amount of data in it and
if we have an index created on that particular field it becomes very fast to
get that particular record .Oracle itself maintains various algorithms to
search index field record entry from table data.
So whenever performance issue comes in oracle
database, first we have to check whether indexes are created or not on
the table fields which is most commonly going to be used in where clause. If we
take real time example, in book library there are number of books present and
each book has particular code written on it, what that code is? It is the index
created by librarian to search those books faster than normal case. Index
creation format chosen by librarian may be of any type of his convenience and
same applies with search algorithm as well .In Oracles as well it has its own
standards and methods of procedures for index creation and its searching
algorithm .User need not to worry about it .
We can create an index on any of field from database table,
The syntax for creating a index is:
CREATE [UNIQUE] INDEX index_name
ON table_name (column1, column2, . column_n)
[ COMPUTE STATISTICS ];
UNIQUE indicates that the combination of values in the indexed columns
must be unique.
COMPUTE STATISTICS tells Oracle to collect statistics during the
creation of the index. The statistics are then used by the optimizer to choose
a "plan of execution" when SQL statements are executed.
For example:
CREATE INDEX supplier_idx
ON supplier (supplier_name);
In this example, we've created an index on the supplier table called
supplier_idx. It consists of only one field - the supplier_name field.
We can create composite index also, means index created on two different
table fields.
CREATE [UNIQUE] INDEX index_name
ON table_name (function1, function2, . function_n)
[ COMPUTE STATISTICS ];
There is more number of ways to create, rename as well as drop an index.
Now where should we use indexes
and where should not
·
If we are not using
any where clause in the sql query , then there is no need in wasting time on
indexing.
·
When table contains
small amount of the data , then there is no need of using indexing.
·
When table contains
huge amount of the data ,put index on the field which is being used in where
clause.
·
Don’t create
indexes on all the fields present in the table because it’s difficult for
oracle to maintain synchronization between all the index fields present in
table and also it adds the time value difference in scanning ,so in these cases
we loss the basic purpose of indexing.
Table partitioning
What does it mean by table partitioning?
let’s
discuss direct technically, as it’s very easy in understanding the concept
behind it..!!
Consider we have a table named as CallDetail having large amount of the data inside it and it is not
partitioned . Daily millions of the records get added to it .Now consider we
want to generate a report from table data for a specific period of time. let’s
say we have table data for last 3 months and we want to generate a report for a
first month .So if we written an sql query
on the table which is not partitioned ,it will search whole data present
inside the table i.e. 3 months of data, which doesn’t make a good sense . why
should we search last 2 months data ? , if we are explicitly only looking for
first month of data, here we are clearly wasting our time and resources for
looking into unwanted data.
Here partitioning comes into picture .Oracle provides
different types of partitioning features for oracle user .In example If we
partitioned CallDetail table on the table field called “received_date” which is
of data type as Date and we are partitioned table on month basis. In this
situation Oracle internally split table into three parts , each part represent
one month of the table data .Its explicitly an internal operation performed by
database itself and use don’t have to do anything else and externally it looks
like a single table .Now If we run the same sql having where clause on
received_date field , oracle internally went to scan only first table (table
for first month) and didn’t went for rest two months data scanning . In this case we are definitely going to
save time which were initially spend for scanning unwanted two months of table
data.
Different
types of the table partitioning methods provided by Oracle.
• Range Partitioning
• Hash Partitioning
• List Partitioning
• Composite Partitioning
You can get the syntax and examples for the same @ below mentioned
links.
Few point to
be consider while using table partitioning…
• Table partitioning generally created / used on the
table having large amount of table data.
• How take the advantages of the partitioning ,
it’s all depends on end user. If we created partitioning on some of the field ,
but never used that field in where clause , then it doesn’t make any sense
having table partitioned .
• Partitioning generally created on the date field from table.
Parallel Processing
It’s quite simple to discuss parallel processing as everyone is aware of
its dictionary meaning, same applies here as well .In oracle parallel
processing applies on the table level scanning.
For example :
select /*+ PARALLEL(transact, 2) */ gl.debitglcode,
gl.creditglcode, SUM(transact.amount) amount
from
appcrm.transaction transact;
In above example we are assigning two separate threads to scan table
data in parallel way, so it will definitely
save some amount of time than if it was assigned to a single thread.
Few points to be consider while using parallel processing…
• Always put/assign parallel processing on the table
having large amount of data rather than table having small amount of data .
• Don’t assign number of threads blindly as more number of threads more
CPU it needs , which really doesn’t good
for any application…So in short use this feature quite smartly .
• Generally number of threads assign in parallel processing are in even
number .
Where clause and Joins
“Where clause” is the most important part in any kind
of the performance tuning , because generally most of the issues regarding
performance are generated because of the incorrect use of where clause
condition and always ends with modification in the where clause condition. All features
mentioned above provided by oracle to user, but to use it correctly and smartly
all depends on the use of the where clause . So always be cautious while using
it and use it smartly.
Lets discuss
this with an example,
Suppose I
have a sql ,which uses two tables Calldetail (having large amount of data
inside it ) and SubscriptionType (having very few/less amount of data)
Select c.ban ,
c.amount from
calldetail c ,
subscriptionType t
Where receiveddate
> startDate
and receivedDate < endDate
and receivedDate < endDate
and t.id
= 0;
•
If we saw above sql
,here we are using where clause condition on larger table first and smaller
table later , means we are using larger table as Driver table .In this case it
will first search the whole data from Calldetail
table and then go and check condition on smaller table .So rather doing this if
we check condition on smaller table first and then check condition on larger table and because of this approach if
condition checked on smaller table fails query wont execute condition put on
larger table as we used “And” condition here.
•
Similarly we are
not putting alias name before mentioning received date so oracle will check
from both the tables where that field present so if we explicitly mentions the
table/alias name before mentioning any filed , that will also help to reduce
the execution time.
•
Similarly we haven’t
put alias name before mentioning receiveDate so oracle will internally check for
both the tables for the presence of receiveDate field. So In this case if we
explicitly mention table/alias names before mentioning any table field, that
will help to reduce execution time.
•
If partitioning
present on the tables always try to use partitioned field in where clause .it
will definitely gives result set in faster manner .
So to get the result set in faster way, ideally query
should be written as below
Select c.ban , c.amount
from calldetail c ,
subscriptionType t
and t.id = 0
and c.receiveddate
> t.startDate
and c.receivedDate < t.endDate
and c.receivedDate < t.endDate
Functions and few more things
• When we have to
perform some repetitive operations in sql query or in cursor , always try to use
oracle/Database provided function. Oracle have mechanism to maintain a cache of
the operation we performed so if we use the same value to perform same
operation and if we use function ,we can take the benefits of the cache memory
(I didn’t go deep regarding this feature so not confident about it , but
functions always make our code more readable as well).
• If we are using some calculated value again and
again , try to write stored procedure and use variable inside it to store
calculated value in variable and use it rather than calculating again and again
.
• In some "for looping " cases use
"where clause" rather than using “For Looping”.
Explain plan for Debugging purpose
Whenever you face any issue regarding the performance of the sql always debug
using “Explain plan for query” .It will provide you all detail information
regarding the execution of sql like ,
• How many joins formed inside the sql.
• Which types of joins are formed in query result set.
• Is there any Cartesian product happening in query (if its , then it’s clear indication that you
wrote an incorrect sql )
• How much total time it will going to take to execute
whole sql ,Also it provide all details regarding time it has taken to execute
every single join.
• Is sql going for whole table scan or partition scan
?
You will get
more information about what is “Explain plan for” and how to use it in my next
blog J
Subscribe to:
Posts (Atom)