Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

PART 4. Optimization

CHAPTER 15. Physical Database Design

For the runtime engine to execute a SQL statement with adequate performance, you must provide it with a properly sized machine that has an adequate SGA and an adequate PGA. However, physical database design is also crucial: even with a reasonably sized machine, a well-written SQL statement, and a CBO that makes all its cardinality estimates perfectly, you are unlikely to get decent performance unless you pay attention to the physical database design.


There are those who suggest that one of the benefits of Exadata is that you don’t have to worry about physical database design: full table scans are so cheap and hardware is so plentiful, so who cares?

I feel that this is a bit of an exaggeration. Yes, it might not be cost-effective to spend months designing your physical model, but there will always be a desire to use hardware effectively. Consideration of physical database design is almost always a good idea.

This chapter reviews the more critical aspects of physical database design, one of the few aspects of SQL performance that we can’t blame on the CBO.

All too often people equate “physical database design improvement” with “adding indexes.” This is very far from the truth, and we will begin this chapter by comparing both the positive and negative aspects of indexes. Once you have come to the conclusion that one or more indexes are necessary, you still have a lot of decisions ahead of you: what columns do you want in the index, whether to use a bitmap index, and so on. We will go through some of these questions.

The partitioning option is a popular and effective tool to improve performance, and I will cover what I believe to be the three most important performance benefits of partitioning: full table scans on portions of a table, the ability to use partition-wise joins, and the additional possibilities for parallelization.

In some circles the word “denormalization” is considered evil, just like “hinting.” And just like hinting, denormalization has a legitimate place in SQL performance tuning. I will cover materialized views and other forms of denormalization later in the chapter. I will round out the chapter with a few comments about compression and LOBs.

Before we start, however, I want to reemphasize that physical database design is almost never about tuning individual SQL statements. You are generally looking to find a way to optimize the entire application. This thought should be at the forefront of your mind as we discuss indexes.

Adding and Removing Indexes

Imagine yourself looking at the execution plan of a query. You see that an index is being used to access a table, but that index is not particularly selective: too often a table block is accessed and then the row specified in the index is thrown away because of a predicate on an un-indexed column. “Aha,” you say. “Let me create a new index that includes all the columns referenced by predicates in my query and all will be well.” You create your new index (or add columns to an existing index) and . . . voilà! Your query runs fast and you pat yourself on the back. This type of behavior is far too common and reflects an inappropriate way to think about indexing and physical database design in general—you shouldn’t add indexes in a piecemeal way to optimize individual queries. You need to think about the optimal set of indexes needed for a particular table for supporting all your queries without overburdening DML. Let us first look at indexing strategy from the other side of the table: optimizing DML.

Removing Indexes

One of the most overlooked issues in database performance is the impact of having too many indexes on a table. Let us consider this overhead in some detail before discussing how best to reduce it.

Understanding the Overhead of Indexes on DML

If you have a dozen indexes on a table, every time you add a row to that table you need to add a dozen index entries. Together with the change to the table block itself, that makes 13 block changes. All 13 block changes are guaranteed to be made to different blocks because the changes are being made to different database objects. Don’t underestimate this overhead. Indexes take up space in the SGA, increase the amount of redo needed, and often generate lots of I/O. Have you ever seen an INSERT statement constantly waiting on “db file sequential read” wait events? You may have assumed that these wait events were associated with the query portion of the INSERT statement. The easiest way to check this is to look at the CURRENT_OBJ# column in V$ACTIVE_SESSION_HISTORY or DBA_HIST_ACTIVE_SESS_HISTORY. This number represents the OBJECT_ID from DBA_OBJECTS associated with the object being accessed. You may well find that most or all of these wait events are associated with index blocks from the table into which you are inserting rows; you need to read the index block into the SGA prior to adding the index entry. If you can reduce the number of indexes on a table you can substantially improve the performance of DML. The following is a quote from the Performance Tuning Guide:

Use this simple estimation guide for the cost of index maintenance: each index maintained by an INSERT, DELETE, or UPDATE of the indexed keys requires about three times as much resource as the actual DML operation on the table. Thus, if you INSERT into a table with three indexes, then the insertion is approximately 10 times slower than an INSERT into a table with no indexes. For DML, and particularly for INSERT-heavy applications, the index design should be seriously reviewed, which might require a compromise between the query and INSERT performance.

Based on the number of tables I have seen with dozens of indexes, many developers are unaware of the above issue. Either that or the information is forgotten in the heat of the moment when solving query performance issues.

Using V$OBJECT_USAGE to Identify Unused Indexes

As a short-term fix to expensive DML you can look for indexes that aren’t being used. This can be done by enabling index monitoring on all the indexes in a table or a schema with the ALTER INDEX ....MONITORING USAGE command. This command creates a row in the viewV$OBJECT_USAGE. The V$OBJECT_USAGE view contains a column USED that is initialized to the value NO by the ALTER INDEX command and gets updated to YES when the index is used in an execution plan. The idea is that if the value of the column USED continues to have a value of NO forever than the index isn’t used and thus can safely be dropped. Of course, forever is a long time, but the longer your system runs without the index being used the higher the confidence level you have that the index will never be used by the current version of your application.

You should bear in mind the following points about V$OBJECT_USAGE:

·     You can only view indexes owned by yourself in V$OBJECT_USAGE. Put another way, you have to log in as the schema owner to use the V$OBJECT_USAGE view.

·     The gathering of statistics on the monitored index does not cause the value of the USED column in V$OBJECT_USAGE to change.

·     Unusually for a dynamic performance view, restarting the database does not cause the value of the USED column in V$OBJECT_USAGE to change for any indexes.

·     Adding or removing index entries by DML does not cause the value of the USED column in V$OBJECT_USAGE to change.

·     Rebuilding or coalescing an index does not cause the value of the USED column in V$OBJECT_USAGE to change.

·     When an index is used to enforce a primary key, unique key, or foreign key constraint, the value of the USED column in V$OBJECT_USAGE does not change.

This last point is worth emphasizing: just because an index isn’t used in an execution plan does not mean it is not fulfilling a purpose, so take care when reviewing the indexes you consider dropping.

Removing Indexes on Columns Used in Foreign Key Constraints

During the 1990s a myth regarding a so-called best practice got completely out of hand. That myth was that it was always necessary to index the referencing columns in a foreign key constraint. The myth took such a hold that many database design tools automatically generated DDL to create indexes on such columns.

Take the ORDER_ID column in the ORDER_ITEMS table in the OE example schema—this column is used by the ORDER_ITEMS_ORDER_ID_FK foreign key constraint that references the ORDER_ID column in the ORDERS table. Now it is necessary to index the ORDER_ID column in the ORDERS table because ORDER_ID is the primary key of the ORDERS table, but it is not necessary to index the ORDER_ID column in the ORDER_ITEMS table.

How did the myth that you should always index columns like ORDER_ITEMS.ORDER_ID take hold? Well, if you DELETE rows from the ORDERS table (or you change the primary key with an UPDATE or MERGE statement) then a table lock will be required on ORDER_ITEMS unless the ORDER_ITEMS.ORDER_ID column is indexed. But if you don’t issue such DELETE statements or the performance of the DELETE statements isn’t critical then you don’t need the index.

Don’t get me wrong—regardless of the fact that the ORDER_ID column in ORDER_ITEMS is used in a foreign key constraint, it may be a good idea to index the column anyway. If you join ORDERS with ORDER_ITEMS using the ORDER_ID columns on both tables, and you wish to drive a nested loop from the ORDERS table into ORDER_ITEMS, then an index on ORDER_ITEMS.ORDER_ID may be a good idea. In fact, the OE.ORDER_ITEMS table does have an index named ITEM_ORDER_IX on ORDER_ID.

But all this business of dropping unused indexes is just a quick fix. The best thing to do is to do a thorough job of understanding your indexing requirements and keep those, and only those, indexes.

Identifying Required Indexes

The Performance Tuning Guide states:

Index design is ... a largely iterative process, based on the SQL generated by application designers. However, it is possible to make a sensible start by building indexes that enforce primary key constraints and indexes on known access patterns, such as a person’s name.

It is quite true that index design is an iterative process, but I would caution you to be very conservative in your initial selection of indexes. It is easy to assume that an index is required when in fact it isn’t. How can you tell what a known access pattern is before you have started testing? If you have the time then I would suggest creating indexes to support primary and unique constraints only and then run some tests to see how you get on.

Single-column and Multi-column Indexes

Most non-unique B-tree indexes should be multi-column. Let me first explain why this is and then I will go on to discuss column ordering.

Misuse of Single-column Non-unique B-tree Indexes

Listing 15-1 takes another look at the indexes in the ORDER_ITEMS table in the OE example schema, and at ITEM_ORDER_IX in particular.

Listing 15-1. Misuse of a single-column non-unique index

  FROM oe.orders JOIN oe.order_items i USING (order_id)
 WHERE order_id = 2400;
| Id  | Operation                            | Name          | Cost (%CPU)|
|   0 | SELECT STATEMENT                     |               |     4   (0)|
|   1 |  NESTED LOOPS                        |               |     4   (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID        | ORDERS        |     1   (0)|
|*  3 |    INDEX UNIQUE SCAN                 | ORDER_PK      |     0   (0)|
|*  5 |    INDEX RANGE SCAN                  | ITEM_ORDER_IX |     1   (0)|
SELECT /*+ index(i order_items_pk) */   *
  FROM oe.orders JOIN oe.order_items i USING (order_id)
 WHERE order_id = 2400;
| Id  | Operation                            | Name           | Cost (%CPU)|
|   0 | SELECT STATEMENT                     |                |     4   (0)|
|   1 |  NESTED LOOPS                        |                |     4   (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID        | ORDERS         |     1   (0)|
|*  3 |    INDEX UNIQUE SCAN                 | ORDER_PK       |     0   (0)|
|*  5 |    INDEX RANGE SCAN                  | ORDER_ITEMS_PK |     1   (0)|

The first query in Listing 15-1 shows the sort of query that one presumes the ITEM_ORDER_IX index was designed for: a query on a particular ORDER_ID. However, the second query in Listing 15-1 uses a hint to force the use of the multi-column index supporting the primary key constraint. The index that enforces the primary key is named ORDER_ITEMS_PK. The ORDER_ITEMS_PK index is a multi-column unique index that has ORDER_ID as the leading column, and the reported estimated cost of the use of ORDER_ITEMS_PK is identical to that ofITEM_ORDER_IX: we don’t need both indexes.

Of course the ITEM_ORDER_IX index is slightly smaller than the ORDER_ITEMS_PK index, so in fact there might be a slight performance gain from using ITEM_ORDER_IX over ORDER_ITEMS_PK. In practice, however, the DML overhead of maintaining ITEM_ORDER_IX will be far greater than the benefit the index affords queries, and if this were a real-life table then I would drop the single-column ITEM_ORDER_IX index.

image Note  If the ITEM_ORDER_IX column were to be dropped then there would still be no table lock when rows in ORDERS are deleted. This is because the ORDER_ID is the leading column in ORDER_ITEMS_PK and as such ORDER_ITEMS_PK prevents the table lock.

Correct Use of Multi-column Non-unique Indexes

If you find the need to create a non-unique B-tree index on a column then it almost always makes sense to add additional columns to the index to help queries that use predicates involving multiple columns. This is because adding a column or two to an index generally adds relatively little overhead to either DML or queries, but adding additional indexes is very expensive for DML. Listing 15-2 shows the use of a potentially legitimate index on the OE.CUSTOMERS table.

Listing 15-2. Correct use of a multi-column index

  FROM oe.customers
 WHERE UPPER (cust_last_name) = 'KANTH';
| Id  | Operation                           | Name               | Rows  | Cost (%CPU)|
|   0 | SELECT STATEMENT                    |                    |     2 |     4   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS          |     2 |     4   (0)|
|*  2 |   INDEX RANGE SCAN                  | CUST_UPPER_NAME_IX |     2 |     2   (0)|
Predicate Information (identified by operation id):
   2 - access(UPPER("CUST_LAST_NAME")='KANTH')
  FROM oe.customers
 WHERE     UPPER (cust_first_name) = 'MALCOLM'
       AND UPPER (cust_last_name) = 'KANTH';
| Id  | Operation                           | Name               | Rows  | Cost (%CPU)|
|   0 | SELECT STATEMENT                    |                    |     1 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS          |     1 |     2   (0)|
|*  2 |   INDEX RANGE SCAN                  | CUST_UPPER_NAME_IX |     1 |     1   (0)|
Predicate Information (identified by operation id):
  FROM oe.customers
 WHERE UPPER (cust_first_name) = 'MALCOLM';
| Id  | Operation         | Name      | Rows  | Cost (%CPU)|
|   0 | SELECT STATEMENT  |           |     2 |     5   (0)|
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |     2 |     5   (0)|
Predicate Information (identified by operation id):

CUST_UPPER_NAME_IX is a multi-column, function-based index on two expressions: UPPER (CUST_LAST_NAME) and UPPER (CUST_FIRST_NAME). Listing 15-2 shows that an INDEX RANGE SCAN can be used when a predicate on just the first expression is used or when predicates on both expressions are used. But when only the second expression is used in a query only the expensive and unsuitable INDEX SKIP SCAN is available, so the CBO elects to use a full table scan instead.

Let me close this section with a reminder that if you only select columns from an index the table need not be accessed at all. This means that the performance of some queries might improve by adding columns to an index, even if said queries contain no predicates on the column being added to the index.

Order of Columns in a Multi-column Index

As Listing 15-2 shows it is important to ensure that commonly used columns or expressions appear at the leading edge of an index. But if your query has predicates that specify all the columns in an index does it matter what the order of the columns in the index is? Generally speaking the answer is no: your index range scan will scan precisely the same number of index entries no matter the ordering of the columns in the index. However, apart from queries that specify only a portion of the indexed columns, there are three other considerations in determining the optimal column ordering for an index:

·     If you wish to compress only a subset of the indexed columns then these columns need to appear on the leading edge of the index. I will cover index compression a little later in this chapter.

·     If queries routinely order data using more than one column from an index, then correctly ordering columns in the index might avoid explicit sorts. I will demonstrate how an index can be used to avoid a sort in Chapter 19.

·     In theory, if data is loaded in a sorted or semi-sorted manner then the index clustering factor (and hence query performance) might be improved by ordering the columns in the index in the same way as the table data is sorted. I must say that I personally have never come across a situation where clustering factor was a concern in choosing the order of columns in an index.

image Note  The Performance Tuning Guide states that you should “order columns with most selectivity first.” This is a famous documentation error that has not been corrected as of 12cR1.

Bitmap Indexes

So far our discussion of indexing strategy has explicitly or implicitly been restricted to B-tree indexes. Bitmap indexes tend to be tiny and are one way to solve one important problem.

Listing 15-2 showed that queries that don’t include a predicate on the leading column of an index often can’t use a multi-column index efficiently. Imagine you have a table called PROBLEM_TABLE. You have a few dozen commonly used queries on PROBLEM_TABLE that each specify a subset of two or three columns from a set of half a dozen columns, but there is no column that is used in all the queries. What indexes do you create? This is the sort of situation that leads some developers to create dozens of indexes for each column subset. Not a good idea.

If you load your data in bulk then bitmap indexes solve this problem: you just create six single-column bitmap indexes on PROBLEM_TABLE and the INDEX COMBINE operation that we showed in Listing 10-18 can be used to identify the subset of rows needed from the table before it is accessed. It is important to not try to maintain your bitmap index during the DML operations themselves. Listing 15-3 shows the correct way to load data when bitmap indexes are in use.

Listing 15-3. Loading data into a partitioned table that has bitmap indexes

CREATE TABLE statement_part_ch15
      PARTITION p1 VALUES LESS THAN (DATE '2013-01-05')
     ,PARTITION p2 VALUES LESS THAN (DATE '2013-01-11')
     ,PARTITION p3 VALUES LESS THAN (DATE '2013-01-12')
     ,PARTITION p4 VALUES LESS THAN (maxvalue))
   SELECT transaction_date_time
     FROM statement;
CREATE BITMAP INDEX statement_part_pc_bix
   ON statement_part_ch15 (product_category)
CREATE BITMAP INDEX statement_part_cc_bix
   ON statement_part_ch15 (customer_category)
INSERT INTO statement_part_ch15 (transaction_date_time
   SELECT DATE '2013-01-11'
         ,DATE '2013-01-11'
         ,DATE '2013-01-11'
     FROM statement_part_ch15;
ALTER TABLE statement_part_ch15 MODIFY PARTITION FOR (DATE '2013-01-11')

Listing 15-3 creates a partitioned table STATEMENT_PART_CH15 using data from the STATEMENT table we created in Chapter 9. Before loading data into the previously empty partition for TRANSACTION_DATE January 11, 2013, the indexes for the partition are marked unusable. Data is then inserted without maintaining the indexes, and, finally, indexes for the partition (in particular, the bitmap indexes) are rebuilt. Once your bitmap indexes are rebuilt they are available for use.

Let me emphasize that bitmap indexes are only of practical use if you load data first, rebuild the indexes second, and query the data third. This is often very practical in data warehousing applications, but DML against usable bitmap indexes is very expensive and concurrency is very poor. The implication is that bitmap indexes are usually impractical for most other types of application.

Alternatives to Bitmap Indexes

Let us return to our PROBLEM_TABLE with the six columns used by the predicates of the table’s various problematic queries. Suppose now that concurrent DML and queries make bitmap indexes impractical. What then? When we consider each query in isolation the best thing to do is to create a multi-column index that includes all the columns or expressions in that query. But as I have already emphasized, this kind of automatic reaction when repeated for each query can result in a plethora of indexes that severely degrade DML performance. So is there an alternative?

You might consider creating six single-column B-tree indexes because, as I showed in Listing 10-18, the INDEX COMBINE operation can be used with B-tree indexes as well as with bitmap indexes. However, the associated conversion of the B-tree index entries to bitmaps can be expensive, and so six single-column B-tree indexes is unlikely to be a good idea. You might create just one big index that includes all six columns. That way if the leading columns of the index aren’t used in a particular query an INDEX FULL SCAN or INDEX SKIP SCAN might be used instead.

At the end of the day you probably have to accept that for some queries you will just have to live with a full table scan and a hash join, even if you know that a nested loops join with an index would make the query perform better.

If you are really desperate I will show you how you might rewrite your SQL to work around this issue in Chapter 19.

Index-organized Tables

In my opinion index-organized tables (IOTs) are best used when both of the following conditions apply:

·     Your table (or table partition) is very small (less than 50MB) or very narrow (the primary key is at least 10% of the row size)

·     You don’t need any indexes other than the one index supporting the primary key

If both of these conditions apply you can store all of the table’s columns in the index and thus avoid the need for any actual table at all.

It is possible to create additional indexes on an IOT, but as I mentioned briefly in Chapter 3, secondary indexes on IOTs hold a “guess” as to where the referenced row is, so they may need to be rebuilt frequently when the “guesses” become out-of-date. Some experts are more bullish about IOTs than I am. Martin Widlake is a great fan of IOTs. You can read more in his blog starting with this post:

Managing Contention

One of the great things about the Oracle database architecture is that writers don’t block readers. Alas, writers block other writers! So this section discusses the performance problems that can arise as a result of multiple processes performing DML on the same database object concurrently.

Sequence Contention

You should endeavor to design your application so that gaps in the numbers generated by a sequence are not problematic and that nothing particularly bad happens if numbers are generated out of order from the sequence. This endeavor is particularly relevant in a RAC environment. Once you are confident in the stability of your application you can alter your sequence and specify the CACHE and NOORDER attributes. This will ensure that communication between nodes in a RAC cluster is minimized when allocating numbers from a sequence.

The Hot-block Problem

When multiple processes concurrently insert rows into a table, the rows will usually be inserted into different blocks in the table whether you use Automatic Segment Space Management (ASSM) or Manual Segment Space Management (MSSM). However, it is often the case that multiple processes will attempt to update the same index block at the same time. This is particularly true when the leading column of the index is a timestamp or a number generated from a sequence.

If you try to access a buffer and it is being updated by another session you will have to wait. In Oracle 9i and earlier you would always have seen this wait as a “buffer busy waits” event, but as of release 10.1 and later a second wait event was split out. The wait event “read by other session” means that the buffer you are trying to access is being updated by another session and that the block is being read from disk into the buffer cache by that other session. Both the “buffer busy waits” event and the “read by other session” wait event are indications of contention for a block, although you might also see the “read by other session” wait event when there are concurrent readers of the same block.

When multiple processes are concurrently attempting to update the same block, the block is said to be hot. Hot blocks don’t always occur in indexes, but most of them do. Let us begin our discussion of hot-block contention by a look at a solution that is specific to indexes.

Reverse Key Indexes

On paper, the idea is simple and elegant. Imagine you have an index I1 on column C1 and you populate the values of C1 by obtaining values from a sequence. Three processes might generate values 123,456; 123,457; and 123,458 from the sequence and then insert rows into your table. The index entries in a traditional index would be consecutive, and most likely the processes would be contending with each other to update the same index leaf block. If you create the index with the keyword REVERSE, the bytes in the index entries will be reversed before inserting. You can think of this by imagining index entries 654,321; 754,321; and 854,321. As you can see, these index entries are no longer even remotely consecutive and will be inserted into different blocks. This isn’t quite how it works: the bytes in the index entry are reversed, not the digits in the number, but the concept is the same. Almost magically you seem to have instantly solved your contention problem.

The reality is not quite as elegant as it first appears. There are two big problems. The first is that INDEX RANGE SCANs are restricted to specific values of C1. If you have a query with a predicate WHERE C1 BETWEEN 123456 AND 123458 the CBO will not be able to use a range scan to find the three index entries because they are scattered around the index.

The second issue with reverse key indexes is that the hot-block problem now has become a cold-block problem. Before reversing the index there may have been contention between the processes, but at least only one index block would have to be read into the SGA. When you reverse your index, the index entries will be inserted all over the place, and unless all the leaf blocks from the index can be cached in the SGA there is a very real risk that disk reads on the index will increase dramatically.

I am indebted to Oracle’s Real World Performance team for educating me about a potentially superior solution to the hot index block problem. That solution is the global partitioned index.

Global Partitioned Indexes

Whether the underlying table is partitioned or not you can partition an index in a way that is totally independent of any partitioning strategy of the underlying table. Such an index is known as a globally partitioned indexListing 15-4 shows how this is done.

Listing 15-4. Creating a globally partitioned index

CREATE TABLE global_part_index_test
   c1   INTEGER
  ,c2   VARCHAR2 (50)
  ,c3   VARCHAR2 (50)
   ON global_part_index_test (c1)
      PARTITIONS 32;

ALTER TABLE global_part_index_test ADD CONSTRAINT global_part_index_test_pk PRIMARY KEY (c1);

  FROM global_part_index_test
 WHERE c1 BETWEEN 123456 AND 123458;
| Id | Operation                            | Name                   |
|  0 | SELECT STATEMENT                     |                        |
|  1 |  PARTITION HASH ALL                  |                        |
|  3 |    INDEX RANGE SCAN                  | GPI_IX                 |

Listing 15-4 creates the index used to support a primary key constraint as a global hash-partitioned index. Insertions into the index will be scattered around the 32 partitions, thus substantially reducing the level of contention, just as with a reverse key index. But unlike our reverse key index, we have not lost our ability to perform range scans across multiple values of C1. As you can see from line 1 in the execution plan, the runtime engine needs to look in all 32 partitions, so index range scans are more expensive than for an unpartitioned index, but at least they are possible.

We have solved our cold-block problem as well. Imagine inserting 1,024 rows into a large table with values of C1 from 1 to 1,024. If C1 is indexed with a reverse key index we are likely to insert index entries into 1,024 different blocks. On the other hand, assuming that we can store over 32 index entries in a block, our 1,024 index entries will be stored in just 32 blocks when C1 is indexed as in Listing 15-4. This is because the 32 values (approximately) of C1 for one hash partition will be consecutive within that partition! So our blocks are “lukewarm,” as Graham Wood put it, and it is likely that these 32 blocks can be kept in the buffer cache.

Is there a downside to globally partitioned indexes? Yes. They are difficult to maintain—even more difficult to maintain than an unpartitioned global index on a partitioned table.

Imagine that you have a 1TB table. You create five global hash-partitioned indexes, each using 32 hash partitions. You now upgrade your hardware and use datapump export/import to move the 1TB table to your new machine and then attempt to rebuild your indexes. You would need to scan the entire 1TB table to rebuild each index partition, regardless of any partitioning strategy on the table. That is 5 x 32 = 160 full table scans of your 1TB table!

One solution to the rebuild problem is to partition (or subpartition) your table by hash and then use local indexes. (Sub) partitioning your table has the added benefit of reducing the risk of contention on the table itself.

Initial ITL Entries

When a transaction first updates a block it uses an entry in the interested transaction list (ITL) in the block. If all the ITL entries in a block are used by uncommitted transactions then a new ITL entry is added, assuming that there is space in the block for such an entry. An ITL entry consumes 24 bytes, so if you were to pre-allocate 100 entries you would waste about one quarter of the space in an 8K block. By default the initial number of ITL slots is one for a table block and two for an index. If you see a lot of wait events with the name “enq: TX - allocate ITL entry"then the waiting process is unable to allocate a new ITL entry due to lack of space in the block. The direct approach to solving this problem is to increase the number of ITL entries initially allocated to blocks in your index or table. To increase the initial number of ITL entries in index I1from two to five, the syntax is ALTER INDEX I1 INITRANS 5, but bear in mind the following:

·     The change to INITRANS doesn’t affect blocks already formatted in the index or table. You may need to rebuild the index or move the table.

·     Just because you solve your ITL contention problem doesn’t mean you won’t still have other contention issues.

If you have ITL entry contention issues it may be better to avoid the issue rather than to solve it: using a reverse key or global hash-partitioned index might eliminate your ITL problem at the same time because the concurrent addition of new index entries will likely be to different blocks and, therefore, different ITLs. Let us look at another way to reduce block contention now.

Spreading Data Out

Just because two sessions are trying to update the same table block doesn’t mean they are trying to update the same row in the block. And just because two sessions are trying to update the same index block doesn’t mean they are they are trying to update the same index entry. If your database object is small you might try reducing the number of rows per table block or index entries per index block to alleviate contention.

Let us assume that you have 50 sessions that each frequently updates a tiny session-specific row in a table. If nothing special is done all 50 rows will end up in the same block, and you are likely to see a number of buffer busy wait events. Since the table is very small you can probably afford to spread the rows out one per block to avoid contention. There are actually several ways to achieve this. Listing 15-5 shows one such way.

Listing 15-5. Ensuring that there is only one row per block

CREATE TABLE process_status
   process_id                 INTEGER PRIMARY KEY
  ,status                     INTEGER
  ,last_job_id                INTEGER
  ,last_job_completion_time   DATE
  ,filler_1                   CHAR (100) DEFAULT RPAD (' ', 100)
INSERT INTO process_status (process_id)
         FROM DUAL

   DBMS_STATS.gather_table_stats (
      ownname     => SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
     ,tabname   => 'PROCESS_STATUS');
UPDATE process_status
     SET status = 1, last_job_id = 1, last_job_completion_time = SYSDATE
 WHERE process_id = 1;
| Id  | Operation          | Name           |
|   0 | UPDATE STATEMENT   |                |
|   1 |  UPDATE            | PROCESS_STATUS |
|   2 |   INDEX UNIQUE SCAN| SYS_C005053    |

Listing 15-5 creates a table called PROCESS_STATUS that stores 50 rows, one for each application process. Our first attempt to reduce the number of rows in each block of the PROCESS_STATUS table is to set PCTFREE to 99, meaning that rows will no longer be inserted into the table once 1% of the 8K block has been used. But our rows are less than 81 bytes long, so we need to add an extra column that I have called FILLER_1. Now when we insert our 50 rows they will all go into separate blocks. Any UPDATE statement on a single row will find the dedicated block via the index and there will be no contention with other sessions updating other rows in the table.


A favorite topic of light-hearted debate among experts is whether the main advantages of the partitioning option relate to the ease of administration or improved performance. Of course, partitioning offers both administrative-and performance-related advantages. We have already discussed how global hash-partitioned indexes and hash-partitioned tables can help alleviate hot-block contention. Let us look at some of the other performance-related advantages that partitioning brings us.

Full Table Scans on Partitions or Subpartitions

Imagine your table holds 100,000 rows for 10 business days, making 1,000,000 rows in total. Suppose that you can get about 100 rows per block, so your table has about 10,000 blocks in total. Let us take a rosy view of the world and assume that all the rows are perfectly clustered by business date so that all the rows for a particular business date are located in 1,000 of the 10,000 blocks. Now suppose you want to read all of the 10,000 rows for a particular business date. To keep things simple, let us assume that none of the data is in the buffer cache. Do you think indexed access or a full table scan is best?

Well, in truth, neither option is particularly appealing. If you use a full table scan then you will read and throw away 9,000 of the 10,000 blocks that you read. If you read the table blocks using an index then you will only access the 1,000 blocks you need, but they will be read using 1,000 single-block reads.

What we really want is to read the 1,000 blocks for our selected business date with multi-block reads but to not bother with the other 9,000 blocks. As I shall show you in Chapter 19, it is possible to rewrite SQL to accomplish this, but it is very tricky to do so. If you partition your table by business date you will now have 10 table segments each of a size of 1,000 blocks rather than a single 10,000-block segment. After partition elimination your query can perform a single full table scan of a single 1,000-block segment. If these 1,000 blocks are already in the buffer cache then you might not see a huge performance difference between accessing the blocks via an index or via a full table scan. But if the selected 1,000 blocks all need to be read from disk then the fact that the full table scan will use multi-block reads means that the scan will be substantially faster than when an index is used.

So partitioning makes full table scans more attractive, and this in turn may make some indexes redundant that might otherwise be needed. A double win!

Partition-wise Joins

If you join two tables that are partitioned identically and the partitioning column or columns are included in the join predicates, then you can perform what is referred to as a full partition-wise joinListing 15-6 shows a full partition-wise join in action.

Listing 15-6. A full partition-wise join executed serially

CREATE TABLE orders_part
   order_id           INTEGER NOT NULL
  ,order_date         DATE NOT NULL
  ,customer_name      VARCHAR2 (50)
  ,delivery_address   VARCHAR2 (100)
      PARTITION p1 VALUES (DATE '2014-04-01')
     ,PARTITION p2 VALUES (DATE '2014-04-02'));
CREATE UNIQUE INDEX orders_part_pk
   ON orders_part (order_date, order_id)
ALTER TABLE orders_part

CREATE TABLE order_items_part
   order_id        INTEGER NOT NULL
  ,order_item_id   INTEGER NOT NULL
  ,order_date      DATE NOT NULL
  ,product_id      INTEGER
  ,quantity        INTEGER
  ,price           NUMBER
  ,CONSTRAINT order_items_part_fk FOREIGN KEY
      (order_date, order_id)
       REFERENCES orders_part (order_date, order_id)
PARTITION BY REFERENCE (order_items_part_fk);
CREATE UNIQUE INDEX order_items_part_pk
   ON order_items_part (order_date, order_id, order_item_id)
ALTER TABLE order_items_part
   ADD CONSTRAINT order_items_part_pk PRIMARY KEY
SELECT /*+ leading(o i) use_hash(i) no_swap_join_inputs(i) */
  FROM orders_part o JOIN order_items_part i USING (order_id);
| Id  | Operation                | Name             |
|   0 | SELECT STATEMENT         |                  |
|   1 |  HASH JOIN               |                  |
|   2 |   PARTITION LIST ALL     |                  |
|   3 |    PARTITION HASH ALL    |                  |
|   4 |     TABLE ACCESS FULL    | ORDERS_PART      |
|   5 |   PARTITION REFERENCE ALL|                  |
SELECT /*+ leading(o i) use_hash(i) no_swap_join_inputs(i) */
  FROM orders_part o JOIN order_items_part i USING (order_date, order_id);

| Id  | Operation            | Name             |
|   0 | SELECT STATEMENT     |                  |
|   1 |  PARTITION LIST ALL  |                  |
|   2 |   PARTITION HASH ALL |                  |
|   3 |    HASH JOIN         |                  |

Listing 15-6 creates two tables, ORDERS_PART and ORDER_ITEMS_PART. ORDERS_PART is partitioned by list using ORDER_DATE and subpartitioned by hash using ORDER_ID. There are partitions for two values of ORDER_DATE and 16 subpartitions for each ORDER_DATE, making 32 subpartitions in all. The primary key should be just the ORDER_ID, but we have compromised a little by adding the ORDER_DATE to the primary key so that the constraint can be enforced using a local index—a common and practical compromise.

The ORDER_ITEMS_PART table is partitioned by reference, which means that when ORDER_ITEMS_PART is created it has 32 partitions, one for each subpartition of ORDER_ITEMS. For each partition added to or dropped from ORDERS_PART, 16 partitions will be added to or dropped from ORDER_ITEMS_PART.

Now then. What happens when we join the two tables? The first query in Listing 15-6 joins the two tables using only the ORDER_ID column. I have used hints to get an execution plan that is both easy to understand and typical of what might be seen once rows are added to the tables. We can see from line 5 in the execution plan that for each subpartition of ORDERS_PART, all 32 subpartitions of ORDER_ITEMS part are scanned. This is because the CBO doesn’t know that a particular value of ORDER_ID is only found in one partition of ORDER_ITEMS_PART. When we add the ORDER_DATE to the list of joined columns we see that the PARTITION REFERENCE ALL operation has disappeared, implying that only one partition of ORDER_ITEMS_PART is scanned for each subpartition of ORDERS_PART. Partition elimination on ORDER_ITEMS_PARThas been accomplished in the latter query of Listing 15-6 because all columns used to define the partitions and subpartitions in the two tables appear in the join conditions.

image Tip  One of the supposed benefits of partitioning by reference is that you can partition a child table by columns only present in the parent table. However, Listing 15-6 shows that if you eliminate the ORDER_DATE column from ORDER_ITEMS_PART in an attempt to normalize your design you will find that full partition-wise joins can’t be used. All the columns used to partition or subpartition a parent table should be present in the child table.

So what have we gained? If we hadn’t partitioned our tables at all we would have one big hash join of the entire table, and now we have 32 small hash joins. The point is that there is a likelihood that the small hash joins can be done entirely in memory as opposed to using the large hash join that has an higher probability to spill to disk.

So far we have only been looking at serial partition-wise joins. The beauty of partition-wise joins becomes really apparent when we consider parallelization.

Parallelization and Partitioning

I gave examples of full and partial partition-wise joins in Listings 11-17 and 11-18 respectively, but just to recap, Listing 15-7 takes the latter query in Listing 15-6 and runs it in parallel.

Listing 15-7. Parallel full partition-wise join

SELECT /*+ leading(o i) use_hash(i) no_swap_join_inputs(i)
           parallel(16) PQ_DISTRIBUTE(I NONE NONE)*/
  FROM orders_part o JOIN order_items_part i USING (order_date,order_id);
| Id  | Operation               | Name             |
|   0 | SELECT STATEMENT        |                  |
|   1 |  PX COORDINATOR         |                  |
|   2 |   PX SEND QC (RANDOM)   | TQ10000:         |
|   3 |    PX PARTITION HASH ALL|                  |
|   4 |     HASH JOIN           |                  |
|   5 |      TABLE ACCESS FULL  | ORDERS_PART      |


I don’t profess to be a leading authority on logical database design, but it is my belief that a logical database design should be normalized. However, once you have your initial neat, normalized, logical model you will often need to denormalize the model as part of the physical database design process in order to obtain decent performance. This section describes some of the ways in which you might need to denormalize your database, starting with materialized views.

Materialized Views

There are two reasons why you might create materialized views. One reason is to automate replication of data from one database to another. The second reason is to denormalize your data for performance purposes. Performance is what we are focusing on here, and there are two main categories of performance-related materialized views:

·     Materialized join views store the results of a join of two or more tables

·     Materialized summary views store the results of aggregations

Of course you can create a materialized view that both joins tables and then performs aggregations. If the same joins and/or aggregations are done frequently, it might make sense to create a materialized view so that the work is done only once. Here are some of the nice features that materialized views offer:

·     Query rewrite. This is an example of an optimizer transformation that I showed in Listing 13-30. You don’t need to rewrite your query to take advantage of the materialized view.

·     Materialized view logs. These can be used to keep track of changes to the tables on which the materialized view is based so that changes can be applied to the materialized view incrementally.

·     Refresh on commit. With this option every time a change is made to a table on which the materialized view is based the materialized view is updated incrementally using the materialized view logs.

This is just a taste of the features of materialized views. For a full tutorial on this complex feature I would recommend the Data Warehousing Guide. The Data Warehousing Guide also describes DIMENSION database objects, which are used to support in-row aggregate values.

Manual Aggregation and Join Tables

In reality, materialized views can be overkill for many applications. You don’t necessarily want to deal with query rewrites that happen or don’t happen when expected: you can write the queries so that they access the materialized view data directly. You don’t necessarily want to worry about the substantial performance impact that materialized view logs and refresh on commit can have on DML performance: you can refresh the materialized view yourself.

If that is your position then you don’t need to use the Oracle materialized view feature at all. For example, in a data warehousing application you might generate aggregated data yourself at the end of your daily load process. You can load the aggregated data into a regular table for use by queries during the online day.

You don’t necessarily need to maintain both normalized and denormalized copies of your data either. You can, for example, store commonly used columns from a dimension table in a fact table so that joins can be eliminated. Such denormalization might even avoid the need for a star transformation. But if the high cost or complexity of a star transformation is your primary concern, there is a potentially superior solution. I will cover that solution next.

Just remember that denormalization carries the risk of inconsistency in your database, and you should consider whether you need to build some safeguards, such as reconciliation jobs, to mitigate that risk.

Bitmap Join Indexes

Bitmap join indexes are actually a very specialized form of denormalization. They are primarily used to avoid star transformations. I must say that I personally found the concept of a bitmap join index hard to grasp at first. Let me approach the concept slowly and in stages.

A regular single-column B-tree index takes each distinct value of a column and creates an index entry for each row that has that value. The matching rows in the table being indexed are identified by a ROWID. A bitmap index is similar except that the matching rows are identified by a bitmap rather than a list of ROWIDs. A function-based bitmap index is similar to a regular bitmap index except that the bitmaps are associated with an expression rather than a value that is explicitly stored in the table. Like a function-based bitmap index, the indexed values in a bitmap join index don’t explicitly appear as a column in the table being indexed, but rather are associated with the indexed value indirectly. It is time for an example, I think. Take a look at Listing 15-8, which shows how we can use bitmap join indexes to improve the query in Listing 13-35.

Listing 15-8. Bitmap join indexes

   (PARTITION pdefault VALUES LESS THAN (maxvalue))
     FROM sh.sales
    WHERE time_id = DATE '2001-10-18';
CREATE TABLE customers_2
   SELECT * FROM sh.customers;
CREATE TABLE products_2
   SELECT * FROM sh.products;
ALTER TABLE products_2
ALTER TABLE customers_2
   DBMS_STATS.gather_table_stats (
      ownname   => SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
     ,tabname   => 'SALES_2');
   DBMS_STATS.gather_table_stats (
      ownname   => SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
     ,tabname   => 'CUSTOMERS_2');
   DBMS_STATS.gather_table_stats (
      ownname   => SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
     ,tabname   => 'PRODUCTS_2');

CREATE BITMAP INDEX sales_2_cust_ln_bjx
   ON sales_2 (c.cust_last_name)
   FROM customers_2 c, sales_2 s
   WHERE c.cust_id = s.cust_id

CREATE BITMAP INDEX sales_prod_category_bjx
   ON sales_2 (p.prod_category)
   FROM products_2 p, sales_2 s
   WHERE s.prod_id = p.prod_id

SELECT prod_name
  FROM customers_2 c, products_2 p, sales_2 s
 WHERE     s.cust_id = c.cust_id
       AND s.prod_id = p.prod_id
       AND c.cust_last_name = 'Everett'
       AND p.prod_category = 'Electronics';
| Id  | Operation                                     | Name                    |
|   0 | SELECT STATEMENT                              |                         |
|   1 |  NESTED LOOPS                                 |                         |
|   2 |   NESTED LOOPS                                |                         |
|   3 |    HASH JOIN                                  |                         |
|   4 |     TABLE ACCESS FULL                         | PRODUCTS_2              |
|   5 |     PARTITION RANGE SINGLE                    |                         |
|   6 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES_2                 |
|   7 |       BITMAP CONVERSION TO ROWIDS             |                         |
|   8 |        BITMAP AND                             |                         |
|   9 |         BITMAP INDEX SINGLE VALUE             | SALES_2_CUST_LN_BJX     |
|  11 |    INDEX UNIQUE SCAN                          | CUSTOMERS_2_PK          |
|  12 |   TABLE ACCESS BY INDEX ROWID                 | CUSTOMERS_2             |

Listing 15-8 creates copies of the SH.SALES, SH.CUSTOMERS, and SH.PRODUCTS table rather than making DDL changes to the Oracle-supplied example schemas. Before creating any bitmap join indexes we need to create enabled, validated primary key constraints on ourCUSTOMERS_2 and PRODUCTS_2 tables. The constraints in the example schemas aren’t validated.

The first bitmap index we create is SALES_2_CUST_LN_BJX. This index identifies all the rows in SALES_2 that are associated with each value of CUST_LAST_NAME in the CUSTOMERS_2 table. This bitmap index allows us to identify all the rows in SALES_2 associated with customers that have the last name Everett without going anywhere near the CUSTOMERS_2 table. Similarly, the SALES_PROD_CATEGORY_BJX allows us to identify the rows in SALES_2 that are associated with products in the “Electronics” category without going anywhere near thePRODUCTS_2 table. By combining these indexes we can find the exact set of rows in SALES_2 that match both sets of predicates.

Although bitmap join indexes are similar to function-based indexes, they are also very similar to materialized join views. You could create a materialized join view that holds the value of CUST_LAST_NAME from CUSTOMERS_2 and the associated ROWID from SALES_2, and you would have essentially the same thing as a bitmap join index. However, the bitmap join index is much more compact and more efficient to use than a materialized join view.

The big problem with bitmap join indexes, as with any bitmap index, is that the DML on the tables associated with the index is inefficient. In fact, parallel DML on the dimension table will mark the bitmap join index as unusable. Accordingly, bitmap join indexes are really only useful in data-warehousing environments.


Compressing data is not a form of denormalization. With the exception of changes in performance, the use or lack of use of compression has no visible impact on users or developers of SQL. Let us look at index and table compression next. I will cover large object (LOB) compression when I mention LOBs a little later.

As a general rule, both index compression and table compression make DML more expensive and queries cheaper. This may sound counterintuitive, as queries need to decompress data. However, the decompression processes for both indexes and tables are very cheap, as we shall see, and even a modest reduction in logical I/O will more than compensate for the cost of decompressing. On the other hand, if you see an excessive amount of CPU used during data loading then you may want to try turning off compression features to see if it makes a difference.

Index Compression

Index compression is one of those Oracle features that should be used extensively but is often forgotten. Index compression is implemented at the block level. Basically, Oracle uses what is called a prefix table to store distinct values of the compressed columns in a block. Now, only a pointer to the prefix table entry is stored in the index entry rather than the values of the column or columns themselves. Oracle allows you to specify which columns, if any, to compress, but your choice is restricted: the syntax allows you to specify the number of columns to compress starting with the leading column. You can’t, for example, choose to compress the second column but not the first.

Imagine you have 200 rows per block in a non-compressed index. First of all, if the number of distinct values of the compressed columns is normally just one or two per block, your prefix table will contain just one or two entries and you will be able to store a lot more than 200 rows per block in a compressed index. But suppose there are 150 distinct values of your leading column in the 200 uncompressed rows. Now index compression will be a problem because the 150 entries in the prefix table will consume most of the block, and all the extra overhead is likely to mean that you will no longer be able to fit all 200 rows in the block.

image Tip  The ANALYZE INDEX . . . VALIDATE STRUCTURE command can be used to determine the recommended number of columns to compress for an index. The recommended value can be found in the column OPT_CMPR_COUNT from the INDEX_STATS table.

Don’t assume that just because an index is unique that it can’t benefit from compression. Notice that the ORDER_ITEMS_PART_PK index created in Listing 15-6 was created specifying COMPRESS 1, a perfectly suitable thing to do if there are a large number of line items per day, because if you didn’t compress it there would be a large number of index entries for each value of ORDER_DATE. It probably isn’t worth compressing the ORDER_ID column unless there are a large number of line items in each order.

Notice that index compression isn’t really compression at all. It is actually a form of deduplication, as Oracle likes to call it. Accordingly, there is no real work needed by queries for decompression. All the query has to do is follow a pointer to get the column value or values in the prefix table.

Table Compression

There are several flavors of table compression:

·     Basic table compression. This is a compression variant that compresses table data for direct-path inserts and table moves only, which doesn’t need additional licenses.

·     OLTP table compression. In 11gR1 this feature was called compression for all operations. Basically it’s the same thing as basic compression, but, unlike basic compression, data is also compressed on non-direct path INSERT operations. OLTP table compression requires either the Personal Edition or Enterprise Edition of the Oracle database product. In the case of Enterprise addition, the Oracle Advanced Compression needs to be purchased separately.

·     Hybrid columnar compression. This is a totally different feature specific to Exadata- and Oracle-supplied storage. This topic is beyond the scope of this book.

Like index compression, basic compression and OLTP table compression are based on the concept of deduplication within a block. In other words, Oracle attempts to store each repeated value just once within a block. The implementation of table compression is somewhat more complicated than index compression. Oracle can reorder columns, deduplicate combinations of columns, and so on. For a detailed description of the internals of table compression, have a look at Jonathan Lewis’ articles, starting here: Here are some important points to note:

·     When you create a table with basic compression PCTFREE defaults to 0. When you create a table with OLTP compression the default value of PCTFREE remains 10.

·     When you update a row in a table, the updated values aren’t recompressed. This is probably why the 11gR1 name for OLTP compression, compression for all operations, was hurriedly changed!

·     You should probably be wary of using either basic or OLTP compression on rows that are modified after insertion. This means that OLTP compression is best used for data loaded by non-direct-path INSERT statements, never modified, and then read repeatedly. As a consequence, you should probably override the value of PCTFREE when you create a table with OLTP compression. Listing 15-9 shows the basic syntax.

Listing 15-9. Creating a table with OLTP compression and overriding PCTFREE

CREATE TABLE order_items_compress
   order_id        INTEGER NOT NULL
  ,order_item_id   INTEGER NOT NULL
  ,order_date      DATE NOT NULL
  ,product_id      INTEGER
  ,quantity        INTEGER
  ,price           NUMBER
  ,CONSTRAINT order_items_compress_fk FOREIGN KEY
      (order_date, order_id)
       REFERENCES orders_part (order_date, order_id)

image Tip  Whether you use table compression or not, you should set PCTFREE to 0 whenever you are certain that no rows in your table ever increase in size as a result of UPDATE or MERGE operations.


The term LOB stands for large object. It is not just the LOBs that are large—the topic is large too! In fact, Oracle has an entire manual, the SecureFiles and Large Objects Developer's Guide, dedicated to the topic. I will just pick out a few key features here.

·     Your LOBs should be created as SecureFiles in a tablespace with Automatic Segment Space Management (ASSM). The other flavor of LOB, BasicFiles, will be deprecated in a future release of Oracle database, according to the manual.

·     With the Advanced Compression option you can take advantage of both LOB deduplication and LOB compression.

·     You can elect to cache LOBs for reads or for both reads and writes. By default, LOBs are not cached in the SGA at all.

·     As long as you don’t cache LOB data you can elect not to log LOB data. If you take advantage of this option, performance of INSERT operations will improve, but you will lose the LOB data on media recovery. This might be alright if you can just reload the data.


This chapter has covered a few features of physical database design that are of importance to SQL performance, but it makes no pretense at being a comprehensive guide. However, it is often the case that SQL performance issues are best addressed by physical database design changes, and hopefully this chapter will help you recognize these cases.

If you remember just one thing from this chapter it should be this: don’t blindly add an index to solve a performance problem with a single query—at least not without some thought.