Expert Oracle Database Architecture, Third Edition (2014)

Chapter 12. Datatypes

Choosing the right datatype seems so easy and straightforward, but many times I see it done incorrectly. The most basic decision—what type you use to store your data in—will have repercussions on your applications and data for years to come. Thus, choosing the appropriate datatype is paramount. It is also hard to change after the fact—in other words, once you implement it, you might be stuck with it for quite a while.

In this chapter, we’ll take a look at all of the Oracle basic datatypes available and discuss how they are implemented and when each might be appropriate to use. We won’t examine user-defined datatypes as they’re simply compound objects derived from the built-in Oracle datatypes. We’ll investigate what happens when you use the wrong datatype for the job—or even just the wrong parameters to the datatype (length, precision, scale, and so on). By the end of this chapter, you’ll have an understanding of the types available to you, how they’re implemented, when to use each type and, as important, why using the right type for the job is key.

An Overview of Oracle Datatypes

Oracle provides 22 different SQL datatypes. Briefly, they are as follows:

·     CHAR: A fixed-length character string that will be blank padded with spaces to its maximum length. A non-null CHAR(10) will always contain 10 bytes of information using the default National Language Support (NLS) settings. We will cover NLS implications in more detail shortly. A CHAR field may store up to 2,000 bytes of information.

·     NCHAR: A fixed-length character string that contains UNICODE formatted data. Unicode is a character-encoding standard developed by the Unicode Consortium with the aim of providing a universal way of encoding characters of any language, regardless of the computer system or platform being used. The NCHAR type allows a database to contain data in two different character sets: the CHAR type and NCHAR type use the database’s character set and the national character set, respectively. A non-null NCHAR(10) will always contain 10characters of information (note that it differs from the CHAR type in this respect). An NCHAR field may store up to 2,000 bytes of information.

·     VARCHAR2: Also currently synonymous with VARCHAR. This is a variable length character string that differs from the CHAR type in that it is not blank padded to its maximum length. A VARCHAR2(10) may contain between 0 and 10 bytes of information using the default NLS settings. A VARCHAR2 may store up to 4,000 bytes of information. Starting with Oracle 12c, a VARCHAR2 can be configured to store up to 32,767 bytes of information (see the “Extended Datatypes” section in this chapter for further details).

·     NVARCHAR2: A variable length character string that contains UNICODE formatted data. An NVARCHAR2(10) may contain between 0 and 10 characters of information. An NVARCHAR2 may store up to 4,000 bytes of information. Starting with Oracle 12c, anNVARCHAR2 can be configured to store up to 32,767 bytes of information (see the “Extended Datatypes” section in this chapter for further details).

·     RAW: A variable length binary datatype, meaning that no character set conversion will take place on data stored in this datatype. It is considered a string of binary bytes of information that will simply be stored by the database. A RAW may store up to 2,000 bytes of information. Starting with Oracle 12c, a RAW can be configured to store up to 32,767 bytes of information (see the “Extended Datatypes” section in this chapter for further details).

·     NUMBER: This datatype is capable of storing numbers with up to 38 digits of precision. These numbers may vary between 1.0x10(–130) and up to but not including 1.0x10(126). Each number is stored in a variable length field that varies between 0 bytes (for NULL) and 22 bytes. Oracle NUMBER types are very precise—much more so than normal FLOAT and DOUBLE types found in many programming languages.

·     BINARY_FLOAT: This is a type available only in Oracle 10g Release 1 and above. This is a 32-bit single-precision floating-point number. It can support at least 6 digits of precision and will consume 5 bytes of storage on disk.

·     BINARY_DOUBLE: This is a type available only in Oracle 10g Release 1 and above. This is a 64-bit double-precision floating-point number. It can support at least 15 digits of precision and will consume 9 bytes of storage on disk.

·     LONG: This type is capable of storing up to 2GB of character data (2 gigabytes, not characters, as each character may take multiple bytes in a multibyte character set). LONG types have many restrictions (I’ll discuss later) that are provided for backward compatibility, so it is strongly recommended you do not use this type in new applications. When possible, convert from LONG to CLOB types in existing applications.

·     LONG RAW: The LONG RAW type is capable of storing up to 2GB of binary information. For the same reasons as noted for LONGs, it is recommended you use the BLOB type in all future development and, when possible, in existing applications as well.

·     DATE: This is a fixed-width 7-byte date/time datatype. It will always contain the seven attributes of the century, the year within the century, the month, the day of the month, the hour, the minute, and the second.

·     TIMESTAMP: This is a fixed-width 7- or 11-byte date/time datatype (depending on the precision). It differs from the DATE datatype in that it may contain fractional seconds; up to 9 digits to the right of the decimal point may be preserved for TIMESTAMPs with fractional seconds.

·     TIMESTAMP WITH TIME ZONE: This is a fixed-width 13-byte date/time datatype, but it also provides for TIME ZONE support. Additional information regarding the time zone is stored with the TIMESTAMP in the data, so the TIME ZONE originally inserted is preserved with the data.

·     TIMESTAMP WITH LOCAL TIME ZONE: This is a fixed-width 7- or 11-byte date/time datatype (depending on the precision), similar to the TIMESTAMP; however, it is time zone sensitive. Upon modification in the database, the TIME ZONE supplied with the data is consulted, and the date/time component is normalized to the local database time zone. So, if you were to insert a date/time using the time zone U.S./Pacific and the database time zone was U.S./Eastern, the final date/time information would be converted to the Eastern time zone and stored as a TIMESTAMP. Upon retrieval, the TIMESTAMP stored in the database would be converted to the time in the session’s time zone.

·     INTERVAL YEAR TO MONTH: This is a fixed-width 5-byte datatype that stores a duration of time, in this case as a number of years and months. You may use intervals in date arithmetic to add or subtract a period of time from a DATE or the TIMESTAMP types.

·     INTERVAL DAY TO SECOND: This is a fixed-width 11-byte datatype that stores a duration of time, in this case as a number of days and hours, minutes, and seconds, optionally with up to 9 digits of fractional seconds.

·     BLOB: This datatype permits for the storage of up to 4GB of data in Oracle9i and before or (4 gigabytes - 1) * (database block size) bytes of data in Oracle 10g and above. BLOBs contain “binary” information that is not subject to character set conversion. This would be an appropriate type in which to store a spreadsheet, a word processing document, image files, and the like.

·     CLOB: This datatype permits for the storage of up to 4GB of data in Oracle9i and before or (4 gigabytes -1) * (database block size) bytes of data in Oracle 10g and above. CLOBs contain information that is subject to character set conversion. This would be an appropriate type in which to store large plain-text information. Note that I said large plain-text information; this datatype would not be appropriate if your plain text data is 4,000 bytes or less—for that you would want to use the VARCHAR2 datatype.

·     NCLOB: This datatype permits for the storage of up to 4GB of data in Oracle9i and before or (4 gigabytes - 1) * (database block size) bytes of data in Oracle 10g and above. NCLOBs store information encoded in the national character set of the database and are subject to character set conversions just as CLOBs are.

·     BFILE: This datatype permits you to store an Oracle directory object (a pointer to an operating system directory) and a file name in a database column and to read this file. This effectively allows you to access operating system files available on the database server in a read-only fashion, as if they were stored in the database table itself.

·     ROWID: A ROWID is effectively a 10-byte address of a row in a database. Sufficient information is encoded in the ROWID to locate the row on disk, as well as identify the object the ROWID points to (the table and so on).

·     UROWID: A UROWID is a universal ROWID and is used for tables—such as IOTs and tables accessed via gateways to heterogeneous databases—that do not have fixed ROWIDs. The UROWID is a representation of the primary key value of the row and hence will vary in size depending on the object to which it points.

Many types are apparently missing from the preceding list, such as INT, INTEGER, SMALLINT, FLOAT, REAL, and others. These types are actually implemented on top of one of the base types in the preceding list—that is, they are synonyms for the native Oracle type. Additionally, datatypes such as XMLType, SYS.ANYTYPE, and SDO_GEOMETRY are not listed because we will not cover them in this book. They are complex object types comprising a collection of attributes along with the methods (functions) that operate on those attributes. They are made up of the basic datatypes listed previously and are not truly datatypes in the conventional sense, but rather an implementation, a set of functionality, that you may make use of in your applications.

Now, let’s take a closer look at these basic datatypes.

Character and Binary String Types

The character datatypes in Oracle are CHAR, VARCHAR2, and their “N” variants. The CHAR and NCHAR can store up to 2,000 bytes of text. The VARCHAR2 and NVARCHAR2 can store up to 4,000 bytes of information.

Image Note  Starting with Oracle 12c, VARCHAR2, NVARCHAR2, and RAW datatypes can be configured to store up to 32,767 bytes of information. Extended datatypes are not enabled by default; therefore unless explicitly configured the maximum size is still 4,000 bytes for VARCHAR2 andNVARCHAR2 datatypes and 2,000 bytes for RAW. See the “Extended Datatypes” section later in this chapter for more details.

This text is converted between various character sets as needed by the database. A character set is a binary representation of individual characters in bits and bytes. Many different character sets are available, and each is capable of representing different characters, for example:

·     The US7ASCII character set is the ASCII standard representation of 128 characters. It uses the low 7 bits of a byte to represent these 128 characters.

·     The WE8MSWIN1252 character set is a Western European character set capable of representing the 128 ASCII characters as well as 128 extended characters, using all 8 bits of a byte.

Before we get into the details of CHAR, VARCHAR2, and their “N” variants, it would benefit us to get a cursory understanding of what these different character sets mean to us.

NLS Overview

As stated earlier, NLS stands for National Language Support. NLS is a very powerful feature of the database, but one that is often not as well understood as it should be. NLS controls many aspects of our data. For example, it controls how data is sorted, and whether we see commas and a single period in a number (e.g., 1,000,000.01) or many periods and a single comma (e.g., 1.000.000,01). But most important, it controls the following:

·     Encoding of the textual data as stored persistently on disk

·     Transparent conversion of data from character set to character set

It is this transparent part that confuses people the most—it is so transparent, you cannot even really see it happening. Let’s look at a small example.

Suppose you are storing 8-bit data in a WE8MSWIN1252 character set in your database, but you have some clients that connect using a 7-bit character set such as US7ASCII. These clients are not expecting 8-bit data and need to have the data from the database converted into something they can use. While this sounds wonderful, if you are not aware of it taking place, then you might well find that your data loses characters over time as the characters that are not available in US7ASCII are translated into some character that is. This is due to the character set translation taking place. In short, if you retrieve data from the database in character set 1, convert it to character set 2, and then insert it back (reversing the process), there is a very good chance that you have materially modified the data. Character set conversion is typically a process that will change the data, and you are usually mapping a large set of characters (in this example, the set of 8-bit characters) into a smaller set (that of the 7-bit characters). This is a lossy conversion—the characters get modified because it is quite simply not possible to represent every character. But this conversion must take place. If the database is storing data in a single-byte character set but the client (say, a Java application, since the Java language uses Unicode) expects it in a multibyte representation, then it must be converted simply so the client application can work with it.

You can see character set conversion very easily. For example, I have a database whose character set is set to WE8MSWIN1252, a typical Western European character set:

EODA@ORA12CR1> select *
  2  from nls_database_parameters
  3  where parameter = 'NLS_CHARACTERSET';

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET               WE8MSWIN1252

Now, if I ensure my NLS_LANG is set the same as my database character set (Windows users would change/verify this setting in their registry):

EODA@ORA12CR1> host echo $NLS_LANG
AMERICAN_AMERICA.WE8MSWIN1252

I can create a table and put in some “8-bit” data. This data that will not be usable by a 7-bit client, one that is expecting only 7-bit ASCII data:

EODA@ORA12CR1> create table t ( data varchar2(1) );
Table created.

EODA@ORA12CR1> insert into t values ( chr(224) );
1 row created.

EODA@ORA12CR1> insert into t values ( chr(225) );
1 row created.

EODA@ORA12CR1> insert into t values ( chr(226) );
1 row created.

EODA@ORA12CR1> select data, dump(data) dump from t;

D DUMP
- --------------------
à Typ=1 Len=1: 224
á Typ=1 Len=1: 225
â Typ=1 Len=1: 226

EODA@ORA12CR1> commit;

Image Note  If you do this example yourself and do not see the preceding output, make sure your terminal client software is using a UTF-8 character set itself. Otherwise, it might be translating the characters when printing to the screen! A common terminal emulator for UNIX will typically be 7-bit ASCII. This affects both Windows and UNIX/Linux users alike. Make sure your terminal can display the characters.

Now, if I go to another window and specify a 7-bit ASCII client, I’ll see quite different results:

$ export NLS_LANG=AMERICAN_AMERICA.US7ASCII
$ sqlplus eoda
Enter password:

EODA@ORA12CR1> select data, dump(data) dump from t;

D DUMP
- --------------------
a Typ=1 Len=1: 224
a Typ=1 Len=1: 225
a Typ=1 Len=1: 226

Notice how in the 7-bit session I received the letter “a” three times with no diacritical marks. However, the DUMP function is showing me that in the database there are, in fact, three separate distinct characters, not just the letter “a.” The data in the database hasn’t changed—just the values this client received. If this client were to retrieve that data into host variables as follows:

EODA@ORA12CR1> variable d varchar2(1)
EODA@ORA12CR1> variable r varchar2(20)
EODA@ORA12CR1> begin
2  select data, rowid into :d, :r from t where rownum = 1;
3  end;
4  /
PL/SQL procedure successfully completed.

And then next, do nothing whatsoever with it, just send it back to the database:

EODA@ORA12CR1> update t set data = :d where rowid = chartorowid(:r);
1 row updated.
EODA@ORA12CR1> commit;
Commit complete.

I would observe in the original 8-bit session that I have lost one of the original characters. It has been replaced with the lowly 7-bit a, not the fancy à I had previously.

EODA@ORA12CR1> select data, dump(data) dump from t;

D DUMP
- --------------------
a Typ=1 Len=1: 97
á Typ=1 Len=1: 225
â Typ=1 Len=1: 226

SQL DUMP FUNCTION

The Oracle SQL DUMP function allows you to display the datatype code, length in bytes, and the internal representation of a data value (also optionally the character set name). Its syntax is as follows:

DUMP( expression [,return_format] [,start_position] [,length] )

The default return_format is 10 (decimal) and can be any of the following: 8, 10, 16, 17, 1008, 1010, 1016, or 1017. Where 8 is octal notation, 10 is decimal, 16 is hexadecimal, 17 is single characters, 1008 is octal with the character set name, 1010 is decimal with character set name, 1016 is hexadecimal with the character set name, and 1017 is single characters with the character set name. The following example dumps information regarding the “a” character:

EODA@ORA12CR1> select dump('a'), dump('a',8), dump('a',16) from dual;
DUMP('A')        DUMP('A',8)       DUMP('A',16)

---------------- ----------------- ----------------

Typ=96 Len=1: 97 Typ=96 Len=1: 141 Typ=96 Len=1: 61

Where 97, 141, and 61 are the corresponding ASCII codes for the “a” character in decimal, octal, and hexadecimal notation. The returned datatype code of Typ=96 indicates a CHAR datatype (see the Oracle Database SQL Language Reference manual for a complete list of Oracle datatype codes and meanings).

This demonstrates the immediate impact of an environment with a heterogeneous character set, whereby the clients and database use different NLS settings. It is something to be aware of because it comes into play in many circumstances. For example, if the DBA uses the deprecated legacy EXP tool to extract information, he may observe the following warning:

[tkyte@desktop tkyte] exp userid=eoda tables=t
Export: Release 12.1.0.1.0 - Production on Thu Jan 9 16:11:24 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
Password:
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Export done in US7ASCII character set and UTF8 NCHAR character set
server uses WE8MSWIN1252 character set (possible charset conversion)
About to export specified tables via Conventional Path
...

Such warnings should be treated very seriously. If you were exporting this table with the goal of dropping the table and then using IMP to re-create it, you would find that all of your data in that table was now lowly 7-bit data! Beware the unintentional character set conversion.

Image Note  The problem of unintentional character set conversion does not affect every tool, nor does it affect every tool in the same ways. For example, if you were to use the recommended Data Pump export/import process, you would discover that the export is always done in the character set of the database containing the data, regardless of the client’s NLS ­settings. This is because Data Pump runs in the database server itself; it is not a client side tool at all. Similarly, Data Pump import will always convert the data in the file to be imported from the source database’s character set into the destination database’s character set— meaning that character set conversion is still possible with Data Pump (if the source and target databases have different character sets) but not in the same fashion as with the legacy EXP/IMP tools!

But also be aware that, in general, character set conversions are necessary. If clients are expecting data in a specific character set, it would be disastrous to send them the information in a different character set.

Image Note  I highly encourage everyone to read through the Oracle Database Globalization Support Guide document. It covers NLS-related issues to a depth we will not here. Anyone creating applications that will be used around the globe (or even across international boundaries) needs to master the information contained in that document.

Now that we have a cursory understanding of character sets and the impact they will have on us, let’s take a look at the character string types provided by Oracle.

Character Strings

There are four basic character string types in Oracle, namely CHAR, VARCHAR2, NCHAR, and NVARCHAR2. All of the strings are stored in the same format in Oracle. On the database block they will have a leading length field of 1 to 3 bytes followed by the data; when they are NULL they will be represented as a single byte value of 0xFF.

Image Note  Trailing NULL columns consume 0 bytes of storage in Oracle. This means that if the last column in a table is NULL, Oracle stores nothing for it. If the last two columns are both NULL, there will be nothing stored for either of them. But if any column after a NULL column in position is not null, then Oracle will use the null flag, described in this section, to indicate the missing value.

If the length of the string is less than or equal to 250 (0x01 to 0xFA), Oracle will use 1 byte for the length. All strings exceeding 250 bytes in length will have a flag byte of 0xFE followed by 2 bytes that represent the length. So, a VARCHAR2(80) holding the words Hello Worldmight look like Figure 12-1 on a block.

image

Figure 12-1. Hello World stored in a VARCHAR2(80)

A CHAR(80) holding the same data, on the other hand, would look like Figure 12-2.

image

Figure 12-2. Hello World stored in a CHAR(80)

The fact that a CHAR/NCHAR is really nothing more than a VARCHAR2/NVARCHAR2 in disguise makes me of the opinion that there are really only two character string types to ever consider, namely VARCHAR2 and NVARCHAR2. I have never found a use for the CHAR type in any application. Since a CHAR type always blank pads the resulting string out to a fixed width, we discover rapidly that it consumes maximum storage both in the table segment and any index segments. That would be bad enough, but there is another important reason to avoid CHAR/NCHAR types: they create confusion in applications that need to retrieve this information (many cannot “find” their data after storing it). The reason for this relates to the rules of character string comparison and the strictness with which they are performed. Let’s use the 'Hello World' string in a simple table to demonstrate:

EODA@ORA12CR1> create table t
  2  ( char_column      char(20),
  3    varchar2_column  varchar2(20)
  4  )
  5  /
Table created.

EODA@ORA12CR1> insert into t values ( 'Hello World', 'Hello World' );
1 row created.

EODA@ORA12CR1> select * from t;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

EODA@ORA12CR1> select * from t where char_column = 'Hello World';
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

EODA@ORA12CR1> select * from t where varchar2_column = 'Hello World';
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

So far, the columns look identical but, in fact, some implicit conversion has taken place and the CHAR(11) literal ‘Hello World’ has been promoted to a CHAR(20) and blank padded when compared to the CHAR column. This must have happened since Hello World......... isnot the same as Hello World without the trailing spaces. We can confirm that these two strings are materially different:

EODA@ORA12CR1> select * from t where char_column = varchar2_column;
no rows selected

They are not equal to each other. We would have to either blank pad out the VARCHAR2_COLUMN to be 20 bytes in length or trim the trailing blanks from the CHAR_COLUMN, as follows:

EODA@ORA12CR1> select * from t where trim(char_column) = varchar2_column;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

EODA@ORA12CR1> select * from t where char_column = rpad( varchar2_column, 20 );
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

Image Note  There are many ways to blank pad the VARCHAR2_COLUMN, such as using the CAST() function.

The problem arises with applications that use variable length strings when they bind inputs, with the resulting “no data found” that is sure to follow:

EODA@ORA12CR1> variable varchar2_bv varchar2(20)
EODA@ORA12CR1> exec :varchar2_bv := 'Hello World';
PL/SQL procedure successfully completed.

EODA@ORA12CR1> select * from t where char_column = :varchar2_bv;
no rows selected

EODA@ORA12CR1> select * from t where varchar2_column = :varchar2_bv;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

So here, the search for the VARCHAR2 string worked, but the CHAR column did not. The VARCHAR2 bind variable will not be promoted to a CHAR(20) in the same way as a character string literal. At this point, many programmers form the opinion that “bind variables don’t work; we have to use literals.” That would be a very bad decision indeed. The solution is to bind using a CHAR type:

EODA@ORA12CR1> variable char_bv char(20)
EODA@ORA12CR1> exec :char_bv := 'Hello World';

PL/SQL procedure successfully completed.
EODA@ORA12CR1>
EODA@ORA12CR1> select * from t where char_column = :char_bv;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World
EODA@ORA12CR1> select * from t where varchar2_column = :char_bv;
no rows selected

However, if you mix and match VARCHAR2 and CHAR, you’ll be running into this issue constantly. Not only that, but the developer is now having to consider the field width in her applications. If the developer opts for the RPAD() trick to convert the bind variable into something that will be comparable to the CHAR field (it is preferable, of course, to pad out the bind variable, rather than TRIM the database column, as applying the function TRIM to the column could easily make it impossible to use existing indexes on that column), she would have to be concerned with column length changes over time. If the size of the field changes, then the application is impacted, as it must change its field width.

It is for these reasons—the fixed-width storage, which tends to make the tables and related indexes much larger than normal, coupled with the bind variable issue—that I avoid the CHAR type in all circumstances. I cannot even make an argument for it in the case of the one-character field, because in that case it is really of no material difference. The VARCHAR2(1) and CHAR(1) are identical in all aspects. There is no compelling reason to use the CHAR type in that case, and to avoid any confusion, I “just say no,” even for the CHAR(1) field.

Character String Syntax

The syntax for the four basic string types is straightforward, as described in Table 12-1.

Table 12-1. Four Basic String Types

String Type

<SIZE>

VARCHAR2( <SIZE> <BYTE|CHAR> )

A number between 1 and 4,000 for up to 4,000 bytes of storage. In the following section, we’ll examine in detail the differences and nuances of the BYTE versus CHAR modifier in that clause. Starting with 12c, you can configure a VARCHAR2 to store up to 32,767 bytes of information.

CHAR( <SIZE> <BYTE|CHAR> )

A number between 1 and 2,000 for up to 2,000 bytes of storage.

NVARCHAR2( <SIZE> )

A number greater than 0 whose upper bound is dictated by your national character set. Starting with 12c, you can configure a NVARCHAR2 to store up to 32,767 bytes of information.

NCHAR( <SIZE> )

A number greater than 0 whose upper bound is dictated by your national character set.

Bytes or Characters

The VARCHAR2 and CHAR types support two methods of specifying lengths:

·     In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set. Remember that bytes are not the same as characters in a multibyte character set!

·     In characters: VARCHAR2(10 char). This will support up to 10 characters of data, which could be as much as 40 bytes of information. Furthermore, VARCHAR2(4000 CHAR) would theoretically support up to 4,000 characters of data, but since a character stringdatatype in Oracle is limited to 4,000 bytes, you might not be able to store that many characters. See the following for an example.

When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definition—that is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts. Any application that requires a database to have a specific set of NLS settings makes for an unfriendly application. Such applications generally cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place.

One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4,000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4,000 characters into that field. In fact, you may be able to fit as few as 1,000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set! Regarding the 4,000-byte limit, starting with 12c, a VARCHAR2 can be configured to store up to 32,767 bytes of information.

The following small example demonstrates the differences between BYTE and CHAR and how the upper bounds come into play. We’ll create a table with three columns, the first two of which will be 1 byte and one character, respectively, with the last column being 4,000 characters. Notice that we’re performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a variable length fashion using from 1 to 4 bytes for each character:

EODA@ORA12CR1> select *
  2    from nls_database_parameters
  3   where parameter = 'NLS_CHARACTERSET';

PARAMETER                      VALUE
------------------------------ --------------------
NLS_CHARACTERSET               AL32UTF8

EODA@ORA12CR1> create table t
  2  ( a varchar2(1),
  3    b varchar2(1 char),
  4    c varchar2(4000 char)
  5  )
  6  /
Table created.

Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following:

EODA@ORA12CR1> insert into t (a) values (unistr('\00d6'));
insert into t (a) values (unistr('\00d6'))
                          *
ERROR at line 1:
ORA-12899: value too large for column "EODA"."T"."A" (actual: 2, maximum: 1)

This example demonstrates two things:

·     VARCHAR2(1) is in bytes, not characters. We have a single Unicode character, but it won’t fit into a single byte.

·     As you migrate an application from a single-byte fixed-width character set to a multibyte character set, you might find that the text that once fit into your fields no longer does.

The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit into a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables.

If we insert that single character into a field set up to hold a single character, we will observe the following:

EODA@ORA12CR1> insert into t (b) values (unistr('\00d6'));
1 row created.
EODA@ORA12CR1> select length(b), lengthb(b), dump(b) dump from t;
LENGTH(B) LENGTHB(B) DUMP
---------- ---------- --------------------
         1          2 Typ=1 Len=2: 195,150

That INSERT succeeded, and we can see that the LENGTH of the inserted data is one character—all of the character string functions work character-wise. So the length of the field is one character, but the LENGTHB (length in bytes) function shows it takes 2 bytes of storage, and theDUMP function shows us exactly what those bytes are. So, that example demonstrates one very common issue people encounter when using multibyte character sets, namely that a VARCHAR2(N) doesn’t necessarily hold N characters, but rather N bytes.

The next issue people confront frequently is that the maximum length in bytes of a VARCHAR2 is 4,000 and in a CHAR is 2,000:

EODA@ORA12CR1> declare
  2          l_data varchar2(4000 char);
  3          l_ch   varchar2(1 char) := unistr( '\00d6' );
  4  begin
  5          l_data := rpad( l_ch, 4000, l_ch );
  6          insert into t ( c )  values ( l_data );
  7  end;
  8  /
declare
*
ERROR at line 1:
ORA-01461: can bind a LONG value only for insert into a LONG column
ORA-06512: at line 6

This shows that a 4,000-character string that is really 8,000 bytes long cannot be stored permanently in a VARCHAR2(4000 CHAR) field. It fits in the PL/SQL variable because there a VARCHAR2 is allowed to be up to 32KB in size. However, when it is stored in a table, the hard limit is 4,000 bytes. We can store 2,000 of these characters successfully:

EODA@ORA12CR1> declare
  2          l_data varchar2(4000 char);
  3          l_ch   varchar2(1 char) := unistr( '\00d6' );
  4  begin
  5          l_data := rpad( l_ch, 2000, l_ch );
  6          insert into t ( c ) values ( l_data );
  7  end;
  8  /

PL/SQL procedure successfully completed.
EODA@ORA12CR1> select length( c ), lengthb( c )
  2    from t
  3   where c is not null;

LENGTH(C)  LENGTHB(C)
---------- ----------
2000       4000

And as you can see, they consume 4,000 bytes of storage.

The “N” Variant

So, of what use are the NVARCHAR2 and NCHAR (for completeness)? They are used in systems where the need to manage and store multiple character sets arises. This typically happens in a database where the predominant character set is a single-byte fixed-width one (such asWE8MSWIN1252), but the need arises to maintain and store some multibyte data. There are many systems that have legacy data but need to support multibyte data for some new applications; likewise, there are systems that want the efficiency of a single-byte character set for most operations (string operations on a string that uses fixed-width characters are more efficient than on a string where each character may use a different number of bytes) but need the flexibility of multibyte data at some points.

The NVARCHAR2 and NCHAR datatypes support this need. They are generally the same as their VARCHAR2 and CHAR counterparts, with the following exceptions:

·     Their text is stored and managed in the database’s national character set, not the default character set.

·     Their lengths are always provided in characters, whereas a CHAR/VARCHAR2 may specify either bytes or characters.

In Oracle9i and above, the database’s national character set may take one of two values: UTF8 or AL16UTF16 (UTF-16 in 9i; AL16UTF16 in 10g). This makes the NCHAR and NVARCHAR types suitable for storing only multibyte data, which is a change from earlier releases of the database (Oracle8i and earlier allowed you to choose any character set for the national character set).

Binary Strings: RAW Types

Oracle supports the storage of binary data as well as text. Binary data is not subject to the character set conversions we discussed earlier with regard to the CHAR and VARCHAR2 types. Therefore, binary datatypes are not suitable for storing user-supplied text, but are suitable for storing encrypted information—encrypted data is not “text,” but a binary representation of the original text, word processing documents containing binary markup information, and so on. Any string of bytes that should not be considered by the database to be “text” (or any other base datatype such as a number, date, and so on) and that should not have character set conversion applied to it should be stored in a binary datatype.

Oracle supports three datatypes for storing binary data:

·     The RAW type, which we focus on in this section, is suitable for storing RAW data up to 2,000 bytes in size. Starting with 12c, you can configure a RAW to store up to 32,767 bytes of information.

·     The BLOB type, which supports binary data of much larger sizes. We’ll defer coverage of this until the “LOB Types” section later in the chapter.

·     The LONG RAW type, which is supported for backward compatibility and should not be considered for new applications.

The syntax for the binary RAW type is straightforward:

RAW( <size> )

For example, the following code creates a table capable of storing 16 bytes of binary information per row:

EODA@ORA12CR1> create table t ( raw_data raw(16) );
Table created.

The RAW type is much like the VARCHAR2 type in terms of storage on disk. The RAW type is a variable length binary string, meaning that the table T just created, for example, may store anywhere from 0 to 16 bytes of binary data. It is not padded out like the CHAR type.

When dealing with RAW data, you will likely find it being implicitly converted to a VARCHAR2 type—that is, many tools, such as SQL*Plus, will not display the RAW data directly but will convert it to a hexadecimal format for display. In the following example, we create some binary data in our table using SYS_GUID(), a built-in function that returns a 16-byte RAW string that is globally unique (GUID stands for globally unique identifier):

EODA@ORA12CR1> insert into t values ( sys_guid() );
1 row created.
EODA@ORA12CR1> select * from t;

RAW_DATA
--------------------------------
EEF18AA30B563AF0E043B7D04F0A4A30

You can immediately note two things here. First, the RAW data looks like a character string. That is just how SQL*Plus retrieved and printed it; that is not how it is stored on disk. SQL*Plus cannot print arbitrary binary data on your screen, as that could have serious side effects on the display. Remember that binary data may include control characters such as a carriage return or linefeed—or maybe a Ctrl-G character that would cause your terminal to beep.

Second, the RAW data looks much larger than 16 bytes—in fact, in this example, you can see 32 characters. This is due to the fact that every binary byte takes two hexadecimal characters to display (if the leading character is a zero, the zero is not displayed). The stored RAW data is really 16 bytes in length, and you can see this using the Oracle SQL DUMP function. Here, I am dumping the value of the binary string and using the optional parameter to specify the base that should be used when displaying the value of each byte. I am using base 16, so we can compare the results of dump with the previous string:

EODA@ORA12CR1> select dump(raw_data,16) from t;

DUMP(RAW_DATA,16)
-------------------------------------------------------------------------------
Typ=23 Len=16: ee,f1,8a,a3,b,56,3a,f0,e0,43,b7,d0,4f,a,4a,30

So, DUMP shows us this binary string is in fact 16 bytes long (Len=16) and displays the binary data byte by byte. As we can see, this dump display matches up with the implicit conversion performed when SQL*Plus fetched the RAW data into a string. This implicit conversion goes the other direction as well:

EODA@ORA12CR1> insert into t values ( 'abcdef' );
1 row created.

That did not insert the string abcdef, but rather a 3-byte RAW with the bytes AB, CD, EF, or in decimal with the bytes 171, 205, 239. If you attempt to use a string that does not consist of valid hex characters, you will receive an error message:

EODA@ORA12CR1> insert into t values ( 'abcdefgh' );
insert into t values ( 'abcdefgh' )
                             *
ERROR at line 1:
ORA-01465: invalid hex number

The RAW type may be indexed and used in predicates—it is as functional as any other datatype. However, you must take care to avoid unwanted implicit conversions, and you must be aware that they will occur.

I prefer and recommend using explicit conversions in all cases, which can be performed using the following built-in functions:

·     HEXTORAW: To convert strings of hexadecimal characters to the RAW type

·     RAWTOHEX: To convert RAW strings to hexadecimal strings

The RAWTOHEX function is invoked implicitly by SQL*Plus when it fetches a RAW type into a string, and the HEXTORAW function is invoked implicitly when inserting the string. It is a good practice to avoid implicit conversions and to always be explicit when coding. So the previous examples could have been written as follows:

EODA@ORA12CR1> select rawtohex(raw_data) from t;

RAWTOHEX(RAW_DATA)
--------------------------------
EEF18AA30B563AF0E043B7D04F0A4A30

EODA@ORA12CR1> insert into t values ( hextoraw('abcdef') );
1 row created.

Extended Datatypes

Prior to Oracle 12c, the maximum length allowed for VARCHAR2 and NVARCHAR2 datatypes was 4,000 bytes, and 2,000 bytes for the RAW datatype. Starting with Oracle 12c, these datatypes can be configured to store up to 32,767 bytes. Listed next are the steps for enabling extended datatypes for a noncontainer (see Chapter 2 for a definition of the types of databases), single instance database. These steps must be performed as SYS:

SYS@O12CE> shutdown immediate;
SYS@O12CE> startup upgrade;
SYS@O12CE> alter system set max_string_size=extended;
SYS@O12CE> @?/rdbms/admin/utl32k.sql
SYS@O12CE> shutdown immediate;
SYS@O12CE> startup;

Image Note  Refer to the Oracle Database Reference guide for complete details on implementing extended datatypes for all types of databases (single instance, container, RAC, and Data Guard Logical Standby).

Once you’ve modified the MAX_STRING_SIZE to EXTENDED, you cannot modify the value back to the default (of STANDARD). It’s a one-way change. If you need to switch back, you will have to perform a recovery to a point in time before the change was made—meaning you’ll need RMAN backups (taken prior to the change) or have the flashback database enabled. You can also take a Data Pump export from a database with extended datatypes enabled and import into a database without extended datatypes enabled with the caveat that any tables with extended columns will fail on the import.

After enabling the extended datatype, you can create a table with an extended column, as follows:

EODA@O12CE> create table t(et varchar2(32727)) tablespace users;

Table created.

If you describe the table it will show the large definition:

EODA@O12CE> desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ET                                     VARCHAR2(32727)

You can manipulate the extended VARCHAR2 column via SQL just as you would a nonextended column, for example:

EODA@O12CE> insert into t values(rpad('abc',10000,'abc'));
EODA@O12CE> select substr(et,9500,10) from t where UPPER(et) like 'ABC%';

The extended datatype is internally implemented as a LOB. Assuming that the T table is created in a schema not containing any other objects, you’ll get the following when querying USER_OBJECTS:

EODA@O12CE> select object_name, object_type from user_objects;

OBJECT_NAME                  OBJECT_TYPE
---------------------------- ---------------
SYS_LOB0000019479C00001$$    LOB
SYS_IL0000019479C00001$$     INDEX
T                            TABLE

You can further verify the LOB segment details by querying USER_LOBS:

EODA@O12CE> select table_name, column_name, segment_name, tablespace_name, in_row
  2     from user_lobs where table_name='T';

TABLE_NAME  COLUMN_NAME  SEGMENT_NAME                 TABLESPACE_NAME    IN_
----------- ------------ ---------------------------- ------------------ ---
T           ET           SYS_LOB0000019479C00001$$    USERS              YES

You have no direct control over the LOB associated with the extended column. This means that you cannot manipulate the underlying LOB column with the DBMS_LOB package. Also, the internal LOB associated with the extended datatype column is not visible to you viaDBA_TAB_COLUMNS or COL$.

The LOB segment and associated LOB index are always stored in the tablespace of the table that the extended datatype was created in. Following normal LOB storage rules, Oracle stores the first 4,000 bytes inline within the table. Anything greater than 4,000 bytes is stored in the LOB segment. If the tablespace that the LOB is created in is using Automatic Segment Space Management (ASSM) then the LOB is created as a SecureFiles LOB, otherwise it is created as a BasicFiles LOB.

Image Note  See the “LOB Types” section later in this chapter for a discussion on in row storage and the technical aspects of SecureFiles and BasicFiles.

Your SQL access to any data stored in the extended-column LOB segment is transparently handled by Oracle. This has some interesting implications. For example, you can successfully select data stored in an extended column via a database link. This bit of code selects (via a database link) from a table named T in a remote database named O12CE:

EODA@ORA12CR1> select substr(et, 9000,10) from t@O12CE;

SUBSTR(ET,9000,10)
----------------------------------------
cabcabcabc

Why is that important? Consider what happens when a table is created in the remote O12CE database with a column defined with a LOB datatype:

EODA@O12CE> create table c(ct clob);

Table created.

Oracle throws an error if you attempt to select from the LOB column remotely over a database link:

EODA@ORA12CR1> select * from c@O12CE;
ERROR:
ORA-22992: cannot use LOB locators selected from remote tables

You can also perform set operation comparisons (UNION, UNION ALL, MINUS, INTERSECT) on extended columns, for example:

EODA@O12CE> select et from t minus select et from t;

Whereas if you tried to compare two LOB columns via a set operator, Oracle returns an error:

EODA@O12CE> select ct from c minus select ct from c;
select ct from c minus select ct from c
       *
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got CLOB

The prior examples demonstrate that you have more flexibility working with an extended datatype than you would if working directly with a LOB column. Therefore, if you have an application that deals with character data greater than 4,000 bytes but less than or equal to 32,727 bytes, then you may want to consider using extended datatypes. Also, if you’re migrating from a non-Oracle database (that supports large character columns) to an Oracle database, the extended datatype feature will help make that migration easier, as you can now define large sizes for VARCHAR2,NVARCHAR2, and RAW columns natively in Oracle.

Number Types

Oracle 10g and above supports three native datatypes suitable for storing numbers. Oracle9i Release 2 and earlier support exactly one native datatype suitable for storing numeric data. In this list, the NUMBER type is supported by all releases, and the subsequent two types are new datatypes supported only in Oracle 10g and above:

·     NUMBER: The Oracle NUMBER type is capable of storing numbers with an extremely large degree of precision—38 digits of precision, in fact. The underlying data format is similar to a packed decimal representation. The Oracle NUMBER type is a variable length format from 0 to 22 bytes in length. It is appropriate for storing any number as small as 10e-130 and numbers up to but not including 10e126. This is by far the most common NUMBER type in use today.

·     BINARY_FLOAT: This is an IEEE native single-precision floating-point number. On disk it will consume 5 bytes of storage: 4 fixed bytes for the floating-point number and 1 length byte. It is capable of storing numbers in the range of ~ ± 1038.53 with 6 digits of precision.

·     BINARY_DOUBLE: This is an IEEE native double-precision floating-point number. On disk it will consume 9 bytes of storage: 8 fixed bytes for the floating-point number and 1 length byte. It is capable of storing numbers in the range of ~ ± 10308.25 with 13 digits of precision.

As you can see from this quick overview, the Oracle NUMBER type has significantly larger precision than the BINARY_FLOAT and the BINARY_DOUBLE types, but a much smaller range than the BINARY_DOUBLE. That is, you can store numbers very precisely with many significant digits in a NUMBER type, but you can store much smaller and larger numbers in the BINARY_FLOAT and BINARY_DOUBLE types. As a quick example, we can create a table with the various datatypes in them and see what is stored given the same inputs:

EODA@ORA12CR1> create table t
  2  ( num_col   number,
  3    float_col binary_float,
  4    dbl_col   binary_double
  5  )
  6  /
Table created.

EODA@ORA12CR1> insert into t ( num_col, float_col, dbl_col )
  2  values ( 1234567890.0987654321,
  3           1234567890.0987654321,
  4           1234567890.0987654321 );
1 row created.

EODA@ORA12CR1> set numformat 99999999999.99999999999
EODA@ORA12CR1> select * from t;

                 NUM_COL                FLOAT_COL                  DBL_COL
------------------------ ------------------------ ------------------------
  1234567890.09876543210   1234567940.00000000000   1234567890.09876540000

Note that the NUM_COL returns the exact number we provided as input. There are fewer than 38 significant digits in the input number (I supplied a number with 20 significant digits), so the exact number is preserved. The FLOAT_COL, however, using the BINARY_FLOAT type, was not able to accurately represent this number. In fact, it preserved only 7 digits accurately. The DBL_COL fared much better, accurately representing the number in this case out to 17 digits. Overall, though, this should be a good indication that the BINARY_FLOAT and BINARY_DOUBLE types will not be appropriate for financial applications! If you play around with different values, you’ll see different results:

EODA@ORA12CR1> delete from t;
1 row deleted.

EODA@ORA12CR1> insert into t ( num_col, float_col, dbl_col )
  2  values ( 9999999999.9999999999,
  3           9999999999.9999999999,
  4           9999999999.9999999999 );
1 row created.

EODA@ORA12CR1> select * from t;

                 NUM_COL                FLOAT_COL                  DBL_COL
------------------------ ------------------------ ------------------------
  9999999999.99999999990  10000000000.00000000000  10000000000.00000000000

Once again, the NUM_COL accurately represented the number, but the FLOAT_COL and DBL_COL did not. This does not mean that the NUMBER type is able to store things with infinite accuracy/precision—just that it has a much larger precision associated with it. It is easy to observe similar results from the NUMBER type:

EODA@ORA12CR1> delete from t;
1 row deleted.

EODA@ORA12CR1> insert into t ( num_col )
  2  values ( 123 * 1e20 + 123*1e-20 ) ;
1 row created.

EODA@ORA12CR1> set numformat 999999999999999999999999.999999999999999999999999
EODA@ORA12CR1> select num_col, 123*1e20, 123*1e-20 from t;

                                           NUM_COL
--------------------------------------------------
                                          123*1E20
--------------------------------------------------
                                         123*1E-20
--------------------------------------------------
  12300000000000000000000.000000000000000000000000
  12300000000000000000000.000000000000000000000000
                         .000000000000000001230000

As you can see, when we put together a very large number (123*1e20) and a very small number (123*1e-20), we lost precision because this arithmetic requires more than 38 digits of precision. The large number by itself can be faithfully represented, as can the small number, but the result of the larger plus the smaller cannot. We can verify this is not just a display/formatting issue as follows:

EODA@ORA12CR1> select num_col from t where num_col = 123*1e20;

                                           NUM_COL
--------------------------------------------------
  12300000000000000000000.000000000000000000000000

The value in NUM_COL is equal to 123*1e20, and not the value we attempted to insert.

NUMBER Type Syntax and Usage

The syntax for the NUMBER type is straightforward:

NUMBER( p,s )

P and S are optional and are used to specify the following:

·     Precision, or the total number of digits: By default, the precision is 38 and has valid values in the range of 1 to 38. The character * may be used to represent 38 as well.

·     Scale, or the number of digits to the right of the decimal point: Valid values for the scale are –84 to 127, and its default value depends on whether or not the precision is specified. If no precision is specified, then scale defaults to the maximum range. If a precision is specified, then scale defaults to 0 (no digits to the right of the decimal point). So, for example, a column defined as NUMBER stores floating-point numbers (with decimal places), whereas a NUMBER(38) stores only integer data (no decimals), since the scale defaults to 0 in the second case.

You should consider the precision and scale to be edits for your data—data integrity tools in a way. The precision and scale do not affect at all how the data is stored on disk, only what values are permitted and how numbers are to be rounded. For example, if a value exceeds the precision permitted, Oracle returns an error:

EODA@ORA12CR1> create table t ( num_col number(5,0) );
Table created.

EODA@ORA12CR1> insert into t (num_col) values ( 12345 );
1 row created.

EODA@ORA12CR1> insert into t (num_col) values ( 123456 );
insert into t (num_col) values ( 123456 )
                                 *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column

So, you can use the precision to enforce some data integrity constraints. In this case, NUM_COL is a column that is not allowed to have more than five digits.

The scale, on the other hand, is used to control rounding of the number. For example:

EODA@ORA12CR1> create table t ( msg varchar2(10), num_col number(5,2) );
Table created.

EODA@ORA12CR1> insert into t (msg,num_col) values ( '123.45',  123.45 );
1 row created.

EODA@ORA12CR1> insert into t (msg,num_col) values ( '123.456', 123.456 );
1 row created.

EODA@ORA12CR1> select * from t;

MSG           NUM_COL
---------- ----------
123.45         123.45
123.456        123.46

Notice how the number 123.456, with more than five digits, succeeded this time. That is because the scale we used in this example was used to round 123.456 to two digits, resulting in 123.46, and then 123.46 was validated against the precision, found to fit, and inserted. However, if we attempt the following insert, it fails because the number 1234.00 has more than five digits in total:

EODA@ORA12CR1> insert into t (msg,num_col) values ( '1234', 1234 );
insert into t (msg,num_col) values ( '1234', 1234 )
                                             *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column

When you specify the scale of 2, at most three digits may be to the left of the decimal place and two to the right. Hence that number does not fit. The NUMBER(5,2) column can hold all values between 999.99 and –999.99.

It may seem strange to allow the scale to vary from –84 to 127. What purpose could a negative scale fulfill? It allows you to round values to the left of the decimal place. Just as the NUMBER(5,2) rounded values to the nearest .01, so a NUMBER(5,-2) would round to the nearest 100, for example:

EODA@ORA12CR1> create table t ( msg varchar2(10), num_col number(5,-2) );
Table created.

EODA@ORA12CR1> insert into t (msg,num_col) values ( '123.45',  123.45 );
1 row created.

EODA@ORA12CR1> insert into t (msg,num_col) values ( '123.456', 123.456 );
1 row created.

EODA@ORA12CR1> select * from t;

MSG           NUM_COL
---------- ----------
123.45            100
123.456           100

The numbers were rounded to the nearest 100. We still have five digits of precision, but there are now seven digits (including the trailing two 0s) permitted to the left of the decimal point:

EODA@ORA12CR1> insert into t (msg,num_col) values ( '1234567', 1234567 );
1 row created.

EODA@ORA12CR1> select * from t;

MSG           NUM_COL
---------- ----------
123.45            100
123.456           100
1234567       1234600

EODA@ORA12CR1> insert into t (msg,num_col) values ( '12345678', 12345678 );
insert into t (msg,num_col) values ( '12345678', 12345678 )
                                                 *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column

So, the precision dictates how many digits are permitted in the number after rounding, using the scale to determine how to round. The precision is an integrity constraint, whereas the scale is an edit.

It is interesting and useful to note that the NUMBER type is, in fact, a variable length datatype on disk and will consume between 0 and 22 bytes of storage. Many times, programmers consider a numeric datatype to be a fixed-length type—that is what they typically see when programming with 2- or 4-byte integers and 4- or 8-byte floats. The Oracle NUMBER type is similar to a variable length character string. We can see what happens with numbers that contain differing amounts of significant digits. We’ll create a table with two NUMBER columns and populate the first column with many numbers that have 2, 4, 6, ... 28 significant digits. Then, we’ll simply add 1 to each of them:

EODA@ORA12CR1> create table t ( x number, y number );
Table created.

EODA@ORA12CR1> insert into t ( x )
  2  select to_number(rpad('9',rownum*2,'9'))
  3    from all_objects
  4   where rownum <= 14;
14 rows created.

EODA@ORA12CR1> update t set y = x+1;
14 rows updated.

Now, if we use the built-in VSIZE function that shows how much storage the column takes, we can review the size differences between the two numbers in each row:

EODA@ORA12CR1> set numformat 99999999999999999999999999999
EODA@ORA12CR1> column v1 format 99
EODA@ORA12CR1> column v2 format 99
EODA@ORA12CR1> select x, y, vsize(x) v1, vsize(y) v2
  2    from t order by x;

                             X                              Y  V1  V2
------------------------------ ------------------------------ --- ---
                            99                            100   2   2
                          9999                          10000   3   2
                        999999                        1000000   4   2
                      99999999                      100000000   5   2
                    9999999999                    10000000000   6   2
                  999999999999                  1000000000000   7   2
                99999999999999                100000000000000   8   2
              9999999999999999              10000000000000000   9   2
            999999999999999999            1000000000000000000  10   2
          99999999999999999999          100000000000000000000  11   2
        9999999999999999999999        10000000000000000000000  12   2
      999999999999999999999999      1000000000000000000000000  13   2
    99999999999999999999999999    100000000000000000000000000  14   2
  9999999999999999999999999999  10000000000000000000000000000  15   2

14 rows selected.

We can see that as we added significant digits to X, the amount of storage required took increasingly more room. Every two significant digits added another byte of storage. But a number just one larger consistently took 2 bytes. When Oracle stores a number, it does so by storing as little as it can to represent that number. It does this by storing the significant digits, an exponent used to place the decimal place, and information regarding the sign of the number (positive or negative). So, the more significant digits a number contains, the more storage it consumes.

That last fact explains why it is useful to know that numbers are stored in varying width fields. When attempting to size a table (e.g., to figure out how much storage 1,000,000 rows would need in a table), you have to consider the NUMBER fields carefully. Will your numbers take 2 bytes or 20 bytes? What is the average size? This makes accurately sizing a table without representative test data very hard. You can get the worst-case size and the best-case size, but the real size will likely be some value in between.

BINARY_FLOAT/BINARY_DOUBLE Type Syntax and Usage

Oracle 10g introduced two numeric types for storing data; they are not available in any release of Oracle prior to version 10g. These are the IEEE standard floating-points many programmers are used to working with. For a full description of what these number types look like and how they are implemented, I suggest reading http://en.wikipedia.org/wiki/Floating-point. It is interesting to note the following in the basic definition of a floating-point number in that reference (emphasis mine):

A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. In particular, it represents an integer or fixed-point number (the significand or, informally, the mantissa) multiplied by a base (usually 2 in computers) to some integer power (the exponent). When the base is 2, it is the binary analogue of scientific notation (in base 10).

They are used to approximate numbers; they are not nearly as precise as the built-in Oracle NUMBER type described previously. Floating-point numbers are commonly used in scientific applications and are useful in many types of applications due to the fact that they allow arithmetic to be done in hardware (on the CPU, the chip) rather than in Oracle subroutines. Therefore, the arithmetic is much faster if you are doing real number-crunching in a scientific application, but you would not want to use floating-points to store financial information. For example, suppose you wanted to add together 0.3 and 0.1 as floats. You might think the answer is of course 0.4. You would be wrong (in floating point arithmetic). The answer is a little bit larger than 0.4:

EODA@ORA12CR1> select to_char( 0.3f + 0.1f, '0.99999999999999' ) from dual;

TO_CHAR(0.3F+0.1F
-----------------
 0.40000000600000

This is not a bug, this is the way IEEE floating point numbers work. As a result, they are useful for a certain domain of problems, but definitely not for problems where dollars and cents count!

The syntax for declaring columns of this type in a table is very straightforward:

BINARY_FLOAT
BINARY_DOUBLE

That is it. There are no options to these types whatsoever.

Non-Native Number Types

In addition to the NUMBER, BINARY_FLOAT, and BINARY_DOUBLE types, Oracle syntactically supports the following numeric datatypes:

·     NUMERIC(p,s): Maps exactly to a NUMBER(p,s). If p is not specified, it defaults to 38.

·     DECIMAL(p,s) or DEC(p,s): Maps exactly to a NUMBER(p,s). If p is not specified, it defaults to 38.

·     INTEGER or INT: Maps exactly to the NUMBER(38) type.

·     SMALLINT: Maps exactly to the NUMBER(38) type.

·     FLOAT(p): Maps to the NUMBER type.

·     DOUBLE PRECISION: Maps to the NUMBER type.

·     REAL: Maps to the NUMBER type.

Image Note  When I say “syntactically supports,” I mean that a CREATE statement may use these datatypes, but under the covers they are all really the NUMBER type. There are precisely three native numeric formats in Oracle 10g Release 1 and above and only one native numeric format in Oracle9i Release 2 and earlier. The use of any other numeric datatype is always mapped to the native Oracle NUMBER type.

Performance Considerations

In general, the Oracle NUMBER type is the best overall choice for most applications. However, there are performance implications associated with that type. The Oracle NUMBER type is a software datatype—it is implemented in the Oracle software itself. We cannot use native hardware operations to add two NUMBER types together, as it is emulated in the software. The floating-point types, however, do not have this implementation. When we add two floating-point numbers together, Oracle will use the hardware to perform the operation.

This is fairly easy to see. If we create a table that contains about 70,000 rows and place the same data in there using the NUMBER and BINARY_FLOAT/BINARY_DOUBLE types as follows:

EODA@ORA12CR1> create table t
  2  ( num_type     number,
  3    float_type   binary_float,
  4    double_type  binary_double
  5  )
  6  /
Table created.

EODA@ORA12CR1> insert /*+ APPEND */  into t
  2  select rownum, rownum, rownum
  3    from all_objects
  4  /
72089 rows created.

EODA@ORA12CR1> commit;
Commit complete.

We then execute the same query against each type of column, using a complex mathematical function such as LN (natural log). We observe in a TKPROF report radically different CPU utilization:

select sum(ln(num_type)) from t
call     count       cpu    elapsed
------- ------  -------- ----------
total        4      4.45       4.66

select sum(ln(float_type)) from t

call     count       cpu    elapsed
------- ------  -------- ----------
total        4      0.07       0.08

select sum(ln(double_type)) from t

call     count       cpu    elapsed
------- ------  -------- ----------
total        4      0.06       0.06

The Oracle NUMBER type used some 63 times the CPU of the floating-point types in this example. But, you have to remember that we did not receive precisely the same answer from all three queries!

EODA@ORA12CR1> set numformat 999999.9999999999999999

EODA@ORA12CR1> select sum(ln(num_type)) from t;

         SUM(LN(NUM_TYPE))
--------------------------
   734280.3209126472927309

EODA@ORA12CR1> select sum(ln(double_type)) from t;

      SUM(LN(DOUBLE_TYPE))
--------------------------
   734280.3209126447300000

The floating-point numbers were an approximation of the number, with between 6 and 13 digits of precision. The answer from the NUMBER type is much more precise than from the floats. However, when you are performing data mining or complex numerical analysis of scientific data, this loss of precision is typically acceptable, and the performance gain to be had can be dramatic.

Image Note  If you are interested in the gory details of floating-point arithmetic and the subsequent loss of precision, see http://docs.sun.com/source/806-3568/ncg_goldberg.html.

It should be noted that in this case we can sort of have our cake and eat it, too. Using the built-in CAST function, we can perform an on-the-fly conversion of the Oracle NUMBER type to a floating-point type, prior to performing the complex math on it. This results in a CPU usage that is much nearer to that of the native floating-point types:

select sum(ln(cast( num_type as binary_double ) )) from t

call     count       cpu    elapsed
------- ------  -------- ----------
total        4      0.08       0.08

This implies that we may store our data very precisely, and when the need for raw speed arises, and the floating-point types significantly outperform the Oracle NUMBER type, we can use the CAST function to accomplish that goal.

Long Types

LONG types come in two flavors in Oracle:

·     A LONG text type capable of storing 2GB of text. The text stored in the LONG type is subject to character set conversion, much like a VARCHAR2 or CHAR type.

·     A LONG RAW type capable of storing 2GB of raw binary data (data that is not subject to character set conversion).

The LONG types date back to version 6 of Oracle, when they were limited to 64KB of data. In version 7, they were enhanced to support up to 2GB of storage, but by the time version 8 was released, they were superseded by the LOB types, which we will discuss shortly.

Rather than explain how to use the LONG type, I will explain why you do not want to use the LONG (or LONG RAW) type in your applications. First and foremost, the Oracle documentation is very clear in its treatment of the LONG types. The Oracle Database SQL Language Referencemanual states the following:

Do not create a table with LONG columns. Use LOB columns (CLOB, NCLOB, BLOB) instead. LONG columns are supported only for backward compatibility.

Restrictions on LONG and LONG RAW Types

The LONG and LONG RAW types are subject to the restrictions outlined in Table 12-2. Even though it might be considered jumping ahead, I’ve added a column to say whether the corresponding LOB type, which is the replacement for the LONG/LONG RAW types, is subject to the same restriction.

Table 12-2. Long Types Compared to LOBs

LONG/LONG RAW Type

CLOB/BLOB Type

You may have only one LONG or LONG RAW column per table.

You may have up to 1,000 columns of CLOB or BLOB type per table.

User-defined types may not be defined with attributes of type LONG/LONG RAW.

User-defined types may fully use CLOB and BLOB types.

LONG types may not be referenced in the WHERE clause.

LOBs may be referenced in the WHERE clause, and a host of functions is supplied in the DBMS_LOB package to manipulate them.

LONG types do not support distributed transactions.

LOBs do support distributed transactions.

LONG types cannot be replicated using basic or advanced replication.

LOBs fully support replication.

LONG columns cannot be in a GROUP BY, ORDER BY, or CONNECT BY, or in a query that uses DISTINCT, UNIQUE, INTERSECT, MINUS, or UNION.

LOBs may appear in these clauses provided a function is applied to the LOB that converts it into a scalar SQL type (contains an atomic value) such as a VARCHAR2, NUMBER, or DATE.

PL/SQL functions/procedures cannot accept an input of type LONG.

PL/SQL works fully with LOB types.

SQL built-in functions cannot be used against LONG columns (e.g., SUBSTR).

SQL functions may be used against LOB types.

You cannot use a LONG type in a CREATE TABLE AS SELECT statement.

LOBs support CREATE TABLE AS SELECT.

You cannot use ALTER TABLE MOVE on a table containing LONG types.

You may move tables containing LOBs.

As you can see, Table 12-2 presents quite a long list; there are many things you just cannot do when you have a LONG column in the table. For all new applications, do not even consider using the LONG type. Instead, use the appropriate LOB type. For existing applications, you should seriously consider converting the LONG type to the corresponding LOB type if you are hitting any of the restrictions in Table 12-2. Care has been taken to provide backward compatibility so that an application written for LONG types will work against the LOB type transparently.

Image Note  It almost goes without saying that you should perform a full functionality test against your application(s) before modifying your production system from LONG to LOB types.

Coping with Legacy LONG Types

A question that arises frequently is, “What about the data dictionary in Oracle?” It is littered with LONG columns, and this makes using the dictionary columns problematic. For example, it is not possible using SQL to search the ALL_VIEWS dictionary view to find all views that contain the text HELLO:

EODA@ORA12CR1> select *
  2  from all_views
  3  where text like '%HELLO%';
where text like '%HELLO%'
      *
ERROR at line 3:
ORA-00932: inconsistent datatypes: expected CHAR got LONG

This issue is not limited to the ALL_VIEWS view; many views are affected:

EODA@ORA12CR1> select table_name, column_name
  2  from dba_tab_columns
  3  where data_type in ( 'LONG', 'LONG RAW' )
  4  and owner = 'SYS'
  5  and table_name like 'DBA%'
  6  order by table_name;

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
DBA_ADVISOR_SQLPLANS           OTHER
DBA_ARGUMENTS                  DEFAULT_VALUE
DBA_CLUSTER_HASH_EXPRESSIONS   HASH_EXPRESSION
DBA_CONSTRAINTS                SEARCH_CONDITION
DBA_IND_EXPRESSIONS            COLUMN_EXPRESSION
DBA_IND_PARTITIONS             HIGH_VALUE
DBA_IND_SUBPARTITIONS          HIGH_VALUE
DBA_MVIEWS                     QUERY
DBA_MVIEW_AGGREGATES           MEASURE
DBA_MVIEW_ANALYSIS             QUERY
DBA_NESTED_TABLE_COLS          DATA_DEFAULT
DBA_OUTLINES                   SQL_TEXT
DBA_REGISTERED_MVIEWS          QUERY_TXT
DBA_REGISTERED_SNAPSHOTS       QUERY_TXT
DBA_SNAPSHOTS                  QUERY
DBA_SQLSET_PLANS               OTHER
DBA_SQLTUNE_PLANS              OTHER
DBA_SUBPARTITION_TEMPLATES     HIGH_BOUND
DBA_SUMMARIES                  QUERY
DBA_SUMMARY_AGGREGATES         MEASURE
DBA_TAB_COLS                   DATA_DEFAULT
DBA_TAB_COLS_V$                DATA_DEFAULT
DBA_TAB_COLUMNS                DATA_DEFAULT
DBA_TAB_PARTITIONS             HIGH_VALUE
DBA_TAB_SUBPARTITIONS          HIGH_VALUE
DBA_TRIGGERS                   TRIGGER_BODY
DBA_VIEWS                      TEXT
DBA_VIEWS_AE                   TEXT
DBA_ZONEMAPS                   QUERY
DBA_ZONEMAP_MEASURES           MEASURE

30 rows selected.

So, what is the solution? If you want to make use of these columns in SQL, then you’ll need to convert them to a SQL-friendly type. You can use a user-defined function for doing so. The following example demonstrates how to accomplish this using a LONG SUBSTR function that willallow you to effectively convert any 4,000 bytes of a LONG type into a VARCHAR2 for use with SQL. When you are done, you’ll be able to query:

EODA@ORA12CR1> select *
  2    from (
  3  select owner, view_name,
  4         long_help.substr_of( 'select text
  5                                 from dba_views
  6                                where owner = :owner
  7                                  and view_name = :view_name',
  8                               1, 4000,
  9                               'owner', owner,
 10                               'view_name', view_name ) substr_of_view_text
 11    from dba_views
 12   where owner = user
 13         )
 14   where upper(substr_of_view_text) like '%INNER%'
 15  /

You’ve converted the first 4,000 bytes of the TEXT column from LONG to VARCHAR2 and can now use a predicate on it. Using the same technique, you could implement your own INSTR, LIKE, and so forth for LONG types as well. In this book, I’ll only demonstrate how to get the substring of a LONG type.

The package we will implement has the following specification:

EODA@ORA12CR1> create or replace package long_help
  2  authid current_user
  3  as
  4      function substr_of
  5      ( p_query in varchar2,
  6        p_from  in number,
  7        p_for   in number,
  8        p_name1 in varchar2 default NULL,
  9        p_bind1 in varchar2 default NULL,
 10        p_name2 in varchar2 default NULL,
 11        p_bind2 in varchar2 default NULL,
 12        p_name3 in varchar2 default NULL,
 13        p_bind3 in varchar2 default NULL,
 14        p_name4 in varchar2 default NULL,
 15        p_bind4 in varchar2 default NULL )
 16      return varchar2;
 17  end;
 18  /
Package created.

Note that on line 2, we specify AUTHID CURRENT_USER. This makes the package run as the invoker, with all roles and grants in place. This is important for two reasons. First, we’d like the database security to not be subverted—this package will only return substrings of columns we (the invoker) are allowed to see. Specifically, that means this package is not vulnerable to SQL injection attacks—it is not running as the owner of the package but as the invoker. Second, we’d like to install this package once in the database and have its functionality available for all to use; using invoker rights allows us to do that. If we used the default security model of PL/SQL—definer rights—the package would run with the privileges of the owner of the package, meaning it would only be able to see data the owner of the package could see, which may not include the set of data the invoker is allowed to see.

The concept behind the function SUBSTR_OF is to take a query that selects at most one row and one column: the LONG value we are interested in. SUBSTR_OF will parse that query if needed, bind any inputs to it, and fetch the results programmatically, returning the necessary piece of the LONG value.

The package body, the implementation, begins with two global variables. The G_CURSOR variable holds a persistent cursor open for the duration of our session. This is to avoid having to repeatedly open and close the cursor and to avoid parsing SQL more than we need to. The second global variable, G_QUERY, is used to remember the text of the last SQL query we’ve parsed in this package. As long as the query remains constant, we’ll just parse it once. So, even if we query 5,000 rows in a query, as long as the SQL query we pass to this function doesn’t change, we’ll have only one parse call:

EODA@ORA12CR1> create or replace package body long_help
  2  as
  3
  4      g_cursor number := dbms_sql.open_cursor;
  5      g_query  varchar2(32765);
  6

Next in this package is a private procedure, BIND_VARIABLE, which we’ll use to bind inputs passed to us by the caller. We implemented this as a separate private procedure only to make life easier; we want to bind only when the input name is NOT NULL. Rather than perform that check four times in the code for each input parameter, we do it once in this procedure:

  7  procedure bind_variable( p_name in varchar2, p_value in varchar2 )
  8  is
  9  begin
 10      if ( p_name is not null )
 11      then
 12          dbms_sql.bind_variable( g_cursor, p_name, p_value );
 13      end if;
 14  end;
 15

Next is the actual implementation of SUBSTR_OF in the package body. This routine begins with a function declaration from the package specification and the declaration for some local variables. L_BUFFER will be used to return the value, and L_BUFFER_LEN will be used to hold the length returned by an Oracle-supplied function:

 16
 17  function substr_of
 18  ( p_query in varchar2,
 19    p_from  in number,
 20    p_for   in number,
 21    p_name1 in varchar2 default NULL,
 22    p_bind1 in varchar2 default NULL,
 23    p_name2 in varchar2 default NULL,
 24    p_bind2 in varchar2 default NULL,
 25    p_name3 in varchar2 default NULL,
 26    p_bind3 in varchar2 default NULL,
 27    p_name4 in varchar2 default NULL,
 28    p_bind4 in varchar2 default NULL )
 29  return varchar2
 30  as
 31      l_buffer       varchar2(4000);
 32      l_buffer_len   number;
 33  begin

Now, the first thing our code does is a sanity check on the P_FROM and P_FOR inputs. P_FROM must be a number greater than or equal to 1, and P_FOR must be between 1 and 4,000—just like the built-in function SUBSTR:

 34      if ( nvl(p_from,0) <= 0 )
 35      then
 36          raise_application_error
 37          (-20002, 'From must be >=1 (positive numbers)' );
 38      end if;
 39      if ( nvl(p_for,0) not between 1 and 4000 )
 40      then
 41          raise_application_error
 42          (-20003, 'For must be between 1 and 4000' );
 43      end if;
 44

Next, we’ll check to see if we are getting a new query that needs to be parsed. If the last query we parsed is the same as the current query, we can skip this step. It is very important to note that on line 47 we are verifying that the P_QUERY passed to us is just a SELECT—we will use this package only to execute SQL SELECT statements. This check validates that for us:

 45      if ( p_query <> g_query or g_query is NULL )
 46      then
 47          if ( upper(trim(nvl(p_query,'x'))) not like 'SELECT%')
 48          then
 49              raise_application_error
 50              (-20001, 'This must be a select only' );
 51          end if;
 52          dbms_sql.parse( g_cursor, p_query, dbms_sql.native );
 53          g_query := p_query;
 54      end if;

We are ready to bind the inputs to this query. Any non-NULL names that were passed to us will be bound to the query, so when we execute it, it finds the right row:

 55      bind_variable( p_name1, p_bind1 );
 56      bind_variable( p_name2, p_bind2 );
 57      bind_variable( p_name3, p_bind3 );
 58      bind_variable( p_name4, p_bind4 );
 59

And now we can execute the query and fetch the row. Using DBMS_SQL.COLUMN_VALUE_LONG, we extract the necessary substring of the LONG and return it:

 60      dbms_sql.define_column_long(g_cursor, 1);
 61      if (dbms_sql.execute_and_fetch(g_cursor)>0)
 62      then
 63          dbms_sql.column_value_long
 64          (g_cursor, 1, p_for, p_from-1,
 65           l_buffer, l_buffer_len );
 66      end if;
 67      return l_buffer;
 68  end substr_of;
 69
 70  end;
 71  /
Package body created.

That’s it—you should be able to use that package against any legacy LONG column in your database, allowing you to perform many WHERE clause operations that were not possible before. For example, you can now find all partitions in your schema such that the HIGH_VALUE has the year 2014 in it (please remember that if you do not have any tables with 2014 in the partition high value, you would not expect to see anything returned):

EODA@ORA12CR1> select *
  2    from (
  3  select table_owner, table_name, partition_name,
  4         long_help.substr_of
  5         ( 'select high_value
  6              from all_tab_partitions
  7             where table_owner = :o
  8               and table_name = :n
  9               and partition_name = :p',
 10            1, 4000,
 11            'o', table_owner,
 12            'n', table_name,
 13            'p', partition_name ) high_value
 14    from all_tab_partitions
 15    where table_owner = user
 16         )
 17   where high_value like '%2014%'
 18  /

TABLE_OWNER TABLE_NAME  PARTITION_NAME HIGH_VALUE
----------- ----------- -------------- --------------------
EODA        F_CONFIGS   CONFIG_P_7     20140101

Using this same technique—that of processing the result of a query that returns a single row with a single LONG column in a function—you can implement your own INSTR, LIKE, and so on as needed.

This implementation works well on the LONG type, but will not work on LONG RAW types. LONG RAWs are not piecewise accessible (there is no COLUMN_VALUE_LONG_RAW function in DBMS_SQL). Fortunately, this is not too serious of a restriction since LONG RAWs are not used in the dictionary and the need to “substring” so you can search on it is rare. If you do have a need to do so, however, you will not use PL/SQL unless the LONG RAW is 32KB or less, as there is simply no method for dealing with LONG RAWs over 32KB in PL/SQL itself. Java, C, C++, Visual Basic, or some other language would have to be used.

Another approach is to temporarily convert the LONG or LONG RAW into a CLOB or BLOB using the TO_LOB built-in function and a global temporary table. Your PL/SQL procedure could be as follows:

Insert into global_temp_table ( blob_column )
select to_lob(long_raw_column) from t where...

This would work well in an application that occasionally needs to work with a single LONG RAW value. You would not want to continuously do that, however, due to the amount of work involved. If you find yourself needing to resort to this technique frequently, you should definitely convert the LONG RAW to a BLOB once and be done with it.

Dates, Timestamps, and Interval Types

The native Oracle datatypes of DATE, TIMESTAMP, and INTERVAL are closely related. The DATE and TIMESTAMP types store fixed date/times with varying degrees of precision. The INTERVAL type is used to store an amount of time, such as “8 hours” or “30 days,” easily. The result of subtracting two timestamps might be an interval, the result of adding an interval of 8 hours to a TIMESTAMP results in a new TIMESTAMP that is 8 hours later.

The DATE datatype has been part of Oracle for many releases—as far back as my experience with Oracle goes, which means at least back to version 5 and probably before. The TIMESTAMP and INTERVAL types are relative newcomers to the scene by comparison, as they were introduced with Oracle9i Release 1. For this simple reason, you will find the DATE datatype to be the most prevalent type for storing date/time information. But many new applications are using the TIMESTAMP type for two reasons: it has support for fractions of seconds (the DATE type does not) and it has support for time zones (something the DATE type also does not have).

We’ll take a look at each type after discussing DATE/TIMESTAMP formats and their uses.

Formats

I am not going to attempt to cover all of the DATE, TIMESTAMP, and INTERVAL formats here. This is well covered in the Oracle Database SQL Language Reference manual, which is freely available to all. A wealth of formats is available to you, and a good understanding of what they are is vital. I strongly recommended that you investigate them.

I’d like to discuss what the formats do here, as there are a great many misconceptions surrounding this topic. The formats are used for two things:

·     To format the data on the way out of the database in a style that pleases you

·     To tell the database how to convert an input string into a DATE, TIMESTAMP, or INTERVAL

And that is all. The common misconception I’ve observed over the years is that the format used somehow affects what is stored on disk and how the data is actually saved. The format has no effect at all on how the data is stored. The format is only used to convert the single binary format used to store a DATE into a string or to convert a string into the single binary format that is used to store a DATE. The same is true for TIMESTAMPs and INTERVALs.

My advice on formats is simply this: use them. Use them when you send a string to the database that represents a DATE, TIMESTAMP, or INTERVAL. Do not rely on default date formats—defaults can and probably will at some point in the future be changed by someone.

Image Note  Refer back to Chapter 1 for a really good security reason to never use TO_CHAR/TO_DATE without an explicit format. In that chapter, I described a SQL injection attack that was available to an end user simply because the developer forgot to use an explicit format. Additionally, performing date operations without using an explicit date format can and will lead to incorrect answers. In order to appreciate this, just tell me what date this string represents: ‘01-02-03’. Whatever you say it represents, I’ll tell you that you are wrong. Never rely on defaults!

If you rely on a default date format and it changes, your application may be negatively affected. It might raise an error back to the end user if the date cannot be converted, have a serious security flaw or, as bad, it might silently insert the wrong data. Consider the follow INSERTstatement, which relies on a default date mask:

Insert into t ( date_column ) values ( '01/02/03' );

Suppose the application was relying on a default date mask of DD/MM/YY to be in place. That would be February 1, 2003 (assuming that code was executed after the year 2000, but we’ll visit the implications of that in a moment). Now, say someone decides the correct date format should be MM/DD/YY. All of a sudden, that previous date changes to January 2, 2003. Or someone decides YY/MM/DD is right, and now you have February 3, 2001. In short, without a date format to accompany that date string, there are many ways to interpret it. That INSERT statement should be:

Insert into t ( date_column ) values ( to_date( '01/02/03', 'DD/MM/YY' ) );

And if you want my opinion, it has to be:

Insert into t ( date_column ) values ( to_date( '01/02/2003', 'DD/MM/YYYY' ) );

That is, it must use a four-character year. Several years ago, our industry learned the hard way how much time and effort was spent remedying software that attempted to “save” 2 bytes. We seem to have lost that lesson over time. There is no excuse nowadays not to use a four-character year! Just because the year 2000 has come and gone does not mean you can now use a 2 character year. Think about birth dates, for example. If you enter a birth date using

Insert into t ( DOB ) values ( to_date( '01/02/10', 'DD/MM/YY' ) );

is that Feb 1st, 2010 or Feb 1st, 1910? Either one is a valid value; you cannot just pick one to be correct.

This same discussion applies to data leaving the database. If you execute SELECT DATE_COLUMN FROM T and fetch that column into a string in your application, then you should apply an explicit date format to it. Whatever format your application is expecting should be explicitly applied there. Otherwise, at some point in the future when someone changes the default date format, your application may break or behave incorrectly.

Next, let’s look at the datatypes themselves in more detail.

DATE Type

The DATE type is a fixed-width 7-byte date/time datatype. It will always contain the seven attributes of the century, the year within the century, the month, the day of the month, the hour, the minute, and the second. Oracle uses an internal format to represent that information, so it is not really storing 20, 05, 06, 25, 12, 01, 00 for June 25, 2005, at 12:01:00. Using the built-in DUMP function, we can see what Oracle really stores:

EODA@ORA12CR1> create table t ( x date );
Table created.

EODA@ORA12CR1> insert into t (x) values
  2  ( to_date( '25-jun-2005 12:01:00',
  3             'dd-mon-yyyy hh24:mi:ss' ) );
1 row created.

EODA@ORA12CR1> select x, dump(x,10) d from t;

X         D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1

The century and year bytes (the 120,105 in the DUMP output) are stored in an excess-100 notation. You would have to subtract 100 from them to determine the correct century and year. The reason for the excess-100 notation is support of BC and AD dates. If you subtract 100 from the century byte and get a negative number, it is a BC date. For example:

EODA@ORA12CR1> insert into t (x) values
  2  ( to_date( '01-jan-4712bc',
  3             'dd-mon-yyyybc hh24:mi:ss' ) );
1 row created.

EODA@ORA12CR1> select x, dump(x,10) d from t;

X         D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
01-JAN-12 Typ=12 Len=7: 53,88,1,1,1,1,1

So, when we insert 01-JAN-4712BC, the century byte is 53 and 53 – 100 = –47, the century we inserted. Because it is negative, we know that it is a BC date. This storage format also allows the dates to be naturally sortable in a binary sense. Since 4712 BC is less than 4710 BC, we’d like a binary representation that supports that. By dumping those two dates, we can see that 01-JAN-4710BC is larger than the same day in 4712 BC, so they will sort and compare nicely:

EODA@ORA12CR1> insert into t (x) values
  2  ( to_date( '01-jan-4710bc',
  3             'dd-mon-yyyybc hh24:mi:ss' ) );
1 row created.

EODA@ORA12CR1> select x, dump(x,10) d from t;

X         D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
01-JAN-12 Typ=12 Len=7: 53,88,1,1,1,1,1
01-JAN-10 Typ=12 Len=7: 53,90,1,1,1,1,1

The month and day bytes, the next two fields, are stored naturally, without any modification. So, June 25 used a month byte of 6 and a day byte of 25. The hour, minute, and second fields are stored in excess-1 notation, meaning we must subtract 1 from each component to see what time it really was. Hence midnight is represented as 1,1,1 in the date field.

This 7-byte format is naturally sortable, as you have seen—it is a 7 byte field that can be sorted in a binary fashion from small to larger (or vice versa) very efficiently. Additionally, its structure allows for easy truncation, without converting the date into some other format. For example, truncating the date we just stored (25-JUN-2005 12:01:00) to the day (remove the hours, minutes, seconds) is very straightforward. Just set the trailing three bytes to 1,1,1 and the time component is as good as erased. Consider a fresh table, T, with the following inserts:

EODA@ORA12CR1> create table t ( what varchar2(10), x date );
Table created.

EODA@ORA12CR1> insert into t (what, x) values
  2  ( 'orig',
  3    to_date( '25-jun-2005 12:01:00',
  4             'dd-mon-yyyy hh24:mi:ss' ) );
1 row created.

EODA@ORA12CR1> insert into t (what, x)
  2  select 'minute', trunc(x,'mi') from t
  3  union all
  4  select 'day', trunc(x,'dd') from t
  5  union all
  6  select 'month', trunc(x,'mm') from t
  7  union all
  8  select 'year', trunc(x,'y') from t
  9  /
4 rows created.

EODA@ORA12CR1> select what, x, dump(x,10) d from t;

WHAT     X         D
-------- --------- -----------------------------------
orig     25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
minute   25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
day      25-JUN-05 Typ=12 Len=7: 120,105,6,25,1,1,1
month    01-JUN-05 Typ=12 Len=7: 120,105,6,1,1,1,1
year     01-JAN-05 Typ=12 Len=7: 120,105,1,1,1,1,1

To truncate that date down to the year, all the database had to do was put 1s in the last 5 bytes—a very fast operation. We now have a sortable, comparable DATE field that is truncated to the year level, and we got it as efficiently as possible.

Adding or Subtracting Time from a DATE

A question I am frequently asked is, “How do I add time to or subtract time from a DATE type?” For example, how do you add one day to a DATE, or eight hours, or one year, or one month, and so on. There are three techniques you’ll commonly use:

·     Simply add a NUMBER to the DATE. Adding 1 to a DATE is a method to add 1 day. Adding 1/24 to a DATE therefore adds 1 hour, and so on.

·     You may use the INTERVAL type, as described shortly, to add units of time. INTERVAL types support two levels of granularity: years and months, or days/hours/minutes/seconds. That is, you may have an interval of so many years and months or an interval of so many days, hours, minutes and seconds.

·     Add months using the built-in ADD_MONTHS function. Since adding a month is generally not as simple as adding 28 to 31 days, a special purpose function was implemented to facilitate this.

Table 12-3 demonstrates the techniques you would use to add (or subtract, of course) N units of time to a date.

Table 12-3. Adding Time to a Date

Unit of Time

Operation

Description

N seconds

DATE + n/24/60/60 DATE + n/86400 DATE + NUMTODSINTERVAL(n,'second')

There are 86,400 seconds in a day. Since adding 1 adds one day, adding 1/86400 adds one second to a date. I prefer the n/24/60/60 technique over the 1/86400 technique. They are equivalent. An even more readable method is to use the NUMTODSINTERVAL (number to day/second interval) to add N seconds.

N minutes

DATE + n/24/60 DATE + n/1440 DATE + NUMTODSINTERVAL(n,'minute')

There are 1,440 minutes in a day. Adding 1/1440 therefore adds one minute to a DATE. An even more readable method is to use the NUMTODSINTERVAL function.

N hours

DATE + n/24 DATE + NUMTODSINTERVAL(n,'hour')

There are 24 hours in a day. Adding 1/24 therefore adds one hour to a DATE. An even more readable method is to use the NUMTODSINTERVAL function.

N days

DATE + n

Simply add N to the DATE to add or subtract N days.

N weeks

DATE + 7*n

A week is seven days, so just multiply 7 by the number of weeks to add or subtract.

N months

ADD_MONTHS(DATE,n) DATE + NUMTOYMINTERVAL(n,'month')

You may use the ADD_MONTHS built-in function or add an interval of N months to the DATE. Please see the important caveat noted shortly regarding using month intervals with DATEs.

N years

ADD_MONTHS(DATE,12*n) DATE + NUMTOYMINTERVAL(n,'year')

You may use the ADD_MONTHS built-in function with 12*n to add or subtract N years. Similar goals may be achieved with a year interval, but please see the important caveat noted shortly regarding using year intervals with dates.

In general, when using the Oracle DATE type, I recommend the following:

·     Use the NUMTODSINTERVAL built-in function to add hours, minutes, and seconds.

·     Add a simple number to add days.

·     Use the ADD_MONTHS built-in function to add months and years.

I do not recommend using the NUMTOYMINTERVAL function (to add months and years). The reason has to do with how the functions behave at the months’ end.

The ADD_MONTHS function treats the end of month days specially. It will, in effect, round the dates for us—if we add one month to a month that has 31 days and the next month has fewer than 31 days, ADD_MONTHS will return the last day of the next month. Additionally, adding one month to the last day of a month results in the last day of the next month. We see this when adding one month to a month with 30 or fewer days:

EODA@ORA12CR1> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.

EODA@ORA12CR1> select dt, add_months(dt,1)
  2    from (select to_date('29-feb-2000','dd-mon-yyyy') dt from dual )
  3  /
DT                   ADD_MONTHS(DT,1)
-------------------- --------------------
29-feb-2000 00:00:00 31-mar-2000 00:00:00

EODA@ORA12CR1> select dt, add_months(dt,1)
  2    from (select to_date('28-feb-2001','dd-mon-yyyy') dt from dual )
  3  /
DT                   ADD_MONTHS(DT,1)
-------------------- --------------------
28-feb-2001 00:00:00 31-mar-2001 00:00:00

EODA@ORA12CR1> select dt, add_months(dt,1)
  2    from (select to_date('30-jan-2001','dd-mon-yyyy') dt from dual )
  3  /
DT                   ADD_MONTHS(DT,1)
-------------------- --------------------
30-jan-2001 00:00:00 28-feb-2001 00:00:00

EODA@ORA12CR1> select dt, add_months(dt,1)
  2    from (select to_date('30-jan-2000','dd-mon-yyyy') dt from dual )
  3  /
DT                   ADD_MONTHS(DT,1)
-------------------- --------------------
30-jan-2000 00:00:00 29-feb-2000 00:00:00

See how the result of adding one month to February 29, 2000, results in March 31, 2000? February 29 was the last day of that month so ADD_MONTHS returned the last day of the next month. Additionally, notice how adding one month to January 30, 2000 and 2001 results in the last day of February 2000 and 2001, respectively.

If we compare this to how adding an interval would work, we see very different results:

EODA@ORA12CR1> select dt, dt+numtoyminterval(1,'month')
  2    from (select to_date('29-feb-2000','dd-mon-yyyy') dt from dual )
  3  /
DT                   DT+NUMTOYMINTERVAL(1
-------------------- --------------------
29-feb-2000 00:00:00 29-mar-2000 00:00:00

EODA@ORA12CR1> select dt, dt+numtoyminterval(1,'month')
  2    from (select to_date('28-feb-2001','dd-mon-yyyy') dt from dual )
  3  /
DT                   DT+NUMTOYMINTERVAL(1
-------------------- --------------------
28-feb-2001 00:00:00 28-mar-2001 00:00:00

Notice how the resulting date is not the last day of the next month, but rather the same day of the next month. It is arguable that this behavior is acceptable, but consider what happens when the resulting month doesn’t have that many days:

EODA@ORA12CR1> select dt, dt+numtoyminterval(1,'month')
  2    from (select to_date('30-jan-2001','dd-mon-yyyy') dt from dual )
  3  /
select dt, dt+numtoyminterval(1,'month')
             *
ERROR at line 1:
ORA-01839: date not valid for month specified

EODA@ORA12CR1> select dt, dt+numtoyminterval(1,'month')
  2    from (select to_date('30-jan-2000','dd-mon-yyyy') dt from dual )
  3  /
select dt, dt+numtoyminterval(1,'month')
             *
ERROR at line 1:
ORA-01839: date not valid for month specified

In my experience, this makes using a month interval in date arithmetic impossible in general. A similar issue arises with a year interval: adding one year to February 29, 2000, results in a runtime error because there is no February 29, 2001.

Getting the Difference Between Two DATEs

Another frequently asked question is, “How do I retrieve the difference between two dates?” The answer is deceptively simple: you just subtract them. This will return a number representing the number of days between the two dates. Additionally, you have the built-in functionMONTHS_BETWEEN that will return a number representing the number of months—including fractional months—between two dates. Lastly, with the INTERVAL datatypes, you have yet another method to see the elapsed time between two dates. The following SQL query demonstrates the outcome of subtracting two dates (showing the number of days between them), using the MONTHS_BETWEEN function and then the two functions used with INTERVAL types:

EODA@ORA12CR1> select dt2-dt1 ,
  2         months_between(dt2,dt1) months_btwn,
  3         numtodsinterval(dt2-dt1,'day') days,
  4         numtoyminterval(trunc(months_between(dt2,dt1)),'month') months
  5    from (select to_date('29-feb-2000 01:02:03','dd-mon-yyyy hh24:mi:ss') dt1,
  6                 to_date('15-mar-2001 11:22:33','dd-mon-yyyy hh24:mi:ss') dt2
  7            from dual )
  8  /

   DT2-DT1 MONTHS_BTWN DAYS                           MONTHS
---------- ----------- ------------------------------ -------------
380.430903  12.5622872 +000000380 10:20:30.000000000  +000000001-00

Those are all correct values, but not of great use to us yet. Most applications would like to display the years, months, days, hours, minutes, and seconds between the dates. Using a combination of the preceding functions, we can achieve that goal. We’ll select out two intervals: one for the years and months, and the other for just the day, hours, and so on. We’ll use the MONTHS_BETWEEN built-in function to determine the decimal number of months between the two dates, and then we’ll use the NUMTOYMINTERVAL built-in function to convert that number into the years and months. Additionally, we’ll use MONTHS_BETWEEN to subtract the integer number of months between the two dates from the larger of the two dates to get down to the days and hours between them:

EODA@ORA12CR1> select numtoyminterval
  2         (trunc(months_between(dt2,dt1)),'month')
  3             years_months,
  4         numtodsinterval
  5             (dt2-add_months( dt1, trunc(months_between(dt2,dt1)) ),
  6              'day' )
  7             days_hours
  8    from (select to_date('29-feb-2000 01:02:03','dd-mon-yyyy hh24:mi:ss') dt1,
  9                 to_date('15-mar-2001 11:22:33','dd-mon-yyyy hh24:mi:ss') dt2
 10            from dual )
 11  /

YEARS_MONTHS    DAYS_HOURS
--------------- ------------------------------
+000000001-00   +000000015 10:20:30.000000000

Now it is clear that there is 1 year, 15 days, 10 hours, 20 minutes, and 30 seconds between the two DATEs.

TIMESTAMP Type

The TIMESTAMP type is very much like the DATE, with the addition of support for fractional seconds and time zones. We’ll look at the TIMESTAMP type in the following three sections: one with regard to just the fractional second support but no time zone support, and the other two with regard to the two methods of storing the TIMESTAMP with time zone support.

TIMESTAMP

The syntax of the basic TIMESTAMP datatype is straightforward

TIMESTAMP(n)

where N is optional; it is used to specify the scale of the seconds component in the timestamp and may take on values between 0 and 9. If you specify 0, then a TIMESTAMP is functionally equivalent to a DATE and, in fact, stores the same values in the same manner:

EODA@ORA12CR1> create table t
  2  ( dt   date,
  3    ts   timestamp(0)
  4  )
  5  /
Table created.

EODA@ORA12CR1> insert into t values ( sysdate, systimestamp );
1 row created.

EODA@ORA12CR1> select dump(dt,10) dump, dump(ts,10) dump from t;

DUMP                                DUMP
----------------------------------- -----------------------------------
Typ=12 Len=7: 120,110,4,12,20,4,8   Typ=180 Len=7: 120,110,4,12,20,4,8

The datatypes are different (the Typ=field indicates that), but the manner in which they store data is identical. The TIMESTAMP datatype will differ in length from the DATE type when you specify some number of fractional seconds to preserve, for example:

EODA@ORA12CR1> create table t
  2  ( dt   date,
  3    ts   timestamp(9)
  4  )
  5  /
Table created.

EODA@ORA12CR1> insert into t values ( sysdate, systimestamp );
1 row created.

EODA@ORA12CR1> select dump(dt,10) dump, dump(ts,10) dump
  2    from t;

DUMP                                DUMP
----------------------------------- -----------------------------------
Typ=12 Len=7: 120,114,1,2,8,20,1    Typ=180 Len=11: 120,114,1,2,8,20,1,
                                    53,55,172,40

Now the TIMESTAMP consumes 11 bytes of storage, and the extra 4 bytes at the end contain the fractional seconds, which we can see by looking at the time that was stored:

EODA@ORA12CR1> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.

EODA@ORA12CR1> select * from t;

DT                   TS
-------------------- ----------------------------------------------
02-jan-2014 07:19:00 02-JAN-14 07.19.00.892841000 AM

EODA@ORA12CR1> select dump(ts,16) dump from t;

DUMP
-----------------------------------
Typ=180 Len=11: 78,72,1,2,8,14,1,35,37,ac,28

EODA@ORA12CR1> select to_number('3537ac28', 'xxxxxxxx' ) from dual;

TO_NUMBER('3537AC28','XXXXXXXX')
--------------------------------
                       892841000

We can see the fractional seconds that were stored are there in the last 4 bytes. We used the DUMP function to inspect the data in HEX this time (base 16) so we could easily convert the 4 bytes into the decimal representation.

Adding or Subtracting Time to/from a TIMESTAMP

The same techniques we applied to DATE for date arithmetic works with a TIMESTAMP, but the TIMESTAMP will be converted into a DATE in many cases using the preceding techniques. For example:

EODA@ORA12CR1> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.

EODA@ORA12CR1> select systimestamp ts, systimestamp+1 dt
  2  from dual;

TS                                  DT
----------------------------------- -----------------------------------
02-JAN-14 07.30.37.627678 AM -07:00 03-jan-2014 07:30:37

Note that adding 1 did, in fact, advance the SYSTIMESTAMP by a day, but the fractional seconds are gone, as would be the time zone information. This is where using INTERVALs will be more important:

EODA@ORA12CR1> select systimestamp ts, systimestamp +numtodsinterval(1,'day') dt
  2  from dual;

TS                                       DT
---------------------------------------- ----------------------------------------
02-JAN-14 07.31.45.451317 AM -07:00      03-JAN-14 07.31.45.451317000 AM -07:00

Using the function that returns an INTERVAL type preserved the fidelity of the TIMESTAMP. You will need to exercise caution when using TIMESTAMPs to avoid the implicit conversions. But bear in mind the caveat about adding intervals of months or years to a TIMESTAMP if the resulting day isn’t a valid date—the operation fails (adding one month to the last day in January will always fail if the month is added via an INTERVAL).

Getting the Difference Between Two TIMESTAMPs

This is where the DATE and TIMESTAMP types diverge significantly. Whereas the results of subtracting a DATE from a DATE was a NUMBER, the result of doing the same to a TIMESTAMP is an INTERVAL:

EODA@ORA12CR1> select dt2-dt1
  2    from (select to_timestamp('29-feb-2000 01:02:03.122000',
  3                              'dd-mon-yyyy hh24:mi:ss.ff') dt1,
  4                 to_timestamp('15-mar-2001 11:22:33.000000',
  5                                           'dd-mon-yyyy hh24:mi:ss.ff') dt2
  6            from dual )
  7  /

DT2-DT1
---------------------------------------------------------------------------
+000000380 10:20:29.878000000

The difference between two TIMESTAMP values is an INTERVAL, and this shows us the number of days and hours/minutes/seconds between the two. If we desire to have the years, months, and so forth, we are back to using a query similar to the one we used with dates:

EODA@ORA12CR1> select numtoyminterval
  2         (trunc(months_between(dt2,dt1)),'month')
  3             years_months,
  4         dt2-add_months(dt1,trunc(months_between(dt2,dt1)))
  5                 days_hours
  6    from (select to_timestamp('29-feb-2000 01:02:03.122000',
  7                              'dd-mon-yyyy hh24:mi:ss.ff') dt1,
  8                 to_timestamp('15-mar-2001 11:22:33.000000',
  9                              'dd-mon-yyyy hh24:mi:ss.ff') dt2
 10            from dual )
 11  /

YEARS_MONTHS  DAYS_HOURS
------------- -----------------------------
+000000001-00 +000000015 10:20:30.000000000

Note in this case, since we used ADD_MONTHS, DT1 was converted implicitly into a DATE type and we lost the fractional seconds. We would have to add yet more code to preserve them. We could use NUMTOYMINTERVAL to add the months and preserve the TIMESTAMP; however, we would be subject to runtime errors:

EODA@ORA12CR1> select numtoyminterval
  2         (trunc(months_between(dt2,dt1)),'month')
  3             years_months,
  4         dt2-(dt1 + numtoyminterval( trunc(months_between(dt2,dt1)),'month' ))
  5                 days_hours
  6    from (select to_timestamp('29-feb-2000 01:02:03.122000',
  7                              'dd-mon-yyyy hh24:mi:ss.ff') dt1,
  8                 to_timestamp('15-mar-2001 11:22:33.000000',
  9                               'dd-mon-yyyy hh24:mi:ss.ff') dt2
 10            from dual )
 11  /
       dt2-(dt1 + numtoyminterval( trunc(months_between(dt2,dt1)),'month' ))
                *
ERROR at line 4:
ORA-01839: date not valid for month specified

I personally find this unacceptable. The fact is, though, that by the time you are displaying information with years and months, the fidelity of the TIMESTAMP is destroyed already. A year is not fixed in duration (it may be 365 or 366 days in length) and neither is a month. If you are displaying information with years and months, the loss of microseconds is not relevant; having the information displayed down to the second is more than sufficient at that point.

TIMESTAMP WITH TIME ZONE Type

The TIMESTAMP WITH TIME ZONE type inherits all of the qualities of the TIMESTAMP type and adds time zone support. The TIMESTAMP WITH TIME ZONE type consumes 13 bytes of storage, with the extra two bytes being used to preserve the time zone information. It differs from a TIMESTAMP structurally only by the addition of these 2 bytes:

EODA@ORA12CR1> create table t
  2  (
  3    ts    timestamp,
  4    ts_tz timestamp with time zone
  5  )
  6  /
Table created.

EODA@ORA12CR1> insert into t ( ts, ts_tz )
  2  values ( systimestamp, systimestamp );
1 row created.

EODA@ORA12CR1> select * from t;

TS                                  TS_TZ
----------------------------------- ----------------------------------------
02-JAN-14 03.02.51.890565 PM        02-JAN-14 03.02.51.890565 PM -07:00

EODA@ORA12CR1> select dump(ts) dump, dump(ts_tz) dump from t;

DUMP
------------------------------------------------------------
DUMP
------------------------------------------------------------
Typ=180 Len=11: 120,114,1,2,16,3,52,53,20,241,136
Typ=181 Len=13: 120,114,1,2,23,3,52,53,20,241,136,13,60

Upon retrieval, the default TIMESTAMP WITH TIME ZONE format included the time zone information (I was on U.S. Mountain Standard Time when this was executed).

TIMESTAMP WITH TIME ZONEs store the data in whatever time zone was specified when the data was stored. The time zone becomes part of the data itself. Note how the TIMESTAMP WITH TIME ZONE field stored ...23,3,52... for the hour, minutes, and seconds (in excess-1 notation, so that is 22:02:51), whereas the TIMESTAMP field stored simply ...16,3,52..., which is 15:02:51 —the exact time in the string we inserted. The TIMESTAMP WITH TIME ZONE had seven hours added to it, in order to store in GMT (also known as UTC) time. The trailing 2 bytes are used upon retrieval to properly adjust the TIMESTAMP value.

It is not my intention to cover all of the nuances of time zones here in this book; that is a topic well covered elsewhere. To that end, I’ll just point out that there is support for time zones in this datatype. This support is more relevant in applications today than ever before. In the distant past, applications were not nearly as global as they are now. In the days before widespread Internet use, applications were many times distributed and decentralized, and the time zone was implicitly based on where the server was located. Today, with large centralized systems being used by people worldwide, the need to track and use time zones is very relevant.

Before time zone support was built into a datatype, it would have been an application function to store the DATE and in another column the time zone information, and then supply functions to convert DATEs from one time zone to another. Now it’s the job of the database, and it can store data in multiple time zones:

EODA@ORA12CR1> create table t
  2  ( ts1  timestamp with time zone,
  3    ts2  timestamp with time zone
  4  )
  5  /
Table created.

EODA@ORA12CR1> insert into t (ts1, ts2)
  2  values ( timestamp'2014-02-27 16:02:32.212 US/Eastern',
  3           timestamp'2014-02-27 16:02:32.212 US/Pacific' );
1 row created.

And perform correct TIMESTAMP arithmetic on them:

EODA@ORA12CR1> select ts1-ts2 from t;

TS1-TS2
---------------------------------------------------------------------------
-000000000 03:00:00.000000

Since there is a three-hour time difference between those two time zones, even though they show the same time of 16:02:32.212, the interval reported is a three-hour difference. When performing TIMESTAMP arithmetic on TIMESTAMPS WITH TIME ZONE types, Oracle automatically converts both types to UTC time first and then performs the operation.

TIMESTAMP WITH LOCAL TIME ZONE Type

This type works much like the TIMESTAMP column. It is a 7- or 11-byte field (depending on the precision of the TIMESTAMP), but it is normalized to be stored with the local database’s time zone. To see this, we’ll use the DUMP command once again. First, we create a table with three columns—a DATE, a TIMESTAMP WITH TIME ZONE, and a TIMESTAMP WITH LOCAL TIME ZONE—and then we insert the same value into all three columns:

EODA@ORA12CR1> create table t
  2  ( dt   date,
  3    ts1  timestamp with time zone,
  4    ts2  timestamp with local time zone
  5  )
  6  /
Table created.

EODA@ORA12CR1> insert into t (dt, ts1, ts2)
  2  values ( timestamp'2014-02-27 16:02:32.212 US/Pacific',
  3           timestamp'2014-02-27 16:02:32.212 US/Pacific',
  4           timestamp'2014-02-27 16:02:32.212 US/Pacific' );
1 row created.

EODA@ORA12CR1> select dbtimezone from dual;

DBTIMEZONE
------------
-07:00

Now, when we dump those values as follows:

EODA@ORA12CR1> select dump(dt), dump(ts1), dump(ts2) from t;

DUMP(DT)
-------------------------------------------------------------------------------
DUMP(TS1)
-------------------------------------------------------------------------------
DUMP(TS2)
-------------------------------------------------------------------------------
Typ=12 Len=7: 120,114,2,27,17,3,33
Typ=181 Len=13: 120,114,2,28,1,3,33,12,162,221,0,137,156
Typ=231 Len=11: 120,114,2,27,18,3,33,12,162,221,0

We can see that, in this case, three totally different date/time representations were stored:

·     DT: This column stored the date/time 27-FEB-2014 16:02:32. The time zone and fractional seconds are lost because we used the DATE type. No time zone conversions were performed at all. We stored the exact date/time inserted, but lost the time zone.

·     TS1: This column preserved the TIME ZONE information and was normalized to be in UTC with respect to that TIME ZONE. The inserted TIMESTAMP value was in the US/Pacific time zone, which at the time of this writing was eight hours off UTC. Therefore, the stored date/time was 28-FEB-2014 00:02:32. It advanced our input time by eight hours to make it UTC time, and it saved the time zone US/Pacific as the last 2 bytes so this data can be properly interpreted later.

·     TS2: This column is assumed to be in the database’s time zone, which is US/Mountain. Now, 16:02:32 US/Pacific is 17:02:32 US/Mountain, so that is what was stored in the bytes ...18,3,33... (excess-1 notation; remember to subtract 1).

Since the TS1 column preserved the original time zone in the last 2 bytes, we’ll see the following upon retrieval:

EODA@ORA12CR1> select ts1, ts2 from t;

TS1
---------------------------------------------------------------------------
TS2
---------------------------------------------------------------------------
27-FEB-14 04.02.32.212000 PM US/PACIFIC
27-FEB-14 05.02.32.212000 PM

The database would be able to show that information, but the TS2 column with the LOCAL TIME ZONE (the time zone of the database) shows the time in database’s time zone, which is the assumed time zone for that column (and in fact all columns in this database with the LOCAL TIME ZONE). My database was in the US/Mountain time zone, so 16:02:32 US/Pacific on the way in is now displayed as 5:00 p.m. Mountain time on the way out.

Image Note  You may get slightly different results if the date was stored when Standard time zone was in effect and then retrieved when Daylight Savings time is in effect. The output in the prior example would show a two-hour difference instead of what you would intuitively think would be a one-hour difference. I only point this out to drive home the fact that time-zone math is much more complex than it appears!

The TIMESTAMP WITH LOCAL TIME ZONE provides sufficient support for most applications, if you need not remember the source time zone, but only need a datatype that provides consistent worldwide handling of date/time types. Additionally, the TIMESTAMP(0) WITH LOCAL TIMEZONE provides you the equivalent of a DATE type with time zone support—it consumes 7 bytes of storage and the ability to have the dates stored normalized in UTC form.

One caveat with regard to the TIMESTAMP WITH LOCAL TIME ZONE type is that once you create tables with this column, you will find your database’s time zone is frozen—and you will not be able to change it:

EODA@ORA12CR1> alter database set time_zone = 'PST';
alter database set time_zone = 'PST'
*
ERROR at line 1:
ORA-30079: cannot alter database timezone when database has
           TIMESTAMP WITH LOCAL TIME ZONE columns

EODA@ORA12CR1> !oerr ora 30079
30079, 00000, "cannot alter database timezone when database has
               TIMESTAMP WITH LOCAL TIME ZONE columns"
// *Cause:  An attempt was made to alter database timezone with
//          TIMESTAMP WITH LOCAL TIME ZONE column in the database.
// *Action: Either do not alter database timezone or first drop all the
//          TIMESTAMP WITH LOCAL TIME ZONE columns.

It should be obvious why: if you were to change the database’s time zone, you would have to rewrite every single table with a TIMESTAMP WITH LOCAL TIME ZONE because their current values would be wrong, given the new time zone!

INTERVAL Type

We briefly saw INTERVAL type used in the previous section. It is a way to represent a duration of time or an interval of time. There are two interval types we’ll discuss in this section: the YEAR TO MONTH type, which is capable of storing a duration of time specified in years and months, and the DAY TO SECOND type, which is capable of storing a duration of time in days, hours, minutes, and seconds (including fractional seconds).

Before we get into the specifics of the two INTERVAL types, I’d like to look at the EXTRACT built-in function, which can be very useful when working with this type. The EXTRACT built-in function works on TIMESTAMPs and INTERVALs, and it returns various bits of information from them, such as the time zone from a TIMESTAMP or the hours/days/minutes from an INTERVAL. Let’s use the previous example, where we got the INTERVAL of 380 days, 10 hours, 20 minutes, and 29.878 seconds:

EODA@ORA12CR1> select dt2-dt1
  2    from (select to_timestamp('29-feb-2000 01:02:03.122000',
  3                              'dd-mon-yyyy hh24:mi:ss.ff') dt1,
  4                 to_timestamp('15-mar-2001 11:22:33.000000',
  5                              'dd-mon-yyyy hh24:mi:ss.ff') dt2
  6            from dual )
  7  /

DT2-DT1
---------------------------------------------------------------------------
+000000380 10:20:29.878000000

We can use EXTRACT to see how easy it is to pull out each bit of information:

EODA@ORA12CR1> select extract( day    from dt2-dt1 ) day,
  2         extract( hour   from dt2-dt1 ) hour,
  3         extract( minute from dt2-dt1 ) minute,
  4         extract( second from dt2-dt1 ) second
  5    from (select to_timestamp('29-feb-2000 01:02:03.122000',
  6                              'dd-mon-yyyy hh24:mi:ss.ff') dt1,
  7                 to_timestamp('15-mar-2001 11:22:33.000000',
  8                              'dd-mon-yyyy hh24:mi:ss.ff') dt2
  9            from dual )
 10  /

       DAY       HOUR     MINUTE     SECOND
---------- ---------- ---------- ----------
       380         10         20     29.878

Additionally, we’ve already seen the NUMTOYMINTERVAL and the NUMTODSINTERVAL for creating YEAR TO MONTH and DAY TO SECOND intervals. I find these functions to be the easiest way to create instances of INTERVAL types—over and above the string conversion functions. Rather than concatenate a bunch of numbers representing the days, hours, minutes, and seconds representing some interval together, I’d rather add up four calls to NUMTODSINTERVAL to do the same.

The INTERVAL type can be used to store not just durations, but times as well in a way. For example, if you want to store a specific date and time, you have the DATE or TIMESTAMP types. But what if you want to store just the time 8:00 a.m.? The INTERVAL type would be handy for that (the INTERVAL DAY TO SECOND type in particular).

INTERVAL YEAR TO MONTH

The syntax for INTERVAL YEAR TO MONTH is straightforward

INTERVAL YEAR(n) TO MONTH

where N is an optional number of digits to support for the number of years and varies from 0 to 9, with a default of 2 (to store a number of years from 0 to 99). It allows you to store any number of years (up to nine digits’ worth, anyway) and months. The function I prefer to use to createINTERVAL instances of this type is NUMTOYMINTERVAL. For example, to create an interval of five years and two months, we can use the following:

EODA@ORA12CR1> select numtoyminterval(5,'year')+numtoyminterval(2,'month') from dual;

NUMTOYMINTERVAL(5,'YEAR')+NUMTOYMINTERVAL(2,'MONTH')
---------------------------------------------------------------------------
+000000005-02

Or, using a single call and the fact that a year has 12 months, we can use the following approach:

EODA@ORA12CR1> select numtoyminterval(5*12+2,'month') from dual;

NUMTOYMINTERVAL(5*12+2,'MONTH')
---------------------------------------------------------------------------
+000000005-02

Either approach works well. Another function, TO_YMINTERVAL, can be used to convert a string into a year/month INTERVAL type:

EODA@ORA12CR1> select to_yminterval( '5-2' ) from dual;

TO_YMINTERVAL('5-2')
---------------------------------------------------------------------------
+000000005-02

But since the vast majority of the time I have the year and months in two NUMBER fields in my application, I find the NUMTOYMINTERVAL function to be more useful, as opposed to building a formatted string from the numbers. Lastly, you can just use the INTERVAL type right in SQL, bypassing the functions altogether:

EODA@ORA12CR1> select interval '5-2' year to month from dual;

INTERVAL'5-2'YEARTOMONTH
---------------------------------------------------------------------------
+05-02

INTERVAL DAY TO SECOND

The syntax for the INTERVAL DAY TO SECOND type is straightforward

INTERVAL DAY(n) TO SECOND(m)

where N is an optional number of digits to support for the day component and varies from 0 to 9, with a default of 2. M is the number of digits to preserve in the fractional part of the seconds field and varies from 0 to 9, with a default of 6. Once again, the function I prefer to use to create instances of these INTERVAL type is NUMTODSINTERVAL:

EODA@ORA12CR1> select numtodsinterval( 10, 'day' )+
  2  numtodsinterval( 2, 'hour' )+
  3  numtodsinterval( 3, 'minute' )+
  4  numtodsinterval( 2.3312, 'second' )
  5  from dual;

NUMTODSINTERVAL(10,'DAY')+NUMTODSINTERVAL(2,'HOUR')+NUMTODSINTERVAL(3,'MINU
---------------------------------------------------------------------------
+000000010 02:03:02.331200000

or simply

EODA@ORA12CR1> select numtodsinterval( 10*86400+2*3600+3*60+2.3312, 'second' ) from dual;

NUMTODSINTERVAL(10*86400+2*3600+3*60+2.3312,'SECOND')
---------------------------------------------------------------------------
+000000010 02:03:02.331200000

using the fact that there are 86,400 seconds in a day, 3,600 seconds in an hour, and so on. Alternatively, as before, we can use the TO_DSINTERVAL function to convert a string into a DAY TO SECOND interval

EODA@ORA12CR1> select to_dsinterval( '10 02:03:02.3312' ) from dual;

TO_DSINTERVAL('1002:03:02.3312')
---------------------------------------------------------------------------
+000000010 02:03:02.331200000

or just using an INTERVAL literal in SQL itself

EODA@ORA12CR1> select interval '10 02:03:02.3312' day to second from dual;

INTERVAL'1002:03:02.3312'DAYTOSECOND
---------------------------------------------------------------------------
+10 02:03:02.331200

LOB Types

LOBs, or large objects, are the source of much confusion, in my experience. They are a misunderstood datatype, both in how they are implemented and how best to use them. This section provides an overview of how LOBs are stored physically and the considerations you must take into account when using a LOB type. They have many optional settings, and getting the right mix for your application is crucial.

There are four types of LOBs supported in Oracle:

·     CLOB: A character LOB. This type is used to store large amounts of textual information, such as XML or just plain text. This datatype is subject to character set translation—that is, the characters in this field will be converted from the database’s character set to the client’s character set upon retrieval, and from the client’s character set to the database’s character set upon modification.

·     NCLOB: Another type of character LOB. The character set of the data stored in this column is the national character set of the database, not the default character set of the database.

·     BLOB: A binary LOB. This type is used to stored large amounts of binary information, such as word processing documents, images, and anything else you can imagine. It is not subject to character set translation. Whatever bits and bytes the application writes into a BLOBare what is returned by the BLOB.

·     BFILE: A binary file LOB. This is more of a pointer than a database-stored entity. The only thing stored in the database with a BFILE is a pointer to a file in the operating system. The file is maintained outside of the database and is not really part of the database at all. ABFILE provides read-only access to the contents of the file.

When discussing LOBs, I’ll break the preceding list into two pieces: LOBs stored in the database, or internal LOBs, which include CLOB, BLOB, and NCLOB; and LOBs stored outside of the database, or the BFILE type. I will not discuss CLOB, BLOB, or NCLOB independently, since from a storage and option perspective they are the same. It is just that a CLOB and NCLOB support textual information and a BLOB does not. But the options we specify for them—the CHUNK size, RETENTION, and so on—and the considerations are the same, regardless of the base type. Since BFILEs are significantly different, I’ll discuss them separately.

Internal LOBs

Starting with Oracle Database 11g, Oracle introduced a new underlying architecture for LOBs known as SecureFiles. The prior existing LOB architecture is known as BasicFiles. By default in 11g, when you create a LOB, it will be created as a BasicFiles LOB. Starting with Oracle 12c, when creating a LOB column in an ASSM-managed tablespace, by default the LOB will be created as a SecureFiles LOB.

Going forward, I recommend using SecureFiles over BasicFiles for the following reasons:

·     Oracle’s documentation states that BasicFiles will be deprecated in a future release.

·     There are fewer parameters to manage with SecureFiles, namely the following attributes don’t apply to SecureFiles: CHUNK, PCTVERSION, FREEPOOLS, FREELISTS, or FREELIST GROUPS.

·     SecureFiles allow for the use of advanced encryption, compression, and de-duplication. If you’re going to use these advanced LOB features, then you need to obtain a license for the Advanced Security Option and/or the Advanced Compression Option. If you’re not using advanced LOB features, then you can use SecureFiles LOBs without an extra license.

In the following subsections, I’ll detail the nuances of using both SecureFiles and BasicFiles.

Creating a SecureFiles LOB

The syntax for a SecureFiles LOB is, on the face of it, very simple—deceptively simple. You may create tables with column datatypes of CLOB, BLOB, or NCLOB, and that is it.

EODA@ORA12CR1> create table t
  2  ( id int primary key,
  3    txt clob
  4  )
  5  segment creation immediate
  6  /
Table created.

You can verify that the column was created as a SecureFiles LOB as follows:

EODA@ORA12CR1> select column_name, securefile from user_lobs where table_name='T';

COLUMN_NAME  SECUREFILE
------------ ------------
TXT          YES

If you’re using Oracle Database 11g, the default LOB type is BasicFiles, therefore in Oracle 11g, if you want to create a SecureFiles LOB, you’ll need to use the STORE AS SECUREFILE clause, as follows:

EODA@ORA11GR2> create table t
  2  ( id int primary key,
  3    txt clob
  4  )
  5 segment creation immediate
  6  lob(txt) store as securefile
  7  /
Table created.

Seemingly, LOBs are as simple to use as the NUMBER, DATE, or VARCHAR2 datatypes. Or are they? The prior small examples show the tip of the iceberg—the bare minimum you can specify about a LOB. Using DBMS_METADATA, we can get the entire picture:

EODA@ORA12CR1> select dbms_metadata.get_ddl( 'TABLE', 'T' )  from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------

  CREATE TABLE "EODA"."T"
   (    "ID" NUMBER(*,0),
        "TXT" CLOB,
         PRIMARY KEY ("ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
 LOB ("TXT") STORE AS SECUREFILE (
  TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES
  STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))

As you can see, there are quite a few parameters. Before going into the details of these parameters, in the next section I’ll generate the same type of output for a BasicFiles LOB. This will provide a basis for discussing the various LOB attributes.

Creating a BasicFiles LOB

In versions prior to 12c, the following code will create a BasicFiles LOB:

EODA@ORA11GR2> create table t
  2  ( id int primary key,
  3    txt clob
  4  )
  5  segment creation immediate
  6  /
Table created.

In 12c, to create a BasicFiles LOB, you’ll need to use the STORE AS BASICFILE syntax:

EODA@ORA12CR1> create table t
  2  ( id int primary key,
  3    txt clob
  4  )
  5  segment creation immediate
  6  lob(txt) store as basicfile
  7  /
Table created.

Using the DBMS_METADATA package, we can see the details of a BasicFiles LOB:

EODA@ORA12CR1> select dbms_metadata.get_ddl( 'TABLE', 'T' )  from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------

  CREATE TABLE "EODA"."T"
   (    "ID" NUMBER(*,0),
        "TXT" CLOB,
         PRIMARY KEY ("ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
 LOB ("TXT") STORE AS BASICFILE (
  TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION
  NOCACHE LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))

Most of the parameters for a BasicFiles LOB are identical to those of a SecureFiles LOB. The main differences being that the SecureFiles LOB storage clause contains fewer parameters (like no FREELISTS and FREELIST GROUPS in the LOB storage clause).

LOB Components

As shown in the DBMS_METADATA output in the prior sections, the LOB has several interesting attributes:

·     A tablespace (USERS in this example)

·     ENABLE STORAGE IN ROW as a default attribute

·     CHUNK 8192

·     RETENTION

·     NOCACHE

·     A full storage clause

These attributes imply there is a lot going on in the background with LOBs, and there is. A LOB column always results in what I call a multisegment object, meaning the table will use multiple physical segments. If we had created that table in an empty schema, we would discover the following:

EODA@ORA12CR1> select segment_name, segment_type from user_segments;

SEGMENT_NAME                   SEGMENT_TY
------------------------------ ----------
T                              TABLE
SYS_LOB0000020053C00002$$      LOBSEGMENT
SYS_IL0000020053C00002$$       LOBINDEX
SYS_C005432                    INDEX

An index was created in support of the primary key constraint—that is normal—but what about the other two segments, the LOBINDEX and the LOBSEGMENT? Those were created in support of our LOB column. The LOBSEGMENT is where our actual data will be stored (well, it might be stored in the table T also, but we’ll cover that in more detail when we get to the ENABLE STORAGE IN ROW clause). The LOBINDEX is used to navigate our LOB, to find the pieces of it. When we create a LOB column, in general what is stored in the row is a pointer, or LOB locator. This LOB locator is what our application retrieves. When we ask for “bytes 1,000 through 2,000” of the LOB, the LOB locator is used against the LOBINDEX to find where those bytes are stored, and then the LOBSEGMENT is accessed. The LOBINDEX is used to find the pieces of the LOB easily. You can think of a LOB then as a master/detail sort of relation. A LOB is stored in chunks or pieces, and any piece is accessible to us. If we were to implement a LOB using just tables, for example, we might do so as follows:

Create table parent
( id int primary key,
  other-data...
);

Create table lob
( id references parent on delete cascade,
  chunk_number int,
  data <datatype>(n),
  primary key (id,chunk_number)
);

Conceptually, the LOB is stored very much like that—in creating those two tables, we would have primary key on the LOB table on the ID,CHUNK_NUMBER (analogous to the LOBINDEX created by Oracle), and we would have a table LOB storing the chunks of data (analogous to the LOBSEGMENT). The LOB column implements this master/detail structure for us transparently. Figure 12-3 might make this idea clearer.

image

Figure 12-3. Table to LOBINDEX to LOBSEGMENT

The LOB locator in the table really just points to the LOBINDEX; the LOBINDEX, in turn, points to all of the pieces of the LOB itself. To get bytes N through M of the LOB, you would dereference the pointer in the table (the LOB locator), walk the LOBINDEX structure to find the needed chunks, and then access them in order. This makes random access to any piece of the LOB equally fast—you can get the front, the middle, or the end of a LOB equally fast, as you don’t always just start at the beginning and walk the LOB.

Now that you understand conceptually how a LOB is stored, I’d like to walk through each of the optional settings listed previously and explain what they are used for and what exactly they imply.

LOB Tablespace

The CREATE TABLE statement returned from DBMS_METADATA both the SecureFiles and BasicFiles included the following:

LOB ("TXT") STORE AS ... (  TABLESPACE "USERS" ...

The TABLESPACE specified here is the tablespace where the LOBSEGMENT and LOBINDEX will be stored, and this may be different from the tablespace where the table itself resides. That is, the tablespace that holds the LOB data may be separate and distinct from the tablespace that holds the actual table data.

The main reasons you might consider using a different tablespace for the LOB data versus the table data are mostly administrative and performance related. From the administrative angle, a LOB datatype represents a sizable amount of information. If the table had millions of rows, and each row has a sizeable LOB associated with it, the LOB data would be huge. It would make sense to segregate the table from the LOB data just to facilitate backup and recovery and space management. You may well want a different uniform extent size for your LOB data than you have for your regular table data, for example.

The other reason could be for I/O performance. By default, LOBs are not cached in the buffer cache (more on that later). Therefore, by default every LOB access, be it read or write, is a physical I/O—a direct read from disk or a direct write to disk.

Image Note  LOBs may be in line or stored in the table. In that case, the LOB data would be cached, but this applies only to LOBs that are 4,000 bytes or less in size. We’ll discuss this further in the section “IN ROW Clause.”

Because each access is a physical I/O, it makes sense to segregate the objects you know for a fact will be experiencing more physical I/O than most objects in real time (as the user accesses them) to their own disks.

It should be noted that the LOBINDEX and the LOBSEGMENT will always be in the same tablespace. You cannot have the LOBINDEX and LOBSEGMENT in separate tablespaces. Much earlier releases of Oracle allowed you to separate them, but versions 8i Release 3 and up at least do not allow you to specify separate tablespaces for the LOBINDEX and LOBSEGMENT. In fact, all storage characteristics of the LOBINDEX are inherited from the LOBSEGMENT, as we’ll see shortly.

IN ROW Clause

The CREATE TABLE statement returned from DBMS_METADATA earlier, both the SecureFiles and BasicFiles included the following:

LOB ("TXT") STORE AS ...  (... ENABLE STORAGE IN ROW ...

This controls whether the LOB data is always stored separate from the table in the LOBSEGMENT or if it can sometimes be stored right in the table itself without being placed into the LOBSEGMENT. If ENABLE STORAGE IN ROW is set, as opposed to DISABLE STORAGE IN ROW, small LOBs of up to 4,000 bytes will be stored in the table itself, much like a VARCHAR2 would be. Only when LOBs exceed 4,000 bytes will they be moved out of line into the LOBSEGMENT.

Enabling storage in the row is the default and, in general, should be the way to go if you know the LOBs will many times fit in the table itself. For example, you might have an application with a description field of some sort in it. The description might be anywhere from 0 to 32KB of data (or maybe even more, but mostly 32KB or less). Many of the descriptions are known to be very short, consisting of a couple of hundred characters. Rather than going through the overhead of storing these out of line and accessing them via the index every time you retrieve them, you can store them in line, in the table itself. Further, if the LOB is using the default of NOCACHE (the LOBSEGMENT data is not cached in the buffer cache), then a LOB stored in the table segment (which is cached) will avoid the physical I/O required to retrieve the LOB.

Image Note  Starting with Oracle 12c, you can create a VARCHAR2, NVARCHAR2, or RAW column that will store up to 32,767 bytes of information. See the “Extended Datatypes” section in this chapter for details.

We can see the effect of this with a rather simple example. We’ll create a table with a LOB that can store data in row and one that cannot:

EODA@ORA12CR1> create table t
  2  ( id int   primary key,
  3    in_row   clob,
  4    out_row  clob
  5  )
  6  lob (in_row)  store as ( enable  storage in row )
  7  lob (out_row) store as ( disable storage in row )
  8  /
Table created.

Into this table we’ll insert some string data, all of which is less than 4,000 bytes in length:

EODA@ORA12CR1> insert into t
  2  select rownum,
  3         owner || ' ' || object_name || ' ' || object_type || ' ' || status,
  4         owner || ' ' || object_name || ' ' || object_type || ' ' || status
  5    from all_objects
  6  /
72085 rows created.

EODA@ORA12CR1> commit;
Commit complete.

Now, if we try to read out each row and, using the DBMS_MONITOR package, do this with SQL_TRACE enabled, we’ll be able to see the performance upon data retrieval of each:

EODA@ORA12CR1> declare
  2          l_cnt    number;
  3          l_data   varchar2(32765);
  4  begin
  5          select count(*)
  6            into l_cnt
  7            from t;
  8
  9          dbms_monitor.session_trace_enable;
 10          for i in 1 .. l_cnt
 11          loop
 12                  select in_row  into l_data from t where id = i;
 13                  select out_row into l_data from t where id = i;
 14          end loop;
 15  end;
 16  /
PL/SQL procedure successfully completed.

When we review the TKPROF report for this small simulation, the results are rather obvious:

SELECT IN_ROW FROM T WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  18240      0.23       0.25          0          0          0           0
Fetch    18240      0.22       0.27          0      54720          0       18240
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    36481      0.46       0.53          0      54720          0       18240
********************************************************************************
SELECT OUT_ROW FROM T WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  18240      0.23       0.24          0          0          0           0
Fetch    18240      1.95       1.67      18240      72960          0       18240
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    36481      2.18       1.91      18240      72960          0       18240

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                            18240        0.00          0.14

The retrieval of the IN_ROW column was significantly faster and consumed far fewer resources. We can see that it used 54,720 logical I/Os (query mode gets), whereas the OUT_ROW column used significantly more logical I/Os. At first it is not clear where these extra logical I/Os are coming from, but if you remember how LOBs are stored, it will become obvious. These are the I/Os against the LOBINDEX segment in order to find the pieces of the LOB. Those extra logical I/Os are all against this LOBINDEX.

Additionally, you can see that the retrieval of 18,240 rows with out of row storage incurred 18,240 physical I/Os and resulted in 18,240 I/O waits for direct path read. These were the reads of the noncached LOB data. We might be able to reduce them in this case by enabling caching on the LOB data, but then we’d have to ensure we had sufficient additional buffer cache to be used for this. Also, if there were some really large LOBs in there, we might not really want this data to be cached.

This in row/out of row storage will affect modifications as well as reads. If we were to update the first 100 rows with short strings, and insert 100 new rows with short strings and use the same techniques to monitor performance as follows:

EODA@ORA12CR1> create sequence s start with 100000;
Sequence created.

EODA@ORA12CR1> declare
  2          l_cnt    number;
  3          l_data   varchar2(32765);
  4  begin
  5          dbms_monitor.session_trace_enable;
  6          for i in 1 .. 100
  7          loop
  8                  update t set in_row  =
                     to_char(sysdate,'dd-mon-yyyy hh24:mi:ss') where id = i;
  9                  update t set out_row =
                     to_char(sysdate,'dd-mon-yyyy hh24:mi:ss') where id = i;
 10                  insert into t (id, in_row) values ( s.nextval, 'Hello World' );
 11                  insert into t (id,out_row) values ( s.nextval, 'Hello World' );
 12          end loop;
 13  end;
 14  /
PL/SQL procedure successfully completed.

we would discover findings similar to the following output in the resulting TKPROF report

UPDATE T SET IN_ROW = TO_CHAR(SYSDATE,'dd-mon-yyyy hh24:mi:ss') WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.00       0.01          0        200        214         100
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101      0.00       0.01          0        200        214         100

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T (cr=2 pr=0 pw=0 time=463 us)
         1          1          1  INDEX UNIQUE SCAN SYS_C005434 (cr=2 pr=0 pw=0 time=16...

********************************************************************************
UPDATE T SET OUT_ROW = TO_CHAR(SYSDATE,'dd-mon-yyyy hh24:mi:ss') WHERE ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.03       0.99          0        200        302         100
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101      0.03       0.99          0        200        302         100

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T (cr=2 pr=0 pw=1 time=8759 us)
         1          1          1  INDEX UNIQUE SCAN SYS_C005434 (cr=2 pr=0 pw=0 time=6...

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  direct path write                             163        0.01          0.96

As we can see, the update of the out-of-line LOB consumed measurably more resources. It spent some amount of time doing direct path writes (physical I/O) and performed many more current mode gets. These were in response to the fact that the LOBINDEX and LOBSEGMENT had to be maintained in addition to the table itself. The INSERT activity shows the same disparity:

INSERT INTO T (ID, IN_ROW) VALUES ( S.NEXTVAL, 'Hello World' )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.00       0.00          0          4        317         100
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101      0.00       0.00          0          4        317         100
********************************************************************************
INSERT INTO T (ID,OUT_ROW) VALUES ( S.NEXTVAL, 'Hello World' )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.02       0.61          0          4        440         100
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101      0.02       0.61          0          4        440         100
...
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path write                             100        0.01          0.60

Note the increased I/O usage, both on the read and writes. All in all, this shows that if you use a CLOB, and many of the strings are expected to fit in the row (i.e., will be less than 4,000 bytes), then using the default of ENABLE STORAGE IN ROW is a good idea.

CHUNK Clause

LOBs are stored in chunks; the index that points to the LOB data points to individual chunks of data. Chunks are logically contiguous sets of blocks and are the smallest unit of allocation for LOBs, whereas normally a block is the smallest unit of allocation. The CHUNK size must be an integer multiple of your Oracle blocksize—this is the only valid value.

Image Note  The CHUNK clause only applies to BasicFiles. The CHUNK clause appears in the syntax clause for SecureFiles for backward compatibility purposes only.

You must take care to choose a CHUNK size from two perspectives. First, each LOB instance (each LOB value stored out of line) will consume at least one CHUNK. A single CHUNK is used by a single LOB value. If a table has 100 rows and each row has a LOB with 7KB of data in it, you can be sure that there will be 100 chunks allocated. If you set the CHUNK size to 32KB, you will have 100 32KB chunks allocated. If you set the CHUNK size to 8KB, you will have (probably) 100 8KB chunks allocated. The point is, a chunk is used by only one LOB entry (two LOBs will not use the same CHUNK). If you pick a chunk size that does not meet your expected LOB sizes, you could end up wasting an excessive amount of space. For example, if you have that table with 7KB LOBs on average, and you use a CHUNK size of 32KB, you will be wasting approximately 25KB of space per LOB instance. On the other hand, if you use an 8KB CHUNK, you will minimize any sort of waste.

You also need to be careful when you want to minimize the number of CHUNKs you have per LOB instance. As you have seen, there is a LOBINDEX used to point to the individual chunks, and the more chunks you have, the larger this index is. If you have a 4MB LOB and use an 8KBCHUNK, you will need at least 512 CHUNKs to store that information. This means you need at least enough LOBINDEX entries to point to these chunks. It might not sound like a lot until you remember this is per LOB instance; if you have thousands of 4MB LOBs, you now have many thousands of entries. This will also affect your retrieval performance, as it takes longer to read and manage many small chunks than it takes to read fewer, but larger, chunks. The ultimate goal is to use a CHUNK size that minimizes your waste, but also efficiently stores your data.

RETENTION Clause

The RETENTION clause differs depending on whether you’re using SecureFiles or BasicFiles. If you look back at the output of DBMS_METADATA at the beginning of the “Internal Lobs” section, notice that there is no RETENTION clause in the CREATE TABLE statement for a SecureFiles LOB whereas there is one for a BasicFiles LOB. This is because RETENTION is automatically enabled for SecureFiles.

RETENTON is used to control the read consistency of the LOB. I’ll provide details in subsequent subsections on how RETENTION is handled differently between SecureFiles and BasicFiles.

Read Consistency for LOBs

In previous chapters, we’ve discussed read consistency, multiversioning, and the role that undo plays in that. Well, when it comes to LOBs, the way read consistency is implemented changes. The LOBSEGMENT does not use undo to record its changes; rather, it versions the information directly in the LOBSEGMENT itself. The LOBINDEX generates undo just as any other segment would, but the LOBSEGMENT does not. Instead, when you modify a LOB, Oracle allocates a new CHUNK and leaves the old CHUNK in place. If you roll back your transaction, the changes to the LOB index are rolled back and the index will point to the old CHUNK again. So the undo maintenance is performed right in the LOBSEGMENT itself. As you modify the data, the old data is left in place and new data is created.

This comes into play for reading the LOB data as well. LOBs are read consistent, just as all other segments are. If you retrieve a LOB locator at 9:00 a.m., the LOB data you retrieve from it will be “as of 9:00 a.m.” Just like if you open a cursor (a resultset) at 9:00 a.m., the rows it produces will be as of that point in time. Even if someone else comes along and modifies the LOB data and commits (or not), your LOB locator will be “as of 9:00 a.m.,” just like your resultset would be. Here, Oracle uses the LOBSEGMENT along with the read-consistent view of the LOBINDEX to undo the changes to the LOB, to present you with the LOB data as it existed when you retrieved the LOB locator. It does not use the undo information for the LOBSEGMENT, since none was generated for the LOBSEGMENT itself.

We can see that LOBs are read-consistent easily. Consider this small table with an out-of-line LOB (it is stored in the LOBSEGMENT):

EODA@ORA12CR1> create table t
  2  ( id int   primary key,
  3    txt      clob
  4  )
  5  lob( txt) store as ( disable storage in row )
  6  /
Table created.

EODA@ORA12CR1> insert into t values ( 1, 'hello world' );
1 row created.

EODA@ORA12CR1> commit;
Commit complete.

If we fetch out the LOB locator and open a cursor on this table as follows

EODA@ORA12CR1> declare
  2          l_clob  clob;
  3
  4          cursor c is select id from t;
  5          l_id    number;
  6  begin
  7          select txt into l_clob from t;
  8          open c;

and then we modify that row and commit

  9
 10          update t set id = 2, txt = 'Goodbye';
 11          commit;
 12

we’ll see upon working with the LOB locator and opened cursor that the data is presented “as of the point in time we retrieved or opened them”

 13          dbms_output.put_line( dbms_lob.substr( l_clob, 100, 1 ) );
 14          fetch c into l_id;
 15          dbms_output.put_line( 'id = ' || l_id );
 16          close c;
 17  end;
 18  /
hello world
id = 1

PL/SQL procedure successfully completed.

but the data is most certainly updated/modified in the database

EODA@ORA12CR1> select * from t;

        ID TXT
---------- ---------------
         2 Goodbye

The read-consistent images for the cursor C came from the undo segments, whereas the read-consistent images for the LOB came from the LOBSEGMENT itself. So, that gives us a reason to be concerned: if the undo segments are not used to store rollback for LOBs and LOBs support read consistency, how can we prevent the dreaded ORA-01555: snapshot too old error from occurring? And, as important, how do we control the amount of space used by these old versions? That is where RETENTION, and alternatively, PCTVERSION come into play.

BasicFiles RETENTION

RETENTION tells the database to retain modified LOB segment data in the LOB segment in accordance with your database’s UNDO_RETENTION setting. If you set your UNDO_RETENTION to 2 days, Oracle will attempt to not reuse LOB segment space freed by a modification. That is, if you deleted all of your rows pointing to LOBS, Oracle would attempt to retain the data in the LOB segment (the deleted data) for two days in order to satisfy your UNDO_RETENTION policy, just as it would attempt to retain the undo information for the structured data (your relational rows and columns) in the UNDO tablespace for two days. It is important you understand this: the freed space in the LOB segment will not be immediately reused by subsequent INSERTs or UPDATEs. This is a frequent cause of questions in the form of, “Why is my LOB segment growing and growing?” A mass purge followed by a reload of information will tend to cause the LOB segment to just grow, since the retention period has not yet expired.

Image Note  To use RETENTION, the BasicFiles LOB must reside in an automatic segment space management (ASSM) tablespace. The RETENTION parameter is ignored if the BasicFiles LOB resides in a manual segment space management (MSSM) tablespace. See Chapter 10 for a discussion on ASSM and MSSM.

Alternatively, the BasicFiles LOB storage clause could use PCTVERSION, which controls the percentage of allocated (used by LOBs at some point and blocks under the LOBSEGMENT’s HWM) LOB space that should be used for versioning of LOB data. The default of 10 percent is adequate for many uses since many times you only ever INSERT and retrieve LOBs (updating of LOBs is typically not done; LOBs tend to be inserted once and retrieved many times). Therefore, not much space, if any, needs to be set aside for LOB versioning.

However, if you have an application that does modify the LOBs frequently, the default of 10 percent may be too small if you frequently read LOBs at the same time some other session is modifying them. If you hit an ORA-22924 error while processing a LOB, the solution is not to increase the size of your undo tablespace, or increase the undo retention, or add more rollback segments if you are using manual undo management. Rather you should use the following:

ALTER TABLE tabname MODIFY LOB (lobname) ( PCTVERSION n );

and increase the amount of space to be used in that LOBSEGMENT for versioning of data.

SecureFiles RETENTION

SecureFiles use RETENTION to control read consistency (just like BasicFiles). In the CREATE TABLE output of DBMS_METADATA for the SecureFiles LOB, there is no RETENTION clause. This is because the default RETENTION is set to AUTO, which instructs Oracle to retain undo long enough for read-consistent purposes.

If you want to alter the default RETENTION behavior, you can adjust it via the following parameters:

·     Use MAX to indicate that the undo should be retained until the LOB segment has reached the MAXSIZE specified in the storage clause (therefore, MAX must be used in conjunction with the MAXSIZE clause in the storage clause).

·     Set MIN N if the flashback database is enabled to limit the undo duration for the LOB to N seconds.

·     Set NONE if undo is not required for consistent reads or flashback operations.

If you don’t set the RETENTION parameter for SecureFiles, or specify RETENTION with no parameters, then it is set to DEFAULT (which is equivalent of AUTO).

CACHE Clause

The CREATE TABLE statement returned from DBMS_METADATA previously included the following for both SecureFiles and BasicFiles:

LOB ("TXT") STORE AS ... (...   NOCACHE ... )

The alternative to NOCACHE is CACHE or CACHE READS. This clause controls whether or not the LOBSEGMENT data is stored in the buffer cache. The default NOCACHE implies that every access will be a direct read from disk and every write/modification will likewise be a direct read from disk. CACHE READS allows LOB data that is read from disk to be buffered, but writes of LOB data will be done directly to disk. CACHE permits the caching of LOB data during both reads and writes.

In many cases, the default might not be what you want. If you have small- to medium-sized LOBS (e.g., you are using them to store descriptive fields of just a couple of kilobytes), caching them makes perfect sense. If they are not cached, when the user updates the description field the user must also wait for the I/O to write the data to disk (an I/O the size of a CHUNK will be performed and the user will wait for this I/O to complete). If you are performing a large load of many LOBs, you will have to wait for the I/O to complete on each row as they are loaded. It makes sense to enable caching on these LOBs. You may turn caching on and off easily:

ALTER TABLE tabname MODIFY LOB (lobname) ( CACHE );
ALTER TABLE tabname MODIFY LOB (lobname) ( NOCACHE );

to see the effect this may have on you. For a large initial load, it would make sense to enable caching of the LOBs and allow DBWR to write the LOB data out to disk in the background while your client application keeps loading more. For small- to medium-sized LOBs that are frequently accessed or modified, caching makes sense so the end user doesn’t have to wait for physical I/O to complete in real time. For a LOB that is 50MB in size, however, it probably does not make sense to have that in the cache.

Image Tip  Bear in mind that you can make excellent use of the Keep or Recycle pools (discussed in Chapter 4) here. Instead of caching the LOBSEGMENT data in the default cache with all of the regular data, you can use the Keep or Recycle pools to separate it out. In that fashion, you can achieve the goal of caching LOB data without affecting the caching of existing data in your system.

LOB STORAGE Clause

And lastly, the CREATE TABLE statement returned from DBMS_METADATA previously included the following for SecureFiles:

LOB ("TXT") STORE AS SECUREFILE (...
STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))

And here is the corresponding output for BasicFiles:


LOB ("TXT") STORE AS BASICFILE ( ...
 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))

Both SecureFiles and BasicFiles have a full storage clause you can employ to control the physical storage characteristics. It should be noted that this storage clause applies to the LOBSEGMENT and the LOBINDEX equally—a setting for one is used for the other.

The management of the storage with SecureFiles is less complicated than that of a BasicFiles. Recall that a SecureFiles LOB must be created within an ASSM-managed tablespace, and therefore the following attributes no longer apply: FREELISTS, FREELIST GROUPS, andFREEPOOLS.

For a BasicFiles LOB, the relevant settings for a LOB would be the FREELISTS, FREELIST GROUPS (when not using ASSM, as discussed in Chapter 10). The same rules apply to the LOBINDEX segment, as the LOBINDEX is managed the same as any other index segment. If you have highly concurrent modifications of LOBs, multiple FREELISTS on the index segment might be recommended.

As mentioned in the previous section, using the Keep or Recycle pools for LOB segments can be a useful technique to allow you to cache LOB data, without damaging your existing default buffer cache. Rather than having the LOBs age out block buffers from normal tables, you can set aside a dedicated piece of memory in the SGA just for these objects. The BUFFER_POOL clause could be used to achieve that.

BFILEs

The last of the LOB types to talk about is the BFILE type. A BFILE type is simply a pointer to a file in the operating system. It is used to provide read-only access to these operating system files.

Image Note  The built-in package UTL_FILE provides read and write access to operating system files, too. It does not use the BFILE type, however.

When you use BFILEs, you will also be using an Oracle DIRECTORY object. The DIRECTORY object simply maps an operating system directory to a string or a name in the database (providing for portability; you refer to a string in your BFILEs, not an operating system–specific file-naming convention). So, as a quick example, let’s create a table with a BFILE column, create a DIRECTORY object, and insert a row referencing a file in the file system:

EODA@ORA12CR1> create table t
  2  ( id       int primary key,
  3    os_file  bfile
  4  )
  5  /
Table created.

EODA@ORA12CR1> create or replace directory my_dir as '/tmp/';
Directory created.

EODA@ORA12CR1> insert into t values ( 1, bfilename( 'MY_DIR', 'test.dmp' ) );
1 row created.

For this example, I’ll create a test.dmp file in the /tmp directory using the UNIX/Linux dd command:

dd if=/dev/zero of=/tmp/test.dmp bs=1056768 count=1

Now the BFILE can be treated as if it were a LOB—because it is. For example:

EODA@ORA12CR1> select dbms_lob.getlength(os_file) from t;

DBMS_LOB.GETLENGTH(OS_FILE)
---------------------------
                    1056768

We can see the file pointed to is 1MB in size. Note that the use of MY_DIR in the INSERT statement was intentional. If we use mixed case or lowercase, we would get the following:

EODA@ORA12CR1> update t set os_file = bfilename( 'my_dir', 'test.dmp' );
1 row updated.

EODA@ORA12CR1> select dbms_lob.getlength(os_file) from t;
select dbms_lob.getlength(os_file) from t
       *
ERROR at line 1:
ORA-22285: non-existent directory or file for GETLENGTH operation
ORA-06512: at "SYS.DBMS_LOB", line 850

This example points out that DIRECTORY objects in Oracle are identifiers, and identifiers are stored in uppercase by default. The BFILENAME built-in function accepts a string, and this string’s case must match the case of the DIRECTORY object exactly as stored in the data dictionary. So, we must either use uppercase in the BFILENAME function or use quoted identifiers when creating the DIRECTORY object:

EODA@ORA12CR1> create or replace directory "my_dir" as '/tmp/';

Directory created.

EODA@ORA12CR1> select dbms_lob.getlength(os_file) from t;

DBMS_LOB.GETLENGTH(OS_FILE)
---------------------------
                    1056768

I recommend against using quoted identifiers; rather, use the uppercase name in the BFILENAME call. Quoted identifiers are not usual and tend to create confusion downstream.

A BFILE (the pointer object in the database, not the actual binary file on disk) consumes a varying amount of space on disk, depending on the length of the DIRECTORY object name and the file name. In the preceding example, the resulting BFILE was about 35 bytes in length. In general, you’ll find the BFILE consumes approximately 20 bytes of overhead plus the length of the DIRECTORY object name plus the length of the file name itself.

Image Note  BFILE data is not read consistent as other LOB data is. Since the BFILE is managed outside of the database, whatever happens to be in the file when you dereference the BFILE is what you will get. So, repeated reads from the same BFILE may produce different results—unlike a LOB locator used against a CLOB, BLOB, or NCLOB.

ROWID/UROWID Types

The last datatypes to discuss are the ROWID and UROWID types. A ROWID is the address of a row in a table (remember from Chapter 10 that it takes a ROWID plus a tablename to uniquely identify a row in a database). Sufficient information is encoded in the ROWID to locate the row on disk, as well as identify the object the ROWID points to (the table and so on). ROWID’s close relative, UROWID, is a universal ROWID and is used for tables, such as IOTs and tables accessed via gateways to heterogeneous databases that do not have fixed ROWIDs. The UROWID is a representation of the primary key value of the row and hence will vary in size depending on the object it points to.

Every row in every table has either a ROWID or a UROWID associated with it. They are considered pseudo columns when retrieved from a table, meaning they are not actually stored with the row, but rather are a derived attribute of the row. A ROWID is generated based on the physical location of the row, it is not stored with it. A UROWID is generated based on the row’s primary key, so in a sense it is stored with the row, but not really, as the UROWID does not exist as a discrete column, but rather as a function of the existing columns.

It used to be that for rows with ROWIDs (the most common type of rows in Oracle; with the exception of rows in IOTs, all rows have ROWIDs), the ROWIDs were immutable. When a row was inserted, it would be associated with a ROWID, an address, and that ROWID would be associated with that row until it was deleted, until it was physically removed from the database. Over time, this is becoming less true, as there are now operations that may cause a row’s ROWID to change, for example:

·     Updating the partition key of a row in a partitioned table such that the row must move from one partition to another

·     Using the FLASHBACK table command to restore a database table to a prior point in time

·     MOVE operations and many partition operations such as splitting or merge partitions

·     Using the ALTER TABLE SHRINK SPACE command to perform a segment shrink

Now, since ROWIDs can change over time (since they are no longer immutable), it is not recommended to physically store them as columns in database tables. That is, using a ROWID as a datatype of a database column is considered a bad practice and should be avoided. The primary key of the row (which should be immutable) should be used instead, and referential integrity can be in place to ensure data integrity is preserved. You cannot do this with the ROWID types—you cannot create a foreign key from a child table to a parent table by ROWID, and you cannot enforce integrity across tables like that. You must use the primary key constraint.

Of what use is the ROWID type, then? It is still useful in applications that allow the end user to interact with the data—the ROWID, being a physical address of a row, is the fastest way to access a single row in any table. An application that reads data out of the database and presents it to the end user can use the ROWID upon attempting to update that row. The application must use the ROWID in combination with other fields or checksums (refer to Chapter 7 for further information on application locking). In this fashion, you can update the row in question with the least amount of work (e.g., no index lookup to find the row again) and ensure the row is the same row you read out in the first place by verifying the column values have not changed. So, a ROWID is useful in applications that employ optimistic locking.

Summary

In this chapter, we’ve examined many basic datatypes provided by Oracle; we’ve seen how they are physically stored and what options are available with each. We started with character strings, the most basic of types, and looked into considerations surrounding multibyte characters and raw binary data. We then discussed extended datatypes (available in Oracle 12c and above) and how this feature allows you to define VARCHAR2,  NVARCHAR2, and RAW datatypes to be as large as 32, 727 bytes. Next, we studied the numeric types, including the very precise Oracle NUMBERtype and the new floating-point types provided with Oracle 10g and later.

We also gave consideration to the legacy LONG and LONG RAW types, concentrating on how you might work around their existence, as the functionality provided by these types falls far short of that provided by the LOB types. Next, we looked at the datatypes capable of storing dates and times. We covered the basics of date arithmetic, a perplexing issue until you’ve seen it demonstrated. Lastly, in the section on dates and timestamps, we looked at the INTERVAL type and how best to use it.

The most detailed part of the chapter from a physical storage perspective was the LOB section. The LOB type is frequently misunderstood by developers and DBAs alike, so the bulk of the section was spent looking at how they are physically implemented as well as certain performance considerations and the differences between SecureFiles and BasicFiles.

The last datatype we looked at was the ROWID/UROWID type. For what now should be obvious reasons, you should not use this datatype as a database column, since ROWIDs are not immutable and no integrity constraints could enforce the parent/child relationship. Rather, you want to store primary keys if you need to point to another row.