SQL Data Types

SQL Data Types Overview

In SQL, as in many other programming languages, every piece of data has a particular type. For example, 11 is an integer, and “Wall Street” is a text string. The SQL standard defines many data types to accommodate almost any type of data.

Some cases arise where the SQL standard does not adequately support certain types of data, however, and in these cases, the individual implementations of SQL extend data type definitions to include certain custom data types.

In this article, the standard SQL data types and NULL values will be explained, and some of the more popular custom data types will also be covered. In addition, type casting and coercion with the CAST function will be discussed.

Standard Data Types

The standard SQL data types can be broken up into roughly four categories: Boolean and binary, character, numeric, and date and time.

Boolean & Binary

    boolean

Also known as bool, this type allows a value of either true or false. SQL implementations tend to allow different sets of values for the data, including true and false, t and f, yes and no, and 1 and 0, respectively, depending on the database system.

    bit(n)

This represents a bit string, or a string of zeros and ones, which could look something like 01101001. The maximum number of bits in the string is specified by n. Many implementations of this data type will pad a bit string with zeros on the right-hand side if a bit string has fewer than n bits. This may not be desirable behavior, depending on the application using the bit string, so it is prudent for developers to remember this. An example of this behavior is shown in Figure 1.

In Figure 1, we see the decimal number 3850, represented by binary 111100001010, cast to the decimal number 61600, represented in 16 bits by binary 1111000010100000. For purposes of left-hand bit shifting, the padding presents no problem, but it will cause errors if used for numeric computation or right-hand bit shifting.

    bit varying(n)

Also called varbit(n), this data type allows a variable-length bit string, with a length of up to n bits. In some database systems, varbit can be specified without a maximum length, and that field will accept bit strings of (ideally) unlimited length. These bit strings are not padded.

{qbapagebreak title=SQL Character and Numeric Data Type}

Character

    character(n)

This and its alias, char(n), allow text strings of length n. Strings of less than length n will be padded with spaces on the right-hand side up to length n. An example of this behavior is shown in Figure 2.

    character varying(n)

Also known as varchar(n), this data type allows text strings of any length up to n characters. These strings are not padded.

Some database systems implement this as simply varchar, which does not require a maximum length to be specified. A varchar field without a specified maximum length will accept text strings of (ideally) unlimited length. This is similar to the non-standard text data type, which is discussed later in this article.

Numeric

smallint

Also called int2, this data type represents a signed, two-byte integer. These may range in value from -32,768 to 32,767.

integer

This and its aliases, int and int4, represent signed, four-byte integers. These may range in value from -2,147,483,648 to 2,147,483,647 and are the most commonly used integer data type.

real

Also called float4, this represents a four-byte floating point number. This has less precision than a double precision value and can represent, approximately, -3.40e38 to -1.18e-38, zero, and 1.18e-38 to 3.40e38.

double precision

Also known as float8 and simply float, these eight-byte floating point numbers are much more precise than those represented by real. These are also more computationally expensive to use in calculations, and they require more storage than real values. Most applications also do not require the precision of double precision values. These values can represent, approximately, -1.79e308 to -2.23e-308, zero, and 2.23e-308 to 1.79e308.

numeric(p, s)

This value and its synonyms, decimal(p, s) and dec(p, s), represent exact integer or decimal numbers with arbitrary precision p and scale s. To be clear, p is the maximum number of digits that can be stored in the entire value, and s is the maximum number of digits that can be stored to the right of the decimal point. Thus, 0 ≤ s ≤ p. These data types are commonly used to precisely store monetary values.

{qbapagebreak title=SQL Date & Time Data Type}

Date & Time

date

This represents a calendar date and specifies a year, month, and day. Its specific format (i.e. month/day/year, day/month/year, year-month-day, etc.) depends on the locale of the database. Some database systems allow the administrator to set a specified “date style” that will be used in lieu of the locale’s date format, if such separate behavior is desired.

time

This represents the time of day and does not include a date. Like date, the display format of time fields will depend on the locale of the database.

This data type may also be specified as time with time zone, which includes a time zone. Time zones can be specified by their three-letter code, like UTC or EST, or by the difference from UTC, like -05. 3:00 PM, Eastern Standard Time (United States), for example, could be represented as 15:00:00-05.

timestamp

The timestamp type combines the date and time types and may also include a time zone. An example value might be 2007-12-31 23:59:59 UTC.

Null Values

The NULL value is not actually a data type, but it should be mentioned here for completeness. NULL can be accepted as a value for a column of any data type, so long as that column is not constrained to be NOT NULL.

The NULL value is used by the database to signify that a certain field has no value, and database applications often use it when a field is considered optional. In conditional SQL statements, the IS NULL clause can be used to determine whether a field contains a NULL value.

NULL has an interesting and sometimes counter-intuitive effect on text strings in certain database systems. In PostgreSQL, for example, a string concatenation containing a NULL value will result in a NULL string, even if other parts of the concatenation are not NULL values. This is illustrated in Figure 3.

In PostgreSQL, this behavior can be overcome by using the COALESCE() function, whose sole purpose is to return the first non-NULL parameter. This is shown in Figure 4.

COALESCE() is especially useful in database application programming, when it is not known whether a value will be NULL prior to a concatenation or another use of a possibly-NULL value in a string manipulation process.

{qbapagebreak title=Implementation-Specific Data Types}

Implementation-Specific Data Types

As mentioned above, implementations of SQL tend to have their own custom data types. These are typically provided for “ease of use” purposes, but some are altogether quite different from the standard SQL data types.

“Ease of Use” Data Types

An example of a non-standard “ease of use” data type is PostgreSQL’s text data type, which may contain a text string of any length, without having to worry too much about upper limits on string size. Unlike character(n) and character varying(n), no maximum length n may be specified. The true maximum length of a text field is limited only by the resources, like memory and disk space, available to the database server. This is a very common extension to the standard data types.

Other “ease of use” data types commonly implemented in database systems define objects such as eight-byte (64-bit) integers, network addresses, automatically-incrementing serial numbers, and currency types.

Other Non-Standard Data Types

Some of the less common non-standard data types are exemplified by PostgreSQL’s “geometric types” data types. These include point, line, circle, and others, which define geometric shapes. PostgreSQL also defines functions that are able to scale, translate, rotate, and intersect these objects. Figure 5 shows an example of one of these types.

In Figure 5, we test two line segment (lseg) intersections with the ?#, or intersection, operator.

In the first query, two temporary line segments are created: one from (0, 10) to (0, -10) and the other from (-10, 0) to (10, 0) on the plane. Intuitively, we know these two line segments run along the Cartesian x and y axes, and they intersect at (0, 0). The ?# operator returns t, or true, meaning the line segments do, in fact, intersect.

The second query attempts to intersect a line segment with endpoints (0, 10) and (0, -10) with another line segment with endpoints (-10, 11) and (10, 11). We can visualize this example and determine the lines do not intersect. At their closest proximity, they are still a distance of 1 apart. The query returns f, or false, and confirms this.

PostgreSQL and other database systems provide many non-standard data types and the operators and functions used to manipulate them. A comprehensive list of these is not practical to include in an article such as this. The best way to research these non-standard data types is to reference each database system’s manual.

Type Casting

When a value is used as something other than its normal data type, it is usually type cast to the new data type. This is commonly useful when, for example, an integer is being extracted from a string or a date is being incorporated into a string.

Type casts occur either implicitly or explicitly. An implicit cast is one that occurs when the method of converting a value is unambiguous. For example, the integer to varchar cast is usually implemented as an implicit cast because there is only one way to convert a number to a variable character string. Explicit casts, on the other hand, do not happen automatically and must be specified each time they occur.

In many programming languages, explicit casting is required when a loss of precision could result. That is, if the value 99.99 were being assigned to an integer variable, many programming languages would not allow the operation to take place without the programmer explicitly casting 99.99 to an integer type. SQL does not care about precision in casting, and this is something of which database developers must be acutely aware, especially if their programming background is in a language that requires explicit casts for conversions that would result in a loss of precision. Figure 6 shows an example of an implicit cast and the loss of precision that results from rounding a decimal number.

The SQL standard determines which data types can be implicitly cast to other data types, but the implementation of these casts varies among database systems. An example of a CAST is shown in Figure 7.

Type casting is an important part of any programming language, and learning how the SQL CAST operator interacts with each common SQL data type is a vital step to take toward becoming an expert SQL programmer.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It