In relational database management system (RDBMS) terminology, Access Path refers to the path chosen by the system to retrieve data after a structured query language (SQL) request is executed.
A query may request at least one variable to be filled up with one value or more. A query may look like this:
SELECT family_name FROM users WHERE family_name = ‘Smith’
The query tells the computer to select all ‘Smith’ family names which may number to a few thousand from among tens of thousands within the database table. And so the database management system will have to estimate a filter factor using the determined value for the variable and an access path will have to be determined to get to the data.
Access path selection can make a tremendous impact on the overall performance of the system. The query mentioned above is a very simple query with one variable being matched to values from one table only. A more complex query may involve looking for many variables which can be matched to many different records on separate tables. Some of these variables even have complex conditions such as greater than or less than some value of integers. Many relational database makers have their own algorithms to optimize choosing of access paths while minimizing total access cost.
Optimization of access path selection maybe gauged using cost formulas with I/O and CPU utilization weight usually considered. Generally, query optimizers evaluate the available paths to data retrieval and estimate the cost in executing the statements using the determined paths or a combination of these paths.
In choosing an access path, the RDBMS optimizer examines the WHERE clause and the FROM clause. It then lays out possible plans of execution using the determined paths and, with the use of statistics for the columns, index and tables accessible to the statement, the optimizer then estimates the cost of executing the plan.
Access paths selection for joins where data is taken from than one table is basically done using the nested loop and merging scan techniques. Because joins are more complex, there are some other considerations for determining access path selections for them.
In general, the most common ways of paths selections include the following:
Full Table Scan – The RDBMS software scans all rows from the table and filters out those that do not mach the criteria in the query.
Row ID Scan – This is the fastest retrieval method for a single row. The row identification (rowed) give the exact location of the row in the specified database.
Index Scan – With this method, the RDBMS retrieves a row of records by traversing the index using the indexed column values required by the query statement. There are many types of index scans which many include Index Unique Scans, Index Range Scans, Index Skip Scans, Full Scans, Fast Full Index Scans, Index Joins and Bitmap Indexes.
Cluster Access Scan – This is used to retrieve all rows that have same cluster key value. The rows are coming from a table stored in an indexed cluster.
Hash Access Scan – This method locates rows in a hush cluster basing on some hash value. All rows containing the same hash values are stored within the same data block.
A new system present for optimizing access path selection is by defining an index and segment scan, the two types of scans which are available for SQL statements. Before returning the tuples, simple predicates called search arguments (SARGS) are added to the indexes. There are many more techniques under research by RDBMS vendors like Microsoft (SQL), Oracle, MySQL, PostgreSQL and many others.