The INGRES algorithm refers to the method developed for query optimization and execution in the INGRES Database Management System (DBMS). INGRES, which stands for Interactive Graphics and Retrieval System, was an early relational DBMS developed at the University of California, Berkeley, in the 1970s and 1980s. It introduced several innovations that later influenced the design of modern relational databases.
The INGRES algorithm specifically refers to the query optimization process used to determine the most efficient way to execute SQL queries. This process involves selecting an execution plan that minimizes the total cost of executing a query, considering factors like the number of disk accesses, CPU time, and communication overhead.
Key Concepts of INGRES Query Optimization Algorithm
- Relational Algebra:
- The INGRES algorithm is based on relational algebra, which consists of operators like selection (σ), projection (π), join (⨝), union (∪), and difference (−). Each operator corresponds to a specific type of query operation, such as filtering, joining tables, or selecting columns.
- Cost Model:
- The cost model in INGRES uses estimations for I/O costs, CPU time, and communication overhead. The goal of the query optimization process is to find the execution plan with the lowest overall cost, typically measured in terms of disk reads, CPU time, and other resource usage.
- Transformation Rules:
- The INGRES algorithm applies transformation rules to rewrite the query into an equivalent but potentially more efficient form. These transformations include:
- Pushdown selections: Moving selection operations (filters) closer to the base tables to reduce the number of rows processed in subsequent operations.
- Join reorderings: Changing the order of joins to optimize the execution.
- Projection pushdown: Moving projection operations (column selections) earlier in the query plan to reduce the number of columns processed.
- The INGRES algorithm applies transformation rules to rewrite the query into an equivalent but potentially more efficient form. These transformations include:
- Join Ordering:
- Join ordering is one of the most critical aspects of query optimization. The INGRES algorithm considers different ways to order the joins, aiming to minimize the size of intermediate results. For example, a query with multiple joins may be optimized by choosing an order in which the joins generate smaller intermediate results, reducing the amount of data that needs to be processed in subsequent joins.
- Access Path Selection:
- Access paths refer to the ways the DBMS retrieves data from storage, such as through sequential scans, index scans, or hash-based joins. The INGRES algorithm chooses the most efficient access paths based on available indexes, the number of rows involved, and the query’s specific structure.
- Cost-based Optimization:
- The INGRES query optimizer uses a cost-based approach to evaluate different execution plans. This involves estimating the costs of different query plans based on I/O operations, CPU cycles, and other system metrics. The optimizer chooses the plan that minimizes total cost. It considers the available indexes, the size of the tables, and the query’s predicates (conditions).
- Dynamic Query Evaluation:
- The INGRES system supports dynamic query evaluation, meaning that query execution plans are chosen at runtime based on the available statistics about the database. This allows the system to adapt to changes in data distribution and system load.
INGRES Query Optimization Phases
The process of optimizing a query in the INGRES algorithm generally involves the following steps:
- Parsing:
- The SQL query is parsed to generate a parse tree or query tree, which represents the logical structure of the query in terms of relational operators (like selection, join, etc.).
- Logical Query Plan Generation:
- From the parse tree, the system generates a logical query plan based on relational algebra. This logical plan is independent of any physical storage considerations but defines the sequence of relational operators to execute the query.
- Cost Estimation:
- The system estimates the cost of different execution strategies (logical query plans). This estimation involves calculating the expected I/O operations, CPU usage, and other system resources based on available indexes, data sizes, and statistics.
- Physical Plan Generation:
- After estimating costs, the optimizer transforms the logical query plan into a physical query plan, which includes the actual access paths (e.g., which indexes to use, how to join tables) and the order of operations.
- Plan Selection:
- The optimizer selects the optimal query plan with the lowest estimated cost. This is based on the cost model used by the optimizer.
- Execution:
- The chosen physical plan is executed, and the results are returned to the user.
Example of Query Optimization in INGRES
Consider the following query:
SELECT name, age FROM employees WHERE department = 'HR' AND age > 30;
The optimizer would evaluate the following possibilities:
- Apply the selection (filter) on the
department
andage
fields as early as possible, potentially reducing the number of rows before selecting the desired columns. - Choose the best access path for the
employees
table — whether to scan it sequentially or use an index if one exists on thedepartment
orage
columns. - Apply projection (column selection) only after filtering, so that fewer columns are processed in the intermediate results.
- If this query were part of a larger query with multiple joins, the optimizer would also consider the best join order to minimize the intermediate result size.
The INGRES algorithm is an early but influential query optimization method in relational databases. It helps the database management system determine the most efficient execution plan for a given query by using techniques like relational algebra, cost estimation, join ordering, and access path selection. By applying these techniques, the INGRES query optimizer can significantly improve query performance, especially for complex or large queries.