Parallel architectures play a significant role in enhancing the performance of databases by enabling them
to handle large volumes of data and high transaction rates. Let’s examine two common parallel
architectures: shared-nothing and shared-disk.
Shared-Nothing Architecture
Description:
i. Decentralization: Each node in the system has its own private memory and disk storage. Nodes
communicate with each other over a network.
ii. Scalability: This architecture scales well since adding more nodes increases both storage
capacity and processing power without contention for shared resources.
Advantages:
i. High Scalability: Easily scales out by adding more nodes.
ii. Fault Tolerance: Failures in one node do not affect others, ensuring high availability.
iii. Performance: Reduces contention for resources, leading to better performance.
Disadvantages:
i. Complexity: Managing data distribution and query processing across multiple nodes can be
complex.
Use Cases:
i. Data Warehousing: Systems like Google Bigtable and Amazon Redshift use shared-nothing
architecture for efficient data storage and querying.
Shared-Disk Architecture
Description:
i. Centralized Storage: All nodes share access to a common disk storage but have their own private
memory. Nodes communicate with the shared disk storage over a high-speed network.
Advantages:
i. Simpler Data Management: Since data is centrally stored, managing and updating it is more
straightforward.
ii. High Availability: If one node fails, others can continue to access the shared storage.
Disadvantages:
i. Scalability Limits: Can face bottlenecks at the shared disk, limiting scalability.
ii. Resource Contention: Potential for contention and latency issues due to shared access to the
disk.
Use Cases:
i. Clustered Databases: Oracle RAC (Real Application Clusters) uses shared-disk architecture to
provide high availability and load balancing.
Examples of Parallel Execution Problems
- Skewed Data Distribution:
o Issue: Uneven distribution of data across nodes can lead to some nodes being
overloaded while others are underutilized.
o Impact: Causes performance bottlenecks and inefficient resource utilization. - Network Latency:
o Issue: High network latency can impact the performance of parallel queries, especially in
shared-nothing architectures where nodes need to communicate frequently.
o Impact: Increases query response times and reduces overall system efficiency. - Resource Contention:
o Issue: In shared-disk architectures, multiple nodes accessing the same disk can lead to
contention for disk I/O resources.
o Impact: Causes delays and can become a performance bottleneck. - Synchronization Overhead:
o Issue: Coordinating and synchronizing parallel tasks can introduce overhead, particularly
in complex queries that require data from multiple nodes.
o Impact: Reduces the performance gains from parallelism.