PDW–The Architecture

In Order to Talk about PDW you need to get to know what is MPP (Massive Parallel Processing of Data).


(Massively Parallel Processing or Massively Parallel Processor) A multiprocessing architecture that uses many processors and a different programming paradigm than the common symmetric multiprocessing (SMP) found in today’s computer systems.
Self-Contained MPP Subsystems
Each CPU is a subsystem with its own memory and copy of the operating system and application, and each subsystem communicates with the others via a high-speed interconnect. In order to use MPP effectively, an information processing problem must be breakable into pieces that can all be solved simultaneously. In scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time. In the business world, a parallel data query (PDQ) divides a large database into pieces. For example, 26 CPUs could be used to perform a sequential search, each one searching one letter of the alphabet. To take advantage of more CPUs, the data have to be broken further into more parallel groups.
In contrast, adding CPUs in an SMP system increases performance in a more general manner. Applications that support parallel operations (multithreading) immediately take advantage of SMP, but performance gains are available to all applications simply because there are more processors. For example, four CPUs can be running four different applications.

Why ?

    Customers need more execute more queries against a large Database with the ability to retrieve more data faster and a single point of access is not enough

    Make sure the system is scalable with linear state when more data is added


The SQL Server Architecture

The SQL Server PDW Consists of at least 2 Appliances (Control Rack and Data Rack).


The Control Rack consists of six Nodes : two Management Nodes, two Control Nodes, the Landing Zone, and the Backup Node.  Storage Area Network’s (SAN) are also included for the Control Node, Landing Zone, and Backup Node.  Additionally, the control rack ships with dual Infiniband, Ethernet, and Fiber switches needed for the rack. 

•The Management Node:

–Is responsible for Management of Data Nodes and failover instances and new Data nodes monitoring Rack Status.

•The Control Node:

  • It is the Brain of the PDW Appliance it is responsible for managing queries to the Compute node inside the Data Rack and consolidate the result and return it to the Application.
  • It is the Node that manage the node that will host the insert operations of the compute node
  • So it’s really a brain

•The Landing Zone

  • It’s Staging Area has it’s own SAN Storage (around 1.8 TB of Data) and it holds the data
  • –It’s ETL Layer that is used to load the data into the appliance when Required.

•The Backup and restore Zone

  • This node is responsible for managing the backup and restore operations of the appliance.
  • Sending the Data to a disaster recovery site is a part of it’s responsibilities.

Because the appliance is designed to work out of the box, it includes its own Active Directory that is housed within the Management Node.  There are several reasons why PDW needs Active Directory, one of which is that we use Microsoft Clustering Services (MCS) within the appliance and MCS requires domain accounts for certain services to run.  Additionally the Management Node includes High Performance Computing (HPC) that is used during the initial install and for ease in management of the nodes within the appliance.

The Control Node is where user requests for data will enter and exit the appliance.  On the control nodes, queries will be parsed and then sent to compute nodes for processing.  Additionally, the metadata of the appliance and distributed databases is located here.  Essentially, the control node is the brains of the operation.  No persisted user data is located here, that all exists on the compute nodes within the data racks.  User data can be temporarily aggregated on the control node during query processing and then dropped after sent back to a client.

The Landing Zone is essentially a large file server with plenty of SAN storage to provide a staging ground for loading data into the appliance.  You will be able to load data either through the command line with DWLoader or through SSIS which now has an connector  for PDW.  The Backup Zone is another large file server that is designed to hold backups of the distributed databases on the appliance.  Compute nodes will be able to backup to the Backup Node in parallel via the high speed Infiniband connections that connect the nodes.  From the backup node, organizations will be able to offload their backups through their normal procedures.  Backups of a PDW database can only be restored to another PDW appliance with at least as many compute nodes as the database had when backed up.

If the Control Nodes in the Control Rack are considered the brains of the operation, the Compute Nodes in the Data Rack are certainly the brawn.  It is here within the Data Rack that all user data is stored and processed during query execution.  Each Data Rack has between 8-10 compute nodes.  Additionally, the Data Rack uses Microsoft Failover Clustering to gain high availability.  This is accomplished by having a spare node within the rack that acts as a passive node within the cluster.  Essentially, each compute node has its affinity set to failover to the spare node in the event of a failure on the active Compute Node.

Each compute node runs an instance of SQL Server and owns its own dedicated storage array.  User data is stored on the dedicated Storage Area Network.  The local disks on the Compute Node are used for TempDB.   The user data will be stored in one of two configurations:  Replicated tables or Distributed tables.  A replicated table is duplicated in whole on each Compute Node in the appliance.  When you think replicated tables in PDW, think small tables, usually dimension tables.  Distributed tables, on the other hand, are hash distributed across multiple nodes.  This horizontal partitioning breaks the table up into 8 partitions per compute node.  Thus, on a PDW appliance with eight compute nodes, a distributed table will have 64 physical distributions.  Each of these distributions (essentially a table in and of itself) have dedicated CPU and disk that is the essence of Massively Parallel Processing in PDW.  To swag some numbers, if you have a 1.6 TB fact table that you distribute across an eight node data rack, you would have 64 individual 25 GB distributions with dedicated CPU and disk space.  This is how the appliance can break down a large table into manageable sizes to find the data needed to respond to queries.  I’ll speak to this in more detail in the future.

If your data set is too large to store on a single data rack, you can add another.  By adding an additional data rack, not only expand your storage but you also significantly increase your processing power and the data will be distributed across additional distributions.  The current target size of an appliance is up to forty nodes, which would be either 4-5 data racks, depending on the manufacturer.  Larger appliance sizes are expected in the future.