Idiots Algorithm: How to Design the Parallel Programs?

Designing and developing parallel programs has characteristically been a very manual process. The programmer is typically responsible for both identifying and actually implementing parallelism. Very often, manually developing parallel codes is a time consuming, complex, error-prone and iterative process. For a number of years now, various tools have been available to assist the programmer with converting serial programs into parallel programs. The most common type of tool used to automatically parallelize a serial program is a parallelizing compiler or pre-processor.

1. Understand the Problem and the Program
The first step in developing parallel software is to first understand the problem that you wish to solve in parallel. If you are starting with a serial program, this necessitates understanding the existing code also. Before spending time in an attempt to develop a parallel solution for a problem, determine whether or not the problem is one that can actually be parallelized.

Lets Consider the following two problems.

Addition of two 10x10 matrix.
Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...).

The first problem is able to be solved in parallel. Each of the matrix element is independently added. Since the calculations are independent. The calculation of the resultant matrix is a parallelizable problem.

This is a non-parallelizable problem because the calculation of the Fibonacci sequence as shown would entail dependent calculations rather than independent ones. The calculation of the k + 2 value uses those of both k + 1 and k. These three terms cannot be calculated independently and therefore, not in parallel.

2. Partitioning

One of the first steps in designing a parallel program is to break the problem into discrete "chunks" of work that can be distributed to multiple tasks. This is known as decomposition or partitioning.

There are two basic ways to partition computational work among parallel tasks: domain decomposition and functional decomposition.

3. Communications

The need for communications between tasks depends upon your problem

You DON'T need communications

Some types of problems can be decomposed and executed in parallel with virtually no need for tasks to share data. For example, imagine an image processing operation where every pixel in a black and white image needs to have its color reversed. The image data can easily be distributed to multiple tasks that then act independently of each other to do their portion of the work.

These types of problems are often called embarrassingly parallel because they are so straight-forward. Very little inter-task communication is required.

You DO need communications

Most parallel applications are not quite so simple, and do require tasks to share data with each other. For example, a 3-D heat diffusion problem requires a task to know the temperatures calculated by the tasks that have neighboring data. Changes to neighboring data has a direct effect on that task's data.

4. Synchronization

Synchronization means sharing system resources by processes in a such a way that, Concurrent access to shared data is handled thereby minimizing the chance of inconsistent data. Maintaining data consistency demands mechanisms to ensure synchronized execution of cooperating processes.

5. Data Dependencies

A dependence exists between program statements when the order of statement execution affects the results of the program. A data dependence results from multiple use of the same location(s) in storage by different tasks. Dependencies are important to parallel programming because they are one of the primary inhibitors to parallelism.

6. Load Balancing

Load balancing refers to the practice of distributing work among tasks so that all tasks are kept busy all of the time. It can be considered a minimization of task idle time. Load balancing is important to parallel programs for performance reasons. For example, if all tasks are subject to a barrier synchronization point, the slowest task will determine the overall performance.

7. Granularity

In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. Periods of computation are typically separated from periods of communication by synchronization events.

Problems to be Solved by Parallel Programming

Computational fluid dynamics
Physics processing
Ray tracing
Data mining
Medical imaging
Control engineering software
Digital signal processing
Bioinformatics

List of Parallel Programming Frameworks

Apache Hadoop
Apache Spark
Apache Flink
CUDA
OpenCL
OpenHMPP
OpenMP for C, C++, and Fortran (shared memory and attached GPUs)

Sunday, 26 March 2017

How to Design the Parallel Programs?

List of Parallel Programming Frameworks

1 comment:

Translate