問題描述
我正在嘗試并行化以下程序,但不知道如何減少數(shù)組.我知道這是不可能的,但有沒有其他選擇?謝謝.(我在 m 上添加了reduce,這是錯誤的,但想就如何做到這一點(diǎn)提出建議.)
I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.)
#include <iostream>
#include <stdio.h>
#include <time.h>
#include <omp.h>
using namespace std;
int main ()
{
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10];
time_t start_time = time(NULL);
#pragma omp parallel for private(m) reduction(+:m)
for (int n=0 ; n<10 ; ++n ){
for (int m=0; m<=n; ++m){
S[n] += A[m];
}
}
time_t end_time = time(NULL);
cout << end_time-start_time;
return 0;
}
推薦答案
是的,可以使用 OpenMP 進(jìn)行數(shù)組縮減.在 Fortran 中,它甚至為此有構(gòu)造.在 C/C++ 中,你必須自己做.這里有兩種方法可以做到.
Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.
第一種方法為每個線程制作私有版本的S
,并行填充,然后在臨界區(qū)合并成S
(見下面的代碼).第二種方法創(chuàng)建一個維度為 10*nthreads 的數(shù)組.并行填充此數(shù)組,然后將其合并到 S
中,而不使用臨界區(qū).第二種方法要復(fù)雜得多,如果您不小心,可能會出現(xiàn)緩存問題,尤其是在多路系統(tǒng)上.有關(guān)更多詳細(xì)信息,請參閱此填充直方圖(數(shù)組縮減)與 OpenMP 并行,無需使用臨界區(qū)
The first method makes private version of S
for each thread, fill them in parallel, and then merges them into S
in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S
without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section
第一種方法
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
#pragma omp parallel
{
int S_private[10] = {0};
#pragma omp for
for (int n=0 ; n<10 ; ++n ) {
for (int m=0; m<=n; ++m){
S_private[n] += A[m];
}
}
#pragma omp critical
{
for(int n=0; n<10; ++n) {
S[n] += S_private[n];
}
}
}
第二種方法
int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
int *S_private;
#pragma omp parallel
{
const int nthreads = omp_get_num_threads();
const int ithread = omp_get_thread_num();
#pragma omp single
{
S_private = new int[10*nthreads];
for(int i=0; i<(10*nthreads); i++) S_private[i] = 0;
}
#pragma omp for
for (int n=0 ; n<10 ; ++n )
{
for (int m=0; m<=n; ++m){
S_private[ithread*10+n] += A[m];
}
}
#pragma omp for
for(int i=0; i<10; i++) {
for(int t=0; t<nthreads; t++) {
S[i] += S_private[10*t + i];
}
}
}
delete[] S_private;
這篇關(guān)于減少 OpenMP 中的數(shù)組的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!