亚洲成人免费视频在线观看,久久国产传媒,91天天综合

本文介紹了為什么ubuntu 12.04下的OpenMP比串口版慢的處理方法，對大家解決問題具有一定的參考價(jià)值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我已經(jīng)閱讀了有關(guān)此主題的其他一些問題.然而，他們無論如何都沒有解決我的問題.

I've read some other questions on this topic. However, they didn't solve my problem anyway.

我寫的代碼如下，我得到的 pthread 版本和 omp 版本都比串行版本慢.我很困惑.

I wrote the code as following and I got pthread version and omp version both slower than the serial version. I'm very confused.

環(huán)境下編譯:

Ubuntu 12.04 64bit 3.2.0-60-generic
g++ (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1

CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Vendor ID:             AuthenticAMD
CPU family:            18
Model:                 1
Stepping:              0
CPU MHz:               800.000
BogoMIPS:              3593.36
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
NUMA node0 CPU(s):     0,1

編譯命令:

g++ -std=c++11 ./eg001.cpp -fopenmp

#include <cmath>
#include <cstdio>
#include <ctime>
#include <omp.h>
#include <pthread.h>

#define NUM_THREADS 5
const int sizen = 256000000;

struct Data {
    double * pSinTable;
    long tid;
};

void * compute(void * p) {
    Data * pDt = (Data *)p;
    const int start = sizen * pDt->tid/NUM_THREADS;
    const int end = sizen * (pDt->tid + 1)/NUM_THREADS;
    for(int n = start; n < end; ++n) {
        pDt->pSinTable[n] = std::sin(2 * M_PI * n / sizen);
    }
    pthread_exit(nullptr);
}

int main()
{
    double * sinTable = new double[sizen];
    pthread_t threads[NUM_THREADS];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    clock_t start, finish;

    start = clock();
    int rc;
    Data dt[NUM_THREADS];
    for(int i = 0; i < NUM_THREADS; ++i) {
        dt[i].pSinTable = sinTable;
        dt[i].tid = i;
        rc = pthread_create(&threads[i], &attr, compute, &dt[i]);
    }//for
    pthread_attr_destroy(&attr);
    for(int i = 0; i < NUM_THREADS; ++i) {
        rc = pthread_join(threads[i], nullptr);
    }//for
    finish = clock();
    printf("from pthread: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);

    delete sinTable;
    sinTable = new double[sizen];

    start = clock();
#   pragma omp parallel for
    for(int n = 0; n < sizen; ++n)
        sinTable[n] = std::sin(2 * M_PI * n / sizen);
    finish = clock();
    printf("from omp: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);

    delete sinTable;
    sinTable = new double[sizen];

    start = clock();
    for(int n = 0; n < sizen; ++n)
        sinTable[n] = std::sin(2 * M_PI * n / sizen);
    finish = clock();
    printf("from serial: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);

    delete sinTable;

    pthread_exit(nullptr);
    return 0;
}

輸出:

from pthread: 21.150000
from omp: 20.940000
from serial: 20.800000

不知道是不是我代碼的問題，所以我用pthread來做同樣的事情.

I wonder whether it was my code's problem so I used pthread to do the same thing.

然而，我完全錯(cuò)了，我想知道這是否可能是 Ubuntu 在 OpenMP/pthread 上的問題.

However, I'm totally wrong, and I wonder whether it might be Ubuntu's problem on OpenMP/pthread.

我有一個(gè)朋友也有 AMD CPU 和 Ubuntu 12.04，在那里遇到了同樣的問題，所以我可能有理由相信問題不僅限于我.

I have a friend who has AMD CPU and Ubuntu 12.04 as well, and got the same problem there, so I might have some reason to believe that the problem is not limited to only me.

如果有人和我有同樣的問題，或者對這個(gè)問題有一些線索，提前致謝.

If anyone has the same problem as me, or has some clue on the problem, thanks in advance.

如果代碼不夠好，我運(yùn)行了一個(gè)基準(zhǔn)測試并將結(jié)果粘貼在這里:

If the code is not good enough, I ran a benchmark and I pasted the result here:

http://pastebin.com/RquLPREc

基準(zhǔn)網(wǎng)址:http://www.cs.kent.edu/~farrell/mc08/lectures/progs/openmp/microBenchmarks/src/download.html

新信息:

我使用 VS2012 在 windows(沒有 pthread 版本)上運(yùn)行代碼.

I ran the code on windows (without pthread version) with VS2012.

我使用了 sizen 的 1/10，因?yàn)?windows 不允許我分配大內(nèi)存主干的結(jié)果:

I used 1/10 of sizen because windows does not allow me to allocate that great trunk of memory where the results are:

from omp: 1.004
from serial: 1.420
from FreeNickName: 735 (this one is the suggestion improvement by @FreeNickName)

這是否表明它可能是 Ubuntu OS 的問題??

Does this indicate that it could be a problem of Ubuntu OS ??

問題通過使用在操作系統(tǒng)之間可移植的omp_get_wtime 函數(shù)解決.請參閱 Hristo Iliev 的答案.

Problem is solved by using omp_get_wtime function that is portable among Operating Systems. See the answer by Hristo Iliev.

FreeNickName 對這個(gè)有爭議的話題進(jìn)行了一些測試.

Some tests about the controversial topic by FreeNickName.

(抱歉，我需要在 Ubuntu 上測試它，因?yàn)?Windows 是我的朋友之一.)

(Sorry I need to test it on Ubuntu cause the windows was one of my friends'.)

--1-- 從 delete 更改為 delete [] : (但沒有 memset)(-std=c++11 -fopenmp)

--1-- Change from delete to delete [] : (but without memset)(-std=c++11 -fopenmp)

from pthread: 13.491405
from omp: 13.023099
from serial: 20.665132
from FreeNickName: 12.022501

--2-- 在 new 之后立即使用 memset:(-std=c++11 -fopenmp)

--2-- With memset immediately after new: (-std=c++11 -fopenmp)

from pthread: 13.996505
from omp: 13.192444
from serial: 19.882127
from FreeNickName: 12.541723

--3-- 在 new 之后立即使用 memset:(-std=c++11 -fopenmp -march=native -O2)

--3-- With memset immediately after new: (-std=c++11 -fopenmp -march=native -O2)

from pthread: 11.886978
from omp: 11.351801
from serial: 17.002865
from FreeNickName: 11.198779

--4-- 在 new 之后立即使用 memset，并將 FreeNickName 的版本放在 OMP 之前用于版本:(-std=c++11 -fopenmp -march=native -O2)

--4-- With memset immediately after new, and put FreeNickName's version before OMP for version: (-std=c++11 -fopenmp -march=native -O2)

from pthread: 11.831127
from FreeNickName: 11.571595
from omp: 11.932814
from serial: 16.976979

--5-- 在 new 之后立即使用 memset，并將 FreeNickName 的版本放在 OMP for version 之前，并將 NUM_THREADS 設(shè)置為 5 而不是 2(我是雙核).

--5-- With memset immediately after new, and put FreeNickName's version before OMP for version, and set NUM_THREADS to 5 instead of 2 (I'm dual core).

from pthread: 9.451775
from FreeNickName: 9.385366
from omp: 11.854656
from serial: 16.960101

推薦答案

在您的情況下，OpenMP 沒有任何問題.問題在于您測量經(jīng)過的時(shí)間的方式.

There is nothing wrong with OpenMP in your case. What is wrong is the way you measure the elapsed time.

使用 clock() 測量 Linux(和大多數(shù)其他類 Unix 操作系統(tǒng))上多線程應(yīng)用程序的性能是一個(gè)錯(cuò)誤，因?yàn)樗环祷貟扃?實(shí)時(shí))時(shí)間，而是返回所有進(jìn)程線程的累積 CPU 時(shí)間(在某些 Unix 風(fēng)格上甚至是所有子進(jìn)程的累積 CPU 時(shí)間).您的并行代碼在 Windows 上顯示出更好的性能，因?yàn)?clock() 返回的是實(shí)時(shí)時(shí)間，而不是累積的 CPU 時(shí)間.

Using clock() to measure the performance of multithreaded applications on Linux (and most other Unix-like OSes) is a mistake since it does not return the wall-clock (real) time but instead the accumulated CPU time for all process threads (and on some Unix flavours even the accumulated CPU time for all child processes). Your parallel code shows better performance on Windows since there clock() returns the real time and not the accumulated CPU time.

防止此類差異的最佳方法是使用可移植的 OpenMP 計(jì)時(shí)器例程 omp_get_wtime():

The best way to prevent such discrepancies is to use the portable OpenMP timer routine omp_get_wtime():

double start = omp_get_wtime();
#pragma omp parallel for
for(int n = 0; n < sizen; ++n)
    sinTable[n] = std::sin(2 * M_PI * n / sizen);
double finish = omp_get_wtime();
printf("from omp: %lf
", finish - start);

對于非 OpenMP 應(yīng)用程序，您應(yīng)該使用 clock_gettime() 和 CLOCK_REALTIME 時(shí)鐘:

For non-OpenMP applications, you should use clock_gettime() with the CLOCK_REALTIME clock:

struct timespec start, finish;
clock_gettime(CLOCK_REALTIME, &start);
#pragma omp parallel for
for(int n = 0; n < sizen; ++n)
    sinTable[n] = std::sin(2 * M_PI * n / sizen);
clock_gettime(CLOCK_REALTIME, &finish);
printf("from omp: %lf
", (finish.tv_sec + 1.e-9 * finish.tv_nsec) -
                          (start.tv_sec + 1.e-9 * start.tv_nsec));

這篇關(guān)于為什么ubuntu 12.04下的OpenMP比串口版慢的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

為什么ubuntu 12.04下的OpenMP比串口版慢

問題描述

推薦答案

相關(guān)文檔推薦