問題描述
我在編譯時(shí)編寫了斐波那契數(shù)計(jì)算程序(constexpr)使用 C++11 支持的模板元編程技術(shù)的問題.目的這是為了計(jì)算模板元編程方法與舊的傳統(tǒng)方法之間的運(yùn)行時(shí)間差異.
I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this is to calculate the difference in the run-time between the template metaprogramming approach and the old conventional approach.
// Template Metaprograming Approach
template<int N>
constexpr int fibonacci() {return fibonacci<N-1>() + fibonacci<N-2>(); }
template<>
constexpr int fibonacci<1>() { return 1; }
template<>
constexpr int fibonacci<0>() { return 0; }
// Conventional Approach
int fibonacci(int N) {
if ( N == 0 ) return 0;
else if ( N == 1 ) return 1;
else
return (fibonacci(N-1) + fibonacci(N-2));
}
我在我的 GNU/Linux 系統(tǒng)上為 N = 40 運(yùn)行了兩個(gè)程序,并測(cè)量了時(shí)間和發(fā)現(xiàn)傳統(tǒng)解決方案(1.15 秒)比基于模板的解決方案(0.55 秒)慢大約兩倍.這是一項(xiàng)重大改進(jìn),因?yàn)檫@兩種方法都基于遞歸.
I ran both programs for N = 40 on my GNU/Linux system and measured the time and found that that conventional solution (1.15 second) is around two times slower than the template-based solution (0.55 second). This is a significant improvement as both approaches are based on the recursion.
為了更深入地理解它,我在 g++ 中編譯了程序(-fdump-tree-all 標(biāo)志),發(fā)現(xiàn)編譯器實(shí)際上生成了 40 個(gè)不同的函數(shù)(如 fibonacci<40>、fibonacci<39>...斐波那契<0>).
To understand it more I compiled the program (-fdump-tree-all flag) in g++ and found that compiler actually generated the 40 different functions (like fibonacci<40>, fibonacci<39>...fibonacci<0>).
constexpr int fibonacci() [with int N = 40] () {
int D.29948, D.29949, D.29950;
D.29949 = fibonacci<39> ();
D.29950 = fibonacci<38> ();
D.29948 = D.29949 + D.29950;
return D.29948;
}
constexpr int fibonacci() [with int N = 39] () {
int D.29952, D.29953, D.29954;
D.29953 = fibonacci<38> ();
D.29954 = fibonacci<37> ();
D.29952 = D.29953 + D.29954;
return D.29952;
}
...
...
...
constexpr int fibonacci() [with int N = 0] () {
int D.29962;
D.29962 = 0;
return D.29962;
}
我也在GDB中調(diào)試了程序,發(fā)現(xiàn)上面的功能都是執(zhí)行次數(shù)與傳統(tǒng)遞歸方法相同.如果程序的兩個(gè)版本都以相同的次數(shù)(遞歸)執(zhí)行該函數(shù),那么模板元編程技術(shù)如何實(shí)現(xiàn)這一點(diǎn)?我還想知道您對(duì)基于模板元編程的方法與其他版本相比如何以及為什么要花費(fèi)一半時(shí)間的看法?這個(gè)程序能不能比現(xiàn)在的程序更快?
I also debugged the program in GDB and found that all the above functions are executed an equal number of times as with the conventional recursive approach. If both versions of the program are executing the function an equal number of times (recursive), then how is this achieved by template metaprogramming techniques? I would also like to know your opinion about how and why a template metaprogramming based approach is taking half time compared to the other version? Can this program be made faster than the current one?
基本上,我的目的是盡可能多地了解內(nèi)部發(fā)生的事情.
Basically my intention here is to understand what's going on internally as much as possible.
我的機(jī)器是帶有 GCC 4.8.1 的 GNU/Linux,我對(duì)兩個(gè)程序都使用了優(yōu)化 -o3
.
My machine is GNU/Linux with GCC 4.8.1, and I used the optimization -o3
for both programs.
推薦答案
試試這個(gè):
template<size_t N>
struct fibonacci : integral_constant<size_t, fibonacci<N-1>{} + fibonacci<N-2>{}> {};
template<> struct fibonacci<1> : integral_constant<size_t,1> {};
template<> struct fibonacci<0> : integral_constant<size_t,0> {};
使用 clang 和 -Os
,編譯時(shí)間大約為 0.5 秒,零 時(shí)間運(yùn)行 N=40
.您的傳統(tǒng)"方法大約在 0.4 秒內(nèi)編譯并在 0.8 秒內(nèi)運(yùn)行.只是為了檢查,結(jié)果是102334155
對(duì)嗎?
With clang and -Os
, this compiles in roughly 0.5s and runs in zero time for N=40
. Your "conventional" approach compiles in roughly 0.4s and runs in 0.8s. Just for checking, the result is 102334155
right?
當(dāng)我嘗試您自己的 constexpr
解決方案時(shí),編譯器運(yùn)行了幾分鐘,然后我停止了它,因?yàn)轱@然內(nèi)存已滿(計(jì)算機(jī)開始凍結(jié)).編譯器正在嘗試計(jì)算最終結(jié)果,而您的實(shí)現(xiàn)在編譯時(shí)使用效率極低.
When I tried your own constexpr
solution the compiler run for a couple of minutes and then I stopped it because apparently memory was full (computer started freezing). The compiler was trying to compute the final result and your implementation is extremely inefficient to be used at compile time.
使用此解決方案,在實(shí)例化 N
時(shí)會(huì)重復(fù)使用 N-2
、N-1
處的模板實(shí)例化.所以 fibonacci<40>
實(shí)際上在編譯時(shí)作為一個(gè)值被知道,在運(yùn)行時(shí)沒有什么可做的.這是一種動(dòng)態(tài)編程方法,當(dāng)然,如果您在 N<計(jì)算之前將所有值存儲(chǔ)在
0
到 N-1
處,當(dāng)然您可以在運(yùn)行時(shí)執(zhí)行相同的操作/代碼>.
With this solution, template instantiations at N-2
, N-1
are re-used when instantiating N
. So fibonacci<40>
is actually known at compile time as a value, and there is nothing to do at run-time. This is a dynamic programming approach and of course you can do the same at run time if you store all values at 0
through N-1
before computing at N
.
使用您的解決方案,編譯器可以在編譯時(shí)評(píng)估fibonacci
,但不需要.在您的情況下,所有或部分計(jì)算都留給運(yùn)行時(shí).就我而言,所有計(jì)算都是在編譯時(shí)嘗試的,因此永無止境.
With your solution, the compiler can evaluate fibonacci<N>()
at compile time but is not required to. In your case, all or part of computation is left for run time. In my case, all computation is attempted at compile time, hence never ending.
這篇關(guān)于在 C++11 中計(jì)算編譯時(shí)(constexpr)中的斐波那契數(shù)(遞歸方法)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!