問題描述
日期是 12/02/10.圣誕節前的日子一去不復返了,作為一個 Windows 程序員,我幾乎遇到了一個主要的障礙.我一直在使用 AQTime,我嘗試過昏昏欲睡、閃亮和非?;杌栌?,正如我們所說,VTune 正在安裝.我曾嘗試使用 VS2008 分析器,但它一直是積極的懲罰,而且常常是不明智的.我使用了隨機暫停技術.我檢查了調用樹.我已經關閉了函數跟蹤.但令人悲傷的事實是,我正在使用的應用程序有超過一百萬行代碼,其中可能還有價值一百萬行的第三方應用程序.
The date is 12/02/10. The days before Christmas are dripping away and I've pretty much hit a major road block as a windows programmer. I've been using AQTime, I've tried sleepy, shiny, and very sleepy, and as we speak, VTune is installing. I've tried to use the VS2008 profiler, and it's been positively punishing as well as often insensible. I've used the random pause technique. I've examined call-trees. I've fired off function traces. But the sad painful fact of the matter is that the app I'm working with is over a million lines of code, with probably another million lines worth of third-party apps.
我需要更好的工具.我已經閱讀了其他主題.我已經嘗試了每個主題中列出的每個分析器.必須有比這些垃圾和昂貴的選擇更好的東西,或者幾乎沒有收益的荒謬的工作量.更復雜的是,我們的代碼是大量線程的,并運行了許多 Qt 事件循環,其中一些非常脆弱,由于時間延遲,它們在大量檢測下崩潰.不要問我為什么要運行多個事件循環.沒有人能告訴我.
I need better tools. I've read the other topics. I've tried out each profiler listed in each topic. There simply has to be something better than these junky and expensive options, or ludicrous amounts of work for almost no gain. To further complicate matters, our code is heavily threaded, and runs a number of Qt Event loops, some of which are so fragile that they crash under heavy instrumentation due to timing delays. Don't ask me why we're running multiple event loops. No one can tell me.
在 Windows 環境中是否有更多類似 Valgrind 的選項?
有什么比我已經嘗試過的大量破損工具更好的了嗎?
是否有任何旨在與 Qt 集成的東西,也許可以有用地顯示隊列中的事件?
Are there any options more along the lines of Valgrind in a windows environment?
Is there anything better than the long swath of broken tools I've already tried?
Is there anything designed to integrate with Qt, perhaps with a useful display of events in queue?
我嘗試過的工具的完整列表,其中真正有用的用斜體表示:
A full list of the tools I tried, with the ones that were really useful in italics:
- AQTime:相當不錯!深度遞歸有一些問題,但調用圖在這些情況下是正確的,可用于消除您可能遇到的任何混淆.不是一個完美的工具,但值得一試.它可能適合您的需求,而且在大多數情況下對我來說肯定已經足夠了.
- 調試模式下的隨機暫停攻擊:時間不夠信息.
一個很好的工具,但不是一個完整的解決方案. - Parallel Studios: 核選項.突兀,怪異,而且瘋狂的強大.我認為您應該進行 30 天評估,并確定它是否合適.這也太酷了.
- AMD Codeanalyst: 很棒,易于使用,非常容易崩潰,但我認為這是環境問題.我建議您嘗試一下,因為它是免費的.
- Luke Stackwalker: 在小型項目上運行良好,但在我們的項目上運行有點困難.不過也有一些不錯的結果,它絕對可以代替 Sleepy 來處理我的個人任務.
- PurifyPlus:不支持 Win-x64 環境,最突出的是 Windows 7.其他方面都很棒.我在其他部門的許多同事都對它發誓.
- VS2008 Profiler:在功能跟蹤模式下以所需分辨率生成 100+gigs 范圍內的輸出.從好的方面來說,會產生可靠的結果.
- GProf:要求 GCC 甚至適度有效.
- VTune:VTune 的 W7 支持近乎犯罪.否則很棒
- PIN:我需要破解我自己的工具,所以這是最后的手段.
- SleepyVerySleepy:對于較小的應用程序很有用,但在這里讓我失望.
- EasyProfiler:如果您不介意手動注入一些代碼來指示要檢測的位置,那還不錯.
- Valgrind:僅適用于 *nix,但在那種環境下非常好.
- OProfile:僅限 Linux.
- 普羅菲:他們射野馬.
- AQTime: Rather good! Has some trouble with deep recursion, but the call graph is correct in these cases, and can be used to clear up any confusion you might have. Not a perfect tool, but worth trying out. It might suit your needs, and it certainly was good enough for me most of the time.
- Random Pause attack in debug mode: Not enough information enough of the time.
A good tool but not a complete solution. - Parallel Studios: The nuclear option. Obtrusive, weird, and crazily powerful. I think you should hit up the 30 day evaluation, and figure out if it's a good fit. It's just darn cool, too.
- AMD Codeanalyst: Wonderful, easy to use, very crash-prone, but I think that's an environment thing. I'd recommend trying it, as it is free.
- Luke Stackwalker: Works fine on small projects, it's a bit trying to get it working on ours. Some good results though, and it definitely replaces Sleepy for my personal tasks.
- PurifyPlus: No support for Win-x64 environments, most prominently Windows 7. Otherwise excellent. A number of my colleagues in other departments swear by it.
- VS2008 Profiler: Produces output in the 100+gigs range in function trace mode at the required resolution. On the plus side, produces solid results.
- GProf: Requires GCC to be even moderately effective.
- VTune: VTune's W7 support borders on criminal. Otherwise excellent
- PIN: I'd need to hack up my own tool, so this is sort of a last resort.
- SleepyVerySleepy: Useful for smaller apps, but failing me here.
- EasyProfiler: Not bad if you don't mind a bit of manually injected code to indicate where to instrument.
- Valgrind: *nix only, but very good when you're in that environment.
- OProfile: Linux only.
- Proffy: They shoot wild horses.
我沒有嘗試過的推薦工具:
Suggested tools that I haven't tried:
- XPerf:
- 發光代碼:
- 開發伙伴:
注意事項:目前的英特爾環境.VS2008,增強庫.Qt 4+.以及他們所有的可悲的 humdinger:通過 trolltech 的 Qt/MFC 集成.
現在:差不多兩周后,我的問題似乎已經解決了.多虧了各種各樣的工具,包括列表中的幾乎所有東西和我的一些個人技巧,我們找到了主要的瓶頸.但是,我將繼續測試、探索和嘗試新的分析器和新技術.為什么?因為我欠你們,因為你們搖滾.它確實稍微減慢了時間線,但我仍然很高興繼續嘗試新工具.
Notes: Intel environment at the moment. VS2008, boost libraries. Qt 4+. And the wretched humdinger of them all: Qt/MFC integration via trolltech.
Now: Almost two weeks later, it looks like my issue is resolved. Thanks to a variety of tools, including almost everything on the list and a couple of my personal tricks, we found the primary bottlenecks. However, I'm going to keep testing, exploring, and trying out new profilers as well as new tech. Why? Because I owe it to you guys, because you guys rock. It does slow the timeline down a little, but I'm still very excited to keep trying out new tools.
概要
在許多其他問題中,一些組件最近被切換到不正確的線程模型,由于我們下面的代碼突然不再是多線程的,導致嚴重的掛斷.我不能說更多,因為它違反了我的 NDA,但我可以告訴你,通過隨意檢查甚至正常的代碼審查都不會發現這種情況.如果沒有分析器、調用圖和隨機暫停,我們仍然會對天空中美麗的藍色弧線大喊大叫.謝天謝地,我與一些我見過的最優秀的黑客一起工作,我可以接觸到一首令人驚嘆的詩篇",里面充滿了偉大的工具和偉大的人.
Synopsis
Among many other problems, a number of components had recently been switched to the incorrect threading model, causing serious hang-ups due to the fact that the code underneath us was suddenly no longer multithreaded. I can't say more because it violates my NDA, but I can tell you that this would never have been found by casual inspection or even by normal code review. Without profilers, callgraphs, and random pausing in conjunction, we'd still be screaming our fury at the beautiful blue arc of the sky. Thankfully, I work with some of the best hackers I've ever met, and I have access to an amazing 'verse full of great tools and great people.
紳士們,我非常感謝這一點,唯一遺憾的是我沒有足夠的代表來獎勵你們每個人.我仍然認為這是一個重要的問題,比我們迄今為止在 SO 上得到的答案更好.
Gentlefolk, I appreciate this tremendously, and only regret that I don't have enough rep to reward each of you with a bounty. I still think this is an important question to get a better answer to than the ones we've got so far on SO.
因此,在接下來的三周內,我每周都會提供我能負擔得起的最大賞金,并使用我認為不是常識的最好工具將其獎勵給答案.三周后,如果你能原諒我的雙關語,我們有望積累一份明確的剖析師簡介.
As a result, each week for the next three weeks, I'll be putting up the biggest bounty I can afford, and awarding it to the answer with the nicest tool that I think isn't common knowledge. After three weeks, we'll hopefully have accumulated a definitive profile of the profilers, if you'll pardon my punning.
外賣
使用分析器.它們對 Ritchie、Kernighan、Bentley 和 Knuth 來說已經足夠好了.我不在乎你認為你是誰.使用分析器.如果你得到的一個不起作用,再找一個.如果找不到,請編碼一.如果你不能編碼,或者是一個小掛斷,或者你只是卡住了,使用隨機暫停.如果一切都失敗了,請聘請一些研究生來制作分析器.
Take-away
Use a profiler. They're good enough for Ritchie, Kernighan, Bentley, and Knuth. I don't care who you think you are. Use a profiler. If the one you've got doesn't work, find another. If you can't find one, code one. If you can't code one, or it's a small hang up, or you're just stuck, use random pausing. If all else fails, hire some grad students to bang out a profiler.
遠景
所以,我認為寫一點回顧可能會很好.我選擇與 Parallel Studios 廣泛合作,部分原因是它實際上是建立在 PIN 工具之上的.與一些參與的研究人員進行過學術交流后,我覺得這可能是某種品質的標志.謝天謝地,我是對的.雖然 GUI 有點可怕,但我發現 IPS 非常有用,盡管我不能輕松地向所有人推薦它.至關重要的是,沒有明顯的方法來獲得行級命中計數,這是 AQT 和許多其他分析器提供的,我發現除其他外,對于檢查分支選擇率非常有用.在網絡中,我也很喜歡使用 AQTime,而且我發現他們的支持非常敏感.同樣,我必須證明我的建議:他們的許多功能都不能很好地工作,其中一些在 Win7x64 上非常容易崩潰.XPerf 的表現也令人欽佩,但在某些類型的應用程序上獲得良好讀取所需的采樣細節方面卻慢得令人痛苦.
A Longer View
So, I thought it might be nice to write up a bit of a retrospective. I opted to work extensively with Parallel Studios, in part because it is actually built on top of the PIN Tool. Having had academic dealings with some of the researchers involved, I felt that this was probably a mark of some quality. Thankfully, I was right. While the GUI is a bit dreadful, I found IPS to be incredibly useful, though I can't comfortably recommend it for everyone. Critically, there's no obvious way to get line-level hit counts, something that AQT and a number of other profilers provide, and I've found very useful for examining rate of branch-selection among other things. In net, I've enjoyed using AQTime as well, and I've found their support to be really responsive. Again, I have to qualify my recommendation: A lot of their features don't work that well, and some of them are downright crash-prone on Win7x64. XPerf also performed admirably, but is agonizingly slow for the sampling detail required to get good reads on certain kinds of applications.
現在,我不得不說,我認為在 W7x64 環境中分析 C++ 代碼沒有明確的選項,但肯定有一些選項根本無法執行任何有用的服務.
Right now, I'd have to say that I don't think there's a definitive option for profiling C++ code in a W7x64 environment, but there are certainly options that simply fail to perform any useful service.
推薦答案
第一:
時間采樣分析器比 CPU 采樣分析器更強大.我對 Windows 開發工具不是很熟悉,所以我不能說哪些是哪些.大多數分析器都是 CPU 采樣.
First:
Time sampling profilers are more robust than CPU sampling profilers. I'm not extremely familiar with Windows development tools so I can't say which ones are which. Most profilers are CPU sampling.
CPU 采樣分析器每 N 條指令抓取一個堆棧跟蹤.
此技術將揭示您的代碼中受 CPU 限制的部分.如果這是您應用程序中的瓶頸,那就太棒了.如果您的應用程序線程大部分時間都在為互斥鎖而戰,那就不太好.
A CPU sampling profiler grabs a stack trace every N instructions.
This technique will reveal portions of your code that are CPU bound. Which is awesome if that is the bottle neck in your application. Not so great if your application threads spend most of their time fighting over a mutex.
時間采樣分析器每 N 微秒抓取一次堆棧跟蹤.
這種技術將在慢" 代碼中歸零.原因是否是 CPU 綁定、阻塞 IO 綁定、互斥綁定或代碼的緩存抖動部分.簡而言之,讓您的應用程序變慢的任何代碼段都將脫穎而出.
A time sampling profiler grabs a stack trace every N microseconds.
This technique will zero in on "slow" code. Whether the cause is CPU bound, blocking IO bound, mutex bound, or cache thrashing sections of code. In short what ever piece of code is slowing your application will standout.
因此,如果可能,請使用時間采樣分析器,尤其是在分析線程代碼時.
So use a time sampling profiler if at all possible especially when profiling threaded code.
采樣分析器生成大量數據.數據非常有用,但往往太多而無法輕松使用.個人資料數據可視化工具在這里非常有用.我發現的用于個人資料數據可視化的最佳工具是 gprof2dot.不要被這個名字騙了,它處理各種采樣分析器輸出(AQtime、Sleepy、XPerf 等).一旦可視化指出了有問題的函數,請跳回原始配置文件數據,以獲得有關真正原因的更好提示.
Sampling profilers generate gobs of data. The data is extremely useful, but there is often too much to be easily useful. A profile data visualizer helps tremendously here. The best tool I've found for profile data visualization is gprof2dot. Don't let the name fool you, it handles all kinds of sampling profiler output (AQtime, Sleepy, XPerf, etc). Once the visualization has pointed out the offending function(s), jump back to the raw profile data to get better hints on what the real cause is.
gprof2dot 工具會生成一個點圖描述,然后您將其輸入到graphviz 工具.輸出基本上是一個調用圖,其中的函數根據它們對應用程序的影響進行顏色編碼.
The gprof2dot tool generates a dot graph description that you then feed into a graphviz tool. The output is basically a callgraph with functions color coded by their impact on the application.
讓 gprof2dot 生成良好輸出的一些提示.
A few hints to get gprof2dot to generate nice output.
- 我在圖表上使用了 0.001 的
--skew
,這樣我就可以很容易地看到熱代碼路徑.否則,int main()
將主導圖形. - 如果您對 C++ 模板做任何瘋狂的事情,您可能想要添加
--strip
.Boost 尤其如此. - 我使用 OProfile 生成我的采樣數據.為了獲得良好的輸出,我需要將其配置為從我的 3rd 方和系統庫加載調試符號.一定要這樣做,否則你會看到 CRT 占用了 20% 的應用程序時間,而真正發生的事情是
malloc
正在破壞堆并占用 15% 的時間.
- I use a
--skew
of 0.001 on my graphs so I can easily see the hot code paths. Otherwise theint main()
dominates the graph. - If you're doing anything crazy with C++ templates you'll probably want to add
--strip
. This is especially true with Boost. - I use OProfile to generate my sampling data. To get good output I need configure it to load the debug symbols from my 3rd party and system libraries. Be sure to do the same, otherwise you'll see that CRT is taking 20% of your application's time when what's really going on is
malloc
is trashing the heap and eating up 15%.
這篇關于超越堆棧采樣:C++ 分析器的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!