問題描述
我有一個(gè) Java 程序,它接收一個(gè)包含文本文件列表的文本文件,并分別處理每一行.為了加快處理速度,我使用帶有 24 個(gè)線程的 FixedThreadPool 的 ExecutorService 線程.該機(jī)器有 24 個(gè)內(nèi)核和 48GB 內(nèi)存.
I have a Java program that takes in a text file containing a list of text files and processes each line separately. To speed up the processing, I make use of threads using an ExecutorService with a FixedThreadPool with 24 threads. The machine has 24 cores and 48GB of RAM.
我正在處理的文本文件有 250 萬行.我發(fā)現(xiàn)對(duì)于前 230 萬行左右的內(nèi)容,在 CPU 利用率很高的情況下運(yùn)行得非常好.然而,超過某個(gè)點(diǎn)(大約 2.3 行),性能下降,只使用一個(gè) CPU,我的程序幾乎停止運(yùn)行.
The text file that I'm processing has 2.5 million lines. I find that for the first 2.3 million lines or so things run very well with high CPU utilization. However, beyond some point (at around the 2.3 lines), the performance degenerates with only a single CPU being utilized and my program pretty much grinding to a halt.
我調(diào)查了許多原因,確保我的所有文件句柄都已關(guān)閉,并增加了提供給 JVM 的內(nèi)存量.但是,無論我改變什么,性能總是會(huì)在最后下降.我什至嘗試過包含更少行的文本文件,但在處理文件結(jié)束時(shí)性能再次下降.
I've investigated a number of causes, made sure all my file handles are closed, and increased the amount of memory supplied to the JVM. However, regardless of what I change, performance always degrades towards the end. I've even tried on text files containing fewer lines and once again performance decreases towards the end of processing the file.
除了標(biāo)準(zhǔn)的 Java 并發(fā)庫之外,代碼還利用 Lucene 庫進(jìn)行文本處理和分析.
In addition to the standard Java concurrency libraries, the code also makes use of Lucene libraries for text processing and analysis.
當(dāng)我不線程化這段代碼時(shí),性能是恒定的,并且不會(huì)在最后退化.我知道這是在黑暗中拍攝,很難描述發(fā)生了什么,但我想我想看看是否有人對(duì)最終導(dǎo)致性能退化的原因有任何想法.
When I don't thread this code, the performance is constant and doesn't degenerate towards the end. I know this is a shot in the dark and it's hard to describe what is going on, but I thought I would just see if anyone has any ideas as to what might be causing this degeneration in performance towards the end.
編輯
在收到評(píng)論后,我在此處粘貼了堆棧跟蹤.如您所見,似乎沒有任何線程正在阻塞.此外,在進(jìn)行分析時(shí),當(dāng)事情變慢時(shí),GC 并沒有達(dá)到 100%.事實(shí)上,大部分時(shí)間 CPU 和 GC 利用率都為 0%,CPU 偶爾會(huì)飆升以處理一些文件,然后再次停止.
After the comments I've received, I've pasted a stack trace here. As you can see, it doesn't appear as if any of the threads are blocking. Also, when profiling, the GC was not at 100% when things slowed down. In fact, both CPU and GC utilization were at 0% most of the time, with the CPU spiking occasionally to process a few files and then stopping again.
執(zhí)行線程的代碼
BufferedReader read = new BufferedReader(new FileReader(inputFile));
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
String line;
while ((line = read.readLine()) != null) { //index each line
Runnable worker = new CharikarHashThreader(line, bits, minTokens);
executor.execute(worker);
}
read.close();
推薦答案
這聽起來很像垃圾收集/內(nèi)存問題.
This sounds alot like a Garbage Collection / Memory Issue.
當(dāng)垃圾收集運(yùn)行時(shí),它會(huì)暫停所有線程,以便 GC 線程可以進(jìn)行這是可收集的垃圾"分析,而不會(huì)對(duì)其進(jìn)行任何更改.當(dāng) GC 運(yùn)行時(shí),您會(huì)看到正好 1 個(gè)線程處于 100%,而其他線程將停留在 0%.
When the Garbage Collection runs it pauses all threads so that the GC thread can do its "is this collectable garbage" analysis without things changing on it. While the GC is running you'll see exactly 1 thread at 100%, the other threads will be stuck at 0%.
我會(huì)考慮添加一些 Runtime.freeMemory() 調(diào)用(或使用分析器)來查看在 GC 期間是否發(fā)生停止".
I would consider adding a few Runtime.freeMemory() calls (or using a profiler) to see if the "grind to a halt" occurs during GC.
我還嘗試僅在文件的前 10k 行上運(yùn)行您的程序,看看是否可行.
I'd also trying running your program on just the first 10k lines of your file to see if that works.
我還想看看你的程序在應(yīng)該使用 StringBuilders 時(shí)是否構(gòu)建了太多的中間字符串.
I'd also look to see if your program is building too many intermediate Strings when it should be using StringBuilders.
在我看來,您需要分析您的內(nèi)存使用情況.
It sounds to me like you need to profile your memory usage.
這篇關(guān)于Java 線程在處理結(jié)束時(shí)變慢的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!