久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

是否有任何 JVM 的 JIT 編譯器生成使用矢量化浮點(diǎn)

Do any JVM#39;s JIT compilers generate code that uses vectorized floating point instructions?(是否有任何 JVM 的 JIT 編譯器生成使用矢量化浮點(diǎn)指令的代碼?)
本文介紹了是否有任何 JVM 的 JIT 編譯器生成使用矢量化浮點(diǎn)指令的代碼?的處理方法,對大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

假設(shè)我的 Java 程序的瓶頸確實(shí)是一些緊密循環(huán)來計(jì)算一堆矢量點(diǎn)積.是的,我已經(jīng)分析過了,是的,它是瓶頸,是的,它很重要,是的,算法就是這樣,是的,我已經(jīng)運(yùn)行 Proguard 來優(yōu)化字節(jié)碼,等等.

Let's say the bottleneck of my Java program really is some tight loops to compute a bunch of vector dot products. Yes I've profiled, yes it's the bottleneck, yes it's significant, yes that's just how the algorithm is, yes I've run Proguard to optimize the byte code, etc.

這項(xiàng)工作本質(zhì)上是點(diǎn)積.如,我有兩個(gè) float[50] ,我需要計(jì)算成對產(chǎn)品的總和.我知道處理器指令集的存在是為了快速批量執(zhí)行此類操作,例如 SSE 或 MMX.

The work is, essentially, dot products. As in, I have two float[50] and I need to compute the sum of pairwise products. I know processor instruction sets exist to perform these kind of operations quickly and in bulk, like SSE or MMX.

是的,我可以通過在 JNI 中編寫一些本機(jī)代碼來訪問這些.事實(shí)證明,JNI 調(diào)用非常昂貴.

Yes I can probably access these by writing some native code in JNI. The JNI call turns out to be pretty expensive.

我知道你不能保證 JIT 會(huì)編譯什么,什么不編譯.有沒有人曾經(jīng)聽說過使用這些指令的 JIT 生成代碼?如果是這樣,Java 代碼有什么東西可以幫助它以這種方式編譯嗎?

I know you can't guarantee what a JIT will compile or not compile. Has anyone ever heard of a JIT generating code that uses these instructions? and if so, is there anything about the Java code that helps make it compilable this way?

可能是不";值得一問.

Probably a "no"; worth asking.

推薦答案

所以,基本上,你希望你的代碼運(yùn)行得更快.JNI 就是答案.我知道你說它對你不起作用,但讓我告訴你你錯(cuò)了.

So, basically, you want your code to run faster. JNI is the answer. I know you said it didn't work for you, but let me show you that you are wrong.

這里是 Dot.java:

import java.nio.FloatBuffer;
import org.bytedeco.javacpp.*;
import org.bytedeco.javacpp.annotation.*;

@Platform(include = "Dot.h", compiler = "fastfpu")
public class Dot {
    static { Loader.load(); }

    static float[] a = new float[50], b = new float[50];
    static float dot() {
        float sum = 0;
        for (int i = 0; i < 50; i++) {
            sum += a[i]*b[i];
        }
        return sum;
    }
    static native @MemberGetter FloatPointer ac();
    static native @MemberGetter FloatPointer bc();
    static native @NoException float dotc();

    public static void main(String[] args) {
        FloatBuffer ab = ac().capacity(50).asBuffer();
        FloatBuffer bb = bc().capacity(50).asBuffer();

        for (int i = 0; i < 10000000; i++) {
            a[i%50] = b[i%50] = dot();
            float sum = dotc();
            ab.put(i%50, sum);
            bb.put(i%50, sum);
        }
        long t1 = System.nanoTime();
        for (int i = 0; i < 10000000; i++) {
            a[i%50] = b[i%50] = dot();
        }
        long t2 = System.nanoTime();
        for (int i = 0; i < 10000000; i++) {
            float sum = dotc();
            ab.put(i%50, sum);
            bb.put(i%50, sum);
        }
        long t3 = System.nanoTime();
        System.out.println("dot(): " + (t2 - t1)/10000000 + " ns");
        System.out.println("dotc(): "  + (t3 - t2)/10000000 + " ns");
    }
}

Dot.h:

float ac[50], bc[50];

inline float dotc() {
    float sum = 0;
    for (int i = 0; i < 50; i++) {
        sum += ac[i]*bc[i];
    }
    return sum;
}

我們可以通過 JavaCPP 使用這個(gè)命令來編譯和運(yùn)行它:

We can compile and run that with JavaCPP using this command:

$ java -jar javacpp.jar Dot.java -exec

使用 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz、Fedora 30、GCC 9.1.1 和 OpenJDK 8 或 11,我得到這樣的輸出:

With an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, Fedora 30, GCC 9.1.1, and OpenJDK 8 or 11, I get this kind of output:

dot(): 39 ns
dotc(): 16 ns

或大約快 2.4 倍.我們需要使用直接 NIO 緩沖區(qū)而不是數(shù)組,但是 HotSpot 可以像訪問數(shù)組一樣快地訪問直接 NIO 緩沖區(qū).另一方面,在這種情況下,手動(dòng)展開循環(huán)并不能顯著提升性能.

Or roughly 2.4 times faster. We need to use direct NIO buffers instead of arrays, but HotSpot can access direct NIO buffers as fast as arrays. On the other hand, manually unrolling the loop does not provide a measurable boost in performance, in this case.

這篇關(guān)于是否有任何 JVM 的 JIT 編譯器生成使用矢量化浮點(diǎn)指令的代碼?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

How to wrap text around components in a JTextPane?(如何在 JTextPane 中的組件周圍環(huán)繞文本?)
MyBatis, how to get the auto generated key of an insert? [MySql](MyBatis,如何獲取插入的自動(dòng)生成密鑰?[MySql])
Inserting to Oracle Nested Table in Java(在 Java 中插入 Oracle 嵌套表)
Java: How to insert CLOB into oracle database(Java:如何將 CLOB 插入 oracle 數(shù)據(jù)庫)
Why does Spring-data-jdbc not save my Car object?(為什么 Spring-data-jdbc 不保存我的 Car 對象?)
Use threading to process file chunk by chunk(使用線程逐塊處理文件)
主站蜘蛛池模板: www.日韩 | 精久久久 | 国产精品不卡 | 中文在线一区二区 | 精品视频在线观看 | 99精品国产一区二区青青牛奶 | 免费观看的黄色网址 | 男女网站在线观看 | 天天澡天天操 | 视频一区二区中文字幕日韩 | 成人黄色av网址 | 国产精品1区| 日本福利一区 | 精品视频久久久久久 | 精品视频在线观看 | 日韩h| 亚洲欧美中文日韩在线v日本 | 乳色吐息在线观看 | 亚洲av毛片成人精品 | 亚洲成人在线免费 | 久久一起草 | 成人午夜免费视频 | 午夜久久久久久久久久一区二区 | av免费网站在线观看 | 黄色大片在线 | 国产视频中文字幕 | 欧美一级欧美三级在线观看 | 欧美激情综合网 | 91影院| 一级做a爰片久久毛片免费看 | 欧美成人激情 | 国产精品一区二区av | 国产精品久久久久久婷婷天堂 | 欧美精品久久久 | 蜜桃视频一区二区三区 | 国产免费让你躁在线视频 | 亚洲三级在线观看 | 97精品超碰一区二区三区 | 日本成人中文字幕 | 国产一区二区三区久久 | 久久久久成人精品 |