久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

<small id='qDZrG'></small><noframes id='qDZrG'>

    <tfoot id='qDZrG'></tfoot>
    <legend id='qDZrG'><style id='qDZrG'><dir id='qDZrG'><q id='qDZrG'></q></dir></style></legend>

      <i id='qDZrG'><tr id='qDZrG'><dt id='qDZrG'><q id='qDZrG'><span id='qDZrG'><b id='qDZrG'><form id='qDZrG'><ins id='qDZrG'></ins><ul id='qDZrG'></ul><sub id='qDZrG'></sub></form><legend id='qDZrG'></legend><bdo id='qDZrG'><pre id='qDZrG'><center id='qDZrG'></center></pre></bdo></b><th id='qDZrG'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='qDZrG'><tfoot id='qDZrG'></tfoot><dl id='qDZrG'><fieldset id='qDZrG'></fieldset></dl></div>
      • <bdo id='qDZrG'></bdo><ul id='qDZrG'></ul>
      1. Spark SQL/Hive 查詢永遠(yuǎn)需要加入

        Spark SQL/Hive Query Takes Forever With Join(Spark SQL/Hive 查詢永遠(yuǎn)需要加入)

          <tfoot id='JwRW7'></tfoot>
          <i id='JwRW7'><tr id='JwRW7'><dt id='JwRW7'><q id='JwRW7'><span id='JwRW7'><b id='JwRW7'><form id='JwRW7'><ins id='JwRW7'></ins><ul id='JwRW7'></ul><sub id='JwRW7'></sub></form><legend id='JwRW7'></legend><bdo id='JwRW7'><pre id='JwRW7'><center id='JwRW7'></center></pre></bdo></b><th id='JwRW7'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='JwRW7'><tfoot id='JwRW7'></tfoot><dl id='JwRW7'><fieldset id='JwRW7'></fieldset></dl></div>
          • <bdo id='JwRW7'></bdo><ul id='JwRW7'></ul>

              <legend id='JwRW7'><style id='JwRW7'><dir id='JwRW7'><q id='JwRW7'></q></dir></style></legend>
                <tbody id='JwRW7'></tbody>
            1. <small id='JwRW7'></small><noframes id='JwRW7'>

                • 本文介紹了Spark SQL/Hive 查詢永遠(yuǎn)需要加入的處理方法,對大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

                  問題描述

                  所以我正在做一些應(yīng)該很簡單的事情,但顯然它不在 Spark SQL 中.

                  So I'm doing something that should be simple, but apparently it's not in Spark SQL.

                  如果我在 MySQL 中運(yùn)行以下查詢,查詢會在幾分之一秒內(nèi)完成:

                  If I run the following query in MySQL, the query finishes in a fraction of a second:

                  SELECT ua.address_id
                  FROM user u
                  inner join user_address ua on ua.address_id = u.user_address_id
                  WHERE u.user_id = 123;
                  

                  但是,在 Spark (1.5.1) 下的 HiveContext 中運(yùn)行相同的查詢需要超過 13 秒.添加更多連接會使查詢運(yùn)行很長時間(超過 10 分鐘).我不確定我在這里做錯了什么以及如何加快速度.

                  However, running the same query in HiveContext under Spark (1.5.1) takes more than 13 seconds. Adding more joins makes the query run for a very very long time (over 10 minutes). I'm not sure what I'm doing wrong here and how I can speed things up.

                  這些表是 MySQL 表,它們作為臨時表加載到 Hive 上下文中.它在單個實(shí)例中運(yùn)行,數(shù)據(jù)庫在遠(yuǎn)程機(jī)器上.

                  The tables are MySQL tables that are loaded into the Hive Context as temporary tables.This is running in a single instance, with the database on a remote machine.

                  • 用戶表大約有 480 萬行.
                  • user_address 表有 350,000 行.

                  表有外鍵字段,但在數(shù)據(jù)庫中沒有定義明確的 fk 關(guān)系.我正在使用 InnoDB.

                  The tables have foreign key fields, but no explicit fk relationships is defined in the db. I'm using InnoDB.

                  Spark 中的執(zhí)行計(jì)劃:

                  The execution plan in Spark:

                  計(jì)劃:

                  掃描JDBCRelation(jdbc:mysql://.user,[Lorg.apache.spark.Partition;@596f5dfc,{user=, password=, url=jdbc:mysql://, dbtable=user})[address_id#0L,user_address_id#27L]

                  Scan JDBCRelation(jdbc:mysql://.user,[Lorg.apache.spark.Partition;@596f5dfc, {user=, password=, url=jdbc:mysql://, dbtable=user}) [address_id#0L,user_address_id#27L]

                  過濾器 (user_id#0L = 123) 掃描JDBCRelation(jdbc:mysql://.user_address,[Lorg.apache.spark.Partition;@2ce558f3,{user=, password=,url=jdbc:mysql://, dbtable=user_address})[address_id#52L]

                  Filter (user_id#0L = 123) Scan JDBCRelation(jdbc:mysql://.user_address, [Lorg.apache.spark.Partition;@2ce558f3,{user=, password=, url=jdbc:mysql://, dbtable=user_address})[address_id#52L]

                  ConvertToUnsafe ConvertToUnsafe

                  ConvertToUnsafe ConvertToUnsafe

                  TungstenExchange hashpartitioning(address_id#52L) TungstenExchangehashpartitioning(user_address_id#27L) TungstenSort [address_id#52LASC], false, 0 TungstenSort [user_address_id#27L ASC], false, 0

                  TungstenExchange hashpartitioning(address_id#52L) TungstenExchange hashpartitioning(user_address_id#27L) TungstenSort [address_id#52L ASC], false, 0 TungstenSort [user_address_id#27L ASC], false, 0

                  SortMergeJoin [user_address_id#27L], [address_id#52L]

                  SortMergeJoin [user_address_id#27L], [address_id#52L]

                  == 物理計(jì)劃 == TungstenProject [address_id#0L]

                  == Physical Plan == TungstenProject [address_id#0L]

                  推薦答案

                  首先,您執(zhí)行的查詢類型極其低效.至于現(xiàn)在(Spark 1.5.0*)要執(zhí)行這樣的連接,每次執(zhí)行查詢時都必須對兩個表進(jìn)行混洗/散列分區(qū).對于 users 表,其中 user_id = 123 謂詞最有可能被下推,但仍然需要對 user_address.

                  First of all type of query you perform is extremely inefficient. As for now (Spark 1.5.0*) to perform join like this, both tables has to be shuffled / hash-partitioned each time query is executed. It shouldn't be a problem in case of users table where user_id = 123 predicate is most likely pushed-down but still requires full shuffle on user_address.

                  此外,如果表只注冊而不緩存,那么每次執(zhí)行此查詢都會從 MySQL 獲取整個 user_address 表到 Spark.

                  Moreover, if tables are only registered and not cached, then every execution of this query will fetch a whole user_address table from MySQL to Spark.

                  我不確定我在這里做錯了什么以及如何加快速度.

                  I'm not sure what I'm doing wrong here and how I can speed things up.

                  不清楚為什么要將 Spark 用于應(yīng)用程序,但單機(jī)設(shè)置、小數(shù)據(jù)和查詢類型表明 Spark 不適合這里.

                  It is not exactly clear why you want to use Spark for application but single machine setup, small data and type of queries suggest that Spark is not a good fit here.

                  一般來說,如果應(yīng)用程序邏輯需要單條記錄訪問,那么 Spark SQL 的性能就不會很好.它專為分析查詢而設(shè)計(jì),而不是作為 OLTP 數(shù)據(jù)庫的替代品.

                  Generally speaking if application logic requires a single record access then Spark SQL won't perform well. It is designed for analytical queries not as a OLTP database replacement.

                  如果單個表/數(shù)據(jù)框小得多,您可以嘗試廣播.

                  If a single table / data frame is much smaller you could try broadcasting.

                  import org.apache.spark.sql.DataFrame
                  import org.apache.spark.sql.functions.broadcast
                  
                  val user: DataFrame = ???
                  val user_address: DataFrame = ???
                  
                  val userFiltered = user.where(???)
                  
                  user_addresses.join(
                    broadcast(userFiltered), $"address_id" === $"user_address_id")
                  

                  <小時>

                  * 這應(yīng)該在 Spark 1.6.0 中改變,SPARK-11410應(yīng)該啟用持久表分區(qū).

                  這篇關(guān)于Spark SQL/Hive 查詢永遠(yuǎn)需要加入的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

                  【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

                  相關(guān)文檔推薦

                  How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函數(shù)根據(jù) N 個先前值來決定接下來的 N 個行)
                  reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用選擇表達(dá)式的結(jié)果;條款?)
                  Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函數(shù)的 ignore 選項(xiàng)是忽略整個事務(wù)還是只是有問題的行?) - IT屋-程序員軟件開發(fā)技
                  Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 時出錯,使用 for 循環(huán)數(shù)組)
                  pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 調(diào)用 o23.load 時發(fā)生錯誤 沒有合適的驅(qū)動程序)
                  How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何將 Apache Spark 與 MySQL 集成以將數(shù)據(jù)庫表作為 Spark 數(shù)據(jù)幀讀取?)
                    <tfoot id='RnxHT'></tfoot>
                    • <bdo id='RnxHT'></bdo><ul id='RnxHT'></ul>

                        <legend id='RnxHT'><style id='RnxHT'><dir id='RnxHT'><q id='RnxHT'></q></dir></style></legend>

                          <tbody id='RnxHT'></tbody>

                            <i id='RnxHT'><tr id='RnxHT'><dt id='RnxHT'><q id='RnxHT'><span id='RnxHT'><b id='RnxHT'><form id='RnxHT'><ins id='RnxHT'></ins><ul id='RnxHT'></ul><sub id='RnxHT'></sub></form><legend id='RnxHT'></legend><bdo id='RnxHT'><pre id='RnxHT'><center id='RnxHT'></center></pre></bdo></b><th id='RnxHT'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='RnxHT'><tfoot id='RnxHT'></tfoot><dl id='RnxHT'><fieldset id='RnxHT'></fieldset></dl></div>

                            <small id='RnxHT'></small><noframes id='RnxHT'>

                          • 主站蜘蛛池模板: 黄色大片av | 最新理论片 | 黄色片在线播放 | 国产视频成人 | 亚洲精品视频在线播放 | 欧美日韩中文字幕在线 | 五月激情久久 | 国产乱淫片视频 | 亚洲69| 91丨porny丨成人蝌蚪 | 国产精品一区在线播放 | 玖玖在线观看 | 日本www视频 | aaa国产精品 | 一级片在线 | 免费久久久 | 国产福利91精品一区二区三区 | 久久久一区二区三区 | 日韩在线欧美 | 美日韩丰满少妇在线观看 | 亚洲一区在线视频 | 91在线| 国产精品一区二区三区不卡 | 日日日操操操 | 久久99精品久久久久久 | 麻豆91在线 | 欧美日韩一区二区在线观看 | 五月婷婷色 | 国产精品伦一区二区三级视频 | 久久久久女人精品毛片九一 | 日韩高清一区二区 | 免费国产视频 | 国产精品手机在线 | 亚洲精品www | 成人在线视频网 | 伊人av网| www.日本黄色 | 国产一区福利 | 伊人国产在线 | 国产在线黄色 | 午夜激情影视 |