久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

<i id='y9BjT'><tr id='y9BjT'><dt id='y9BjT'><q id='y9BjT'><span id='y9BjT'><b id='y9BjT'><form id='y9BjT'><ins id='y9BjT'></ins><ul id='y9BjT'></ul><sub id='y9BjT'></sub></form><legend id='y9BjT'></legend><bdo id='y9BjT'><pre id='y9BjT'><center id='y9BjT'></center></pre></bdo></b><th id='y9BjT'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='y9BjT'><tfoot id='y9BjT'></tfoot><dl id='y9BjT'><fieldset id='y9BjT'></fieldset></dl></div>

<tfoot id='y9BjT'></tfoot>

    1. <legend id='y9BjT'><style id='y9BjT'><dir id='y9BjT'><q id='y9BjT'></q></dir></style></legend>

      • <bdo id='y9BjT'></bdo><ul id='y9BjT'></ul>
    2. <small id='y9BjT'></small><noframes id='y9BjT'>

        如何有效地使用窗口函數(shù)根據(jù) N 個(gè)先前值來(lái)決定

        How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函數(shù)根據(jù) N 個(gè)先前值來(lái)決定接下來(lái)的 N 個(gè)行)
        <legend id='WCXeD'><style id='WCXeD'><dir id='WCXeD'><q id='WCXeD'></q></dir></style></legend>

          <small id='WCXeD'></small><noframes id='WCXeD'>

            <bdo id='WCXeD'></bdo><ul id='WCXeD'></ul>

              <tbody id='WCXeD'></tbody>
            <i id='WCXeD'><tr id='WCXeD'><dt id='WCXeD'><q id='WCXeD'><span id='WCXeD'><b id='WCXeD'><form id='WCXeD'><ins id='WCXeD'></ins><ul id='WCXeD'></ul><sub id='WCXeD'></sub></form><legend id='WCXeD'></legend><bdo id='WCXeD'><pre id='WCXeD'><center id='WCXeD'></center></pre></bdo></b><th id='WCXeD'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='WCXeD'><tfoot id='WCXeD'></tfoot><dl id='WCXeD'><fieldset id='WCXeD'></fieldset></dl></div>

            • <tfoot id='WCXeD'></tfoot>
                  本文介紹了如何有效地使用窗口函數(shù)根據(jù) N 個(gè)先前值來(lái)決定接下來(lái)的 N 個(gè)行的處理方法,對(duì)大家解決問(wèn)題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)吧!

                  問(wèn)題描述

                  我有以下數(shù)據(jù).

                  +----------+----+-------+-----------------------+
                  |      date|item|avg_val|conditions             |
                  +----------+----+-------+-----------------------+
                  |01-10-2020|   x|     10|                      0|
                  |02-10-2020|   x|     10|                      0|
                  |03-10-2020|   x|     15|                      1|
                  |04-10-2020|   x|     15|                      1|
                  |05-10-2020|   x|      5|                      0|
                  |06-10-2020|   x|     13|                      1|
                  |07-10-2020|   x|     10|                      1|
                  |08-10-2020|   x|     10|                      0|
                  |09-10-2020|   x|     15|                      1|
                  |01-10-2020|   y|     10|                      0|
                  |02-10-2020|   y|     18|                      0|
                  |03-10-2020|   y|      6|                      1|
                  |04-10-2020|   y|     10|                      0|
                  |05-10-2020|   y|     20|                      0|
                  +----------+----+-------+-----------------------+
                  

                  我正在嘗試基于

                  1. 如果標(biāo)志值為 0,則新列值將為 0.
                  2. 如果標(biāo)志為 1,則新列將為 1,接下來(lái)的四個(gè) N 行數(shù)將為零,即無(wú)需檢查下一個(gè) N 值.此過(guò)程將應(yīng)用于每個(gè)項(xiàng)目,即按項(xiàng)目分區(qū)將起作用.

                  我在這里使用了 N = 4,

                  I have used here N = 4,

                  我已經(jīng)使用了下面的代碼,但沒(méi)有有效的窗口函數(shù)是否有任何優(yōu)化的方法.

                  I have used the below code but not effienntly windowing function is there any optimized way.

                  DROP TEMPORARY TABLE t2;
                  CREATE TEMPORARY TABLE t2
                  SELECT *,
                  MAX(conditions) OVER (PARTITION BY item ORDER BY item,`date` ROWS 4 PRECEDING ) AS new_row
                  FROM record
                  ORDER BY item,`date`;
                  
                   
                  
                   DROP TEMPORARY TABLE t3;
                  CREATE TEMPORARY TABLE t3
                  SELECT *,ROW_NUMBER() OVER (PARTITION BY item,new_row ORDER BY item,`date`) AS e FROM t2;
                  
                   
                  
                  
                  SELECT *,CASE WHEN new_row=1 AND e%5>1 THEN 0 
                  WHEN new_row=1 AND e%5=1 THEN 1 ELSE 0 END AS flag FROM t3;
                  

                  輸出類似于

                  +----------+----+-------+-----------------------+-----+
                  |      date|item|avg_val|conditions             |flag |
                  +----------+----+-------+-----------------------+-----+
                  |01-10-2020|   x|     10|                      0|    0|
                  |02-10-2020|   x|     10|                      0|    0|
                  |03-10-2020|   x|     15|                      1|    1|
                  |04-10-2020|   x|     15|                      1|    0|
                  |05-10-2020|   x|      5|                      0|    0|
                  |06-10-2020|   x|     13|                      1|    0|
                  |07-10-2020|   x|     10|                      1|    0|
                  |08-10-2020|   x|     10|                      0|    0|
                  |09-10-2020|   x|     15|                      1|    1|
                  |01-10-2020|   y|     10|                      0|    0|
                  |02-10-2020|   y|     18|                      0|    0|
                  |03-10-2020|   y|      6|                      1|    1|
                  |04-10-2020|   y|     10|                      0|    0|
                  |05-10-2020|   y|     20|                      0|    0|
                  +----------+----+-------+-----------------------+-----+
                  

                  但是我無(wú)法獲得輸出,我嘗試了更多.

                  But i am unable to get the ouput , i have tried more.

                  推薦答案

                  正如評(píng)論中所建議的(@nbk 和 @Akina),您將需要某種迭代器來(lái)實(shí)現(xiàn)邏輯.對(duì)于 SparkSQL 和 Spark 2.4+ 版,我們可以使用內(nèi)置函數(shù) aggregate 并設(shè)置一個(gè)結(jié)構(gòu)數(shù)組和一個(gè)計(jì)數(shù)器作為累加器.下面是一個(gè)名為 record 的示例數(shù)據(jù)框和表(假設(shè) conditions 列中的值為 01):

                  As suggested in the comments(by @nbk and @Akina), you will need some sort of iterator to implement the logic. With SparkSQL and Spark version 2.4+, we can use the builtin function aggregate and set an array of structs plus a counter as the accumulator. Below is an example dataframe and table named record(assume values in conditions column are either 0 or 1):

                  val df = Seq(
                      ("01-10-2020", "x", 10, 0), ("02-10-2020", "x", 10, 0), ("03-10-2020", "x", 15, 1),
                      ("04-10-2020", "x", 15, 1), ("05-10-2020", "x", 5, 0), ("06-10-2020", "x", 13, 1),
                      ("07-10-2020", "x", 10, 1), ("08-10-2020", "x", 10, 0), ("09-10-2020", "x", 15, 1),
                      ("01-10-2020", "y", 10, 0), ("02-10-2020", "y", 18, 0), ("03-10-2020", "y", 6, 1),
                      ("04-10-2020", "y", 10, 0), ("05-10-2020", "y", 20, 0)
                  ).toDF("date", "item", "avg_val", "conditions")
                  
                  df.createOrReplaceTempView("record")
                  

                  SQL:

                  spark.sql("""
                    SELECT t1.item, m.*
                    FROM (
                      SELECT item,
                        sort_array(collect_list(struct(date,avg_val,int(conditions) as conditions,conditions as flag))) as dta
                      FROM record
                      GROUP BY item
                    ) as t1 LATERAL VIEW OUTER inline(
                      aggregate(
                        /* expr: set up array `dta` from the 2nd element to the last 
                         *       notice that indices for slice function is 1-based, dta[i] is 0-based
                         */
                        slice(dta,2,size(dta)),
                        /* start: set up and initialize `acc` to a struct containing two fields:
                         * - dta: an array of structs with a single element dta[0]
                         * - counter: number of rows after flag=1, can be from `0` to `N+1`
                         */
                        (array(dta[0]) as dta, dta[0].conditions as counter),
                        /* merge: iterate through the `expr` using x and update two fields of `acc`
                         * - dta: append values from x to acc.dta array using concat + array functions
                         *        update flag using `IF(acc.counter IN (0,5) and x.conditions = 1, 1, 0)`
                         * - counter: increment by 1 if acc.counter is between 1 and 4
                         *            , otherwise set value to x.conditions
                         */
                        (acc, x) -> named_struct(
                            'dta', concat(acc.dta, array(named_struct(
                                'date', x.date,
                                'avg_val', x.avg_val,
                                'conditions', x.conditions,
                                'flag', IF(acc.counter IN (0,5) and x.conditions = 1, 1, 0)
                              ))),
                            'counter', IF(acc.counter > 0 and acc.counter < 5, acc.counter+1, x.conditions)
                          ),
                        /* finish: retrieve acc.dta only and discard acc.counter */
                        acc -> acc.dta
                      )
                    ) m
                  """).show(50)
                  

                  結(jié)果:

                  +----+----------+-------+----------+----+
                  |item|      date|avg_val|conditions|flag|
                  +----+----------+-------+----------+----+
                  |   x|01-10-2020|     10|         0|   0|
                  |   x|02-10-2020|     10|         0|   0|
                  |   x|03-10-2020|     15|         1|   1|
                  |   x|04-10-2020|     15|         1|   0|
                  |   x|05-10-2020|      5|         0|   0|
                  |   x|06-10-2020|     13|         1|   0|
                  |   x|07-10-2020|     10|         1|   0|
                  |   x|08-10-2020|     10|         0|   0|
                  |   x|09-10-2020|     15|         1|   1|
                  |   y|01-10-2020|     10|         0|   0|
                  |   y|02-10-2020|     18|         0|   0|
                  |   y|03-10-2020|      6|         1|   1|
                  |   y|04-10-2020|     10|         0|   0|
                  |   y|05-10-2020|     20|         0|   0|
                  +----+----------+-------+----------+----+
                  

                  地點(diǎn):

                  1. 使用 groupby 將同一項(xiàng)目的行收集到名為 dta 列的結(jié)構(gòu)數(shù)組中,該列具有 4 個(gè)字段:dateavg_valconditionsflag 并按 date
                  2. 排序
                  3. 使用aggregate函數(shù)遍歷上述結(jié)構(gòu)體數(shù)組,根據(jù)counterconditions更新flag字段strong>(詳情見上面SQL代碼注釋)
                  4. 使用 Lateral VIEW 和 inline 函數(shù)分解來(lái)自聚合函數(shù)的結(jié)果結(jié)構(gòu)數(shù)組
                  1. use groupby to collect rows for the same item into an array of structs named dta column with 4 fields: date, avg_val, conditions and flag and sorted by date
                  2. use aggregate function to iterate through the above array of structs, update the flag field based on counter and conditions (details see the above SQL code comments)
                  3. use Lateral VIEW and inline function to explode the resulting array of structs from the aggregate function

                  注意事項(xiàng):

                  (1) 建議的 SQL 適用于 N=4,其中我們有 acc.counter IN (0,5)acc.counter <;5 在 SQL 中.對(duì)于任何 N,將上述調(diào)整為:acc.counter IN (0,N+1)acc.counter <;N+1,下圖為N=2 相同樣本數(shù)據(jù)的結(jié)果:

                  (1) the proposed SQL is for N=4, where we have acc.counter IN (0,5) and acc.counter < 5 in the SQL. For any N, adjust the above to: acc.counter IN (0,N+1) and acc.counter < N+1, the below shows the result for N=2 with the same sample data:

                  +----+----------+-------+----------+----+
                  |item|      date|avg_val|conditions|flag|
                  +----+----------+-------+----------+----+
                  |   x|01-10-2020|     10|         0|   0|
                  |   x|02-10-2020|     10|         0|   0|
                  |   x|03-10-2020|     15|         1|   1|
                  |   x|04-10-2020|     15|         1|   0|
                  |   x|05-10-2020|      5|         0|   0|
                  |   x|06-10-2020|     13|         1|   1|
                  |   x|07-10-2020|     10|         1|   0|
                  |   x|08-10-2020|     10|         0|   0|
                  |   x|09-10-2020|     15|         1|   1|
                  |   y|01-10-2020|     10|         0|   0|
                  |   y|02-10-2020|     18|         0|   0|
                  |   y|03-10-2020|      6|         1|   1|
                  |   y|04-10-2020|     10|         0|   0|
                  |   y|05-10-2020|     20|         0|   0|
                  +----+----------+-------+----------+----+
                  

                  (2) 我們使用dta[0] 來(lái)初始化acc,它包括其字段的值和數(shù)據(jù)類型.理想情況下,我們應(yīng)該確保這些字段的數(shù)據(jù)類型正確,以便正確進(jìn)行所有計(jì)算.例如在計(jì)算 acc.counter 時(shí),如果 conditions 是 StringType,acc.counter+1 將返回一個(gè)帶有 DoubleType 值的 StringType

                  (2) we use dta[0] to initialize acc which includes both the values and datatypes of its fields. Ideally, we should make sure data types of these fields right so that all calculations are correctly conducted. for example when calculating acc.counter, if conditions is StringType, acc.counter+1 will return a StringType with a DoubleType value

                  spark.sql("select '2'+1").show()
                  +---------------------------------------+
                  |(CAST(2 AS DOUBLE) + CAST(1 AS DOUBLE))|
                  +---------------------------------------+
                  |                                    3.0|
                  +---------------------------------------+
                  

                  當(dāng)使用 acc.counter IN (0,5)acc.counter 將其值與整數(shù)進(jìn)行比較時(shí),可能會(huì)產(chǎn)生浮點(diǎn)錯(cuò)誤.5.根據(jù) OP 的反饋,這產(chǎn)生了錯(cuò)誤的結(jié)果,沒(méi)有任何警告/錯(cuò)誤消息.

                  Which could yield floating-point errors when comparing their value with integers using acc.counter IN (0,5) or acc.counter < 5. Based on OP's feedback, this produced incorrect result without any WARNING/ERROR message.

                  • 一種解決方法是在設(shè)置聚合函數(shù)的第二個(gè)參數(shù)時(shí)使用 CAST 指定確切的字段類型,以便在任何類型不匹配時(shí)報(bào)告錯(cuò)誤,見下文:

                  • One workaround is to specify exact field types using CAST when setting up the 2nd argument of aggregate function so it reports ERROR when any types mismatch, see below:

                  CAST((array(dta[0]), dta[0].conditions) as struct<dta:array<struct<date:string,avg_val:string,conditions:int,flag:int>>,counter:int>),
                  

                1. 另一種在創(chuàng)建 dta 列時(shí)強(qiáng)制類型的解決方案,在此示例中,請(qǐng)參閱以下代碼中的 int(conditions) as conditions:

                2. Another solution it to force types when creating dta column, in this example, see int(conditions) as conditions in below code:

                  SELECT item,
                    sort_array(collect_list(struct(date,avg_val,int(conditions) as conditions,conditions as flag))) as dta
                  FROM record
                  GROUP BY item
                  

                3. 我們也可以在計(jì)算中強(qiáng)制使用數(shù)據(jù)類型,例如,參見下面的int(acc.counter+1):

                  IF(acc.counter > 0 and acc.counter < 5, int(acc.counter+1), x.conditions)      
                  

                4. 這篇關(guān)于如何有效地使用窗口函數(shù)根據(jù) N 個(gè)先前值來(lái)決定接下來(lái)的 N 個(gè)行的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

                  【網(wǎng)站聲明】本站部分內(nèi)容來(lái)源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問(wèn)題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

                  相關(guān)文檔推薦

                  reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用選擇表達(dá)式的結(jié)果;條款?)
                  Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函數(shù)的 ignore 選項(xiàng)是忽略整個(gè)事務(wù)還是只是有問(wèn)題的行?) - IT屋-程序員軟件開發(fā)技
                  Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 時(shí)出錯(cuò),使用 for 循環(huán)數(shù)組)
                  pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 調(diào)用 o23.load 時(shí)發(fā)生錯(cuò)誤 沒(méi)有合適的驅(qū)動(dòng)程序)
                  How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何將 Apache Spark 與 MySQL 集成以將數(shù)據(jù)庫(kù)表作為 Spark 數(shù)據(jù)幀讀取?)
                  In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?(在 Apache Spark 2.0.0 中,是否可以從外部數(shù)據(jù)庫(kù)獲取查詢(而不是獲取整個(gè)表)?) - IT屋-程序員軟件開
                5. <i id='JePBC'><tr id='JePBC'><dt id='JePBC'><q id='JePBC'><span id='JePBC'><b id='JePBC'><form id='JePBC'><ins id='JePBC'></ins><ul id='JePBC'></ul><sub id='JePBC'></sub></form><legend id='JePBC'></legend><bdo id='JePBC'><pre id='JePBC'><center id='JePBC'></center></pre></bdo></b><th id='JePBC'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='JePBC'><tfoot id='JePBC'></tfoot><dl id='JePBC'><fieldset id='JePBC'></fieldset></dl></div>
                  • <tfoot id='JePBC'></tfoot>

                      <legend id='JePBC'><style id='JePBC'><dir id='JePBC'><q id='JePBC'></q></dir></style></legend>

                        <small id='JePBC'></small><noframes id='JePBC'>

                        • <bdo id='JePBC'></bdo><ul id='JePBC'></ul>
                            <tbody id='JePBC'></tbody>

                            主站蜘蛛池模板: 国产精品久久久久久久久久久免费看 | 欧美a v在线 | 国产精品毛片无码 | 国产成人精品一区二区三区四区 | 99免费在线观看 | 亚洲精品永久免费 | 99精品视频免费观看 | 国产日韩一区二区三免费高清 | 亚洲 日本 欧美 中文幕 | 久久小视频| 国产精品日日夜夜 | 91福利在线观看视频 | 欧美日韩精品专区 | 最新中文字幕在线 | 久久综合狠狠综合久久综合88 | 亚洲欧美一区二区三区在线 | 日韩一级电影免费观看 | 国产成人亚洲精品 | 国产欧美精品 | 91精品国产91久久久久游泳池 | 91五月天| 免费在线黄色av | 狼人伊人影院 | 精品国产一二三区 | 特一级毛片 | 黄免费观看视频 | 日韩视频中文字幕 | 国产午夜精品一区二区三区嫩草 | 国产一二区视频 | 91久久久久久| 精品粉嫩超白一线天av | 国产免费一区二区 | 亚洲成人免费视频 | www.久久| 无码一区二区三区视频 | 国产一区二区三区久久久久久久久 | 日韩国产一区二区三区 | 精品国产乱码久久久久久闺蜜 | 欧美日韩在线国产 | 日本超碰| 午夜激情在线视频 |