日本天天操,欧美一级欧美三级在线观看,最新中文字幕第一页

本文介紹了Spark SQL 和 MySQL- SaveMode.Overwrite 不插入修改的數據的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

我在 MySQL 中有一個 test 表，其 ID 和名稱如下:

+----+-------+|身份證 |姓名 |+----+-------+|1 |姓名1 |+----+-------+|2 |姓名2 |+----+-------+|3 |姓名3 |+----+-------+

我正在使用 Spark DataFrame 讀取此數據(使用 JDBC)并像這樣修改數據

Datasetmodified = sparkSession.sql("select id, concat(name,' - new') as name from test");modified.write().mode("overwrite").jdbc(AppProperties.MYSQL_CONNECTION_URL,測試"，連接屬性)；

但我的問題是，如果我提供覆蓋模式，它會刪除以前的表并創建一個新表但不插入任何數據.

我通過從 csv 文件(與測試表相同的數據)讀取并覆蓋來嘗試相同的程序.那對我有用.

我在這里遺漏了什么嗎?

謝謝！

解決方案

問題出在您的代碼中.因為你覆蓋了一個你試圖從中讀取的表，所以在 Spark 可以實際訪問它之前，你有效地清除了所有數據.

記住 Spark 是懶惰的.當您創建 Dataset 時，Spark 會獲取所需的元數據，但不會加載數據.所以沒有可以保留原始內容的魔法緩存.數據將在實際需要時加載.這是當您執行 write 操作并且當您開始寫入時沒有更多數據要獲取時.

你需要的是這樣的:

創建一個數據集.
應用所需的轉換并將數據寫入中間 MySQL 表.
TRUNCATE 原始輸入和 INSERT INTO ... SELECT 來自中間表或 DROP 原始表和 RENAME 中間表.

另一種但不太有利的方法是:

創建一個數據集.
應用所需的轉換并將數據寫入持久 Spark 表(df.write.saveAsTable(...) 或等效項)
TRUNCATE 原始輸入.
讀回數據并保存 (spark.table(...).write.jdbc(...))
刪除 Spark 表.

我們不能過分強調使用 Spark cache/persist 不是正確的方法.即使使用保守的 StorageLevel (MEMORY_AND_DISK_2/MEMORY_AND_DISK_SER_2) 緩存數據也可能丟失(節點故障)，導致無提示的正確性錯誤.>

I have a test table in MySQL with id and name like below:

+----+-------+
| id | name  |
+----+-------+
| 1  | Name1 |
+----+-------+
| 2  | Name2 |
+----+-------+
| 3  | Name3 |
+----+-------+

I am using Spark DataFrame to read this data (using JDBC) and modifying the data like this

Dataset<Row> modified = sparkSession.sql("select id, concat(name,' - new') as name from test");
modified.write().mode("overwrite").jdbc(AppProperties.MYSQL_CONNECTION_URL,
                "test", connectionProperties);

But my problem is, if I give overwrite mode, it drops the previous table and creates a new table but not inserting any data.

I tried the same program by reading from a csv file (same data as test table) and overwriting. That worked for me.

Am I missing something here ?

Thank You!

解決方案

The problem is in your code. Because you overwrite a table from which you're trying to read you effectively obliterate all data before Spark can actually access it.

Remember that Spark is lazy. When you create a Dataset Spark fetches required metadata, but doesn't load the data. So there is no magic cache which will preserve original content. Data will be loaded when it is actually required. Here it is when you execute write action and when you start writing there is no more data to be fetched.

What you need is something like this:

Create a Dataset.
Apply required transformations and write data to an intermediate MySQL table.
TRUNCATE the original input and INSERT INTO ... SELECT from the intermediate table or DROP the original table and RENAME intermediate table.

Alternative, but less favorable approach, would be:

Create a Dataset.
Apply required transformations and write data to a persistent Spark table (df.write.saveAsTable(...) or equivalent)
TRUNCATE the original input.
Read data back and save (spark.table(...).write.jdbc(...))
Drop Spark table.

We cannot stress enough that using Spark cache / persist is not the way to go. Even in with the conservative StorageLevel (MEMORY_AND_DISK_2 / MEMORY_AND_DISK_SER_2) cached data can be lost (node failures), leading to silent correctness errors.

這篇關于Spark SQL 和 MySQL- SaveMode.Overwrite 不插入修改的數據的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網！

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題，如果有圖片或者內容侵犯了您的權益，請聯系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

Spark SQL 和 MySQL- SaveMode.Overwrite 不插入修改的數據

問題描述

相關文檔推薦