問題描述
我有兩個 CSV 文件:
I have two CSV files:
Identity(no,name,Age)
有 10 行Location(Address,no,City)
有 100 行
Identity(no,name,Age)
which has 10 rowsLocation(Address,no,City)
which has 100 rows
我需要提取行并使用 Location
CSV 文件檢查 Identity
中的 no
列.
I need to extract rows and check the no
column in the Identity
with Location
CSV files.
從 Identity
CSV 文件中獲取單行并檢查 Identity.no
和 Location.no
在 Location<中有 100 行/code> CSV 文件.
Get the single row from Identity
CSV file and check Identity.no
with Location.no
having 100 rows in Location
CSV file.
如果匹配則在Identity, Location
注意:我需要將 Identity
的第一行與 Location
CSV 文件中的 100 行進(jìn)行比較,然后將第二行與 100 行進(jìn)行比較.它將在 Identity
CSV 文件中繼續(xù)最多 10 行.
Note: I need to get 1st row from Identity
compare it with 100 rows in Location
CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity
CSV file.
并將整體結(jié)果轉(zhuǎn)換為 Json.然后將結(jié)果移入 SQL Server.
And overall results convert into Json.Then move the results in to SQL Server.
是否可以在 Apache Nifi 中使用?
感謝任何幫助.
推薦答案
您可以在 NiFi 中使用 DistributedMapCache 功能執(zhí)行此操作,該功能實現(xiàn)了用于查找的鍵/值存儲.該設(shè)置需要一個分布式地圖緩存,以及兩個流 - 一個用于使用您的地址記錄填充緩存,另一個用于通過 no
字段查找地址.
You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no
field.
DistributedMapCache 由兩個控制器服務(wù)定義,一個 DistributedMapCacheServer 和 DistributeMapCacheClientService.如果您的數(shù)據(jù)集很小,您可以使用localhost"作為服務(wù)器.
The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.
填充緩存需要讀取地址文件、拆分記錄、提取no
鍵,并將鍵/值對放入緩存.大致流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
Populating the cache requires reading the Address file, splitting the records, extracting the no
key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
查找您的身份記錄實際上與上面的流程非常相似,因為它需要讀取身份文件、拆分記錄、提取no
鍵,然后獲取地址記錄.處理器流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no
key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
您可以使用 AttributesToJSON 或 ExecuteScript 將整個或部分從 CSV 轉(zhuǎn)換為 JSON.
You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.
這篇關(guān)于如何將一個 CSV 中的一行與另一個 CSV 文件中的所有行進(jìn)行比較?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!