問題描述
驗證 .txt 文件是否為:
What is the best way to validate whether a .txt file is:
實際上是一個 .txt 文件,而不是其他類型的文件,只是更改了擴展名.
In fact a .txt file and not another type of file with only the extension changed.
.txt文件的格式與指定的格式匹配(因此能夠正確解析,包含所有相關信息等)
The format of the .txt file matches the specified format (so it is able to be parsed correctly, contains all the relevant information, etc.)
這一切都是在 Java 中完成的,其中將檢索一個文件,然后需要檢查它以確保它是應該的.到目前為止,我只發現 JHOVE(現在是 JHOVE2)作為這項任務的工具,但在 Java 代碼中而不是通過命令行實現它的文檔方式中沒有找到太多.感謝您的幫助.
This is all being done in Java, where a file will be retrieved and then needs to be checked to make sure it is what it is supposed to be. So far I have only found JHOVE (and now JHOVE2) as tools for this task but have not found much in the way of documentation for implementing it within Java code as opposed to through the command line. Thanks for your help.
推薦答案
聽起來您正在尋找一種通用的格式化選項,我可以向您推薦正則表達式嗎?您可以使用正則表達式進行各種不同類型的匹配.我在下面寫了一個簡單的例子[對于所有那些正則表達式專家,如果我沒有使用完美的表達,請憐憫我;)].您可以將 REGEX 和 MAX_LINES_TO_READ 常量放入屬性文件并對其進行修改以使其更加通用.
As it sounds like you're looking for a general sort of formatting option, could I recommend regular expressions to you? You can do all sorts of different kinds of matching using regex. I've written a simple example below [for all those regex experts out there, have mercy on me if I didn't use the perfect expression ;) ]. You could put the REGEX and MAX_LINES_TO_READ constants into a properties file and modify that to make it even more generalized.
您基本上會測試您的.txt"文件的最大行數(但是需要很多行才能確定格式是否良好 - 您也可以將正則表達式用于標題行或根據需要執行多個不同的正則表達式測試格式),如果所有這些行都匹配,文件將被標記為有效".
You would basically test your ".txt" file for a maximum number of lines (however many lines are needed to establish the formatting is good - you could also use regular expressions for a header line or do multiple different regular expressions as needed to test the formatting) and if all those lines matched, the file would be flagged as "valid".
這只是您可能運行的示例.您應該實施適當的異常處理,而不僅僅是為一個捕獲異常".
This is just an example for you to possibly run with. You should implement proper exception handling other than just catching "Exception" for one.
要在 Java 中測試您的正則表達式,http://www.regexplanet.com/simple/index.html 效果很好.
For testing your regular expressions in Java, http://www.regexplanet.com/simple/index.html works very nice.
這里是ValidateTxtFile"源...
Here's the "ValidateTxtFile" source...
import java.io.*;
public class ValidateTxtFile {
private final int MAX_LINES_TO_READ = 5;
private final String REGEX = ".{15}[ ]{5}.{15}[ ]{5}[-]\d{2}\.\d{2}[ ]{9}\d{2}/\d{2}/\d{4}";
public void testFile(String fileName) {
int lineCounter = 1;
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = br.readLine();
while ((line != null) && (lineCounter <= MAX_LINES_TO_READ)) {
// Validate the line is formatted correctly based on regular expressions
if (line.matches(REGEX)) {
System.out.println("Line " + lineCounter + " formatted correctly");
}
else {
System.out.println("Invalid format on line " + lineCounter + " (" + line + ")");
}
line = br.readLine();
lineCounter++;
}
} catch (Exception ex) {
System.out.println("Exception occurred: " + ex.toString());
}
}
public static void main(String args[]) {
ValidateTxtFile vtf = new ValidateTxtFile();
vtf.testFile("transactions.txt");
}
}
這是transactions.txt"中的內容...
Here's what's in "transactions.txt"...
Electric Electric Co. -50.99 12/28/2011
Food Food Store -80.31 12/28/2011
Clothes Clothing Store -99.36 12/28/2011
Entertainment Bowling -30.4393 12/28/2011
Restaurant Mcdonalds -10.35 12/28/11
我運行應用程序時的輸出是...
The output when I ran the app was...
Line 1 formatted correctly
Line 2 formatted correctly
Line 3 formatted correctly
Invalid format on line 4 (Entertainment Bowling -30.4393 12/28/2011)
Invalid format on line 5 (Restaurant Mcdonalds -10.35 12/28/11)
編輯 2011 年 12 月 29 日上午 10:00 左右
不確定這是否存在性能問題,但僅供參考,我多次復制transactions.txt"中的條目以構建一個包含大約 130 萬行的文本文件,并且我能夠通過整個文件在我的電腦上大約 7 秒.我將 System.out 更改為僅在無效 (524,288) 和有效 (786,432) 格式條目的末尾顯示總計數.transactions.txt"的大小約為 85mb.
EDIT 12/29/2011 about 10:00am
Not sure if there is a performance concern on this or not, but just as an FYI I duplicated the entries in "transactions.txt" several times to build a text file with about 1.3 million rows in it and I was able to get through the whole file in about 7 seconds on my PC. I changed the System.out's to just show a grand total count at the end of invalid (524,288) and valid (786,432) formatted entries. "transactions.txt" was about 85mb in size.
這篇關于java中的txt文件格式驗證的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!