問題描述
如何在 C# 中編寫解析器(遞歸下降?)?現(xiàn)在我只想要一個簡單的解析器來解析算術(shù)表達式(并讀取變量?).雖然后來我打算寫一個 xml 和 html 解析器(用于學習目的).我這樣做是因為解析器有用的東西范圍很廣:Web 開發(fā)、編程語言解釋器、內(nèi)部工具、游戲引擎、地圖和瓷磚編輯器等.那么編寫解析器的基本理論是什么,我該怎么做在 C# 中實現(xiàn)一個?C# 是解析器的正確語言嗎(我曾經(jīng)用 C++ 編寫了一個簡單的算術(shù)解析器,它很高效.JIT 編譯會證明同樣好嗎?).任何有用的資源和文章.最重要的是,代碼示例(或代碼示例的鏈接).
How do I go about writing a Parser (Recursive Descent?) in C#? For now I just want a simple parser that parses arithmetic expressions (and reads variables?). Though later I intend to write an xml and html parser (for learning purposes). I am doing this because of the wide range of stuff in which parsers are useful: Web development, Programming Language Interpreters, Inhouse Tools, Gaming Engines, Map and Tile Editors, etc. So what is the basic theory of writing parsers and how do I implement one in C#? Is C# the right language for parsers (I once wrote a simple arithmetic parser in C++ and it was efficient. Will JIT compilation prove equally good?). Any helpful resources and articles. And best of all, code examples (or links to code examples).
注意:出于好奇,回答這個問題的人是否曾經(jīng)在 C# 中實現(xiàn)過解析器?
Note: Out of curiosity, has anyone answering this question ever implemented a parser in C#?
推薦答案
我已經(jīng)用 C# 實現(xiàn)了幾個解析器 - 手寫和工具生成.
I have implemented several parsers in C# - hand-written and tool generated.
一個非常好的一般解析入門教程是讓我們構(gòu)建一個編譯器 - 它演示了如何構(gòu)建遞歸下降解析器;對于任何有能力的開發(fā)人員,這些概念很容易從他的語言(我認為是 Pascal)翻譯成 C#.這將教您遞歸下降解析器的工作原理,但是手動編寫完整的編程語言解析器是完全不切實際的.
A very good introductory tutorial on parsing in general is Let's Build a Compiler - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.
您應(yīng)該研究一些工具來為您生成代碼 - 如果您決心編寫 經(jīng)典遞歸下降解析器 (TinyPG, Coco/R, 諷刺).請記住,現(xiàn)在還有其他編寫解析器的方法,它們通常性能更好 - 并且定義更簡單(例如 TDOP解析或一元解析).
You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. TDOP parsing or Monadic Parsing).
關(guān)于 C# 是否適合這項任務(wù) - C# 有一些最好的文本庫.今天的許多解析器(在其他語言中)都有大量的代碼來處理 Unicode 等.我不會對 JITted 代碼發(fā)表太多評論,因為它可能會變得非常虔誠——但是你應(yīng)該沒問題.IronJS 是 CLR 上的解析器/運行時的一個很好的例子(盡管它是用 F# 編寫的)及其性能略遜于 Google V8.
On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religious - however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.
旁注:與語言解析器相比,標記解析器是完全不同的野獸——在大多數(shù)情況下,它們是手工編寫的——并且在掃描器/解析器級別非常簡單;它們通常不是遞歸下降的——特別是在 XML 的情況下,最好不要編寫遞歸下降解析器(以避免堆棧溢出,并且因為可以在 SAX/推送模式下使用平面"解析器).
Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).
這篇關(guān)于如何用 C# 編寫解析器?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!