問題描述
我使用的是一種僅包含 ()
、|
、空格和字母字符的簡單語言.
給定一個如下的正則表達式:
I'm using a simple language of only ()
, |
, spaces, and alpha characters.
Given a regular expression like the following:
(hello|goodbye) (world(s|)|)
我將如何生成以下數據?
How would I go about generating the following data?
hello worlds
hello world
hello
goodbye worlds
goodbye world
goodbye
我不太確定是否需要先構建一棵樹,或者是否可以遞歸完成.我被困在要使用的數據結構上,以及如何生成字符串.我是否必須保留一堆標記,并索引回部分構建的字符串以連接更多數據?我不知道如何最好地解決這個問題.我是否需要先閱讀整個表達式,然后以某種方式重新排序?
I'm not quite sure if I need to build a tree first, or if it can be done recursively. I'm stuck on what data structures to utilize, and how to generate the strings as I go. Will I have to keep a bunch of markers, and index back into partially built strings to concatenate more data on? I don't know how best to approach this problem. Would I need to read the whole expression first, and re-order it a certain way?
函數簽名將如下所示:
std::vector<std::string> Generate(std::string const&){
//...
}
你建議我做什么?
讓我澄清一下,這里的結果應該總是有限的.在我的特定示例中,表達式中只有 6 個字符串為真.我不確定我的術語在這里是否正確,但我正在尋找的是表達式的完美匹配 - 不是任何包含匹配子字符串的字符串.
Let me clarify that the results should always be finite here. In my particular example, there are only 6 strings that would ever be true for the expression. I'm not sure if my terminology is correct here, but what I'm looking for, is a perfect match of the expression- not any string that contains a substring which matches.
推薦答案
有點遵循 Kieveli 的建議,我有提出一個可行的解決方案.雖然之前沒有提到,但對我來說計算可能產生多少結果也很重要.我使用的是在 github 上找到的名為exrex"的 Python 腳本.尷尬的是,我沒有意識到它也有計數的能力.盡管如此,我還是使用簡化的正則表達式語言在 C++ 中盡我所能地實現了它.如果對我的解決方案感興趣,請繼續閱讀.
Somewhat following Kieveli's advice, I have come up with a working solution. Although not previously mentioned, it was important for me to also get a count of how many results could potentially be generated. I was using a python script called "exrex" which I had found on github. Embarrassingly, I did not realize that it had the capability to also count. Nonetheless, I implemented it the best I could in C++ using my simplified regular expression language. If interested in my solution, please read on.
從面向對象的角度來看,我編寫了一個掃描器來獲取正則表達式(字符串),并將其轉換為標記列表(字符串向量).然后將令牌列表發送到生成 n 叉樹的解析器.所有這些都打包在一個表達式生成器"類中,該類可以接受一個表達式并保存解析樹,以及生成的計數.
掃描儀很重要,因為它標記了空字符串大小寫,您可以在我的問題中看到它顯示為|)".掃描也造就了[詞][運算][詞][運算]...[詞]的圖案.
例如掃描:"(hello|goodbye) (world(s|)|)"
將創建: [][(][hello][|][goodbye][)][ ][(][world][(][s][|][][)][][|][][)][]
From an object oriented stand point, I wrote a scanner to take the regular expression(string), and convert it into a list of tokens(vector of strings). The list of tokens was then sent to a parser which generated an n-ary tree. All of this was packed inside an "expression generator" class that could take an expression and hold the parse tree, as well as the generated count.
The scanner was important because it tokenized the empty string case which you can see in my question appearing as "|)". Scanning also created a pattern of [word] [operation] [word] [operation] ... [word].
For example, scanning: "(hello|goodbye) (world(s|)|)"
will create: [][(][hello][|][goodbye][)][ ][(][world][(][s][|][][)][][|][][)][]
解析樹是一個節點向量.節點包含節點向量的向量.
橙色單元格代表或",而繪制連接的其他框代表和".下面是我的代碼.
The parse tree was a vector of nodes. Nodes contain a vector of vector of nodes.
The orange cells represent the "or"s, and the other boxes that draw the connections, represent the "and"s. Below is my code.
節點頭
#pragma once
#include <string>
#include <vector>
class Function_Expression_Node{
public:
Function_Expression_Node(std::string const& value_in = "", bool const& more_in = false);
std::string value;
bool more;
std::vector<std::vector<Function_Expression_Node>> children;
};
節點來源
#include "function_expression_node.hpp"
Function_Expression_Node::Function_Expression_Node(std::string const& value_in, bool const& more_in)
: value(value_in)
, more(more_in)
{}
掃描儀標題
#pragma once
#include <vector>
#include <string>
class Function_Expression_Scanner{
public: Function_Expression_Scanner() = delete;
public: static std::vector<std::string> Scan(std::string const& expression);
};
掃描儀來源
#include "function_expression_scanner.hpp"
std::vector<std::string> Function_Expression_Scanner::Scan(std::string const& expression){
std::vector<std::string> tokens;
std::string temp;
for (auto const& it: expression){
if (it == '('){
tokens.push_back(temp);
tokens.push_back("(");
temp.clear();
}
else if (it == '|'){
tokens.push_back(temp);
tokens.push_back("|");
temp.clear();
}
else if (it == ')'){
tokens.push_back(temp);
tokens.push_back(")");
temp.clear();
}
else if (isalpha(it) || it == ' '){
temp+=it;
}
}
tokens.push_back(temp);
return tokens;
}
解析器標頭
#pragma once
#include <string>
#include <vector>
#include "function_expression_node.hpp"
class Function_Expression_Parser{
Function_Expression_Parser() = delete;
//get parse tree
public: static std::vector<std::vector<Function_Expression_Node>> Parse(std::vector<std::string> const& tokens, unsigned int & amount);
private: static std::vector<std::vector<Function_Expression_Node>> Build_Parse_Tree(std::vector<std::string>::const_iterator & it, std::vector<std::string>::const_iterator const& end, unsigned int & amount);
private: static Function_Expression_Node Recursive_Build(std::vector<std::string>::const_iterator & it, int & total); //<- recursive
//utility
private: static bool Is_Word(std::string const& it);
};
解析器源
#include "function_expression_parser.hpp"
bool Function_Expression_Parser::Is_Word(std::string const& it){
return (it != "(" && it != "|" && it != ")");
}
Function_Expression_Node Function_Expression_Parser::Recursive_Build(std::vector<std::string>::const_iterator & it, int & total){
Function_Expression_Node sub_root("",true); //<- contains the full root
std::vector<Function_Expression_Node> root;
const auto begin = it;
//calculate the amount
std::vector<std::vector<int>> multiplies;
std::vector<int> adds;
int sub_amount = 1;
while(*it != ")"){
//when we see a "WORD", add it.
if(Is_Word(*it)){
root.push_back(Function_Expression_Node(*it));
}
//when we see a "(", build the subtree,
else if (*it == "("){
++it;
root.push_back(Recursive_Build(it,sub_amount));
//adds.push_back(sub_amount);
//sub_amount = 1;
}
//else we see an "OR" and we do the split
else{
sub_root.children.push_back(root);
root.clear();
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
++it;
}
//add the last bit, if there is any
if (!root.empty()){
sub_root.children.push_back(root);
//store the sub amount
adds.push_back(sub_amount);
}
if (!adds.empty()){
multiplies.push_back(adds);
}
//calculate sub total
int or_count = 0;
for (auto const& it: multiplies){
for (auto const& it2: it){
or_count+=it2;
}
if (or_count > 0){
total*=or_count;
}
or_count = 0;
}
/*
std::cout << "---SUB FUNCTION---
";
for (auto it: multiplies){for (auto it2: it){std::cout << "{" << it2 << "} ";}std::cout << "
";}std::cout << "--
";
std::cout << total << std::endl << '
';
*/
return sub_root;
}
std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Build_Parse_Tree(std::vector<std::string>::const_iterator & it, std::vector<std::string>::const_iterator const& end, unsigned int & amount){
std::vector<std::vector<Function_Expression_Node>> full_root;
std::vector<Function_Expression_Node> root;
const auto begin = it;
//calculate the amount
std::vector<int> adds;
int sub_amount = 1;
int total = 0;
while (it != end){
//when we see a "WORD", add it.
if(Is_Word(*it)){
root.push_back(Function_Expression_Node(*it));
}
//when we see a "(", build the subtree,
else if (*it == "("){
++it;
root.push_back(Recursive_Build(it,sub_amount));
}
//else we see an "OR" and we do the split
else{
full_root.push_back(root);
root.clear();
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
++it;
}
//add the last bit, if there is any
if (!root.empty()){
full_root.push_back(root);
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
//calculate sub total
for (auto const& it: adds){ total+=it; }
/*
std::cout << "---ROOT FUNCTION---
";
for (auto it: adds){std::cout << "[" << it << "] ";}std::cout << '
';
std::cout << total << std::endl << '
';
*/
amount = total;
return full_root;
}
std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Parse(std::vector<std::string> const& tokens, unsigned int & amount){
auto it = tokens.cbegin();
auto end = tokens.cend();
auto parse_tree = Build_Parse_Tree(it,end,amount);
return parse_tree;
}
生成器標題
#pragma once
#include "function_expression_node.hpp"
class Function_Expression_Generator{
//constructors
public: Function_Expression_Generator(std::string const& expression);
public: Function_Expression_Generator();
//transformer
void Set_New_Expression(std::string const& expression);
//observers
public: unsigned int Get_Count();
//public: unsigned int Get_One_Word_Name_Count();
public: std::vector<std::string> Get_Generations();
private: std::vector<std::string> Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree);
private: std::vector<std::string> Sub_Generate(std::vector<Function_Expression_Node> const& nodes);
private:
std::vector<std::vector<Function_Expression_Node>> m_parse_tree;
unsigned int amount;
};
生成器源
#include "function_expression_generator.hpp"
#include "function_expression_scanner.hpp"
#include "function_expression_parser.hpp"
//constructors
Function_Expression_Generator::Function_Expression_Generator(std::string const& expression){
auto tokens = Function_Expression_Scanner::Scan(expression);
m_parse_tree = Function_Expression_Parser::Parse(tokens,amount);
}
Function_Expression_Generator::Function_Expression_Generator(){}
//transformer
void Function_Expression_Generator::Set_New_Expression(std::string const& expression){
auto tokens = Function_Expression_Scanner::Scan(expression);
m_parse_tree = Function_Expression_Parser::Parse(tokens,amount);
}
//observers
unsigned int Function_Expression_Generator::Get_Count(){
return amount;
}
std::vector<std::string> Function_Expression_Generator::Get_Generations(){
return Generate(m_parse_tree);
}
std::vector<std::string> Function_Expression_Generator::Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree){
std::vector<std::string> results;
std::vector<std::string> more;
for (auto it: parse_tree){
more = Sub_Generate(it);
results.insert(results.end(), more.begin(), more.end());
}
return results;
}
std::vector<std::string> Function_Expression_Generator::Sub_Generate(std::vector<Function_Expression_Node> const& nodes){
std::vector<std::string> results;
std::vector<std::string> more;
std::vector<std::string> new_results;
results.push_back("");
for (auto it: nodes){
if (!it.more){
for (auto & result: results){
result+=it.value;
}
}
else{
more = Generate(it.children);
for (auto m: more){
for (auto r: results){
new_results.push_back(r+m);
}
}
more.clear();
results = new_results;
new_results.clear();
}
}
return results;
}
總而言之,我建議使用 exrex 或本主題中提到的任何其他程序,如果您需要為正則表達式生成匹配項.
In conclusion, I recommend using exrex, or any other programs mentioned in this thread, if you need to generate matches for regular expressions.
這篇關于給定一個正則表達式,我將如何生成與之匹配的所有字符串?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!