問題描述
我使用的是一種僅包含 ()
、|
、空格和字母字符的簡單語言.
給定一個如下的正則表達(dá)式:
I'm using a simple language of only ()
, |
, spaces, and alpha characters.
Given a regular expression like the following:
(hello|goodbye) (world(s|)|)
我將如何生成以下數(shù)據(jù)?
How would I go about generating the following data?
hello worlds
hello world
hello
goodbye worlds
goodbye world
goodbye
我不太確定是否需要先構(gòu)建一棵樹,或者是否可以遞歸完成.我被困在要使用的數(shù)據(jù)結(jié)構(gòu)上,以及如何生成字符串.我是否必須保留一堆標(biāo)記,并索引回部分構(gòu)建的字符串以連接更多數(shù)據(jù)?我不知道如何最好地解決這個問題.我是否需要先閱讀整個表達(dá)式,然后以某種方式重新排序?
I'm not quite sure if I need to build a tree first, or if it can be done recursively. I'm stuck on what data structures to utilize, and how to generate the strings as I go. Will I have to keep a bunch of markers, and index back into partially built strings to concatenate more data on? I don't know how best to approach this problem. Would I need to read the whole expression first, and re-order it a certain way?
函數(shù)簽名將如下所示:
std::vector<std::string> Generate(std::string const&){
//...
}
你建議我做什么?
讓我澄清一下,這里的結(jié)果應(yīng)該總是有限的.在我的特定示例中,表達(dá)式中只有 6 個字符串為真.我不確定我的術(shù)語在這里是否正確,但我正在尋找的是表達(dá)式的完美匹配 - 不是任何包含匹配子字符串的字符串.
Let me clarify that the results should always be finite here. In my particular example, there are only 6 strings that would ever be true for the expression. I'm not sure if my terminology is correct here, but what I'm looking for, is a perfect match of the expression- not any string that contains a substring which matches.
推薦答案
有點遵循 Kieveli 的建議,我有提出一個可行的解決方案.雖然之前沒有提到,但對我來說計算可能產(chǎn)生多少結(jié)果也很重要.我使用的是在 github 上找到的名為exrex"的 Python 腳本.尷尬的是,我沒有意識到它也有計數(shù)的能力.盡管如此,我還是使用簡化的正則表達(dá)式語言在 C++ 中盡我所能地實現(xiàn)了它.如果對我的解決方案感興趣,請繼續(xù)閱讀.
Somewhat following Kieveli's advice, I have come up with a working solution. Although not previously mentioned, it was important for me to also get a count of how many results could potentially be generated. I was using a python script called "exrex" which I had found on github. Embarrassingly, I did not realize that it had the capability to also count. Nonetheless, I implemented it the best I could in C++ using my simplified regular expression language. If interested in my solution, please read on.
從面向?qū)ο蟮慕嵌葋砜矗揖帉懥艘粋€掃描器來獲取正則表達(dá)式(字符串),并將其轉(zhuǎn)換為標(biāo)記列表(字符串向量).然后將令牌列表發(fā)送到生成 n 叉樹的解析器.所有這些都打包在一個表達(dá)式生成器"類中,該類可以接受一個表達(dá)式并保存解析樹,以及生成的計數(shù).
掃描儀很重要,因為它標(biāo)記了空字符串大小寫,您可以在我的問題中看到它顯示為|)".掃描也造就了[詞][運(yùn)算][詞][運(yùn)算]...[詞]的圖案.
例如掃描:"(hello|goodbye) (world(s|)|)"
將創(chuàng)建: [][(][hello][|][goodbye][)][ ][(][world][(][s][|][][)][][|][][)][]
From an object oriented stand point, I wrote a scanner to take the regular expression(string), and convert it into a list of tokens(vector of strings). The list of tokens was then sent to a parser which generated an n-ary tree. All of this was packed inside an "expression generator" class that could take an expression and hold the parse tree, as well as the generated count.
The scanner was important because it tokenized the empty string case which you can see in my question appearing as "|)". Scanning also created a pattern of [word] [operation] [word] [operation] ... [word].
For example, scanning: "(hello|goodbye) (world(s|)|)"
will create: [][(][hello][|][goodbye][)][ ][(][world][(][s][|][][)][][|][][)][]
解析樹是一個節(jié)點向量.節(jié)點包含節(jié)點向量的向量.
橙色單元格代表或",而繪制連接的其他框代表和".下面是我的代碼.
The parse tree was a vector of nodes. Nodes contain a vector of vector of nodes.
The orange cells represent the "or"s, and the other boxes that draw the connections, represent the "and"s. Below is my code.
節(jié)點頭
#pragma once
#include <string>
#include <vector>
class Function_Expression_Node{
public:
Function_Expression_Node(std::string const& value_in = "", bool const& more_in = false);
std::string value;
bool more;
std::vector<std::vector<Function_Expression_Node>> children;
};
節(jié)點來源
#include "function_expression_node.hpp"
Function_Expression_Node::Function_Expression_Node(std::string const& value_in, bool const& more_in)
: value(value_in)
, more(more_in)
{}
掃描儀標(biāo)題
#pragma once
#include <vector>
#include <string>
class Function_Expression_Scanner{
public: Function_Expression_Scanner() = delete;
public: static std::vector<std::string> Scan(std::string const& expression);
};
掃描儀來源
#include "function_expression_scanner.hpp"
std::vector<std::string> Function_Expression_Scanner::Scan(std::string const& expression){
std::vector<std::string> tokens;
std::string temp;
for (auto const& it: expression){
if (it == '('){
tokens.push_back(temp);
tokens.push_back("(");
temp.clear();
}
else if (it == '|'){
tokens.push_back(temp);
tokens.push_back("|");
temp.clear();
}
else if (it == ')'){
tokens.push_back(temp);
tokens.push_back(")");
temp.clear();
}
else if (isalpha(it) || it == ' '){
temp+=it;
}
}
tokens.push_back(temp);
return tokens;
}
解析器標(biāo)頭
#pragma once
#include <string>
#include <vector>
#include "function_expression_node.hpp"
class Function_Expression_Parser{
Function_Expression_Parser() = delete;
//get parse tree
public: static std::vector<std::vector<Function_Expression_Node>> Parse(std::vector<std::string> const& tokens, unsigned int & amount);
private: static std::vector<std::vector<Function_Expression_Node>> Build_Parse_Tree(std::vector<std::string>::const_iterator & it, std::vector<std::string>::const_iterator const& end, unsigned int & amount);
private: static Function_Expression_Node Recursive_Build(std::vector<std::string>::const_iterator & it, int & total); //<- recursive
//utility
private: static bool Is_Word(std::string const& it);
};
解析器源
#include "function_expression_parser.hpp"
bool Function_Expression_Parser::Is_Word(std::string const& it){
return (it != "(" && it != "|" && it != ")");
}
Function_Expression_Node Function_Expression_Parser::Recursive_Build(std::vector<std::string>::const_iterator & it, int & total){
Function_Expression_Node sub_root("",true); //<- contains the full root
std::vector<Function_Expression_Node> root;
const auto begin = it;
//calculate the amount
std::vector<std::vector<int>> multiplies;
std::vector<int> adds;
int sub_amount = 1;
while(*it != ")"){
//when we see a "WORD", add it.
if(Is_Word(*it)){
root.push_back(Function_Expression_Node(*it));
}
//when we see a "(", build the subtree,
else if (*it == "("){
++it;
root.push_back(Recursive_Build(it,sub_amount));
//adds.push_back(sub_amount);
//sub_amount = 1;
}
//else we see an "OR" and we do the split
else{
sub_root.children.push_back(root);
root.clear();
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
++it;
}
//add the last bit, if there is any
if (!root.empty()){
sub_root.children.push_back(root);
//store the sub amount
adds.push_back(sub_amount);
}
if (!adds.empty()){
multiplies.push_back(adds);
}
//calculate sub total
int or_count = 0;
for (auto const& it: multiplies){
for (auto const& it2: it){
or_count+=it2;
}
if (or_count > 0){
total*=or_count;
}
or_count = 0;
}
/*
std::cout << "---SUB FUNCTION---
";
for (auto it: multiplies){for (auto it2: it){std::cout << "{" << it2 << "} ";}std::cout << "
";}std::cout << "--
";
std::cout << total << std::endl << '
';
*/
return sub_root;
}
std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Build_Parse_Tree(std::vector<std::string>::const_iterator & it, std::vector<std::string>::const_iterator const& end, unsigned int & amount){
std::vector<std::vector<Function_Expression_Node>> full_root;
std::vector<Function_Expression_Node> root;
const auto begin = it;
//calculate the amount
std::vector<int> adds;
int sub_amount = 1;
int total = 0;
while (it != end){
//when we see a "WORD", add it.
if(Is_Word(*it)){
root.push_back(Function_Expression_Node(*it));
}
//when we see a "(", build the subtree,
else if (*it == "("){
++it;
root.push_back(Recursive_Build(it,sub_amount));
}
//else we see an "OR" and we do the split
else{
full_root.push_back(root);
root.clear();
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
++it;
}
//add the last bit, if there is any
if (!root.empty()){
full_root.push_back(root);
//store the sub amount
adds.push_back(sub_amount);
sub_amount = 1;
}
//calculate sub total
for (auto const& it: adds){ total+=it; }
/*
std::cout << "---ROOT FUNCTION---
";
for (auto it: adds){std::cout << "[" << it << "] ";}std::cout << '
';
std::cout << total << std::endl << '
';
*/
amount = total;
return full_root;
}
std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Parse(std::vector<std::string> const& tokens, unsigned int & amount){
auto it = tokens.cbegin();
auto end = tokens.cend();
auto parse_tree = Build_Parse_Tree(it,end,amount);
return parse_tree;
}
生成器標(biāo)題
#pragma once
#include "function_expression_node.hpp"
class Function_Expression_Generator{
//constructors
public: Function_Expression_Generator(std::string const& expression);
public: Function_Expression_Generator();
//transformer
void Set_New_Expression(std::string const& expression);
//observers
public: unsigned int Get_Count();
//public: unsigned int Get_One_Word_Name_Count();
public: std::vector<std::string> Get_Generations();
private: std::vector<std::string> Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree);
private: std::vector<std::string> Sub_Generate(std::vector<Function_Expression_Node> const& nodes);
private:
std::vector<std::vector<Function_Expression_Node>> m_parse_tree;
unsigned int amount;
};
生成器源
#include "function_expression_generator.hpp"
#include "function_expression_scanner.hpp"
#include "function_expression_parser.hpp"
//constructors
Function_Expression_Generator::Function_Expression_Generator(std::string const& expression){
auto tokens = Function_Expression_Scanner::Scan(expression);
m_parse_tree = Function_Expression_Parser::Parse(tokens,amount);
}
Function_Expression_Generator::Function_Expression_Generator(){}
//transformer
void Function_Expression_Generator::Set_New_Expression(std::string const& expression){
auto tokens = Function_Expression_Scanner::Scan(expression);
m_parse_tree = Function_Expression_Parser::Parse(tokens,amount);
}
//observers
unsigned int Function_Expression_Generator::Get_Count(){
return amount;
}
std::vector<std::string> Function_Expression_Generator::Get_Generations(){
return Generate(m_parse_tree);
}
std::vector<std::string> Function_Expression_Generator::Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree){
std::vector<std::string> results;
std::vector<std::string> more;
for (auto it: parse_tree){
more = Sub_Generate(it);
results.insert(results.end(), more.begin(), more.end());
}
return results;
}
std::vector<std::string> Function_Expression_Generator::Sub_Generate(std::vector<Function_Expression_Node> const& nodes){
std::vector<std::string> results;
std::vector<std::string> more;
std::vector<std::string> new_results;
results.push_back("");
for (auto it: nodes){
if (!it.more){
for (auto & result: results){
result+=it.value;
}
}
else{
more = Generate(it.children);
for (auto m: more){
for (auto r: results){
new_results.push_back(r+m);
}
}
more.clear();
results = new_results;
new_results.clear();
}
}
return results;
}
總而言之,我建議使用 exrex 或本主題中提到的任何其他程序,如果您需要為正則表達(dá)式生成匹配項.
In conclusion, I recommend using exrex, or any other programs mentioned in this thread, if you need to generate matches for regular expressions.
這篇關(guān)于給定一個正則表達(dá)式,我將如何生成與之匹配的所有字符串?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!