久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

將 invRegex.py 移植到 Javascript (Node.js)

Porting invRegex.py to Javascript (Node.js)(將 invRegex.py 移植到 Javascript (Node.js))
本文介紹了將 invRegex.py 移植到 Javascript (Node.js)的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我一直在嘗試移植 invRegex.py 到 node.js 實現(xiàn)一段時間,但我仍在努力解決它.感謝 ret.js 標(biāo)記器,我已經(jīng)有了正則表達式解析樹,它運行良好,但是以一種內(nèi)存效率高的方式實際生成和連接所有不同元素對我來說是非常具有挑戰(zhàn)性的.為了簡單起見,假設(shè)我有以下正則表達式:

I have been trying to port invRegex.py to a node.js implementation for a while, but I'm still struggling with it. I already have the regular expression parse tree thanks to the ret.js tokenizer and it works pretty well, but the actual generation and concatenation of all the distinct elements in a way that is memory-efficient is revealing very challenging to me. To keep it simple, lets say I have the following regex:

[01]{1,2}@[a-f]

將其提供給 invRegex.py 會產(chǎn)生以下輸出(tabbified 以占用更少的空間):

Feeding that to invRegex.py produces the following output (tabbified to take less space):

 0@a     0@b     0@c     0@d     0@e     0@f
00@a    00@b    00@c    00@d    00@e    00@f
01@a    01@b    01@c    01@d    01@e    01@f
 1@a     1@b     1@c     1@d     1@e     1@f
10@a    10@b    10@c    10@d    10@e    10@f
11@a    11@b    11@c    11@d    11@e    11@f

考慮到我能夠獲取每個單獨的令牌并生成一個包含所有有效單獨輸出的數(shù)組:

Considering I'm able to get each individual token and produce an array of all the valid individual outputs:

[01]{1,2} = function () {
    return ['0', '00', '01', '1', '10', '11'];
};

@ = function () {
    return ['@'];
};

[a-f] = function () {
    return ['a', 'b', 'c', 'd', 'e', 'f'];
};

我可以計算所有數(shù)組的笛卡爾積并得到相同的預(yù)期輸出:

I can compute the cartesian product of all the arrays and get the same expected output:

var _ = require('underscore');

function cartesianProductOf() {
    return _.reduce(arguments, function(a, b) {
        return _.flatten(_.map(a, function(x) {
            return _.map(b, function(y) {
                return x.concat([y]);
            });
        }), true);
    }, [ [] ]);
};

var tokens = [
    ['0', '00', '01', '1', '10', '11'],
    ['@'],
    ['a', 'b', 'c', 'd', 'e', 'f'],
];

var result = cartesianProductOf(tokens[0], tokens[1], tokens[2]);

_.each(result, function (value, key) {
    console.log(value.join(''));
});

問題在于它在內(nèi)存中保存了所有 36 個值,如果我有一個稍微復(fù)雜一點的正則表達式,例如 [az]{0,10} 它會保存 146813779479511 內(nèi)存中的值,這是完全不可行的.我想以異步方式處理這個龐大的列表,將每個生成的組合傳遞給回調(diào),并允許我在我認(rèn)為合適的任何合理點中斷該過程,就像 invRegex.py 或 這個 Haskell 包 - 不幸的是,我無法理解 Haskell,我也不知道如何將 Python 中的生成器行為模仿為 Javascript.

The problem with this is that it holds all the 36 values in memory, if I had a slightly more complicated regular expression, such as [a-z]{0,10} it would hold 146813779479511 values in memory, which is totally unfeasible. I would like to process this huge list in an asynchronous fashion, passing each generated combination to a callback and allowing me to interrupt the process at any sensible point I see fit, much like invRegex.py or this Haskell package - unfortunately I can't understand Haskell and I don't know how to mimic the generator behavior in Python to Javascript either.

我嘗試在節(jié)點 0.11.9(使用 --harmony)中運行幾個簡單的生成器實驗,如下所示:

I tried running a couple of simple generator experiments in node 0.11.9 (with --harmony) like this one:

function* alpha() {
    yield 'a'; yield 'b'; yield 'c';
}

function* numeric() {
    yield '0'; yield '1';
}

function* alphanumeric() {
    yield* alpha() + numeric(); // what's the diff between yield and yield*?
}

for (var i of alphanumeric()) {
    console.log(i);
}

不用說上面的行不通.=/

Needless to say the above doesn't work. =/

在這里把我的頭撞到墻上,所以任何解決這個問題的幫助將不勝感激.

Banging my head against the wall here, so any help tackling this problem would be highly appreciated.

更新:這是 b[a-z]{3} 的示例 ret.js 解析樹:

UPDATE: Here is a sample ret.js parse tree for b[a-z]{3}:

{
    "type": ret.types.ROOT,
    "stack": [
            {
                "type": ret.types.CHAR,
                "value": 98 // b
            },
            {
                "type": ret.types.REPETITION,
                "max": 3,
                "min": 3,
                "value": {
                    "type": ret.types.SET,
                    "not": false,
                    "set": [
                        {
                            "type": ret.types.RANGE,
                            "from": 97, // a
                            "to": 122   // z
                        }
                    ]
                }
            }
        ]
    ]
}

SET/RANGE 類型應(yīng)該產(chǎn)生 26 個不同的值,而父 REPETITION 類型應(yīng)該將之前的值取 3 次方,產(chǎn)生 17576 個不同的組合.如果我要生成一個扁平化的 tokens 數(shù)組,就像我之前為 cartesianProductOf 所做的那樣,中間扁平化值將占用與實際笛卡爾積本身一樣多的空間.

The SET / RANGE type should yield 26 distinct values, and the parent REPETITION type should take that previous value to the power of 3, yielding 17576 distinct combinations. If I was to generate a flattened out tokens array like I did before for cartesianProductOf, the intermediate flattened values would take as much space as the actual cartesian product itself.

我希望這個例子能更好地解釋我面臨的問題.

I hope this example explains better the problem I am facing.

推薦答案

我建議你寫迭代器類.它們易于實現(xiàn)(基本上它們是狀態(tài)機),內(nèi)存占用少,可以組合起來構(gòu)建越來越復(fù)雜的表達式(請向下滾動查看最終結(jié)果),生成的迭代器可以包裝在枚舉器.

I advise you to write iterator classes. They are easy to implement (basically they are state machines), they have a low memory footprint, they can be combined to build up increasingly complex expressions (please scroll down to see the final result), and the resulting iterator can be wrapped in an enumerator.

每個迭代器類都有以下方法:

Each iterator class has the following methods:

  • first:初始化狀態(tài)機(第一次匹配)
  • 下一個:進入下一個狀態(tài)(下一場比賽)
  • ok:最初為真,但一旦下一個"超出最后一個匹配項,則變?yōu)榧?/li>
  • get: 返回當(dāng)前匹配項(作為字符串)
  • 克隆: 克隆對象;對重復(fù)至關(guān)重要,因為每個實例都需要自己的狀態(tài)
  • first: initializes the state machine (first match)
  • next: proceeds to the next state (next match)
  • ok: initially true, but becomes false once 'next' proceeds beyond the last match
  • get: returns the current match (as a string)
  • clone: clones the object; essential for repetition, because each instance needs its own state

從最簡單的情況開始:應(yīng)按字面意思匹配的一個或多個字符序列(例如 /foo/).不用說這只有一個匹配項,所以在第一次調(diào)用 'next' 時,'ok' 將變?yōu)?false.

Start off with the most trivial case: a sequence of one or more characters that should be matched literally (e.g. /foo/). Needless to say this has only one match, so 'ok' will become false upon the first call of 'next'.

function Literal(literal) { this.literal = literal; }

Literal.prototype.first = function() { this.i = 0; };
Literal.prototype.next = function() { this.i++; };
Literal.prototype.ok = function() { return this.i == 0; };
Literal.prototype.get = function() { return this.literal; };
Literal.prototype.clone = function() { return new Literal(this.literal); };

字符類 ([abc]) 也很簡單.構(gòu)造函數(shù)接受一個字符串;如果你更喜歡數(shù)組,這很容易解決.

Character classes ([abc]) are trivial too. The constructor accepts a string of characters; if you prefer arrays, that's easy to fix.

function CharacterClass(chars) { this.chars = chars; }

CharacterClass.prototype.first = function() { this.i = 0; };
CharacterClass.prototype.next = function() { this.i++; };
CharacterClass.prototype.ok = function() { return this.i < this.chars.length; };
CharacterClass.prototype.get = function() { return this.chars.charAt(this.i); };
CharacterClass.prototype.clone = function() { return new CharacterClass(this.chars); };

現(xiàn)在我們需要結(jié)合其他迭代器來形成更復(fù)雜的正則表達式的迭代器.序列只是連續(xù)的兩個或多個模式(如 foo[abc]).

Now we need iterators that combine other iterators to form more complex regular expressions. A sequence is just two or more patterns in a row (like foo[abc]).

function Sequence(iterators) {
   if (arguments.length > 0) {
      this.iterators = iterators.length ? iterators : [new Literal('')];
   }
}
Sequence.prototype.first = function() {
   for (var i in this.iterators) this.iterators[i].first();
};
Sequence.prototype.next = function() {
   if (this.ok()) {
      var i = this.iterators.length;
      while (this.iterators[--i].next(), i > 0 && !this.iterators[i].ok()) {
         this.iterators[i].first();
      }
   }
};
Sequence.prototype.ok = function() {
   return this.iterators[0].ok();
};
Sequence.prototype.get = function() {
   var retval = '';
   for (var i in this.iterators) {
      retval += this.iterators[i].get();
   }
   return retval;
};
Sequence.prototype.clone = function() {
   return new Sequence(this.iterators.map(function(it) { return it.clone(); }));
};

另一種組合迭代器的方法是選擇(也稱為替代品),例如foo|bar.

Another way to combine iterators is the choice (a.k.a. alternatives), e.g. foo|bar.

function Choice(iterators) { this.iterators = iterators; }

Choice.prototype.first = function() {
   this.count = 0;
   for (var i in this.iterators) this.iterators[i].first();
};
Choice.prototype.next = function() {
   if (this.ok()) {
      this.iterators[this.count].next();
      while (this.ok() && !this.iterators[this.count].ok()) this.count++;
   }
};
Choice.prototype.ok = function() {
   return this.count < this.iterators.length;
};
Choice.prototype.get = function() {
   return this.iterators[this.count].get();
};
Choice.prototype.clone = function() {
   return new Choice(this.iterators.map(function(it) { return it.clone(); }));
};

其他正則表達式功能可以通過組合現(xiàn)有的類來實現(xiàn).類繼承是一個很好的方法來做到這一點.例如,可選模式 (x?) 只是在空字符串和 x 之間進行選擇.

Other regex features can be implemented by combining the existing classes. Class inheritance is a great way to do this. For example, an optional pattern (x?) is just a choice between the empty string and x.

function Optional(iterator) {
   if (arguments.length > 0) {
      Choice.call(this, [new Literal(''), iterator]);
   }
}
Optional.prototype = new Choice();

重復(fù) (x{n,m}) 是序列和可選的組合.因為我必須繼承其中一個,所以我的實現(xiàn)由兩個相互依賴的類組成.

Repetition (x{n,m}) is a combination of Sequence and Optional. Because I have to inherit one or the other, my implementation consists of two mutually dependent classes.

function RepeatFromZero(maxTimes, iterator) {
   if (arguments.length > 0) {
      Optional.call(this, new Repeat(1, maxTimes, iterator));
   }
}
RepeatFromZero.prototype = new Optional();

function Repeat(minTimes, maxTimes, iterator) {
   if (arguments.length > 0) {
      var sequence = [];
      for (var i = 0; i < minTimes; i++) {
         sequence.push(iterator.clone());   // need to clone the iterator
      }
      if (minTimes < maxTimes) {
         sequence.push(new RepeatFromZero(maxTimes - minTimes, iterator));
      }
      Sequence.call(this, sequence);
   }
}
Repeat.prototype = new Sequence();

正如我之前所說,迭代器可以包裝到枚舉器中.這只是一個循環(huán),您可以隨時中斷.

As I said earlier, an iterator can be wrapped into an enumerator. This is simply a loop that you can break whenever you want.

function Enumerator(iterator) {
   this.iterator = iterator;

   this.each = function(callback) {
      for (this.iterator.first(); this.iterator.ok(); this.iterator.next()) {
         callback(this.iterator.get());
      }
   };
}

是時候把它們放在一起了.讓我們用一些愚蠢的正則表達式:

Time to put it all together. Let's take some silly regular expression:

([ab]{2}){1,2}|[cd](f|ef{0,2}e)

組合迭代器對象非常簡單:

Composing the iterator object is really straightforward:

function GetIterationsAsHtml() {

   var iterator = new Choice([
      new Repeat(1, 2,
         new Repeat(2, 2, new CharacterClass('ab'))),
      new Sequence([
         new CharacterClass('cd'),
         new Choice([
            new Literal('f'),
            new Sequence([
               new Literal('e'),
               new RepeatFromZero(2, new Literal('f')),
               new Literal('e')
            ])
         ])
      ])
   ]);

   var iterations = '<ol>
';
   var enumerator = new Enumerator(iterator);
   enumerator.each(function(iteration) { iterations += '<li>' + iteration + '</li>
'; });
   return iterations + '</ol>';
}

這會產(chǎn)生 28 個匹配項,但我會省去你的輸出.

This yields 28 matches, but I will spare you the output.

如果我的代碼不符合軟件模式、不兼容瀏覽器(在 Chrome 和 Firefox 上運行良好)或 OOP 不佳,我深表歉意.我只是希望它能讓這個概念變得清晰.

My apologies if my code is not compliant to software patterns, is not browser-compatible (works OK on Chrome and Firefox) or suffers from poor OOP. I just hope it makes the concept clear.

為了完整起見,在 OP 的倡議下,我又實現(xiàn)了一個迭代器類:reference.

for completeness, and following OP's initiative, I implemented one more iterator class: the reference.

引用(1 2 等)獲取較早捕獲組的當(dāng)前匹配項(即括號中的任何內(nèi)容).它的實現(xiàn)與 Literal 非常相似,因為它只有一個匹配項.

A reference (1 2 etc) picks up the current match of an earlier capturing group (i.e. anything in parentheses). Its implementation is very similar to Literal, in that it has exactly one match.

function Reference(iterator) { this.iterator = iterator; }

Reference.prototype.first = function() { this.i = 0; };
Reference.prototype.next  = function() { this.i++; };
Reference.prototype.ok    = function() { return this.i == 0; };
Reference.prototype.get   = function() { return this.iterator.get(); };
Reference.prototype.clone = function() { return new Reference(this.iterator); };

構(gòu)造函數(shù)被賦予一個迭代器,它代表被引用的子模式.以 (foo|bar)([xy])21 為例(產(chǎn)生 fooxxfoo, fooyyfoo, barxxbar, baryybar):

The constructor is given an iterator that represents the referenced subpattern. Taking (foo|bar)([xy])21 as an example (yields fooxxfoo, fooyyfoo, barxxbar, baryybar):

var groups = new Array();

var iterator = new Sequence([
   groups[1] = new Choice([new Literal('foo'), new Literal('bar')]),
   groups[2] = new CharacterClass('xy'),
   new Reference(groups[2]),
   new Reference(groups[1])
]);

在構(gòu)建迭代器類樹時指定捕獲組.我仍然在這里手動執(zhí)行此操作,但最終您希望將其自動化.這只是將您的解析樹映射到類似的迭代器類樹的問題.

Capturing groups are specified as you build up the tree of iterator classes. I am still doing that manually here, but eventually you want this to be automated. That is just a matter of mapping your parse tree to a similar tree of iterator classes.

編輯 2: 這是一個相對簡單的遞歸函數(shù),它將轉(zhuǎn)換由 ret.js 變成一個迭代器.

EDIT 2: here's a relatively simple recursive function that will convert a parse tree produced by ret.js into an iterator.

function ParseTreeMapper() {
    this.capturingGroups = [];
}
ParseTreeMapper.prototype.mapToIterator = function(parseTree) {
    switch (parseTree.type) {
        case ret.types.ROOT:
        case ret.types.GROUP:
            var me = this;
            var mapToSequence = function(parseTrees) {
                return new Sequence(parseTrees.map(function(t) {
                    return me.mapToIterator(t);
                }));
            };
            var group = parseTree.options ?
                new Choice(parseTree.options.map(mapToSequence)) : 
                mapToSequence(parseTree.stack);
            if (parseTree.remember) {
                this.capturingGroups.push(group);
            }
            return group;
        case ret.types.SET:
            return new CharacterClass(this.mapToCharacterClass(parseTree.set));
        case ret.types.REPETITION:
            return new Repeat(parseInt(parseTree.min), parseInt(parseTree.max), this.mapToIterator(parseTree.value));
        case ret.types.REFERENCE:
            var ref = parseInt(parseTree.value) - 1;
            return ref in this.capturingGroups ?
                new Reference(this.capturingGroups[ref]) :
                new Literal('<ReferenceOutOfRange>');
        case ret.types.CHAR:
            return new Literal(String.fromCharCode(parseTree.value));
        default:
            return new Literal('<UnsupportedType>');
    }
};
ParseTreeMapper.prototype.mapToCharacterClass = function(parseTrees) {
    var chars = '';
    for (var i in parseTrees) {
        var tree = parseTrees[i];
        switch (tree.type) {
            case ret.types.CHAR:
                chars += String.fromCharCode(tree.value);
                break;
            case ret.types.RANGE:
                for (var code = tree.from; code <= tree.to; code++) {
                    chars += String.fromCharCode(code);
                }
                break;
        }
    }
    return chars;
};

用法:

var regex = 'b[a-n]{3}';
var parseTree = ret(regex);    // requires ret.js
var iterator = new ParseTreeMapper().mapToIterator(parseTree);

我在這個演示中將所有組件放在一起:http://jsfiddle.net/Pmnwk/3/

I put all components together in this demo: http://jsfiddle.net/Pmnwk/3/

注意:不支持許多正則表達式語法結(jié)構(gòu)(錨、前瞻、后視、遞歸),但我想它已經(jīng)與 invRegex.py 相當(dāng).

Note: many regex syntax constructs are not supported (anchors, look-ahead, look-behind, recursion), but I guess it is already pretty much up to par with invRegex.py.

這篇關(guān)于將 invRegex.py 移植到 Javascript (Node.js)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

discord.js v12: How do I await for messages in a DM channel?(discord.js v12:我如何等待 DM 頻道中的消息?)
how to make my bot mention the person who gave that bot command(如何讓我的機器人提及發(fā)出該機器人命令的人)
How to fix Must use import to load ES Module discord.js(如何修復(fù)必須使用導(dǎo)入來加載 ES 模塊 discord.js)
How to list all members from a specific server?(如何列出來自特定服務(wù)器的所有成員?)
Discord bot: Fix ‘FFMPEG not found’(Discord bot:修復(fù)“找不到 FFMPEG)
Welcome message when joining discord Server using discord.js(使用 discord.js 加入 discord 服務(wù)器時的歡迎消息)
主站蜘蛛池模板: 丝袜一区二区三区 | 欧美日韩成人在线观看 | 久久久国产精品 | 超黄视频网站 | 欧美9999| 国产99久久精品一区二区永久免费 | 国产一区二区三区 | av国产在线观看 | 伊人伊成久久人综合网站 | jdav视频在线观看免费 | 国产精品乱码一区二区三区 | 亚洲精品福利视频 | 91精品国产综合久久久久 | 中文字幕成人av | 91色视频在线观看 | 午夜电影福利 | 日韩欧美专区 | 国产精品九九视频 | 国产精品久久99 | 一区二区三区视频 | 一道本在线 | 欧美日韩久久 | 亚洲精品4 | 天天综合网7799精品 | 免费黄色片在线观看 | 午夜影院在线观看 | 日韩伦理一区二区三区 | 欧美一级黄色免费看 | 一二三区av | 日日操夜夜操天天操 | 国产成人精品综合 | 亚洲成人日韩 | 欧美亚洲另类在线 | 高清欧美性猛交 | 亚洲福利| 天天操天天摸天天干 | 欧美一级片a | 亚洲免费人成在线视频观看 | 99精品免费久久久久久久久日本 | 欧美一区二区免费 | 欧美一级黑人aaaaaaa做受 |