久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

將 invRegex.py 移植到 Javascript (Node.js)

Porting invRegex.py to Javascript (Node.js)(將 invRegex.py 移植到 Javascript (Node.js))
本文介紹了將 invRegex.py 移植到 Javascript (Node.js)的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我一直在嘗試移植 invRegex.py 到 node.js 實現(xiàn)一段時間,但我仍在努力解決它.感謝 ret.js 標記器,我已經(jīng)有了正則表達式解析樹,它運行良好,但是以一種內(nèi)存效率高的方式實際生成和連接所有不同元素對我來說是非常具有挑戰(zhàn)性的.為了簡單起見,假設我有以下正則表達式:

I have been trying to port invRegex.py to a node.js implementation for a while, but I'm still struggling with it. I already have the regular expression parse tree thanks to the ret.js tokenizer and it works pretty well, but the actual generation and concatenation of all the distinct elements in a way that is memory-efficient is revealing very challenging to me. To keep it simple, lets say I have the following regex:

[01]{1,2}@[a-f]

將其提供給 invRegex.py 會產(chǎn)生以下輸出(tabbified 以占用更少的空間):

Feeding that to invRegex.py produces the following output (tabbified to take less space):

 0@a     0@b     0@c     0@d     0@e     0@f
00@a    00@b    00@c    00@d    00@e    00@f
01@a    01@b    01@c    01@d    01@e    01@f
 1@a     1@b     1@c     1@d     1@e     1@f
10@a    10@b    10@c    10@d    10@e    10@f
11@a    11@b    11@c    11@d    11@e    11@f

考慮到我能夠獲取每個單獨的令牌并生成一個包含所有有效單獨輸出的數(shù)組:

Considering I'm able to get each individual token and produce an array of all the valid individual outputs:

[01]{1,2} = function () {
    return ['0', '00', '01', '1', '10', '11'];
};

@ = function () {
    return ['@'];
};

[a-f] = function () {
    return ['a', 'b', 'c', 'd', 'e', 'f'];
};

我可以計算所有數(shù)組的笛卡爾積并得到相同的預期輸出:

I can compute the cartesian product of all the arrays and get the same expected output:

var _ = require('underscore');

function cartesianProductOf() {
    return _.reduce(arguments, function(a, b) {
        return _.flatten(_.map(a, function(x) {
            return _.map(b, function(y) {
                return x.concat([y]);
            });
        }), true);
    }, [ [] ]);
};

var tokens = [
    ['0', '00', '01', '1', '10', '11'],
    ['@'],
    ['a', 'b', 'c', 'd', 'e', 'f'],
];

var result = cartesianProductOf(tokens[0], tokens[1], tokens[2]);

_.each(result, function (value, key) {
    console.log(value.join(''));
});

問題在于它在內(nèi)存中保存了所有 36 個值,如果我有一個稍微復雜一點的正則表達式,例如 [az]{0,10} 它會保存 146813779479511 內(nèi)存中的值,這是完全不可行的.我想以異步方式處理這個龐大的列表,將每個生成的組合傳遞給回調(diào),并允許我在我認為合適的任何合理點中斷該過程,就像 invRegex.py 或 這個 Haskell 包 - 不幸的是,我無法理解 Haskell,我也不知道如何將 Python 中的生成器行為模仿為 Javascript.

The problem with this is that it holds all the 36 values in memory, if I had a slightly more complicated regular expression, such as [a-z]{0,10} it would hold 146813779479511 values in memory, which is totally unfeasible. I would like to process this huge list in an asynchronous fashion, passing each generated combination to a callback and allowing me to interrupt the process at any sensible point I see fit, much like invRegex.py or this Haskell package - unfortunately I can't understand Haskell and I don't know how to mimic the generator behavior in Python to Javascript either.

我嘗試在節(jié)點 0.11.9(使用 --harmony)中運行幾個簡單的生成器實驗,如下所示:

I tried running a couple of simple generator experiments in node 0.11.9 (with --harmony) like this one:

function* alpha() {
    yield 'a'; yield 'b'; yield 'c';
}

function* numeric() {
    yield '0'; yield '1';
}

function* alphanumeric() {
    yield* alpha() + numeric(); // what's the diff between yield and yield*?
}

for (var i of alphanumeric()) {
    console.log(i);
}

不用說上面的行不通.=/

Needless to say the above doesn't work. =/

在這里把我的頭撞到墻上,所以任何解決這個問題的幫助將不勝感激.

Banging my head against the wall here, so any help tackling this problem would be highly appreciated.

更新:這是 b[a-z]{3} 的示例 ret.js 解析樹:

UPDATE: Here is a sample ret.js parse tree for b[a-z]{3}:

{
    "type": ret.types.ROOT,
    "stack": [
            {
                "type": ret.types.CHAR,
                "value": 98 // b
            },
            {
                "type": ret.types.REPETITION,
                "max": 3,
                "min": 3,
                "value": {
                    "type": ret.types.SET,
                    "not": false,
                    "set": [
                        {
                            "type": ret.types.RANGE,
                            "from": 97, // a
                            "to": 122   // z
                        }
                    ]
                }
            }
        ]
    ]
}

SET/RANGE 類型應該產(chǎn)生 26 個不同的值,而父 REPETITION 類型應該將之前的值取 3 次方,產(chǎn)生 17576 個不同的組合.如果我要生成一個扁平化的 tokens 數(shù)組,就像我之前為 cartesianProductOf 所做的那樣,中間扁平化值將占用與實際笛卡爾積本身一樣多的空間.

The SET / RANGE type should yield 26 distinct values, and the parent REPETITION type should take that previous value to the power of 3, yielding 17576 distinct combinations. If I was to generate a flattened out tokens array like I did before for cartesianProductOf, the intermediate flattened values would take as much space as the actual cartesian product itself.

我希望這個例子能更好地解釋我面臨的問題.

I hope this example explains better the problem I am facing.

推薦答案

我建議你寫迭代器類.它們易于實現(xiàn)(基本上它們是狀態(tài)機),內(nèi)存占用少,可以組合起來構(gòu)建越來越復雜的表達式(請向下滾動查看最終結(jié)果),生成的迭代器可以包裝在枚舉器.

I advise you to write iterator classes. They are easy to implement (basically they are state machines), they have a low memory footprint, they can be combined to build up increasingly complex expressions (please scroll down to see the final result), and the resulting iterator can be wrapped in an enumerator.

每個迭代器類都有以下方法:

Each iterator class has the following methods:

  • first:初始化狀態(tài)機(第一次匹配)
  • 下一個:進入下一個狀態(tài)(下一場比賽)
  • ok:最初為真,但一旦下一個"超出最后一個匹配項,則變?yōu)榧?/li>
  • get: 返回當前匹配項(作為字符串)
  • 克隆: 克隆對象;對重復至關(guān)重要,因為每個實例都需要自己的狀態(tài)
  • first: initializes the state machine (first match)
  • next: proceeds to the next state (next match)
  • ok: initially true, but becomes false once 'next' proceeds beyond the last match
  • get: returns the current match (as a string)
  • clone: clones the object; essential for repetition, because each instance needs its own state

從最簡單的情況開始:應按字面意思匹配的一個或多個字符序列(例如 /foo/).不用說這只有一個匹配項,所以在第一次調(diào)用 'next' 時,'ok' 將變?yōu)?false.

Start off with the most trivial case: a sequence of one or more characters that should be matched literally (e.g. /foo/). Needless to say this has only one match, so 'ok' will become false upon the first call of 'next'.

function Literal(literal) { this.literal = literal; }

Literal.prototype.first = function() { this.i = 0; };
Literal.prototype.next = function() { this.i++; };
Literal.prototype.ok = function() { return this.i == 0; };
Literal.prototype.get = function() { return this.literal; };
Literal.prototype.clone = function() { return new Literal(this.literal); };

字符類 ([abc]) 也很簡單.構(gòu)造函數(shù)接受一個字符串;如果你更喜歡數(shù)組,這很容易解決.

Character classes ([abc]) are trivial too. The constructor accepts a string of characters; if you prefer arrays, that's easy to fix.

function CharacterClass(chars) { this.chars = chars; }

CharacterClass.prototype.first = function() { this.i = 0; };
CharacterClass.prototype.next = function() { this.i++; };
CharacterClass.prototype.ok = function() { return this.i < this.chars.length; };
CharacterClass.prototype.get = function() { return this.chars.charAt(this.i); };
CharacterClass.prototype.clone = function() { return new CharacterClass(this.chars); };

現(xiàn)在我們需要結(jié)合其他迭代器來形成更復雜的正則表達式的迭代器.序列只是連續(xù)的兩個或多個模式(如 foo[abc]).

Now we need iterators that combine other iterators to form more complex regular expressions. A sequence is just two or more patterns in a row (like foo[abc]).

function Sequence(iterators) {
   if (arguments.length > 0) {
      this.iterators = iterators.length ? iterators : [new Literal('')];
   }
}
Sequence.prototype.first = function() {
   for (var i in this.iterators) this.iterators[i].first();
};
Sequence.prototype.next = function() {
   if (this.ok()) {
      var i = this.iterators.length;
      while (this.iterators[--i].next(), i > 0 && !this.iterators[i].ok()) {
         this.iterators[i].first();
      }
   }
};
Sequence.prototype.ok = function() {
   return this.iterators[0].ok();
};
Sequence.prototype.get = function() {
   var retval = '';
   for (var i in this.iterators) {
      retval += this.iterators[i].get();
   }
   return retval;
};
Sequence.prototype.clone = function() {
   return new Sequence(this.iterators.map(function(it) { return it.clone(); }));
};

另一種組合迭代器的方法是選擇(也稱為替代品),例如foo|bar.

Another way to combine iterators is the choice (a.k.a. alternatives), e.g. foo|bar.

function Choice(iterators) { this.iterators = iterators; }

Choice.prototype.first = function() {
   this.count = 0;
   for (var i in this.iterators) this.iterators[i].first();
};
Choice.prototype.next = function() {
   if (this.ok()) {
      this.iterators[this.count].next();
      while (this.ok() && !this.iterators[this.count].ok()) this.count++;
   }
};
Choice.prototype.ok = function() {
   return this.count < this.iterators.length;
};
Choice.prototype.get = function() {
   return this.iterators[this.count].get();
};
Choice.prototype.clone = function() {
   return new Choice(this.iterators.map(function(it) { return it.clone(); }));
};

其他正則表達式功能可以通過組合現(xiàn)有的類來實現(xiàn).類繼承是一個很好的方法來做到這一點.例如,可選模式 (x?) 只是在空字符串和 x 之間進行選擇.

Other regex features can be implemented by combining the existing classes. Class inheritance is a great way to do this. For example, an optional pattern (x?) is just a choice between the empty string and x.

function Optional(iterator) {
   if (arguments.length > 0) {
      Choice.call(this, [new Literal(''), iterator]);
   }
}
Optional.prototype = new Choice();

重復 (x{n,m}) 是序列和可選的組合.因為我必須繼承其中一個,所以我的實現(xiàn)由兩個相互依賴的類組成.

Repetition (x{n,m}) is a combination of Sequence and Optional. Because I have to inherit one or the other, my implementation consists of two mutually dependent classes.

function RepeatFromZero(maxTimes, iterator) {
   if (arguments.length > 0) {
      Optional.call(this, new Repeat(1, maxTimes, iterator));
   }
}
RepeatFromZero.prototype = new Optional();

function Repeat(minTimes, maxTimes, iterator) {
   if (arguments.length > 0) {
      var sequence = [];
      for (var i = 0; i < minTimes; i++) {
         sequence.push(iterator.clone());   // need to clone the iterator
      }
      if (minTimes < maxTimes) {
         sequence.push(new RepeatFromZero(maxTimes - minTimes, iterator));
      }
      Sequence.call(this, sequence);
   }
}
Repeat.prototype = new Sequence();

正如我之前所說,迭代器可以包裝到枚舉器中.這只是一個循環(huán),您可以隨時中斷.

As I said earlier, an iterator can be wrapped into an enumerator. This is simply a loop that you can break whenever you want.

function Enumerator(iterator) {
   this.iterator = iterator;

   this.each = function(callback) {
      for (this.iterator.first(); this.iterator.ok(); this.iterator.next()) {
         callback(this.iterator.get());
      }
   };
}

是時候把它們放在一起了.讓我們用一些愚蠢的正則表達式:

Time to put it all together. Let's take some silly regular expression:

([ab]{2}){1,2}|[cd](f|ef{0,2}e)

組合迭代器對象非常簡單:

Composing the iterator object is really straightforward:

function GetIterationsAsHtml() {

   var iterator = new Choice([
      new Repeat(1, 2,
         new Repeat(2, 2, new CharacterClass('ab'))),
      new Sequence([
         new CharacterClass('cd'),
         new Choice([
            new Literal('f'),
            new Sequence([
               new Literal('e'),
               new RepeatFromZero(2, new Literal('f')),
               new Literal('e')
            ])
         ])
      ])
   ]);

   var iterations = '<ol>
';
   var enumerator = new Enumerator(iterator);
   enumerator.each(function(iteration) { iterations += '<li>' + iteration + '</li>
'; });
   return iterations + '</ol>';
}

這會產(chǎn)生 28 個匹配項,但我會省去你的輸出.

This yields 28 matches, but I will spare you the output.

如果我的代碼不符合軟件模式、不兼容瀏覽器(在 Chrome 和 Firefox 上運行良好)或 OOP 不佳,我深表歉意.我只是希望它能讓這個概念變得清晰.

My apologies if my code is not compliant to software patterns, is not browser-compatible (works OK on Chrome and Firefox) or suffers from poor OOP. I just hope it makes the concept clear.

為了完整起見,在 OP 的倡議下,我又實現(xiàn)了一個迭代器類:reference.

for completeness, and following OP's initiative, I implemented one more iterator class: the reference.

引用(1 2 等)獲取較早捕獲組的當前匹配項(即括號中的任何內(nèi)容).它的實現(xiàn)與 Literal 非常相似,因為它只有一個匹配項.

A reference (1 2 etc) picks up the current match of an earlier capturing group (i.e. anything in parentheses). Its implementation is very similar to Literal, in that it has exactly one match.

function Reference(iterator) { this.iterator = iterator; }

Reference.prototype.first = function() { this.i = 0; };
Reference.prototype.next  = function() { this.i++; };
Reference.prototype.ok    = function() { return this.i == 0; };
Reference.prototype.get   = function() { return this.iterator.get(); };
Reference.prototype.clone = function() { return new Reference(this.iterator); };

構(gòu)造函數(shù)被賦予一個迭代器,它代表被引用的子模式.以 (foo|bar)([xy])21 為例(產(chǎn)生 fooxxfoo, fooyyfoo, barxxbar, baryybar):

The constructor is given an iterator that represents the referenced subpattern. Taking (foo|bar)([xy])21 as an example (yields fooxxfoo, fooyyfoo, barxxbar, baryybar):

var groups = new Array();

var iterator = new Sequence([
   groups[1] = new Choice([new Literal('foo'), new Literal('bar')]),
   groups[2] = new CharacterClass('xy'),
   new Reference(groups[2]),
   new Reference(groups[1])
]);

在構(gòu)建迭代器類樹時指定捕獲組.我仍然在這里手動執(zhí)行此操作,但最終您希望將其自動化.這只是將您的解析樹映射到類似的迭代器類樹的問題.

Capturing groups are specified as you build up the tree of iterator classes. I am still doing that manually here, but eventually you want this to be automated. That is just a matter of mapping your parse tree to a similar tree of iterator classes.

編輯 2: 這是一個相對簡單的遞歸函數(shù),它將轉(zhuǎn)換由 ret.js 變成一個迭代器.

EDIT 2: here's a relatively simple recursive function that will convert a parse tree produced by ret.js into an iterator.

function ParseTreeMapper() {
    this.capturingGroups = [];
}
ParseTreeMapper.prototype.mapToIterator = function(parseTree) {
    switch (parseTree.type) {
        case ret.types.ROOT:
        case ret.types.GROUP:
            var me = this;
            var mapToSequence = function(parseTrees) {
                return new Sequence(parseTrees.map(function(t) {
                    return me.mapToIterator(t);
                }));
            };
            var group = parseTree.options ?
                new Choice(parseTree.options.map(mapToSequence)) : 
                mapToSequence(parseTree.stack);
            if (parseTree.remember) {
                this.capturingGroups.push(group);
            }
            return group;
        case ret.types.SET:
            return new CharacterClass(this.mapToCharacterClass(parseTree.set));
        case ret.types.REPETITION:
            return new Repeat(parseInt(parseTree.min), parseInt(parseTree.max), this.mapToIterator(parseTree.value));
        case ret.types.REFERENCE:
            var ref = parseInt(parseTree.value) - 1;
            return ref in this.capturingGroups ?
                new Reference(this.capturingGroups[ref]) :
                new Literal('<ReferenceOutOfRange>');
        case ret.types.CHAR:
            return new Literal(String.fromCharCode(parseTree.value));
        default:
            return new Literal('<UnsupportedType>');
    }
};
ParseTreeMapper.prototype.mapToCharacterClass = function(parseTrees) {
    var chars = '';
    for (var i in parseTrees) {
        var tree = parseTrees[i];
        switch (tree.type) {
            case ret.types.CHAR:
                chars += String.fromCharCode(tree.value);
                break;
            case ret.types.RANGE:
                for (var code = tree.from; code <= tree.to; code++) {
                    chars += String.fromCharCode(code);
                }
                break;
        }
    }
    return chars;
};

用法:

var regex = 'b[a-n]{3}';
var parseTree = ret(regex);    // requires ret.js
var iterator = new ParseTreeMapper().mapToIterator(parseTree);

我在這個演示中將所有組件放在一起:http://jsfiddle.net/Pmnwk/3/

I put all components together in this demo: http://jsfiddle.net/Pmnwk/3/

注意:不支持許多正則表達式語法結(jié)構(gòu)(錨、前瞻、后視、遞歸),但我想它已經(jīng)與 invRegex.py 相當.

Note: many regex syntax constructs are not supported (anchors, look-ahead, look-behind, recursion), but I guess it is already pretty much up to par with invRegex.py.

這篇關(guān)于將 invRegex.py 移植到 Javascript (Node.js)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

discord.js v12: How do I await for messages in a DM channel?(discord.js v12:我如何等待 DM 頻道中的消息?)
how to make my bot mention the person who gave that bot command(如何讓我的機器人提及發(fā)出該機器人命令的人)
How to fix Must use import to load ES Module discord.js(如何修復必須使用導入來加載 ES 模塊 discord.js)
How to list all members from a specific server?(如何列出來自特定服務器的所有成員?)
Discord bot: Fix ‘FFMPEG not found’(Discord bot:修復“找不到 FFMPEG)
Welcome message when joining discord Server using discord.js(使用 discord.js 加入 discord 服務器時的歡迎消息)
主站蜘蛛池模板: 黄色一级大片在线免费看国产一 | 黄色大片在线免费观看 | 国产激情网 | 午夜视频免费观看 | 五月开心激情网 | 五月婷婷色 | 欧美日韩在线视频观看 | 国产一区二区三区免费视频 | 欧美精品在线播放 | 国产精品美女视频 | 亚洲精品一二 | 日韩欧美一区二区在线 | 亚洲综合激情网 | 天天天天天操 | 国产91清纯白嫩初高中在线观看 | 爱爱短视频 | www.成人网| 国产成人精品免费 | 成人av一区 | 中文字幕在线观看日本 | 成人午夜又粗又硬又大 | 国产成人区 | 性生活视频网站 | 亚洲一级黄色片 | 国产精品不卡 | 日韩在线高清 | 中文字幕在线观看日本 | 日韩国产在线播放 | 手机av片 | 超碰成人av| 在线播放av网站 | 麻豆一区二区三区 | aaa亚洲| 午夜av片 | 免费毛片在线播放免费 | 四虎成人在线 | 日韩精品大片 | 在线看日韩 | 日本人の夫妇交换 | 理论片中文字幕 | 在线黄网|