問題描述
在我的 Python 應用程序中,我需要編寫一個正則表達式來匹配以分號 (;
).例如,它應該匹配:
In my Python application, I need to write a regular expression that matches a C++ for
or while
loop that has been terminated with a semi-colon (;
). For example, it should match this:
for (int i = 0; i < 10; i++);
...但不是這個:
for (int i = 0; i < 10; i++)
這乍一看似乎微不足道,直到您意識到左括號和右括號之間的文本可能包含其他括號,例如:
This looks trivial at first glance, until you realise that the text between the opening and closing parenthesis may contain other parenthesis, for example:
for (int i = funcA(); i < funcB(); i++);
我正在使用 python.re 模塊.現在我的正則表達式看起來像這樣(我已經留下了我的評論,所以你可以更容易地理解它):
I'm using the python.re module. Right now my regular expression looks like this (I've left my comments in so you can understand it easier):
# match any line that begins with a "for" or "while" statement:
^s*(for|while)s*
( # match the initial opening parenthesis
# Now make a named group 'balanced' which matches a balanced substring.
(?P<balanced>
# A balanced substring is either something that is not a parenthesis:
[^()]
| # …or a parenthesised string:
( # A parenthesised string begins with an opening parenthesis
(?P=balanced)* # …followed by a sequence of balanced substrings
) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
) # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
s*;s*
這對于上述所有情況都非常有效,但是一旦您嘗試使 for 循環的第三部分包含一個函數,它就會中斷,如下所示:
This works perfectly for all the above cases, but it breaks as soon as you try and make the third part of the for loop contain a function, like so:
for (int i = 0; i < 10; doSomethingTo(i));
我認為它會中斷,因為只要在左括號和右括號之間放置一些文本,平衡"組就會匹配包含的文本,因此 (?P=balanced)
部分不會不再起作用,因為它不會匹配(因為括號內的文本不同).
I think it breaks because as soon as you put some text between the opening and closing parenthesis, the "balanced" group matches that contained text, and thus the (?P=balanced)
part doesn't work any more since it won't match (due to the fact that the text inside the parenthesis is different).
在我的 Python 代碼中,我使用了 VERBOSE 和 MULTILINE 標志,并像這樣創建正則表達式:
In my Python code I'm using the VERBOSE and MULTILINE flags, and creating the regular expression like so:
REGEX_STR = r"""# match any line that begins with a "for" or "while" statement:
^s*(for|while)s*
( # match the initial opening parenthesis
# Now make a named group 'balanced' which matches
# a balanced substring.
(?P<balanced>
# A balanced substring is either something that is not a parenthesis:
[^()]
| # …or a parenthesised string:
( # A parenthesised string begins with an opening parenthesis
(?P=balanced)* # …followed by a sequence of balanced substrings
) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
) # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
s*;s*"""
REGEX_OBJ = re.compile(REGEX_STR, re.MULTILINE| re.VERBOSE)
誰能建議改進這個正則表達式?太復雜了,我無法理解.
Can anyone suggest an improvement to this regular expression? It's getting too complicated for me to get my head around.
推薦答案
您可以編寫一個非常簡單的小程序來完成它,而無需使用正則表達式:
You could write a little, very simple routine that does it, without using a regular expression:
- 設置一個位置計數器
pos
,使其指向for
或while
之后的左括號之前. - 將左括號計數器
openBr
設置為0
. - 現在繼續增加
pos
,讀取相應位置的字符,當看到左括號時增加openBr
,看到右括號時減少它.這將在開始時增加一次,對于for (
"中的第一個左括號,對中間的一些括號增加和減少一些,并將其設置回0
當您的for
括號關閉時. - 所以,當
openBr
再次為0
時停止.
- Set a position counter
pos
so that is points to just before the opening bracket after yourfor
orwhile
. - Set an open brackets counter
openBr
to0
. - Now keep incrementing
pos
, reading the characters at the respective positions, and incrementopenBr
when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (
", increment and decrement some more for some brackets in between, and set it back to0
when yourfor
bracket closes. - So, stop when
openBr
is0
again.
停止位置是 for(...)
的右括號.現在你可以檢查后面是否有分號.
The stopping positon is your closing bracket of for(...)
. Now you can check if there is a semicolon following or not.
這篇關于用于檢測 & 分號終止的 C++ 的正則表達式while 循環的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!