C# 中正则表达式 Group 分组-白红宇

C# 中正则表达式 Group 分组

阅读量：5965 次

发布时间：2019-06-19

本文共 2395 字，大约阅读时间需要 7 分钟。

在一个正则表达式中，如果要提取出多个不同的部分（子表达式项），需要用到分组功能。

在 C# 正则表达式中，Regex 成员关系如下，其中 Group 是其分组处理类。

Regex –> MatcheCollection (匹配项集合)

          –> Match (单匹配项内容)

                –> GroupCollection (单匹配项中包含的 "(分组/子表达式项)" 集合)

                      –> Group ( "(分组/子表达式项)" 内容)

                            –> CaputerCollection (分组项内容显示基础？)

                                  –> Caputer

Group 对分组有两种访问方式：

1、数组下标访问

在 ((\d+)([a-z]))\s+ 这个正则表达式里总共包含了四个分组，按照默认的从左到右的匹配方式，

Groups[0]    代表了匹配项本身，也就是整个整个表达式 ((\d+)([a-z]))\s+

Groups[1]    代表了子表达式项 ((\d+)([a-z]))

Groups[2]    代表了子表达式项 (\d+)

Groups[3]    代表了子表达式项 ([a-z])

                         
string
text = 
"1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080"
;
Response.Write(text + 
"<br/>"
);
string
strPatten = 
@"((\d+)([a-z]))\s+"
;
Regex rex = 
new
Regex(strPatten, RegexOptions.IgnoreCase);
MatchCollection matches = rex.Matches(text);
//提取匹配项
foreach
(Match match 
in
matches)
{ 
    
GroupCollection groups = match.Groups;
    
Response.Write(
string
.Format(
"<br/>{0} 共有 {1} 个分组：{2}<br/>"
                                
, match.Value, groups.Count, strPatten));
    
//提取匹配项内的分组信息
    
for
(
int
i = 0; i < groups.Count; i++)
    
{ 
        
Response.Write(
            
string
.Format(
"分组 {0} 为 {1}，位置为 {2}，长度为 {3}<br/>"
                        
, i
                        
, groups[i].Value
                        
, groups[i].Index
                        
, groups[i].Length));
    
}
}
/* 
 
* 输出：
 
1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080
1A 共有 4 个分组：((\d+)([a-z]))\s+
分组 0 为 1A ，位置为 0，长度为 3
分组 1 为 1A，位置为 0，长度为 2
分组 2 为 1，位置为 0，长度为 1
分组 3 为 A，位置为 1，长度为 1
 
 
....
 
 
*/
        
      

2、命名访问

利用 (?<xxx>子表达式) 定义分组别名，这样就可以利用 Groups["xxx"] 进行访问分组/子表达式内容。

                         
string
text = 
"I've found this amazing URL at , and then find "
;
Response.Write(text + 
"<br/>"
);
string
pattern = 
@"\b(?<protocol>\S+)://(?<address>\S+)\b"
;
Response.Write(pattern.Replace(
"<"
, 
"&lt;"
).Replace(
">"
,
"&gt;"
) + 
"<br/><br/>"
);
MatchCollection matches = Regex.Matches(text, pattern);
foreach
(Match match 
in
matches)
{ 
    
GroupCollection groups = match.Groups;
    
Response.Write(
string
.Format(
                    
"URL: {0}； Protocol: {1}； Address: {2} <br/>"
                    
, match.Value
                    
, groups[
"protocol"
].Value 
                    
, groups[
"address"
].Value));
}
/* 
 
* 输出
 
I've found this amazing URL at , and then find 
    
\b(?<protocol>\S+)://(?<address>\S+)\b
    
URL: ； Protocol: http； Address: www.sohu.com 
    
URL: ； Protocol: ftp； Address: ftp.sohu.comisbetter 
 
*/