‘,’ 撒哈拉 </跨度> 和I-15 East’,‘Sands and Wynn South Gate’,‘Silverado Ranch和I-15(西侧)’]
dirMap = {‘N’:‘北’,‘S’:‘南’,‘E’:‘东’,‘W’:‘西’}
dirPattern = re.compile(r’[^]([NSEW锟
此代码中的修改后的正则表达式可以解决问题。这包括处理“W of”,“E”等类似的东西:
import re nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB', 'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15', 'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean', 'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W', 'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran', 'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East', 'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)'] dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'} dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B?(?! of )(?: |$)') print('name\tdirSting\tdirection') for name in nameList: match = dirPattern.search(name) direction = None dirString = None if match: dirString = match.group(1) direction = dirMap.get(dirString) print('> %s\t\t%s\t%s'%(name, dirString, direction))
正则表达式可以理解如下:
(?:^| ) 从字符串的开头或空格开始
(?:^| )
(?<! at ) 之前没有'at'
(?<! at )
(?<! of ) 没有先于'of'
(?<! of )
([NSEW]) 'N','S','E','W'中的任何一个(这将在match.group(1)中)
([NSEW])
B? 可选地后跟'B'(如绑定)
B?
(?! of ) 没有跟着'at'
(?! of )
(?: |$) 以字符串的一端或空格结束
(?: |$)
最终输出是:
博尔德高速公路和美国95 NB北 Boulder Hwy和US 95 SB S South Buffalo和Summerlin N N North 查尔斯顿和I-215 W W West 东部和I-215 S S南部 火烈鸟和NB I-15 N北 S Buffalo和Summerlin S South 火烈鸟和SB I-15 S南 Gibson和I-215 EB E East I-15在3.5英里N的Jean None None I-15 NB S I-215(双)N北 I-15 SB Primm S South 4.3英里N. 拉塞尔S南部的I-15 SB S. 东W S南的I-515 SB I-580在I-80 N E N North I-580在I-80 S W S South I-80在E 4TH St Kietzke Ln无无 I-80 W McCarran以东无 LV大道在I-215 S S South S Buffalo和I-215 W S South S Decatur和I-215 WB S South 撒哈拉和I-15东无无 金沙和永利南门没有 Silverado Ranch和I-15(西侧)无无
博尔德高速公路和美国95 NB北
Boulder Hwy和US 95 SB S South
Buffalo和Summerlin N N North
查尔斯顿和I-215 W W West
东部和I-215 S S南部
火烈鸟和NB I-15 N北
S Buffalo和Summerlin S South
火烈鸟和SB I-15 S南
Gibson和I-215 EB E East
I-15在3.5英里N的Jean None None
I-15 NB S I-215(双)N北
I-15 SB Primm S South 4.3英里N.
拉塞尔S南部的I-15 SB S.
东W S南的I-515 SB
I-580在I-80 N E N North
I-580在I-80 S W S South
I-80在E 4TH St Kietzke Ln无无
I-80 W McCarran以东无
LV大道在I-215 S S South
S Buffalo和I-215 W S South
S Decatur和I-215 WB S South
撒哈拉和I-15东无无
金沙和永利南门没有
Silverado Ranch和I-15(西侧)无无
旁注:我决定不想结束字符串。为此,正则表达式将是:
dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B? (?!of )')