正则心得
}
</script>
<script language=javascript>
ati('#', 'http://HdhCmsTestcnblogs测试数据/UpLoadFile/Product/20101010162153846.jpg', '加厚青色围脖');
要匹配换行符
使用如下
<script language=javascript>\s\sati
这里使用\s两个应该是匹配\r\n的源因
也可以使用\s*?来获得更通用的效果
<tr><td><a href='(?P<link>/Product/Detail_\d*.html)'[\s\S]*?><img src='(?P<img>[^']*)' width='130' height='130'
过份依赖[\s\S]*会造成回溯引用,使程序死住,上面是我改进过的程序,之前程序就一直挂着,原先那个都用[\s\S]*?的我没有保存,建议使用[^']*这样的进行替代
使用
<div class="goodsItem">[\s\s]*?<a href="(?P<link>[^"]*?)" target="_blank"><img src="(?P<img>[^"]*?)"
而不是
<div class="goodsItem">[\s\s]*?<a href="(?P<link>[\s\S]*?)" target="_blank"><img src="(?P<img>[\s\S]*?)"
re. finditer ( pattern , string [ , flags ] ) ?
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string . The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.
7.2.6.9. Raw String Notation ?
Raw string notation ( r"text" ) keeps regular expressions sane. Without it, every backslash ( '\' ) in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:
>>> re . match ( r"\W(.)\1\W" , " ff " ) <_sre.SRE_Match object at ...> >>> re . match ( " \\ W(.) \\ 1 \\ W" , " ff " ) <_sre.SRE_Match object at ...>
When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\" . Without raw string notation, one must use "\\\\" , making the following lines of code functionally identical:
>>> re . match ( r" \\ " , r" \\ " ) <_sre.SRE_Match object at ...> >>> re . match ( " \\\\ " , r" \\ " ) <_sre.SRE_Match object at ...>
20101015更新
对于诸如
<div class =" listPic "> <a href =" /?mod=goods&do=display&id=2032&sid=f11ee838a106889a37abf4e9227a03fe " target =" _blank "> <img src =' /upload/photobase/2010-09/100924112121_s.jpg ' border =" 0 " title =" 新款 银色小雏菊三叶草满钻白色珍珠开口戒指 " /> </a>
的匹配,我们可以使用如下的回溯引用来达到前后一致匹配的效果,这里还要注意,以括号命名的就是名组,只不过类似link,img是named group,另一种(‘|”)未显式的标识出来,但都占用数字位从1开始,因此, 1 2 这个不占 3 4
<div class="listPic"><a[\s\S]*?href=("|')(?P<link>[^"]*?)\1[\s\S]*?<img[\s\S]*?src=("|')(?P<img>[^"]*?)\3