13. 去除字符串中不需要的字符

要求：

过滤掉用户输入中前后多余的空白字符：

'??nick2008@gmail测试数据??'

过滤Windows下编辑文本中的 \r 、 \n ：

'hello?world\r\n'

去掉文本中的Unicode组合符号（音调）：

'nǐ?hǎo，shì?jiè'

解决方案：

字符串的 strip() 、 lstrip() 、 rstrip() 方法去掉字符串两端的字符；

删除单个固定位置的字符，可以使用 split() 方法先切片后拼接的方式；

字符串的 replace() 方法或正则表达式 re.sub() 删除任意子串（令要替换的内容为 '' ）；

使用 unicodedata.normalize() 方法，可以删除Unicode字符串。

对于 strip() 方法：

字符串的 strip() 方法用于移除字符串头尾指定的字符（默认为空格或换行符）或字符序列，返回一个列表。

注意：该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。

类似的， lstrip() 方法用于截掉字符串左边的空格或指定字符； rstrip() 方法用于截掉字符串右边的空格或指定字符。

对于 replace() 方法：

replace(old,?new,?count)

字符串的 replace() 方法把字符串中的old（旧字符串）替换成new(新字符串)，count表示要替换的最大次数。

对于 unicodedata.normalize() 方法:

unicodedata.normalize(form,?unistr)

返回Unicode字符串unistr的正常形式form。 form的有效值为 NFC 、 NFKC 、 NFD 和 NFKD 。

方案1示例：

s1?=?'??nick2008@gmail测试数据??'s2?=?'+-===nick2008@gmail测试数据===-+'s3?=?'+-=??nick2008@gmail测试数据??=-+'s4?=?'??+-=??nick2008@gmail测试数据??=-+??'print(s1.strip())print(s2.strip('+-='))print(s3.strip('+-=?'))print(s4.strip('+-=?'))nick2008@gmail测试数据??????????????#结果nick2008@gmail测试数据
nick2008@gmail测试数据
nick2008@gmail测试数据

方案2示例：

from?functools?import?reduces?=?'????abc:1234+sbd-????ewq=grw\r\n??'def?my_split(s,?seps):
????res?=?reduce(lambda?t,?sep:?sum(map(lambda?ss:?ss.split(sep),?t),?[]),?seps,?[s])
????return?res

answer?=?''for?i?in?my_split(s,':+-=\r\n?'):
????answer?+=?iprint(answer)abc1234sbdewqgrw????????????????#结果

字符串的 split() 方法指定分隔符对字符串进行切片，自定义函数 my_split() 可将字符串中不需要的任意字符去除。

方案3示例：

使用 replace() 方法：

from?functools?import?reduces?=?'????abc:1234+sbd-????ewq=grw\r\n??'def?my_replace(s,?seps):
????res?=?reduce(lambda?t,?sep:?t.replace(sep,?''),?seps,?s)
????return?res

answer?=?my_replace(s,?'?:+-=\r\n')print(answer)abc1234sbdewqgrw????????????????#结果

使用 replace() 方法比使用 split() 方法简单，这是因为 replace() 方法返回的结果是字符串，而 split() 方法返回的结果是列表。

使用正则表达式 re.sub() ：

import?refrom?functools?import?reduces?=?'????abc:1234+sbd-????ewq=grw\r\n??'def?my_rub(s,?seps):
????res?=?reduce(lambda?t,sep:?re.sub(r'[%s]+'?%?sep,?'',?t),?seps,?s)
????return?res

answer?=?my_rub(s,?'?:+-=\r\n')print(answer)abc1234sbdewqgrw????????????????#结果

方案4示例：

import?unicodedata

s?=?'ní?hǎo,?shì?jiè'answer?=?unicodedata.normalize('NFKD',?s).encode('ascii','ignore')print(str(answer).replace('b',?''))'ni?hao,?shi?jie'???????????????#结果

查看更多关于13. 去除字符串中不需要的字符的详细内容...

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did126759

更新时间：2022-11-28 阅读：66次