当前位置:网站首页>Python cookbook 3rd note (2.1): using multiple qualifiers to split strings

Python cookbook 3rd note (2.1): using multiple qualifiers to split strings

2020-11-09 23:53:23 Giant ship

Use multiple qualifiers to split strings

problem

You need to split a string into multiple fields , But the separator ( And the space around it ) It's not fixed .

solution

string Object's split() Method is only suitable for very simple string segmentation , It doesn't allow multiple separators or indefinite spaces around them . When you need to cut strings more flexibly , Best use re.split() Method :

>>> line = 'asdf fjdk; afed, fjek,asdf, foo'
>>> import re
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

Discuss

function re.split() It's very practical , Because it allows you to specify multiple regular patterns for the separator . such as , In the example above , The separator can be a comma , Semicolons or spaces , And it's followed by any space . As long as the pattern is found , The entities on either side of the matching separator are returned as elements in the result . The return result is a list of fields , This heel str.split() The return value type is the same .

When you use re.split() Function time , It is important to note whether the regular expression contains a bracket to capture the group . If capture packets are used , Then the matched text will also appear in the result list . such as , Take a look at the results of this code run :

>>> fields = re.split(r'(;j,j\s)\s*', line)
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
>>>

Getting split characters is also useful in some cases . such as , You may want to keep the split string , Used to reconstruct a new output string later :

>>> values = fields[::2]
>>> delimiters = fields[1::2] + ['']
>>> values
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
>>> delimiters
[' ', ';', ',', ',', ',', '']
>>> # Reform the line using the same delimiters
>>> ''.join(v+d for v,d in zip(values, delimiters))
'asdf fjdk;afed,fjek,asdf,foo'
>>>

If you don't want to keep the split string in the result list , But if you still need to use parentheses to group regular expressions , Make sure your group is a non capture group , Form like (?:...) . such as :

>>> re.split(r'(?:,j;j\s)\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
>>>

版权声明
本文为[Giant ship]所创,转载请带上原文链接,感谢