1 回答

TA贡献1111条经验 获得超0个赞
正如您所说,该块限制在#和$之间。我们可以将文本视为这些符号之间的数字序列。使用 finditer 指向块限制。
import re
starts =[]
starts = [s.start() for s in re.finditer('###',text)]
# [0, 105, 349]
ends = []
ends = [e.end() for e in re.finditer(re.escape('$$$'),text)] #special char $
# [104, 348, 558]
blocks = []
blocks = list(starts+ends)
blocks.sort()
#sequence of blocks
nBlocks = [blocks[i:i+2] for i in range(0, len(blocks), 2)]
#[[0, 104], [105, 348], [349, 558]]
#find where the input text belongs
for i in text:
find = '22 feb 2017 21 april'
where = text.index(find)
# 10
#removing block elements
for n in range(len(nBlocks)):
if where in range(nBlocks[n][0],nBlocks[n][1]):
for x in range(nBlocks[n][0],nBlocks[n][1]+1):
#text starts #text ends
cleanText = text[0:nBlocks[n][0]]+text[nBlocks[n][1]+1::]
print(cleanText)
###
risk true stories people never thought they d dare share
risk true stories people never
true stories people never thought
stories people never thought they
people never thought they d
never thought they d dare
thought they d dare share
$$$
###
everyone hanging out without me mindy kaling non fiction
everyone hanging out without me
hanging out without me mindy
out without me mindy kaling
without me mindy kaling non
me mindy kaling non fiction
$$$
添加回答
举报