首页猿问删除明确定义了块的开头和结尾的句子块

删除明确定义了块的开头和结尾的句子块

Python

千巷猫影 2022-06-22 17:48:52

我在用Python 3.6.8我有一个文本文件，例如-###books 22 feb 2017 21 april 2018books 22 feb 2017 2122 feb 2017 21 aprilfeb 2017 21 april 2018$$$###risk true stories people never thought they d dare sharerisk true stories people nevertrue stories people never thoughtstories people never thought theypeople never thought they dnever thought they d darethought they d dare share$$$###everyone hanging out without me mindy kaling non fictioneveryone hanging out without mehanging out without me mindyout without me mindy kalingwithout me mindy kaling nonme mindy kaling non fiction$$$我们使用 -for line_no, line in enumerate(books): tokens = line.split(" ") output = list(ngrams(tokens, 5)) booksWithNGrams.append("###") #Adding start of block booksWithNGrams.append(books[line_no]) # Adding original line for x in output: # Adding n-grams booksWithNGrams.append(' '.join(x)) booksWithNGrams.append("$$$") # Adding end of block如您所见，一个带有 n-gram 的句子以 . 开头###和结尾$$$。因此，块的开始和结束是明确定义的。给定一个句子，我想删除一个块。例如 - 如果我输入22 feb 2017 21 april，我想删除 -###books 22 feb 2017 21 april 2018books 22 feb 2017 2122 feb 2017 21 aprilfeb 2017 21 april 2018$$$我怎样才能做到这一点？

查看完整描述

1 回答

catspeake

TA贡献1111条经验获得超0个赞

正如您所说，该块限制在#和$之间。我们可以将文本视为这些符号之间的数字序列。使用 finditer 指向块限制。

import re

starts =[]

starts = [s.start() for s in re.finditer('###',text)]

# [0, 105, 349]

ends = []

ends = [e.end() for e in re.finditer(re.escape('$$$'),text)] #special char $

# [104, 348, 558]

blocks = []

blocks = list(starts+ends)

blocks.sort()

#sequence of blocks

nBlocks = [blocks[i:i+2] for i in range(0, len(blocks), 2)]

#[[0, 104], [105, 348], [349, 558]]

#find where the input text belongs

for i in text:

find = '22 feb 2017 21 april'

where = text.index(find)

# 10

#removing block elements

for n in range(len(nBlocks)):

if where in range(nBlocks[n][0],nBlocks[n][1]):

for x in range(nBlocks[n][0],nBlocks[n][1]+1):

#text starts #text ends

cleanText = text[0:nBlocks[n][0]]+text[nBlocks[n][1]+1::]

print(cleanText)

###

risk true stories people never thought they d dare share

risk true stories people never

true stories people never thought

stories people never thought they

people never thought they d

never thought they d dare

thought they d dare share

$$$

###

everyone hanging out without me mindy kaling non fiction

everyone hanging out without me

hanging out without me mindy

out without me mindy kaling

without me mindy kaling non

me mindy kaling non fiction

$$$

反对回复 2022-06-22

1 回答
0 关注
131 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

删除明确定义了块的开头和结尾的句子块

删除明确定义了块的开头和结尾的句子块

1 回答

添加回答