首页猿问我的 for 循环与 yield...

我的 for 循环与 yield 相结合的问题

Python

慕丝7291255 2021-12-09 18:27:45

我有一个连接由星号分隔的单词的程序。该程序删除星号并将单词的第一部分（星号之前的部分）与其第二部分（星号之后的部分）连接起来。除了一个主要问题外，它运行良好：第二部分（星号之后）仍在输出中。例如，程序连接了 ['presi', '*', 'dent']，但 'dent' 仍在输出中。我没有弄清楚我的代码哪里有问题。代码如下：from collections import defaultdictimport nltkfrom nltk.tokenize import word_tokenizeimport reimport osimport sysfrom pathlib import Pathdef main(): while True: try: file_to_open =Path(input("\nPlease, insert your file path: ")) with open(file_to_open) as f: words = word_tokenize(f.read().lower()) break except FileNotFoundError: print("\nFile not found. Better try again") except IsADirectoryError: print("\nIncorrect Directory path.Try again") word_separator = '*' with open ('Fr-dictionary2.txt') as fr: dic = word_tokenize(fr.read().lower()) def join_asterisk(ary): for w1, w2, w3 in zip(words, words[1:], words[2:]): if w2 == word_separator: word = w1 + w3 yield (word, word in dic) elif w1 != word_separator and w1 in dic: yield (w1, True) correct_words = [] incorrect_words = [] correct_words = [w for w, correct in join_asterisk(words) if correct] incorrect_words = [w for w, correct in join_asterisk(words) if not correct] text=' '.join(correct_words)我想知道是否有人可以帮我检测这里的错误？输入示例：共和国总统*的承诺也是铁路公司领导人的承诺，他争论Elysee Palace的Grand-Est会议上的各种官员。2017 年 7 月 1 日，共和国总统埃马纽埃尔·马克龙（右）与法国国营铁路公司的老板纪尧姆·佩皮在巴黎蒙帕纳斯车站。GEOFFROY VAN DER HASSELT / 法新社SNCF 的用户有时会因火车取消或服务中断而感到恼火，这似乎也影响了共和国总统。作为大辩论的一部分，埃马纽埃尔·马克龙 (Emmanuel Macron) 于 2 月 26 日星期二在爱丽舍宫 (Elysee Palace) 的民选官员面前，在 12 月 23 日关闭了 Saint-Dié - Epinal 线路的 SNCF 发表了非常严厉的言论， 2018 年，而国家元首在 2018 年 4 月在孚日进行的迁移期间承诺，它将继续运营。

查看完整描述

2 回答

慕的地8271018

TA贡献1796条经验获得超4个赞

这两个额外的词（我假设）都在您的字典中，因此在 for 循环的 2 次迭代后第二次产生，因为它们在行中遇到这种情况w1：

elif w1 != word_separator and w1 in dic:

yield (w1, True)

重新设计你的join_asterisk函数似乎是最好的方法，因为任何试图修改这个函数来跳过这些的尝试都是非常笨拙的。

以下是重新设计函数的一种方法，以便您可以跳过已包含在由“*”分隔的单词的后半部分的单词：

incorrect_words = []

def join_asterisk(array):

ary = array + ['', '']

i, size = 0, len(ary)

while i < size - 2:

if ary[i+1] == word_separator:

if ary[i] + ary[i+2] in dic:

yield ary[i] + ary[i+2]

else:

incorrect_words.append(ary[i] + ary[i+2])

i+=2

elif ary[i] in dic:

yield ary[i]

i+=1

如果您希望它更接近您的原始功能，可以将其修改为：

def join_asterisk(array):

ary = array + ['', '']

i, size = 0, len(ary)

while i < size - 2:

if ary[i+1] == word_separator:

concat_word = ary[i] + ary[i+2]

yield (concat_word, concat_word in dic)

i+=2

else:

yield (ary[i], ary[i] in dic)

i+=1

反对回复 2021-12-09

撒科打诨

TA贡献1934条经验获得超2个赞

我认为这种替代实现join_asterisk符合您的意图：

def join_asterisk(words, word_separator):

if not words:

return

# Whether the previous word was a separator

prev_sep = (words[0] == word_separator)

# Next word to yield

current = words[0] if not prev_sep else ''

# Iterate words

for word in words[1:]:

# Skip separator

if word == word_separator:

prev_sep = True

else:

# If neither this or the previous were separators

if not prev_sep:

# Yield current word and clear

yield current

current = ''

# Add word to current

current += word

prev_sep = False

# Yield last word if list did not finish with a separator

if not prev_sep:

yield current

words = ['les', 'engagements', 'du', 'prési', '*', 'dent', 'de', 'la', 'républi', '*', 'que', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']

word_separator = '*'

print(list(join_asterisk(words, word_separator)))

# ['les', 'engagements', 'du', 'président', 'de', 'la', 'république', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']

反对回复 2021-12-09

2 回答
0 关注
366 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

我的 for 循环与 yield 相结合的问题

我的 for 循环与 yield 相结合的问题

2 回答

添加回答