首页猿问数据生成Python

数据生成Python

Python

阿晨1998 2023-03-08 11:19:11

我正在尝试基于现有数据集生成数据集，我能够实现一种随机更改文件内容的方法，但我无法将所有这些写入文件。此外，我还需要将变化的单词数写入文件，因为我想用这个数据集来训练神经网络，你能帮帮我吗？输入：每个文件有 2 行文本。输出：有 3（可能）行的文件：第一行不变，第二行根据方法更改，第三行显示更改的单词数（如果对于深度学习任务最好不这样做，我会很高兴建议，因为我是初学者）from random import randrangeimport osPath = "D:\corrected data\\"filelist = os.listdir(Path)if __name__ == "__main__": new_words = ['consultable', 'partie ', 'celle ', 'également ', 'forte ', 'statistiques ', 'langue ', 'cadeaux', 'publications ', 'notre', 'nous', 'pour', 'suivr', 'les', 'vos', 'visitez ', 'thème ', 'thème ', 'thème ', 'produits', 'coulisses ', 'un ', 'atelier ', 'concevoir ', 'personnalisés ', 'consultable', 'découvrir ', 'fournit ', 'trace ', 'dire ', 'tableau', 'décrire', 'grande ', 'feuille ', 'noter ', 'correspondant', 'propre',] nb_words_to_replace = randrange(10) #with open("1.txt") as file: for i in filelist: # if i.endswith(".txt"): with open(Path + i,"r",encoding="utf-8") as file: # for line in file: data = file.readlines() first_line = data[0] second_line = data[1] print(f"Original: {second_line}") # print(f"FIle: {file}") second_line_array = second_line.split(" ") for j in range(nb_words_to_replace): replacement_position = randrange(len(second_line_array)) old_word = second_line_array[replacement_position] new_word = new_words[randrange(len(new_words))] print(f"Position {replacement_position} : {old_word} -> {new_word}") second_line_array[replacement_position] = new_word res = " ".join(second_line_array) print(f"Result: {res}") with open(Path + i,"w") as f: for line in file: if line == second_line: f.write(res)

查看完整描述

1 回答

凤凰求蛊

TA贡献1825条经验获得超4个赞

简而言之，您有两个问题：

如何正确替换文件的第 2（和 3）行。
如何跟踪更改的单词数。

如何正确替换文件的第 2（和 3）行。

你的代码：

with open(Path + i,"w") as f:

for line in file:

if line == second_line:

f.write(res)

未启用阅读。for line in file不管用。f已定义，但file改为使用。要解决此问题，请改为执行以下操作：

with open(Path + i,"r+") as file:

lines = file.read().splitlines() # splitlines() removes the \n characters

lines[1] = second_line

file.writelines(lines)

但是，您想向其中添加更多行。我建议你以不同的方式构建逻辑。

如何跟踪更改的单词数。

添加变量changed_words_count并在old_word != new_word

结果代码：

for i in filelist:

filepath = Path + i

# The lines that will be replacing the file

new_lines = [""] * 3

with open(filepath, "r", encoding="utf-8") as file:

data = file.readlines()

first_line = data[0]

second_line = data[1]

second_line_array = second_line.split(" ")

changed_words_count = 0

for j in range(nb_words_to_replace):

replacement_position = randrange(len(second_line_array))

old_word = second_line_array[replacement_position]

new_word = new_words[randrange(len(new_words))]

# A word replaced does not mean the word has changed.

# It could be replacing itself.

# Check if the replacing word is different

if old_word != new_word:

changed_words_count += 1

second_line_array[replacement_position] = new_word

# Add the lines to the new file lines

new_lines[0] = first_line

new_lines[1] = " ".join(second_line_array)

new_lines[2] = str(changed_words_count)

print(f"Result: {new_lines[1]}")

with open(filepath, "w") as file:

file.writelines(new_lines)

注意：代码未经测试。

反对回复 2023-03-08

1 回答
0 关注
131 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

数据生成Python

数据生成Python

1 回答

如何正确替换文件的第 2（和 3）行。

添加回答