首页猿问如何计算从文件中读取的字符串中的单词？

如何计算从文件中读取的字符串中的单词？

Python

Smart猫小萌 2022-06-14 17:49:47

我正在尝试创建一个程序，它采用给定路径中的所有文本文件并将所有字符串保存在一个列表中：import osimport collectionsvocab = set()path = 'a\\path\\'listing = os.listdir(path)unwanted_chars = ".,-_/()*"vocab={}for file in listing: #print('Current file : ', file) pos_review = open(path+file, "r", encoding ='utf8') words = pos_review.read().split() #print(type(words)) vocab.update(words)pos_review.close()print(vocab)pos_dict = dict.fromkeys(vocab,0)print(pos_dict)输入file1.txt: A quick brown fox.file2.txt: a quick boy ran.file3.txt: fox ran away.输出A : 2quick : 2brown : 1fox : 2boy : 1ran : 2away : 1到目前为止，我能够制作这些字符串的字典。但现在不知道如何在所有文本文件中组合键、值对字符串及其频率。

查看完整描述

3 回答

小唯快跑啊

TA贡献1863条经验获得超2个赞

希望这可以帮助，

import os

import collections

vocab = set()

path = 'a\\path\\'

listing = os.listdir(path)

unwanted_chars = ".,-_/()*"

vocab={}

whole=[]

for file in listing:

#print('Current file : ', file)

pos_review = open(path+file, "r", encoding ='utf8')

words = pos_review.read().split()

whole.extend(words)

pos_review.close()

print(vocab)

d={} #Creating an Empty dictionary

for item in whole:

if item in d.keys():

d[item]+=1 #Update count

else:

d[item]=1

print(d)

反对回复 2022-06-14

幕布斯7119047

TA贡献1794条经验获得超8个赞

这也有效

import pandas as pd

import glob.glob

files = glob.glob('test*.txt')

txts = []

for f in files:

with open (f,'r') as t: txt = t.read()

txts.append(txt)

texts=' '.join(txts)

df = pd.DataFrame({'words':texts.split()})

out = df.words.value_counts().to_dict()

反对回复 2022-06-14

蛊毒传说

TA贡献1895条经验获得超3个赞

使用collections.Counter：

Counter是dict计算可迭代的子类

数据

给定 3 个文件，命名为t1.txt, t2.txt&t3.txt

每个文件包含以下 3 行文本

file1 txt A quick brown fox.

file2 txt a quick boy ran.

file3 txt fox ran away.

代码：

获取文件：

路径库

from pathlib import Path

files = list(Path('e:/PythonProjects/stack_overflow/t-files').glob('t*.txt'))

print(files)

# Output

[WindowsPath('e:/PythonProjects/stack_overflow/t-files/t1.txt'),

WindowsPath('e:/PythonProjects/stack_overflow/t-files/t2.txt'),

WindowsPath('e:/PythonProjects/stack_overflow/t-files/t3.txt')]

收集并计算单词：

创建一个单独的函数 ,clean_str用于清理每一行文本

str.lower对于小写字母

str.translate, str.maketrans&string.punctuation用于高度优化的标点删除

从最好的方式从字符串中去除标点符号

from collections import Counter

import string

def clean_string(value: str) -> list:

value = value.lower()

value = value.translate(str.maketrans('', '', string.punctuation))

value = value.split()

return value

words = Counter()

for file in files:

with file.open('r') as f:

lines = f.readlines()

for line in lines:

line = clean_string(line)

words.update(line)

print(words)

# Output

Counter({'file1': 3,

'txt': 9,

'a': 6,

'quick': 6,

'brown': 3,

'fox': 6,

'file2': 3,

'boy': 3,

'ran': 6,

'file3': 3,

'away': 3})

名单words：

list_words = list(words.keys())

print(list_words)

>>> ['file1', 'txt', 'a', 'quick', 'brown', 'fox', 'file2', 'boy', 'ran', 'file3', 'away']

反对回复 2022-06-14

3 回答
0 关注
211 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何计算从文件中读取的字符串中的单词？

如何计算从文件中读取的字符串中的单词？

3 回答

添加回答