首页猿问 Py3 Pandas...

Py3 Pandas read_csv 将项目插入到字典中

Python

宝慕林4294392 2021-12-26 14:43:38

我正在尝试做一些简单的事情，但不知道如何从数据框中读取实际行。我想在每个字符串上运行一些正则表达式。.csv 文件没有标题，它只是一列充满一堆字符串的内容。csv_data = pd.read_csv('list.csv', sep=',', header=None)pattern = re.compile(r'(.*\/)(?!\/)(.*)', flags=re.DOTALL)url_file = { pattern.findall(row)[0]: pattern.findall(row)[1] for index, row in csv_data.iterrows() }但我只是得到类型错误：预期的字符串或类似字节的对象编辑 1我不认为这是重复的，另一个建议的 SO 问题/解决方案是不同的上下文，并且有标题和多列。编辑 2打印（csv_data.dtypes）0 objectdtype: object打印（csv_data.head（））0 https://...1 https://...2 https://...3 https://...4 https://...编辑 3这样做：for row in csv_data.iterrows(): print(row.dtypes)给出了错误 AttributeError: 'tuple' object has no attribute 'dtypes'因此，内容似乎是元组，因此只需要弄清楚如何从中取出字符串即可。

查看完整描述

3 回答

忽然笑

TA贡献1806条经验获得超5个赞

主要编辑。您是对的：Yoshitha 的解决方案并不理想，因为您特别想要该正则表达式匹配中的两个元素。

然而，Pandas 确实有一个很好的正则表达式处理解决方案来帮助你。像这样的东西要整洁得多：

matches = csv_data.iloc[:,0].str.extract(r'(.*\/)(?!\/)(.*)', expand=True)

然后为了得到你的字典表示，我们可以运行： matches.set_index(0, drop=True).to_dict()[1]

如果输入中的 url 字符串与此正则表达式完全匹配，这可能仍然存在问题。

简单的例子：

l = ['https://example.s3.amazonaws.com/uploads/full/68518-5df5b5e5t5b.jpg', 'test_with_bad_url']

matches = pd.DataFrame(l).iloc[:,0].str.extract(r'(.*\/)(?!\/)(.*)', expand=True)

your_dict = matches.set_index(0, drop=True).to_dict()[1]

print(your_dict)

{'https://example.s3.amazonaws.com/uploads/full/': '68518-5df5b5e5t5b.jpg',

nan: nan}

反对回复 2021-12-26

湖上湖

TA贡献2003条经验获得超2个赞

您可以更好地在此单列上使用 lambda 函数并将正则表达式操作保留在函数中并像这样调用：假设数据是数据框，字符串是列名：

data = pd.read_csv('list.csv', sep=',', header=None)

data.columns = ['string']

data['string'] = data['string'].apply(lambda x:regex_function(x))

反对回复 2021-12-26

小怪兽爱吃肉

TA贡献1852条经验获得超1个赞

或者你可以试试这个代码：

csv_data = pd.read_csv('list.csv', sep=',', header=None, dtype=str)

csv_data = csv_data.fillna("")

pattern = re.compile(r'(.*\/)(?!\/)(.*)', flags=re.DOTALL)

url_file = {

pattern.findall(str(row))[0]:

pattern.findall(str(row))[1]

for index, row in csv_data.iterrows()

}

反对回复 2021-12-26

3 回答
0 关注
201 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Py3 Pandas read_csv 将项目插入到字典中

Py3 Pandas read_csv 将项目插入到字典中

3 回答

添加回答