已解决430363个问题，去搜搜看，总会有你想问的

在预处理文本中将标点符号保留为自己的单位

首页猿问在预处理文本中将标点符号保留为自己的单位

在预处理文本中将标点符号保留为自己的单位

Python

天涯尽头无女友 2023-10-06 19:17:28

将句子拆分为其组成词和标点符号的列表的代码是什么？大多数文本预处理程序倾向于删除标点符号。例如，如果我输入："Punctuations to be included as its own unit."期望的输出是：结果 = ['标点符号', 'to', 'be', '包含', 'as', '它', '自己', '单位', '.']非常感谢！

查看完整描述

2 回答

慕村9548890

TA贡献1884条经验获得超4个赞

您可能需要考虑使用自然语言工具包或nltk.

尝试这个：

import nltk

sentence = "Punctuations to be included as its own unit."

tokens = nltk.word_tokenize(sentence)

print(tokens)

输出：['Punctuations', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']

反对回复 2023-10-06

郎朗坤

TA贡献1921条经验获得超9个赞

下面的代码片段可以使用正则表达式来分隔列表中的单词和标点符号。

import string

import re

punctuations = string.punctuation

regularExpression="[\w]+|" + "[" + punctuations + "]"

content="Punctuations to be included as its own unit."

splittedWords_Puncs = re.findall(r""+regularExpression, content)

print(splittedWords_Puncs)

输出：['标点符号', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']

反对回复 2023-10-06

2 回答
0 关注
154 浏览

关注

添加回答

0/150

提交

取消

微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号

热搜

最近搜索清空

在预处理文本中将标点符号保留为自己的单位

在预处理文本中将标点符号保留为自己的单位

2 回答

添加回答