首页猿问如何使用熊猫从字符串中删除小数点

如何使用熊猫从字符串中删除小数点

Python

绝地无双 2021-12-09 15:15:50

我正在读取一个 xls 文件并使用 pyspark 在 databricks 中转换为 csv 文件。我的输入数据在 xls 文件中的字符串格式为 101101114501700。但是在使用 Pandas 将其转换为 CSV 格式并写入 datalake 文件夹后，我的数据显示为 101101114501700.0。我的代码如下。请帮助我为什么我在数据中得到小数部分。for file in os.listdir("/path/to/file"): if file.endswith(".xls"): filepath = os.path.join("/path/to/file",file) filepath_pd = pd.ExcelFile(filepath) names = filepath_pd.sheet_names df = pd.concat([filepath_pd.parse(name) for name in names]) df1 = df.to_csv("/path/to/file"+file.split('.')[0]+".csv", sep=',', encoding='utf-8', index=False) print(time.strftime("%Y%m%d-%H%M%S") + ": XLS files converted to CSV and moved to folder"

查看完整描述

2 回答

尚方宝剑之说

TA贡献1788条经验获得超4个赞

我认为该字段在读取 excel 时会自动解析为浮点数。之后我会更正它：

df['column_name'] = df['column_name'].astype(int)

如果您的列包含空值，则无法转换为整数，因此您需要先填充空值：

df['column_name'] = df['column_name'].fillna(0).astype(int)

然后你可以连接和存储你的方式

反对回复 2021-12-09

繁华开满天机

TA贡献1816条经验获得超4个赞

您的问题与 Spark 或 PySpark 无关。它与Pandas相关。

这是因为 Pandas 会自动解释和推断列的数据类型。由于您的列的所有值都是数字，Pandas 会将其视为float数据类型。

为了避免这种情况，pandas.ExcelFile.parse方法接受一个名为的参数converters，您可以使用它通过以下方式告诉 Pandas 特定的列数据类型：

# if you want one specific column as string

df = pd.concat([filepath_pd.parse(name, converters={'column_name': str}) for name in names])

或者

# if you want all columns as string

# and you have multi sheets and they do not have same columns

# this merge all sheets into one dataframe

def get_converters(excel_file, sheet_name, dt_cols):

cols = excel_file.parse(sheet_name).columns

converters = {col: str for col in cols if col not in dt_cols}

for col in dt_cols:

converters[col] = pd.to_datetime

return converters

df = pd.concat([filepath_pd.parse(name, converters=get_converters(filepath_pd, name, ['date_column'])) for name in names]).reset_index(drop=True)

或者

# if you want all columns as string

# and all your sheets have same columns

cols = filepath_pd.parse().columns

dt_cols = ['date_column']

converters = {col: str for col in cols if col not in dt_cols}

for col in dt_cols:

converters[col] = pd.to_datetime

df = pd.concat([filepath_pd.parse(name, converters=converters) for name in names]).reset_index(drop=True)

反对回复 2021-12-09

2 回答
0 关注
254 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何使用熊猫从字符串中删除小数点

如何使用熊猫从字符串中删除小数点

2 回答

添加回答