2 回答

TA贡献1788条经验 获得超4个赞
我认为该字段在读取 excel 时会自动解析为浮点数。之后我会更正它:
df['column_name'] = df['column_name'].astype(int)
如果您的列包含空值,则无法转换为整数,因此您需要先填充空值:
df['column_name'] = df['column_name'].fillna(0).astype(int)
然后你可以连接和存储你的方式

TA贡献1816条经验 获得超4个赞
您的问题与 Spark 或 PySpark 无关。它与Pandas相关。
这是因为 Pandas 会自动解释和推断列的数据类型。由于您的列的所有值都是数字,Pandas 会将其视为float数据类型。
为了避免这种情况,pandas.ExcelFile.parse方法接受一个名为 的参数converters,您可以使用它通过以下方式告诉 Pandas 特定的列数据类型:
# if you want one specific column as string
df = pd.concat([filepath_pd.parse(name, converters={'column_name': str}) for name in names])
或者
# if you want all columns as string
# and you have multi sheets and they do not have same columns
# this merge all sheets into one dataframe
def get_converters(excel_file, sheet_name, dt_cols):
cols = excel_file.parse(sheet_name).columns
converters = {col: str for col in cols if col not in dt_cols}
for col in dt_cols:
converters[col] = pd.to_datetime
return converters
df = pd.concat([filepath_pd.parse(name, converters=get_converters(filepath_pd, name, ['date_column'])) for name in names]).reset_index(drop=True)
或者
# if you want all columns as string
# and all your sheets have same columns
cols = filepath_pd.parse().columns
dt_cols = ['date_column']
converters = {col: str for col in cols if col not in dt_cols}
for col in dt_cols:
converters[col] = pd.to_datetime
df = pd.concat([filepath_pd.parse(name, converters=converters) for name in names]).reset_index(drop=True)
添加回答
举报