首页猿问在 pandas...

在 pandas DataFrame 列中使用字符串格式

Python

幕布斯7119047 2023-08-22 10:34:12

我有以下数据框（两列都是 str 类型）：+------+-----------------+| year | indicator_short |+------+-----------------+| 2020 | ind_1 || 2019 | ind_2 || 2019 | ind_3 || N/A | ind_4 |+------+-----------------+我想添加新列，其中包含两个现有列的串联，但我希望它们的格式如下：+------+-----------------+--------------------+| year | indicator_short | indicator_full |+------+-----------------+--------------------+| 2020 | ind_1 | Indicator_1 (2020) || 2019 | ind_2 | Indicator_2 (2019) || 2019 | ind_3 | Indicator_3 (2019) || N/A | ind_4 | Indicator_4 (N/A) |+------+-----------------+--------------------+我想到的一件事是使用格式，例如'：df['indicator_full'][df['indicator_short']=='ind_1'] = 'Indicator_1 ({})'.format(df['year'])但它给出了错误的结果。

查看完整描述

3 回答

烙印99

TA贡献1829条经验获得超13个赞

我会使用字符串连接并将字符串列格式化为：

years = '('+df['year'].astype(str).str.replace(r'.0$','')+')'

# years = '('+df['year']+')' if the year col is a string

df['indicator_full '] = ('Indicator_'+df.indicator_short.str.rsplit('_').str[-1]) \

.str.cat(years, sep=' ')

print(df)

year indicator_short indicator_full

0 2020.0 ind_1 Indicator_1 (2020)

1 2019.0 ind_2 Indicator_2 (2019)

2 2019.0 ind_3 Indicator_3 (2019)

3 NaN ind_4 Indicator_4 (nan)

反对回复 2023-08-22

慕侠2389804

TA贡献1719条经验获得超6个赞

用于Series.str.extract从获取整数indicator_short，从列中的浮点数获取整数year并最后连接在一起：

i = df['indicator_short'].str.extract('(\d+)', expand=False)

y = df['year'].astype('Int64').astype(str).replace('<NA>','N/A')

df['indicator_full'] = 'Indicator_' + i + ' (' + y + ')'

print (df)

0 2020.0 ind_1 Indicator_1 (2020)

1 2019.0 ind_2 Indicator_2 (2019)

2 2019.0 ind_3 Indicator_3 (2019)

3 NaN ind_4 Indicator_4 (N/A)

反对回复 2023-08-22

鸿蒙传说

TA贡献1865条经验获得超7个赞

替换为using后.str.cat()对concat两列使用indIndicator.str.replace.

df['indicator_full']=(df.indicator_short.str.replace('ind','Indicator')).str.cat("("+df['year']+ ")", sep=(" ") )

反对回复 2023-08-22

3 回答
0 关注
357 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在 pandas DataFrame 列中使用字符串格式

在 pandas DataFrame 列中使用字符串格式

3 回答

添加回答