2 回答

TA贡献1821条经验 获得超5个赞
一般来说,你不需要自己做这些事情,因为pandas已经为你做了。
在这种情况下,您需要的是unique方法,您可以Series直接在 a 上调用该方法(pd.Series除其他外,它是表示列的抽象),并返回一个numpy包含该中唯一值的数组Series。
如果您想要多个列的唯一值,您可以执行以下操作:
which_columns = ... # specify the columns whose unique values you want here
uniques = {col: df[col].unique() for col in which_columns}

TA贡献1836条经验 获得超5个赞
如果您正在处理分类列,那么以下代码非常有用
它不仅会打印唯一值,还会打印每个唯一值的计数
col = ['col1', 'col2', 'col3'...., 'coln']
#Print frequency of categories
for col in categorical_columns:
print ('\nFrequency of Categories for varible %s'%col)
print (bd1[col].value_counts())
例子:
df
pets location owner
0 cat San_Diego Champ
1 dog New_York Ron
2 cat New_York Brick
3 monkey San_Diego Champ
4 dog San_Diego Veronica
5 dog New_York Ron
categorical_columns = ['pets','owner','location']
#Print frequency of categories
for col in categorical_columns:
print ('\nFrequency of Categories for varible %s'%col)
print (df[col].value_counts())
输出:
# Frequency of Categories for varible pets
# dog 3
# cat 2
# monkey 1
# Name: pets, dtype: int64
# Frequency of Categories for varible owner
# Champ 2
# Ron 2
# Brick 1
# Veronica 1
# Name: owner, dtype: int64
# Frequency of Categories for varible location
# New_York 3
# San_Diego 3
# Name: location, dtype: int64
添加回答
举报