2 回答
TA贡献1856条经验 获得超5个赞
这是我如何使用它来解决它.explode(),.value_counts()您还可以将其分配为一列或随心所欲地使用输出:在一行中:
print(df.explode('column')['column'].value_counts())
完整示例:
import pandas as pd
data_1 = {'index':[0,1,2,3],'column':[['abc','mno'],['mno','pqr'],['abc','mno'],['mno','pqr']]}
df = pd.DataFrame(data_1)
df = df.set_index('index')
print(df)
column
index
0 [abc, mno]
1 [mno, pqr]
2 [abc, mno]
3 [mno, pqr]
在这里,我们执行.explode()从列表中创建单个值和 value_counts() 来计算唯一值的重复:
df_new = df.explode('column')
print(df_new['column'].value_counts())
输出:
mno 4
abc 2
pqr 2
TA贡献1825条经验 获得超4个赞
利用collections.Counter
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(df.column))
Out[196]: Counter({'abc': 2, 'mno': 4, 'pqr': 2})
%时间
df1 = pd.concat([df]*10000, ignore_index=True)
In [227]: %timeit pd.Series(Counter(chain.from_iterable(df1.column)))
14.3 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [228]: %timeit df1.column.explode().value_counts()
127 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
添加回答
举报
