4 回答

TA贡献1862条经验 获得超7个赞
这是一种使用collections.defaultdict.
前任:
from collections import defaultdict
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
result = defaultdict(int)
seen = set()
for k, v in a:
key = "{}_{}".format(k, v)
if key not in seen:
result[v] += 1
seen.add(key)
print(list(map(list, result.items())))
输出:
[['referral', 3], ['affiliate', 3], ['cpc', 4], ['orgainic', 2]]

TA贡献1155条经验 获得超0个赞
首先让我们使条目独一无二:
c = {tuple(sublist) for sublist in a}
现在我们有了一对独特的用户和类型。
对于我们不需要用户的计数,因此让我们将其设为仅包含第二个参数的列表:
c = [elem[1] for elem in c]
现在我们可以很容易地计算它:
from collections import Counter
c = Counter(c)
结果:Counter({'cpc': 4, 'affiliate': 3, 'referral': 3, 'orgainic': 2})
现在把它们放在一起:
from collections import Counter
c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})

TA贡献1797条经验 获得超4个赞
defaultdict和基于循环的解决方案
这可以使用defaultdict:
d = defaultdict(set)
for user, category in a:
d[category].add(user)
res = [[category, len(users)] for category, users in d.items()]
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
groupby基于解决方案
或者,这可以使用groupbyfrom来完成itertools:
from itertools import groupby
from operator import itemgetter
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ...]
# Sort the items according to the category so groupby will collect the pairs accordingly
res = {category: len({user for user, _ in pairs}) for category, pairs in
groupby(sorted(a, key=itemgetter(1)), key=itemgetter(1))}
res = [list(pair) for pair in res.items()]
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]

TA贡献1934条经验 获得超2个赞
这听起来像是熊猫的案例,您的列表已经是正确的形状:
import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
df = pd.DataFrame(a)
df.columns=["user", "type"]
unique_per_type = df.groupby("type")["user"].unique()
现在 unique_per_type 是:
type
affiliate [user1, user7, user9]
cpc [user4, user14, user2, user8]
orgainic [user3, user2]
referral [user1, user2, user4]
Name: user, dtype: object
您可以执行以下操作:
# access length by key
len(unique_per_type["affiliate"])
# or use it like a dict
for key, val in unique_per_type.items():
print(key, len(val)))
这个解决方案添加了 pandas,这是一个巨大的依赖。但是,一旦您将数据放入 DataFrame 中,您就可以用它做很多事情:
df["user"].unique() # shows all unique users
df.query("user=='user1'") # shows all observations involving user1
添加回答
举报