如何根据条件提取某些行？

我正在使用一个数据集，该数据集的第一列中包含情感或类别标签。但是，由于数据集不平衡，我需要为每个类别提取相同数量的行。也就是说，如果有 10 个类别，我只需从每个类别中选择 100 行样本。结果将是 1000 行样本。我尝试过的：def append_new_rows(df, new_df, s): c = 0 for index, row in df.iterrows(): if s == row[0]: if c <= 100: new_df.append(row) c += 1 return df_2for s in sorted(list(set(df.category))): new_df = append_new_rows(df, new_df, s)数据集----------------------------| category | A | B | C | D |----------------------------| happy | ...| ...|...|...|| ... | ...| ...|...|...|| sadness | ...| ...|...|...|预期产出----------------------------| category | A | B | C | D |----------------------------| happy | ...| ...|...|...|... 100 samples of happy| ... | ...| ...|...|...|| sadness | ...| ...|...|...|... 100 samples of sadness......1000 sampple rows

查看完整描述

1 回答

Helenr

TA贡献1780条经验获得超3个赞

def append_new_df(df, df_2, s, n):

c = 1

for index, row in df.iterrows():

if s == row[0]:

if c <= n:

df_2 = df_2.append(row)

c += 1

return df_2

你就在那里，你只需要做这样的事情

反对回复 2023-10-11

热搜

最近搜索清空

如何根据条件提取某些行？

如何根据条件提取某些行？

1 回答

添加回答