2 回答
TA贡献1812条经验 获得超5个赞
您有多个状态行,但它们也位于列上。看来您只是显示了该AL列的示例输出?您可以在State和 然后.groupby Rate和上合并两个数据框Cost。.apply(lambda x: '_'.join(x))然后,返回具有相同费率和成本的州的连接字符串(带有)(因为您按它们分组,所以它们将具有相同的费率和成本):
master_dataframe0 = (pd.merge(df_d[['State', 'AL']], df_m[['State', 'AL']], how='inner', on='State')
.rename({'AL_x' : 'Rate', 'AL_y' : 'Cost'}, axis=1)
.groupby(['Rate', 'Cost'])['State'].apply(lambda x: '_'.join(x)).reset_index()
.sort_values('State'))
master_dataframe0 = master_dataframe0[['State', 'Rate', 'Cost']].assign(State='20_21_RDWY_Purple_' + master_dataframe0['State'])
master_dataframe0
Out[1]:
State Rate Cost
7 20_21_RDWY_Purple_AL_AR_KY_LA_MS_SC_TN 50.80% 120.0
11 20_21_RDWY_Purple_AZ 56.70% 155.0
15 20_21_RDWY_Purple_CA 62.40% 145.0
9 20_21_RDWY_Purple_CO_ND 54.30% 155.0
5 20_21_RDWY_Purple_CT_DE_MN_NE 50.00% 145.0
4 20_21_RDWY_Purple_DC_IA_KS_MD_MI_OH_OK 49.00% 125.0
3 20_21_RDWY_Purple_FL 48.30% 125.0
18 20_21_RDWY_Purple_GA 67.90% 120.0
14 20_21_RDWY_Purple_ID_MT_NV 61.80% 145.0
0 20_21_RDWY_Purple_IL_IN_MO_NC 44.10% 120.0
16 20_21_RDWY_Purple_MA 63.50% 155.0
8 20_21_RDWY_Purple_ME_NH_NM_RI_SD 53.90% 155.0
6 20_21_RDWY_Purple_NJ 50.50% 145.0
13 20_21_RDWY_Purple_NY 61.10% 145.0
17 20_21_RDWY_Purple_OR 64.40% 185.0
2 20_21_RDWY_Purple_PA 47.20% 145.0
10 20_21_RDWY_Purple_TX 56.60% 125.0
1 20_21_RDWY_Purple_UT 45.00% 170.0
12 20_21_RDWY_Purple_VA 57.90% 120.0
TA贡献1875条经验 获得超3个赞
借助 Erickson 的 和 lambda 函数帮助.groupby,我们得出了正确的解决方案:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
df_d = pd.read_excel(path,
sheet_name=0,
header=0,
index_col=False,
keep_default_na=True)
df_m = pd.read_excel(path2,
sheet_name=0,
header=0,
index_col=False,
keep_default_na=True)
m_col_names = ['AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA',
'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH',
'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY', 'AB', 'BC',
'MB', 'NB', 'NF', 'NS', 'ON', 'PE', 'PQ', 'SK']
final_frame = pd.DataFrame()
for state in m_col_names:
master_dataframe0 = (pd.merge(df_d[['State', state]], df_m[['State', state]], how='inner', on='State')
.rename({state + '_x': 'Rate', state + '_y': 'Cost'}, axis=1)
.groupby(['Rate', 'Cost'])['State'].apply(lambda x: '_'.join(x)).reset_index()
.sort_values('State'))
master_dataframe0['Origin'] = state
master_dataframe0 = master_dataframe0[['State', 'Rate', 'Cost', 'Origin']].assign(
State='20_21_RDWY_Purple_' + master_dataframe0['State'])
final_frame = final_frame.append(master_dataframe0)
print(final_frame)
final_frame.to_excel("w3llshipmeright.xlsx")
正确输出:
添加回答
举报