为了账号安全,请及时绑定邮箱和手机立即绑定

在sklearn中使用DataFrameMapper()进行PolynomialFeature

在sklearn中使用DataFrameMapper()进行PolynomialFeature

慕尼黑5688855 2023-08-22 17:16:34
对于住房数据集,我尝试使用 sklearn_pandas 中的 DataFrameMapper() 在选定的列上应用多项式特征。我的代码:from sklearn.preprocessing import PolynomialFeatures from sklearn_pandas import DataFrameMapper mapper = DataFrameMapper([('houseAge_income', PolynomialFeatures(2)),('median_income', PolynomialFeatures(2)),(['latitude', 'housing_median_age', 'total_rooms', 'population', 'median_house_value', 'ocean_proximity']], None) ]) poly_feature = mapper.fit_transform(housing) 当我尝试使用houseAge_income.reshape(-1, 1)在 DataFrameMapper() 中,我遇到了另一个问题:---------------------------------------------------------------------------KeyError                                  Traceback (most recent call last)/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)   2645             try:-> 2646                 return self._engine.get_loc(key)   2647             except KeyError:pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: 'houseAge_income.reshape(-1, 1)'During handling of the above exception, another exception occurred:KeyError                                  Traceback (most recent call last)5 frames/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)   2646                 return self._engine.get_loc(key)   2647             except KeyError:-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)   2650         if indexer.ndim > 1 or indexer.size > 1:pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()谁能告诉我,我错过了什么?
查看完整描述

1 回答

?
白板的微信

TA贡献1883条经验 获得超3个赞

  • 从文档中

    • 'column'将列选择器指定为(作为简单字符串)和['column'](作为包含一个元素的列表)之间的区别在于传递给转换器的数组的形状。在第一种情况下,将传递一个一维数组,而在第二种情况下,将传递一个具有一列的二维数组,即列向量。

  • 所有列必须使用相同类型的列选择器传递。

    • 在本例中,为 a list,因为需要list保留一些未转换的列。

import pandas as pd

from sklearn.preprocessing import PolynomialFeatures

from sklearn_pandas import DataFrameMapper


# load data

df = pd.read_csv('https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv')


# create houseAge_income

df['houseAge_income'] = df.housing_median_age.mul(df.median_income)


# configure mapper with all columns passed as lists

mapper = DataFrameMapper([(['houseAge_income'], PolynomialFeatures(2)),

                          (['median_income'], PolynomialFeatures(2)),

                          (['latitude', 'housing_median_age', 'total_rooms', 'population', 'median_house_value', 'ocean_proximity'], None)])


# fit

poly_feature = mapper.fit_transform(df)


# display(pd.DataFrame(poly_feature).head())

  0       1           2  3       4       5      6   7     8     9          10        11

0  1  341.33  1.1651e+05  1  8.3252  69.309  37.88  41   880   322  4.526e+05  NEAR BAY

1  1  174.33       30391  1  8.3014  68.913  37.86  21  7099  2401  3.585e+05  NEAR BAY

2  1  377.38  1.4242e+05  1  7.2574   52.67  37.85  52  1467   496  3.521e+05  NEAR BAY

3  1  293.44       86108  1  5.6431  31.845  37.85  52  1274   558  3.413e+05  NEAR BAY

4  1     200       40001  1  3.8462  14.793  37.85  52  1627   565  3.422e+05  NEAR BAY




查看完整回答
反对 回复 2023-08-22
  • 1 回答
  • 0 关注
  • 125 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信