3 回答
TA贡献1866条经验 获得超5个赞
我只会用numpy的randn:
In [11]: df = pd.DataFrame(np.random.randn(100, 2))
In [12]: msk = np.random.rand(len(df)) < 0.8
In [13]: train = df[msk]
In [14]: test = df[~msk]
只是看到它起作用了:
In [15]: len(test)
Out[15]: 21
In [16]: len(train)
Out[16]: 79
TA贡献1772条经验 获得超7个赞
scikit Learn'strain_test_split是一个不错的选择。
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2)
TA贡献1821条经验 获得超4个赞
我将使用scikit-learn自己的training_test_split,并从索引生成它
from sklearn.cross_validation import train_test_split
y = df.pop('output')
X = df
X_train,X_test,y_train,y_test = train_test_split(X.index,y,test_size=0.2)
X.iloc[X_train] # return dataframe train
添加回答
举报