首页猿问 IRIS 决策树分类问题

IRIS 决策树分类问题

Python

jeck猫 2022-11-18 16:42:21

我想在 IRIS 数据集上写一些简单的分类并获得召回率和精度分数，跟着一个 youtube 视频但是在测试准确性时它给了我 100。我对错误有一些假设但不知道该怎么做。你能帮我扩展代码以使其更好吗？以及如何为这个版本的分类编写召回函数？from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier import graphvizfrom sklearn.tree import export_graphviz iris = load_iris() x = iris.data #feature y = iris.target #prediction tree_clf =DecisionTreeClassifier() model = tree_clf.fit(x,y) #model_fitting dot_data = export_graphviz (tree_clf,out_file=None,feature_names=iris.feature_names,class_names=iris.target_names,filled=True,rounded=True,special_characters=True) graph=graphviz.Source(dot_data) graph.render("iris") accuracy=tree_clf.score(x,y) print(accuracy)

查看完整描述

3 回答

牛魔王的故事

TA贡献1830条经验获得超3个赞

为了检查你的结果，你可以使用 sklearn.metrics

from sklearn.metrics import classification_report

print(classification_report(y, model.predict(x)))

precision recall f1-score support

0 1.00 1.00 1.00 50

1 1.00 1.00 1.00 50

2 1.00 1.00 1.00 50

accuracy 1.00 150

macro avg 1.00 1.00 1.00 150

weighted avg 1.00 1.00 1.00 150

如果您对结果有疑问，请目视检查。

print(model.predict(x))

反对回复 2022-11-18

慕的地10843

TA贡献1785条经验获得超8个赞

您在机器学习中犯了一个根本性错误 - 根据用于训练它的数据评估模型。相反，您需要将数据分成两组 - 训练和测试。在训练数据上训练您的模型，并在测试数据上进行评估。请参阅https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

尝试这样的事情：

x_train, x_test, y_train, y_test = train_test_split(x, y)

model = tree_clf.fit(x_train,y_train)

accuracy=tree_clf.score(x_test, y_test)

要了解为什么这是一个问题，请考虑“作弊”模型的极端情况，它只记住输入数据并输出它记住的任何内容。使用您的代码，它将获得 100% 的准确性，同时什么也学不到。

反对回复 2022-11-18

aluckdog

TA贡献1847条经验获得超7个赞

所以我根据建议实施和更改并实施了我为 150 个数据点（120 个训练和 30 个测试）创建的一些源代码所以我的问题是我的分类报告实施是否正确？发件人

import pandas as pd

from sklearn import tree

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import classification_report

def accuracy(y_true,y_predict):

count=0;

for i in range(0,len(y_true)):

if y_true[i] == y_predict[i]:

count=count+1;

return(count*100*1.0/len(y_true));

#reading trainning data

train_data=pd.read_csv("iris_train_data.csv",header=0)

x_train=train_data.values[:,0:4];

y_train=train_data.values[:,4];

#training the classifier

clf=DecisionTreeClassifier(criterion= 'entropy');

clf.fit(x_train,y_train);

print('Depth of learnt tree is ',clf.tree_.max_depth)

#t=clf.get_n_leaves()

print('Number of leaf nodes in learnt tree is 9','\n')

#reading test data

test_data=pd.read_csv("iris_test_data.csv",header=0)

x_test=test_data.values[:,0:4];

y_test=test_data.values[:,4];

#Training accuracy and Test accuracy without pruning

print('Training accuracy of classifier is ',accuracy(y_train,clf.predict(x_train)))

print('Test accuracy using classifier is ',accuracy(y_test,clf.predict(x_test)),'\n')

import pandas as pd

from sklearn import tree

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import classification_report

def accuracy(y_true,y_predict):

count=0;

for i in range(0,len(y_true)):

if y_true[i] == y_predict[i]:

count=count+1;

return(count*100*1.0/len(y_true));

def pruning_by_max_leaf_nodes(t):

for i in range(1, t-1):

clfnxt1 = DecisionTreeClassifier(criterion= 'entropy',max_leaf_nodes=t-i);

clfnxt1.fit(x_train,y_train)

print('Max_leaf_nodes = ',t-i,'Test Accuracy = ',accuracy(y_test,clfnxt1.predict(x_test)))

return;

def pruning_by_max_depth(t):

for i in range(1, t):

clfnxt2 = DecisionTreeClassifier(criterion= 'entropy',max_depth=t-i);

clfnxt2.fit(x_train,y_train)

print('Max_depth = ',clfnxt2.tree_.max_depth,'Test Accuracy = ',accuracy(y_test,clfnxt2.predict(x_test)))

return;

#reading trainning data

train_data=pd.read_csv("iris_train_data.csv",header=0)

x_train=train_data.values[:,0:4];

y_train=train_data.values[:,4];

#training the classifier

clf=DecisionTreeClassifier(criterion= 'entropy');

clf.fit(x_train,y_train);

print('Depth of learnt tree is ',clf.tree_.max_depth)

#t=clf.get_n_leaves()

print('Number of leaf nodes in learnt tree is 9','\n')

#reading test data

test_data=pd.read_csv("iris_test_data.csv",header=0)

x_test=test_data.values[:,0:4];

y_test=test_data.values[:,4];

#Pruning by reducing max_depth

print('Pruning case1:By reducing the max_depth of the tree')

pruning_by_max_depth(clf.tree_.max_depth)

print('')

t=9;

#Pruning by reducing max_leaf_nodes

print('Pruning case2:By reducing the max_leaf_nodes of the tree')

pruning_by_max_leaf_nodes(t);

print(classification_report(y_test, clf.fit(x_train,y_train).predict(x_test)))

反对回复 2022-11-18

3 回答
0 关注
238 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

IRIS 决策树分类问题

IRIS 决策树分类问题

3 回答

添加回答