首页猿问使用python和numpy进行梯度下降

使用python和numpy进行梯度下降

Python

慕妹3242003 2019-12-09 10:31:59

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return thetaX_norm,mean,std=featureScale(X)#length of X (number of rows)m=len(X)X_norm=np.array([np.ones(m),X_norm])n,m=np.shape(X_norm)num_it=1500alpha=0.01theta=np.zeros(n,float)[:,np.newaxis]X_norm=X_norm.transpose()theta=gradient(X_norm,y,theta,alpha,m,n,num_it)print theta我上面的代码中的theta是100.2 100.2，但是应该100.2 61.09在matlab中是正确的。

查看完整描述

3 回答

catspeake

TA贡献1111条经验获得超0个赞

我认为您的代码有点太复杂了，它需要更多的结构，因为否则您将迷失在所有方程式和运算中。最后，此回归可归结为以下四个操作：

计算假设h = X * theta

计算损耗= h-y，也许是成本的平方（loss ^ 2）/ 2m

计算梯度= X'*损耗/ m

更新参数theta = theta-alpha *渐变

就您而言，我想您已经m与混淆了n。这里m表示训练集中的示例数量，而不是特征数量。

让我们看看我的代码变化：

import numpy as np

import random

# m denotes the number of examples here, not the number of features

def gradientDescent(x, y, theta, alpha, m, numIterations):

xTrans = x.transpose()

for i in range(0, numIterations):

hypothesis = np.dot(x, theta)

loss = hypothesis - y

# avg cost per example (the 2 in 2*m doesn't really matter here.

# But to be consistent with the gradient, I include it)

cost = np.sum(loss ** 2) / (2 * m)

print("Iteration %d | Cost: %f" % (i, cost))

# avg gradient per example

gradient = np.dot(xTrans, loss) / m

# update

theta = theta - alpha * gradient

return theta

def genData(numPoints, bias, variance):

x = np.zeros(shape=(numPoints, 2))

y = np.zeros(shape=numPoints)

# basically a straight line

for i in range(0, numPoints):

# bias feature

x[i][0] = 1

x[i][1] = i

# our target variable

y[i] = (i + bias) + random.uniform(0, 1) * variance

return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise

x, y = genData(100, 25, 10)

m, n = np.shape(x)

numIterations= 100000

alpha = 0.0005

theta = np.ones(n)

theta = gradientDescent(x, y, theta, alpha, m, numIterations)

print(theta)

首先，我创建一个小的随机数据集，其外观应如下所示：

线性回归

如您所见，我还添加了由excel计算的生成的回归线和公式。

您需要注意使用梯度下降的回归直觉。当您完成对数据X的完整批量传递时，需要将每个示例的m损失减少为一次权重更新。在这种情况下，这是所有梯度之和的平均值，因此除以m。

接下来需要注意的是跟踪收敛并调整学习率。为此，您应该始终跟踪每次迭代的成本，甚至可能将其绘制出来。

如果运行我的示例，返回的theta将如下所示：

Iteration 99997 | Cost: 47883.706462

Iteration 99998 | Cost: 47883.706462

Iteration 99999 | Cost: 47883.706462

[ 29.25567368 1.01108458]

实际上，这与excel计算的方程非常接近（y = x + 30）。请注意，当我们将偏差传递到第一列时，第一个theta值表示偏差权重。

反对回复 2019-12-09

森林海

TA贡献2011条经验获得超2个赞

我知道这个问题已经回答了，但是我对GD函数做了一些更新：

### COST FUNCTION

def cost(theta,X,y):

### Evaluate half MSE (Mean square error)

m = len(y)

error = np.dot(X,theta) - y

J = np.sum(error ** 2)/(2*m)

return J

cost(theta,X,y)

def GD(X,y,theta,alpha):

cost_histo = [0]

theta_histo = [0]

# an arbitrary gradient, to pass the initial while() check

delta = [np.repeat(1,len(X))]

# Initial theta

old_cost = cost(theta,X,y)

while (np.max(np.abs(delta)) > 1e-6):

error = np.dot(X,theta) - y

delta = np.dot(np.transpose(X),error)/len(y)

trial_theta = theta - alpha * delta

trial_cost = cost(trial_theta,X,y)

while (trial_cost >= old_cost):

trial_theta = (theta +trial_theta)/2

trial_cost = cost(trial_theta,X,y)

cost_histo = cost_histo + trial_cost

theta_histo = theta_histo + trial_theta

old_cost = trial_cost

theta = trial_theta

Intercept = theta[0]

Slope = theta[1]

return [Intercept,Slope]

res = GD(X,y,theta,alpha)

该函数在迭代过程中降低了alpha值，从而使函数收敛速度更快，请参阅R中的示例使用Gradient Descent（Steepest Descent）估计线性回归。我在Python中应用了相同的逻辑。

反对回复 2019-12-09

3 回答
0 关注
1024 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用python和numpy进行梯度下降

使用python和numpy进行梯度下降

3 回答

添加回答