我想尝试MxNet库并构建一个简单的神经网络来学习XOR功能。我面临的问题是,模型不是在学习。
这是完整的脚本:…
通常,激活层在输入后不会正确,因为一旦完成第一层计算,它就应该被激活。 您仍然可以使用旧代码模仿XOR函数,但需要进行一些调整:
你是正确的,你需要初始化权重。这是深度学习社区中的一个重要讨论,初始权重是最好的,但从我的实践中,Xavier权重运作良好
如果要使用softmax,则需要将最后隐藏的图层单元数量更改为2,因为您有2个类:0和1
在做了这两件事之后,进行了一些小的优化,例如删除矩阵的转置,使用以下代码:
library(mxnet) train = matrix(c(0,0,0, 0,1,1, 1,0,1, 1,1,0), nrow=4, ncol=3, byrow=TRUE) train.x = train[,-3] train.y = train[,3] data <- mx.symbol.Variable("data") fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=2) act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu") fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3) act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu") fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=2) softmax <- mx.symbol.Softmax(fc3, name="sm") mx.set.seed(0) model <- mx.model.FeedForward.create( softmax, X = train.x, y = train.y, num.round = 50, array.layout = "rowmajor", learning.rate = 0.1, momentum = 0.99, eval.metric = mx.metric.accuracy, initializer = mx.init.Xavier(rnd_type = "uniform", factor_type = "avg", magnitude = 3), epoch.end.callback = mx.callback.log.train.metric(100)) predict(model,train.x,array.layout="rowmajor")
我们得到以下结果:
Start training with 1 devices [1] Train-accuracy=NaN [2] Train-accuracy=0.75 [3] Train-accuracy=0.5 [4] Train-accuracy=0.5 [5] Train-accuracy=0.5 [6] Train-accuracy=0.5 [7] Train-accuracy=0.5 [8] Train-accuracy=0.5 [9] Train-accuracy=0.5 [10] Train-accuracy=0.75 [11] Train-accuracy=0.75 [12] Train-accuracy=0.75 [13] Train-accuracy=0.75 [14] Train-accuracy=0.75 [15] Train-accuracy=0.75 [16] Train-accuracy=0.75 [17] Train-accuracy=0.75 [18] Train-accuracy=0.75 [19] Train-accuracy=0.75 [20] Train-accuracy=0.75 [21] Train-accuracy=0.75 [22] Train-accuracy=0.5 [23] Train-accuracy=0.5 [24] Train-accuracy=0.5 [25] Train-accuracy=0.75 [26] Train-accuracy=0.75 [27] Train-accuracy=0.75 [28] Train-accuracy=0.75 [29] Train-accuracy=0.75 [30] Train-accuracy=0.75 [31] Train-accuracy=0.75 [32] Train-accuracy=0.75 [33] Train-accuracy=0.75 [34] Train-accuracy=0.75 [35] Train-accuracy=0.75 [36] Train-accuracy=0.75 [37] Train-accuracy=0.75 [38] Train-accuracy=0.75 [39] Train-accuracy=1 [40] Train-accuracy=1 [41] Train-accuracy=1 [42] Train-accuracy=1 [43] Train-accuracy=1 [44] Train-accuracy=1 [45] Train-accuracy=1 [46] Train-accuracy=1 [47] Train-accuracy=1 [48] Train-accuracy=1 [49] Train-accuracy=1 [50] Train-accuracy=1 > > predict(model,train.x,array.layout="rowmajor") [,1] [,2] [,3] [,4] [1,] 0.9107883 2.618128e-06 6.384078e-07 0.9998743534 [2,] 0.0892117 9.999974e-01 9.999994e-01 0.0001256234 '''
softmax的输出被解释为“属于一个类的概率” - 它在进行常规数学运算后得到的值不是“0”或“1”。答案意味着以下内容:
好吧,我尝试了一点点,现在我在R中使用了mxnet的XOR工作示例。复杂的部分不是mxnet API,而是使用神经网络。
所以这是工作的R代码:
library(mxnet) train = matrix(c(0,0,0, 0,1,1, 1,0,1, 1,1,0), nrow=4, ncol=3, byrow=TRUE) train.x = t(train[,-3]) train.y = t(train[,3]) data <- mx.symbol.Variable("data") act0 <- mx.symbol.Activation(data, name="relu1", act_type="relu") fc1 <- mx.symbol.FullyConnected(act0, name="fc1", num_hidden=2) act1 <- mx.symbol.Activation(fc1, name="relu2", act_type="tanh") fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3) act2 <- mx.symbol.Activation(fc2, name="relu3", act_type="relu") fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=1) act3 <- mx.symbol.Activation(fc3, name="relu4", act_type="relu") softmax <- mx.symbol.LinearRegressionOutput(act3, name="sm") mx.set.seed(0) model <- mx.model.FeedForward.create( softmax, X = train.x, y = train.y, num.round = 10000, array.layout = "columnmajor", learning.rate = 10^-2, momentum = 0.95, eval.metric = mx.metric.rmse, epoch.end.callback = mx.callback.log.train.metric(10), lr_scheduler=mx.lr_scheduler.FactorScheduler(1000,factor=0.9), initializer=mx.init.uniform(0.5) ) predict(model,train.x,array.layout="columnmajor")
初始代码有一些差异:
我通过在数据和第一层之间放置另一个激活层来改变神经网络的布局。我把它解释为在数据和输入层之间加权(这是正确的吗?)
我将隐藏层(具有3个神经元)的激活功能更改为tanh,因为我猜测XOR需要负权重
我将SoftmaxOutput更改为LinearRegressionOutput,以优化平方损失
精细的学习速度和动力
最重要的是:我为权重添加了统一初始值设定项。我猜默认模式是将权重设置为零。使用随机初始权重时学习速度确实加快。
输出:
Start training with 1 devices [1] Train-rmse=NaN [2] Train-rmse=0.706823888574888 [3] Train-rmse=0.705537411582449 [4] Train-rmse=0.701298592443344 [5] Train-rmse=0.691897326795625 ... [9999] Train-rmse=1.07453801496744e-07 [10000] Train-rmse=1.07453801496744e-07 > predict(model,train.x,array.layout="columnmajor") [,1] [,2] [,3] [,4] [1,] 0 0.9999998 1 0