项目作者: ixaxaar

项目描述 :
Decoupled Neural Interfaces Using Synthetic Gradients - under develeopment
高级语言: Python
项目地址: git://github.com/ixaxaar/pytorch-dni.git
创建时间: 2018-01-16T20:55:54Z
项目社区:https://github.com/ixaxaar/pytorch-dni

开源协议:MIT License

下载


Decoupled Neural Interfaces Using Synthetic Gradients

Build Status PyPI version

This is an implementation of Decoupled Neural Interfaces using Synthetic Gradients, Jaderberg et al..

Install

  1. pip install pytorch-dni

From source

  1. git clone https://github.com/ixaxaar/pytorch-dni
  2. cd pytorch-dni
  3. pip install -r ./requirements.txt
  4. pip install -e .

Architecure

Usage

Constructor Parameters

Following are the constructor parameters of DNI:

Argument Default Description
network NA Network to be optimized
dni_network None DNI network class
dni_params {} Parameters to be passed to the dni_network constructor
optim None optimizer for the network
grad_optim ‘adam’ DNI module optimizer
grad_lr 0.001 DNI learning rate
hidden_size 10 hidden size of the DNI network
λ 0.5 How muc to mix backprop and synthetic gradients (0 = synthetic only, 1 = backprop only)
recursive True whether to optimize leaf modules or treat network as a leaf module
gpu_id -1 GPU ID

TLDR: Use DNI to optimize every leaf module of net (including last layer)

  1. from dni import DNI
  2. # Parent network, can be anything extending nn.Module
  3. net = WhateverNetwork(**kwargs)
  4. opt = optim.Adam(net.parameters(), lr=0.001)
  5. # use DNI to optimize this network
  6. net = DNI(net, grad_optim='adam', grad_lr=0.0001)
  7. # after that we go about our business as usual
  8. for e in range(epoch):
  9. opt.zero_grad()
  10. output = net(input, *args)
  11. loss = criterion(output, target_output)
  12. loss.backward()
  13. # Optional: do this to __also__ update net's weight using backprop
  14. # opt.step()
  15. ...

Apply DNI to custom layer

DNI can be applied to any class extending nn.Module.
In this example we supply which layers to use DNI for, as the parameter dni_layers:

  1. from dni import *
  2. class Net(nn.Module):
  3. def __init__(self, num_layers=3, hidden_size=256, dni_layers=[]):
  4. super(Net, self).__init__()
  5. self.num_layers = num_layers
  6. self.hidden_size = hidden_size
  7. self.net = [self.dni(self.layer(
  8. image_size*image_size if l == 0 else hidden_size,
  9. hidden_size
  10. )) if l in dni_layers else self.layer(
  11. image_size*image_size if l == 0 else hidden_size,
  12. hidden_size
  13. ) for l in range(self.num_layers)]
  14. self.final = self.layer(hidden_size, 10)
  15. # bind layers to this class (so that they're searchable by pytorch)
  16. for ctr, n in enumerate(self.net):
  17. setattr(self, 'layer'+str(ctr), n)
  18. def layer(self, input_size, hidden_size):
  19. return nn.Sequential(
  20. nn.Linear(input_size, hidden_size),
  21. nn.BatchNorm1d(hidden_size)
  22. )
  23. # create a DNI wrapper layer, recursive=False implies treat this layer as a leaf module
  24. def dni(self, layer):
  25. d = DNI(layer, hidden_size=256, grad_optim='adam', grad_lr=0.0001, recursive=False)
  26. return d
  27. def forward(self, x):
  28. output = x.view(-1, image_size*image_size)
  29. for layer in self.net:
  30. output = F.relu(layer(output))
  31. output = self.final(output)
  32. return F.log_softmax(output, dim=-1)
  33. net = Net(num_layers=3, dni_layers=[1,2,3])
  34. # use the gradient descent to optimize layers not optimized by DNI
  35. opt = optim.Adam(net.final.parametes(), lr=0.001)
  36. # after that we go about our business as usual
  37. for e in range(epoch):
  38. opt.zero_grad()
  39. output = net(input)
  40. loss = criterion(output, target_output)
  41. loss.backward()
  42. opt.step()

Apply custom DNI net to all layers

  1. from dni import *
  2. # Custom DNI network
  3. class MyCustomDNI(DNINetwork):
  4. def __init__(self, input_size, hidden_size, output_size, num_layers=2, bias=True):
  5. super(LinearDNI, self).__init__(input_size, hidden_size, output_size)
  6. self.input_size = input_size
  7. self.hidden_size = hidden_size * 4
  8. self.output_size = output_size
  9. self.num_layers = num_layers
  10. self.bias = bias
  11. self.net = [self.layer(
  12. input_size if l == 0 else self.hidden_size,
  13. self.hidden_size
  14. ) for l in range(self.num_layers)]
  15. # bind layers to this class (so that they're searchable by pytorch)
  16. for ctr, n in enumerate(self.net):
  17. setattr(self, 'layer'+str(ctr), n)
  18. # final layer (yeah, no kidding)
  19. self.final = nn.Linear(self.hidden_size, output_size)
  20. def layer(self, input_size, hidden_size):
  21. return nn.Linear(input_size, hidden_size)
  22. def forward(self, input, hidden):
  23. output = input
  24. for layer in self.net:
  25. output = F.relu(layer(output))
  26. output = self.final(output)
  27. return output, None
  28. # Custom network, can be anything extending nn.Module
  29. net = WhateverNetwork(**kwargs)
  30. opt = optim.Adam(net.parameters(), lr=0.001)
  31. # use DNI to optimize this network with MyCustomDNI, pass custom params to the DNI nets
  32. net = DNI(net, grad_optim='adam', grad_lr=0.0001, dni_network=MyCustomDNI,
  33. dni_params={'num_layers': 3, 'bias': True})
  34. # after that we go about our business as usual
  35. for e in range(epoch):
  36. opt.zero_grad()
  37. output = net(input, *args)
  38. loss = criterion(output, target_output)
  39. loss.backward()

Apply custom DNI net to custom layers

Oh come on.

DNI Networks

This package ships with 3 types of DNI networks:

  • LinearDNI: Linear -> ReLU * num_layers -> Linear
  • LinearSigmoidDNI: Linear -> ReLU * num_layers -> Linear -> Sigmoid
  • LinearBatchNormDNI: Linear -> BatchNorm1d -> ReLU * num_layers -> Linear
  • RNNDNI: stacked LSTMs, GRUs or RNNs
  • Conv2dDNI: Conv2d -> BatchNorm2d -> MaxPool2d / AvgPool2d -> ReLU * num_layers -> Conv2d -> AvgPool2d

Custom DNI Networks

Custom DNI nets can be created using the DNINetwork interface:

  1. from dni import *
  2. class MyDNI(DNINetwork):
  3. def __init__(self, input_size, hidden_size, output_size, **kwargs):
  4. super(MyDNI, self).__init__(input_size, hidden_size, output_size)
  5. ...
  6. def forward(self, input, hidden):
  7. ...
  8. return output, hidden

Tasks

MNIST (FCN and CNN)

Refer to tasks/mnist/README.md

Language model

Refer to tasks/word_language_model/README.md

Copy task

The tasks included in this project are the same as those in pytorch-dnc, except that they’re trained here using DNI.

Notable stuff

  • Using a linear SG module makes the implicit assumption that loss is a quadratic function of the activations
  • For best performance one should adapt the SG module architecture to the loss function used. For MSE linear SG is a reasonable choice, however for log loss one should use architectures including a sigmoid applied pointwise to a linear SG
  • Learning rates of the order of 1e-5 with momentum of 0.9 works well for rmsprop, adam works well with 0.001