项目作者: Mr-TalhaIlyas

项目描述 :
Loss function Package Tensorflow Keras PyTOrch
高级语言:
项目地址: git://github.com/Mr-TalhaIlyas/Loss-Functions-Package-Tensorflow-Keras-PyTorch.git
创建时间: 2021-03-13T05:24:24Z
项目社区:https://github.com/Mr-TalhaIlyas/Loss-Functions-Package-Tensorflow-Keras-PyTorch

开源协议:Apache License 2.0

下载


License
PyTorch Keras TensorFlow Hits

Loss-Functions-Package-Tensorflow-Keras-PyTorch

This rope implements some popular Loass/Cost/Objective Functions that you can use to train your Deep Learning models.

With multi-class classification or segmentation, we sometimes use loss functions that calculate the average loss for each class, rather than calculating loss from the prediction tensor as a whole. This kernel is meant as a template reference for the basic code, so all examples calculate loss on the entire tensor, but it should be trivial for you to modify it for multi-class averaging.

I have provided the implementations in three popular libraries i.e. tensorflow keras and pytorch. Lets get started.

These functions cannot simply be written in NumPy, as they must operate on tensors that also have gradient parameters which need to be calculated throughout the model during backpropagation. According, loss functions must be written using backend functions from the respective model library.

With multi-class classification or segmentation, we sometimes use loss functions that calculate the average loss for each class, rather than calculating loss from the prediction tensor as a whole. This kernel is meant as a template reference for the basic code, so all examples calculate loss on the entire tensor, but it should be trivial for you to modify it for multi-class averaging.

For Learning-Rate-Schedulers-Packege-Tensorflow-PyTorch-Keras

For Evaluation-Metrics-Package-Tensorflow-PyTorch-Keras

Necessary Imports

You can import some necessary packages as follows

  1. # PyTorch
  2. import torch
  3. import torch.nn as nn
  4. import torch.nn.functional as F
  5. # Keras
  6. import keras
  7. import keras.backend as K
  8. # Tensorflow
  9. form tensorflow import keras
  10. from tensorflow import keras.backend as K

Weighted Catagorical Cross Entropy Loss

  1. # Tensorflow/Keras
  2. def weighted_categorical_crossentropy(weights):
  3. """
  4. A weighted version of keras.objectives.categorical_crossentropy
  5. weights: numpy array of shape (C,) where C is the number of classes
  6. np.array([0.5,2,10]) # Class one at 0.5, class 2 twice the normal weights, class 3 10x.
  7. """
  8. weights = K.variable(weights)
  9. def loss(y_true, y_pred, from_logits=False):
  10. if from_logits:
  11. y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  12. #y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
  13. # clip to prevent NaN's and Inf's
  14. y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
  15. # calc
  16. loss = y_true * K.log(y_pred) * weights
  17. loss = -K.sum(loss, -1)
  18. return loss
  19. return loss

Or you can check the weighted focal loss below and set the gamma=0 and alpha=1 and it’ll work same as weighted catagorical cross entropy

Dice Loss

The Dice coefficient, or Dice-Sørensen coefficient, is a common metric for pixel segmentation that can also be modified to act as a loss function:

  1. #PyTorch
  2. class DiceLoss(nn.Module):
  3. def __init__(self, weight=None, size_average=True):
  4. super(DiceLoss, self).__init__()
  5. def forward(self, inputs, targets, smooth=1):
  6. #comment out if your model contains a sigmoid or equivalent activation layer
  7. inputs = F.sigmoid(inputs)
  8. #flatten label and prediction tensors
  9. inputs = inputs.view(-1)
  10. targets = targets.view(-1)
  11. intersection = (inputs * targets).sum()
  12. dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)
  13. return 1 - dice
  1. def DiceLoss(y_true, y_pred, smooth=1e-6):
  2. # if you are using this loss for multi-class segmentation then uncomment
  3. # following lines
  4. # if y_pred.shape[-1] <= 1:
  5. # # activate logits
  6. # y_pred = tf.keras.activations.sigmoid(y_pred)
  7. # elif y_pred.shape[-1] >= 2:
  8. # # activate logits
  9. # y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  10. # # convert the tensor to one-hot for multi-class segmentation
  11. # y_true = K.squeeze(y_true, 3)
  12. # y_true = tf.cast(y_true, "int32")
  13. # y_true = tf.one_hot(y_true, num_class, axis=-1)
  14. # cast to float32 datatype
  15. y_true = K.cast(y_true, 'float32')
  16. y_pred = K.cast(y_pred, 'float32')
  17. #flatten label and prediction tensors
  18. inputs = K.flatten(y_pred)
  19. targets = K.flatten(y_true)
  20. intersection = K.sum(K.dot(targets, inputs))
  21. dice = (2*intersection + smooth) / (K.sum(targets) + K.sum(inputs) + smooth)
  22. return 1 - dice

BCE-Dice Loss

This loss combines Dice loss with the standard binary cross-entropy (BCE) loss that is generally the default for segmentation models. Combining the two methods allows for some diversity in the loss, while benefitting from the stability of BCE. The equation for multi-class BCE by itself will be familiar to anyone who has studied logistic regression:

  1. #PyTorch
  2. class DiceBCELoss(nn.Module):
  3. def __init__(self, weight=None, size_average=True):
  4. super(DiceBCELoss, self).__init__()
  5. def forward(self, inputs, targets, smooth=1):
  6. #comment out if your model contains a sigmoid or equivalent activation layer
  7. inputs = F.sigmoid(inputs)
  8. #flatten label and prediction tensors
  9. inputs = inputs.view(-1)
  10. targets = targets.view(-1)
  11. intersection = (inputs * targets).sum()
  12. dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)
  13. BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
  14. Dice_BCE = BCE + dice_loss
  15. return Dice_BCE
  1. class DiceLossMulticlass(nn.Module):
  2. def __init__(self, weights=None, size_average=False):
  3. super(mIoULoss, self).__init__()
  4. def forward(self, inputs, targets, smooth=1):
  5. if self.weights is not None:
  6. assert self.weights.shape == (targets.shape[1], )
  7. # make a copy not to change the default weights in the instance of DiceLossMulticlass
  8. weights = self.weights.copy()
  9. #comment out if your model contains a sigmoid or equivalent activation layer
  10. inputs = F.sigmoid(inputs)
  11. # flatten label and prediction images, leave BATCH and NUM_CLASSES
  12. # (BATCH, NUM_CLASSES, H, W) -> (BATCH, NUM_CLASSES, H * W)
  13. inputs = inputs.view(inputs.shape[0],inputs.shape[1],-1)
  14. targets = targets.view(targets.shape[0],targets.shape[1],-1)
  15. #intersection = (inputs * targets).sum()
  16. intersection = (inputs * targets).sum(0).sum(1)
  17. #dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)
  18. dice = (2.*intersection + smooth)/(inputs.sum(0).sum(1) + targets.sum(0).sum(1) + smooth)
  19. if (weights is None) and self.size_average==True:
  20. weights = (targets == 1).sum(0).sum(1)
  21. weights /= weights.sum() # so they sum up to 1
  22. if weights is not None:
  23. return 1 - (dice*weights).mean()
  24. else:
  25. return 1 - weights.mean()
  1. #Tensorflow / Keras
  2. def DiceBCELoss(y_true, y_pred, smooth=1e-6):
  3. # if you are using this loss for multi-class segmentation then uncomment
  4. # following lines
  5. # if y_pred.shape[-1] <= 1:
  6. # # activate logits
  7. # y_pred = tf.keras.activations.sigmoid(y_pred)
  8. # elif y_pred.shape[-1] >= 2:
  9. # # activate logits
  10. # y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  11. # # convert the tensor to one-hot for multi-class segmentation
  12. # y_true = K.squeeze(y_true, 3)
  13. # y_true = tf.cast(y_true, "int32")
  14. # y_true = tf.one_hot(y_true, num_class, axis=-1)
  15. # cast to float32 datatype
  16. y_true = K.cast(y_true, 'float32')
  17. y_pred = K.cast(y_pred, 'float32')
  18. #flatten label and prediction tensors
  19. inputs = K.flatten(y_pred)
  20. targets = K.flatten(y_true)
  21. BCE = binary_crossentropy(targets, inputs)
  22. intersection = K.sum(K.dot(targets, inputs))
  23. dice_loss = 1 - (2*intersection + smooth) / (K.sum(targets) + K.sum(inputs) + smooth)
  24. Dice_BCE = BCE + dice_loss
  25. return Dice_BCE

Weighted BCE and Dice Loss

Combines BCE and Dice loss

  1. # Keras/ Tensorflow
  2. def Weighted_BCEnDice_loss(y_true, y_pred):
  3. # if you are using this loss for multi-class segmentation then uncomment
  4. # following lines
  5. # if y_pred.shape[-1] <= 1:
  6. # # activate logits
  7. # y_pred = tf.keras.activations.sigmoid(y_pred)
  8. # elif y_pred.shape[-1] >= 2:
  9. # # activate logits
  10. # y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  11. # # convert the tensor to one-hot for multi-class segmentation
  12. # y_true = K.squeeze(y_true, 3)
  13. # y_true = tf.cast(y_true, "int32")
  14. # y_true = tf.one_hot(y_true, num_class, axis=-1)
  15. y_true = K.cast(y_true, 'float32')
  16. y_pred = K.cast(y_pred, 'float32')
  17. # if we want to get same size of output, kernel size must be odd number
  18. averaged_mask = K.pool2d(
  19. y_true, pool_size=(11, 11), strides=(1, 1), padding='same', pool_mode='avg')
  20. border = K.cast(K.greater(averaged_mask, 0.005), 'float32') * K.cast(K.less(averaged_mask, 0.995), 'float32')
  21. weight = K.ones_like(averaged_mask)
  22. w0 = K.sum(weight)
  23. weight += border * 2
  24. w1 = K.sum(weight)
  25. weight *= (w0 / w1)
  26. loss = weighted_dice_loss(y_true, y_pred, weight) + weighted_bce_loss(y_true, y_pred, weight)
  27. return loss
  28. def weighted_bce_loss(y_true, y_pred, weight):
  29. # avoiding overflow
  30. epsilon = 1e-7
  31. y_pred = K.clip(y_pred, epsilon, 1. - epsilon)
  32. logit_y_pred = K.log(y_pred / (1. - y_pred))
  33. #logit_y_pred = y_pred
  34. loss = (1. - y_true) * logit_y_pred + (1. + (weight - 1.) * y_true) * \
  35. (K.log(1. + K.exp(-K.abs(logit_y_pred))) + K.maximum(-logit_y_pred, 0.))
  36. return K.sum(loss) / K.sum(weight)
  37. def weighted_dice_loss(y_true, y_pred, weight):
  38. smooth = 1.
  39. w, m1, m2 = weight * weight, y_true, y_pred
  40. intersection = (m1 * m2)
  41. score = (2. * K.sum(w * intersection) + smooth) / (K.sum(w * (m1**2)) + K.sum(w * (m2**2)) + smooth) # Uptill here is Dice Loss with squared
  42. loss = 1. - K.sum(score) #Soft Dice Loss
  43. return loss

HED Loss

I was introduced in holistic edge detector to detect edges/boundaries of objects in https://arxiv.org/pdf/1504.06375.pdf.

  1. # Keras/ Tensorflow
  2. def HED_loss(y_true, y_pred):
  3. #y_true = y_true * 255 # b/c keras generator normalizes images
  4. if y_pred.shape[-1] <= 1:
  5. y_true = y_true[:,:,:,0:1]
  6. elif y_pred.shape[-1] >= 2:
  7. y_true = K.squeeze(y_true, 3)
  8. y_true = tf.cast(y_true, "int32")
  9. y_true = tf.one_hot(y_true, num_class, axis=-1)
  10. y_true = K.cast(y_true, 'float32')
  11. y_pred = K.cast(y_pred, 'float32')
  12. loss = sigmoid_cross_entropy_balanced(y_pred, y_true)
  13. return loss
  14. def sigmoid_cross_entropy_balanced(logits, label, name='cross_entropy_loss'):
  15. """
  16. From:
  17. https://github.com/moabitcoin/holy-edge/blob/master/hed/losses.py
  18. Implements Equation [2] in https://arxiv.org/pdf/1504.06375.pdf
  19. Compute edge pixels for each training sample and set as pos_weights to
  20. tf.nn.weighted_cross_entropy_with_logits
  21. """
  22. y = tf.cast(label, tf.float32)
  23. count_neg = tf.reduce_sum(1. - y)
  24. count_pos = tf.reduce_sum(y)
  25. # Equation [2]
  26. beta = count_neg / (count_neg + count_pos)
  27. # Equation [2] divide by 1 - beta
  28. pos_weight = beta / (1 - beta)
  29. if int(str(tf.__version__)[0]) == 1:
  30. cost = tf.nn.weighted_cross_entropy_with_logits(logits=logits, targets=y, pos_weight=pos_weight)
  31. if int(str(tf.__version__)[0]) == 2:
  32. cost = tf.nn.weighted_cross_entropy_with_logits(logits=logits, labels=y, pos_weight=pos_weight)
  33. # Multiply by 1 - beta
  34. cost = tf.reduce_mean(cost * (1 - beta))
  35. # check if image has no edge pixels return 0 else return complete error function
  36. return tf.where(tf.equal(count_pos, 0.0), 0.0, cost, name=name)

Jaccard/Intersection over Union (IoU) Loss

The IoU metric, or Jaccard Index, is similar to the Dice metric and is calculated as the ratio between the overlap of the positive instances between two sets, and their mutual combined values:

Like the Dice metric, it is a common means of evaluating the performance of pixel segmentation models.

  1. #PyTorch
  2. class IoULoss(nn.Module):
  3. def __init__(self, weight=None, size_average=True):
  4. super(IoULoss, self).__init__()
  5. def forward(self, inputs, targets, smooth=1):
  6. #comment out if your model contains a sigmoid or equivalent activation layer
  7. inputs = F.sigmoid(inputs)
  8. #flatten label and prediction tensors
  9. inputs = inputs.view(-1)
  10. targets = targets.view(-1)
  11. #intersection is equivalent to True Positive count
  12. #union is the mutually inclusive area of all labels & predictions
  13. intersection = (inputs * targets).sum()
  14. total = (inputs + targets).sum()
  15. union = total - intersection
  16. IoU = (intersection + smooth)/(union + smooth)
  17. return 1 - IoU
  1. #Tensorflow / Keras
  2. def IoULoss(y_true, y_pred, smooth=1e-6):
  3. # if you are using this loss for multi-class segmentation then uncomment
  4. # following lines
  5. # if y_pred.shape[-1] <= 1:
  6. # # activate logits
  7. # y_pred = tf.keras.activations.sigmoid(y_pred)
  8. # elif y_pred.shape[-1] >= 2:
  9. # # activate logits
  10. # y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  11. # # convert the tensor to one-hot for multi-class segmentation
  12. # y_true = K.squeeze(y_true, 3)
  13. # y_true = tf.cast(y_true, "int32")
  14. # y_true = tf.one_hot(y_true, num_class, axis=-1)
  15. # cast to float32 datatype
  16. y_true = K.cast(y_true, 'float32')
  17. y_pred = K.cast(y_pred, 'float32')
  18. #flatten label and prediction tensors
  19. inputs = K.flatten(y_pred)
  20. targets = K.flatten(y_true)
  21. intersection = K.sum(K.dot(targets, inputs))
  22. total = K.sum(targets) + K.sum(inputs)
  23. union = total - intersection
  24. IoU = (intersection + smooth) / (union + smooth)
  25. return 1 - IoU

Focal Loss

Focal Loss was introduced by Lin et al of Facebook AI Research in 2017 as a means of combatting extremely imbalanced datasets where positive cases were relatively rare. Their paper “Focal Loss for Dense Object Detection” is retrievable here: https://arxiv.org/abs/1708.02002. In practice, the researchers used an alpha-modified version of the function so I have included it in this implementation.

  1. #PyTorch
  2. ALPHA = 0.8
  3. GAMMA = 2
  4. class FocalLoss(nn.Module):
  5. def __init__(self, weight=None, size_average=True):
  6. super(FocalLoss, self).__init__()
  7. def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
  8. #comment out if your model contains a sigmoid or equivalent activation layer
  9. inputs = F.sigmoid(inputs)
  10. #flatten label and prediction tensors
  11. inputs = inputs.view(-1)
  12. targets = targets.view(-1)
  13. #first compute binary cross-entropy
  14. BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
  15. BCE_EXP = torch.exp(-BCE)
  16. focal_loss = alpha * (1-BCE_EXP)**gamma * BCE
  17. return focal_loss
  1. #Tensorflow / Keras
  2. def FocalLoss(y_true, y_pred):
  3. alpha = 0.8
  4. gamma = 2
  5. # if you are using this loss for multi-class segmentation then uncomment
  6. # following lines
  7. # if y_pred.shape[-1] <= 1:
  8. # # activate logits
  9. # y_pred = tf.keras.activations.sigmoid(y_pred)
  10. # elif y_pred.shape[-1] >= 2:
  11. # # activate logits
  12. # y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  13. # # convert the tensor to one-hot for multi-class segmentation
  14. # y_true = K.squeeze(y_true, 3)
  15. # y_true = tf.cast(y_true, "int32")
  16. # y_true = tf.one_hot(y_true, num_class, axis=-1)
  17. # cast to float32 datatype
  18. y_true = K.cast(y_true, 'float32')
  19. y_pred = K.cast(y_pred, 'float32')
  20. inputs = K.flatten(inputs)
  21. targets = K.flatten(targets)
  22. BCE = K.binary_crossentropy(targets, inputs)
  23. BCE_EXP = K.exp(-BCE)
  24. focal_loss = K.mean(alpha * K.pow((1-BCE_EXP), gamma) * BCE)
  25. return focal_loss

Weighted Focal Loss

in developement

  1. # TensorFlow/Keras
  2. class WFL():
  3. '''
  4. Weighted Focal loss
  5. '''
  6. def __init__(self, alpha=0.25, gamma=2, class_weights=None, from_logits=False):
  7. self.class_weights = class_weights
  8. self.from_logits = from_logits
  9. self.alpha = alpha
  10. self.gamma = gamma
  11. def __call__(self, y_true, y_pred):
  12. if self.from_logits:
  13. y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  14. y_pred = K.clip(y_pred, K.epsilon(), 1. - K.epsilon())
  15. # cast to float32 datatype
  16. y_true = K.cast(y_true, 'float32')
  17. y_pred = K.cast(y_pred, 'float32')
  18. WCCE = y_true * K.log(y_pred) * self.class_weights
  19. WFL = (self.alpha * K.pow((1-y_pred), self.gamma)) * WCCE
  20. # reduce sum -> reduces the loss over number of batches by simply taking sum over all samples
  21. # reduce mean -> reduces the loss ove number of batches by taking mean of all samples
  22. # if axis=-1 is given input batch is like B * C then loss will have shape B * 1
  23. # if axis is None then only 1 scaler value is output
  24. return -tf.math.reduce_sum(WFL, -1) #use this for custom training loop and dviding by global batch size. * (1/GB)
  25. #return -tf.reduce_mean(WFL, -1) # use this for complie fit keras API

Tversky Loss

This loss was introduced in “Tversky loss function for image segmentationusing 3D fully convolutional deep networks”, retrievable here: https://arxiv.org/abs/1706.05721. It was designed to optimise segmentation on imbalanced medical datasets by utilising constants that can adjust how harshly different types of error are penalised in the loss function. From the paper:
… in the case of α=β=0.5 the Tversky index simplifies to be the same as the Dice coefficient, which is also equal to the F1 score. With α=β=1, Equation 2 produces Tanimoto coefficient, and setting α+β=1 produces the set of Fβ scores. Larger βs weigh recall higher than precision (by placing more emphasis on false negatives).
To summarise, this loss function is weighted by the constants ‘alpha’ and ‘beta’ that penalise false positives and false negatives respectively to a higher degree in the loss function as their value is increased. The beta constant in particular has applications in situations where models can obtain misleadingly positive performance via highly conservative prediction. You may want to experiment with different values to find the optimum. With alpha==beta==0.5, this loss becomes equivalent to Dice Loss.

  1. #PyTorch
  2. ALPHA = 0.5
  3. BETA = 0.5
  4. class TverskyLoss(nn.Module):
  5. def __init__(self, weight=None, size_average=True):
  6. super(TverskyLoss, self).__init__()
  7. def forward(self, inputs, targets, smooth=1, alpha=ALPHA, beta=BETA):
  8. #comment out if your model contains a sigmoid or equivalent activation layer
  9. inputs = F.sigmoid(inputs)
  10. #flatten label and prediction tensors
  11. inputs = inputs.view(-1)
  12. targets = targets.view(-1)
  13. #True Positives, False Positives & False Negatives
  14. TP = (inputs * targets).sum()
  15. FP = ((1-targets) * inputs).sum()
  16. FN = (targets * (1-inputs)).sum()
  17. Tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
  18. return 1 - Tversky
  1. #Tensorflow / Keras
  2. def TverskyLoss(y_true, y_pred, smooth=1e-6):
  3. if y_pred.shape[-1] <= 1:
  4. alpha = 0.3
  5. beta = 0.7
  6. gamma = 4/3 #5.
  7. y_pred = tf.keras.activations.sigmoid(y_pred)
  8. #y_true = y_true[:,:,:,0:1]
  9. elif y_pred.shape[-1] >= 2:
  10. alpha = 0.3
  11. beta = 0.7
  12. gamma = 4/3 #3.
  13. y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  14. y_true = K.squeeze(y_true, 3)
  15. y_true = tf.cast(y_true, "int32")
  16. y_true = tf.one_hot(y_true, num_class, axis=-1)
  17. y_true = K.cast(y_true, 'float32')
  18. y_pred = K.cast(y_pred, 'float32')
  19. #flatten label and prediction tensors
  20. inputs = K.flatten(y_pred)
  21. targets = K.flatten(y_true)
  22. #True Positives, False Positives & False Negatives
  23. TP = K.sum((inputs * targets))
  24. FP = K.sum(((1-targets) * inputs))
  25. FN = K.sum((targets * (1-inputs)))
  26. Tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
  27. return 1 - Tversky

Focal Tversky Loss

A variant on the Tversky loss that also includes the gamma modifier from Focal Loss.

  1. #PyTorch
  2. ALPHA = 0.5
  3. BETA = 0.5
  4. GAMMA = 1
  5. class FocalTverskyLoss(nn.Module):
  6. def __init__(self, weight=None, size_average=True):
  7. super(FocalTverskyLoss, self).__init__()
  8. def forward(self, inputs, targets, smooth=1, alpha=ALPHA, beta=BETA, gamma=GAMMA):
  9. #comment out if your model contains a sigmoid or equivalent activation layer
  10. inputs = F.sigmoid(inputs)
  11. #flatten label and prediction tensors
  12. inputs = inputs.view(-1)
  13. targets = targets.view(-1)
  14. #True Positives, False Positives & False Negatives
  15. TP = (inputs * targets).sum()
  16. FP = ((1-targets) * inputs).sum()
  17. FN = (targets * (1-inputs)).sum()
  18. Tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
  19. FocalTversky = (1 - Tversky)**gamma
  20. return FocalTversky
  1. #Tensorflow / Keras
  2. def FocalTverskyLoss(y_true, y_pred, smooth=1e-6):
  3. if y_pred.shape[-1] <= 1:
  4. alpha = 0.3
  5. beta = 0.7
  6. gamma = 4/3 #5.
  7. y_pred = tf.keras.activations.sigmoid(y_pred)
  8. #y_true = y_true[:,:,:,0:1]
  9. elif y_pred.shape[-1] >= 2:
  10. alpha = 0.3
  11. beta = 0.7
  12. gamma = 4/3 #3.
  13. y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  14. y_true = K.squeeze(y_true, 3)
  15. y_true = tf.cast(y_true, "int32")
  16. y_true = tf.one_hot(y_true, num_class, axis=-1)
  17. y_true = K.cast(y_true, 'float32')
  18. y_pred = K.cast(y_pred, 'float32')
  19. #flatten label and prediction tensors
  20. inputs = K.flatten(y_pred)
  21. targets = K.flatten(y_true)
  22. #True Positives, False Positives & False Negatives
  23. TP = K.sum((inputs * targets))
  24. FP = K.sum(((1-targets) * inputs))
  25. FN = K.sum((targets * (1-inputs)))
  26. Tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
  27. FocalTversky = K.pow((1 - Tversky), gamma)
  28. return FocalTversky

Lovasz Hinge Loss

This complex loss function was introduced by Berman, Triki and Blaschko in their paper “The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks”, retrievable here: https://arxiv.org/abs/1705.08790. It is designed to optimise the Intersection over Union score for semantic segmentation, particularly for multi-class instances. Specifically, it sorts predictions by their error before calculating cumulatively how each error affects the IoU score. This gradient vector is then multiplied with the initial error vector to penalise most strongly the predictions that decreased the IoU score the most. This procedure is detailed by jeandebleu in his excellent summary here.

This code is taken directly from the author’s github repo here: https://github.com/bermanmaxim/LovaszSoftmax and all credit is to them.

In this kernel I have implemented the flat variant that uses reshaped rank-1 tensors as inputs for PyTorch. You can modify it accordingly with the dimensions and class number of your data as needed. This code takes raw logits so ensure your model does not contain an activation layer prior to the loss calculation.

I have hidden the researchers’ own code below for brevity; simply load it into your kernel for the losses to function. In the case of their tensorflow implementation, I am still working to make it compatible with Keras. There are differences between the Tensorflow and Keras function libraries that complicate this.

  1. #PyTorch
  2. class LovaszSoftmax(nn.Module):
  3. def __init__(self, classes='present', per_image=False, ignore=None):
  4. super(LovaszSoftmax, self).__init__()
  5. self.classes = classes
  6. self.per_image = per_image
  7. self.ignore = ignore
  8. def forward(self, inputs, targets):
  9. probas = F.softmax(inputs, dim=1) # B*C*H*W -> from logits to probabilities
  10. return lovasz_softmax(probas, targets, self.classes, self.per_image, self.ignore)
  11. def lovasz_softmax(probas, labels, classes='present', per_image=False, ignore=None):
  12. """
  13. Multi-class Lovasz-Softmax loss
  14. probas: [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1).
  15. Interpreted as binary (sigmoid) output with outputs of size [B, H, W].
  16. labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)
  17. classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.
  18. per_image: compute the loss per image instead of per batch
  19. ignore: void class labels
  20. """
  21. if per_image:
  22. loss = mean(lovasz_softmax_flat(*flatten_probas(prob.unsqueeze(0), lab.unsqueeze(0), ignore), classes=classes)
  23. for prob, lab in zip(probas, labels))
  24. else:
  25. loss = lovasz_softmax_flat(*flatten_probas(probas, labels, ignore), classes=classes)
  26. return loss
  27. def lovasz_softmax_flat(probas, labels, classes='present'):
  28. """
  29. Multi-class Lovasz-Softmax loss
  30. probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1)
  31. labels: [P] Tensor, ground truth labels (between 0 and C - 1)
  32. classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.
  33. """
  34. if probas.numel() == 0:
  35. # only void pixels, the gradients should be 0
  36. return probas * 0.
  37. C = probas.size(1)
  38. losses = []
  39. class_to_sum = list(range(C)) if classes in ['all', 'present'] else classes
  40. for c in class_to_sum:
  41. fg = (labels == c).float() # foreground for class c
  42. if (classes is 'present' and fg.sum() == 0):
  43. continue
  44. if C == 1:
  45. if len(classes) > 1:
  46. raise ValueError('Sigmoid output possible only with 1 class')
  47. class_pred = probas[:, 0]
  48. else:
  49. class_pred = probas[:, c]
  50. errors = (Variable(fg) - class_pred).abs()
  51. errors_sorted, perm = torch.sort(errors, 0, descending=True)
  52. perm = perm.data
  53. fg_sorted = fg[perm]
  54. losses.append(torch.dot(errors_sorted, Variable(lovasz_grad(fg_sorted))))
  55. return mean(losses)
  56. def flatten_probas(probas, labels, ignore=None):
  57. """
  58. Flattens predictions in the batch
  59. """
  60. if probas.dim() == 3:
  61. # assumes output of a sigmoid layer
  62. B, H, W = probas.size()
  63. probas = probas.view(B, 1, H, W)
  64. B, C, H, W = probas.size()
  65. probas = probas.permute(0, 2, 3, 1).contiguous().view(-1, C) # B * H * W, C = P, C
  66. labels = labels.view(-1)
  67. if ignore is None:
  68. return probas, labels
  69. valid = (labels != ignore)
  70. vprobas = probas[valid.nonzero().squeeze()]
  71. vlabels = labels[valid]
  72. return vprobas, vlabels
  73. def xloss(logits, labels, ignore=None):
  74. """
  75. Cross entropy loss
  76. """
  77. return F.cross_entropy(logits, Variable(labels), ignore_index=255)
  78. # --------------------------- HELPER FUNCTIONS ---------------------------
  79. def isnan(x):
  80. return x != x
  81. def mean(l, ignore_nan=False, empty=0):
  82. """
  83. nanmean compatible with generators.
  84. """
  85. l = iter(l)
  86. if ignore_nan:
  87. l = ifilterfalse(isnan, l)
  88. try:
  89. n = 1
  90. acc = next(l)
  91. except StopIteration:
  92. if empty == 'raise':
  93. raise ValueError('Empty mean')
  94. return empty
  95. for n, v in enumerate(l, 2):
  96. acc += v
  97. if n == 1:
  98. return acc
  99. return acc / n
  100. def lovasz_grad(gt_sorted):
  101. """
  102. Computes gradient of the Lovasz extension w.r.t sorted errors
  103. See Alg. 1 in paper
  104. """
  105. p = len(gt_sorted)
  106. gts = gt_sorted.sum()
  107. intersection = gts - gt_sorted.float().cumsum(0)
  108. union = gts + (1 - gt_sorted).float().cumsum(0)
  109. jaccard = 1. - intersection / union
  110. if p > 1: # cover 1-pixel case
  111. jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]
  112. return jaccard
  1. #Keras
  2. # not working yet
  3. # def LovaszHingeLoss(inputs, targets):
  4. # return lovasz_hinge_loss(inputs, targets)

Combo Loss

This loss was introduced by Taghanaki et al in their paper “Combo loss: Handling input and output imbalance in multi-organ segmentation”, retrievable here: https://arxiv.org/abs/1805.02798. Combo loss is a combination of Dice Loss and a modified Cross-Entropy function that, like Tversky loss, has additional constants which penalise either false positives or false negatives more respectively.

  1. #PyTorch
  2. ALPHA = 0.5 # < 0.5 penalises FP more, > 0.5 penalises FN more
  3. CE_RATIO = 0.5 #weighted contribution of modified CE loss compared to Dice loss
  4. class ComboLoss(nn.Module):
  5. def __init__(self, weight=None, size_average=True):
  6. super(ComboLoss, self).__init__()
  7. def forward(self, inputs, targets, smooth=1, alpha=ALPHA, beta=BETA):
  8. #flatten label and prediction tensors
  9. inputs = inputs.view(-1)
  10. targets = targets.view(-1)
  11. #True Positives, False Positives & False Negatives
  12. intersection = (inputs * targets).sum()
  13. dice = (2. * intersection + smooth) / (inputs.sum() + targets.sum() + smooth)
  14. inputs = torch.clamp(inputs, e, 1.0 - e)
  15. out = - (ALPHA * ((targets * torch.log(inputs)) + ((1 - ALPHA) * (1.0 - targets) * torch.log(1.0 - inputs))))
  16. weighted_ce = out.mean(-1)
  17. combo = (CE_RATIO * weighted_ce) - ((1 - CE_RATIO) * dice)
  18. return combo
  1. #Tensorflow / Keras
  2. def Combo_loss(y_true, y_pred, smooth=1):
  3. e = K.epsilon()
  4. if y_pred.shape[-1] <= 1:
  5. ALPHA = 0.8 # < 0.5 penalises FP more, > 0.5 penalises FN more
  6. CE_RATIO = 0.5 # weighted contribution of modified CE loss compared to Dice loss
  7. y_pred = tf.keras.activations.sigmoid(y_pred)
  8. elif y_pred.shape[-1] >= 2:
  9. ALPHA = 0.3 # < 0.5 penalises FP more, > 0.5 penalises FN more
  10. CE_RATIO = 0.7 # weighted contribution of modified CE loss compared to Dice loss
  11. y_pred = tf.keras.activations.softmax(y_pred, axis=-1)
  12. y_true = K.squeeze(y_true, 3)
  13. y_true = tf.cast(y_true, "int32")
  14. y_true = tf.one_hot(y_true, num_class, axis=-1)
  15. # cast to float32 datatype
  16. y_true = K.cast(y_true, 'float32')
  17. y_pred = K.cast(y_pred, 'float32')
  18. targets = K.flatten(y_true)
  19. inputs = K.flatten(y_pred)
  20. intersection = K.sum(targets * inputs)
  21. dice = (2. * intersection + smooth) / (K.sum(targets) + K.sum(inputs) + smooth)
  22. inputs = K.clip(inputs, e, 1.0 - e)
  23. out = - (ALPHA * ((targets * K.log(inputs)) + ((1 - ALPHA) * (1.0 - targets) * K.log(1.0 - inputs))))
  24. weighted_ce = K.mean(out, axis=-1)
  25. combo = (CE_RATIO * weighted_ce) - ((1 - CE_RATIO) * dice)
  26. return combo

Usage

Some tips

  • Tversky and Focal-Tversky loss benefit from very low learning rates, of the order 5e-5 to 1e-4. They would not see much improvement in my kernels until around 7-10 epochs, upon which performance would improve significantly.

  • In general, if a loss function does not appear to be working well (or at all), experiment with modifying the learning rate before moving on to other options.

  • You can easily create your own loss functions by combining any of the above with Binary Cross-Entropy or any combination of other losses. Bear in mind that loss is calculated for every batch, so more complex losses will increase runtime.

  • Care must be taken when writing loss functions for PyTorch. If you call a function to modify the inputs that doesn’t entirely use PyTorch’s numerical methods, the tensor will ‘detach’ from the the graph that maps it back through the neural network for the purposes of backpropagation, making the loss function unusable. Discussion of this is available here.

Refernces

RNA Kaggle