项目作者: PopovicMilica

项目描述 :
Implementation of neural network algorithm for estimation of heterogeneous treatment effects and propensity scores described in Farrell, Liang, and Misra (2021)
高级语言: Python
项目地址: git://github.com/PopovicMilica/causal_nets.git
创建时间: 2019-11-04T16:35:05Z
项目社区:https://github.com/PopovicMilica/causal_nets

开源协议:

下载


A feed-forward neural network based software for treatment effects and propensity score estimation. Currently, it only supports the case when target variable is continuous.

Installation

First, navigate to the folder where you want to store the software repository, then clone the software repository. After that install the dependencies, and finally the software itself by typing the following commands:

  1. cd install/path
  2. git clone https://github.com/PopovicMilica/causal_nets.git
  3. cd causal_nets
  4. pip3 install -r requirements.txt
  5. pip3 install .
  6. cd ..

Test installation by importing the package:

  1. import causal_nets

Note: This software supports only Python versions 3.6 and above.

Example

The below code is a short example showcasing the usage of causal_nets software.

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from sklearn.model_selection import train_test_split
  4. from scipy.stats import norm
  5. from causal_nets import causal_net_estimate
  6. # Setting the seeds
  7. np.random.seed(3)
  8. # Generating the fake data
  9. N = 10000
  10. X = np.random.uniform(low=0, high=1, size=[N, 10])
  11. mu0_real = 1.5 + 0.012*X[:, 3] - 0.75*X[:, 5]*X[:, 7] - 0.9*X[:, 4] - np.mean(X, axis=1)
  12. tau_real = X[:, 2] + 0.04*X[:, 9] - 0.35*np.log(X[:, 3])
  13. prob_of_T = 0.5
  14. T = np.random.binomial(size=N, n=1, p=prob_of_T)
  15. normal_errors = np.random.normal(size=[N, ], loc=0.0, scale=1.0)
  16. Y = mu0_real + tau_real*T + normal_errors
  17. # Creating training and validation dataset
  18. X_train, X_valid, T_train, T_valid, Y_train, Y_valid = train_test_split(
  19. X, T, Y, test_size=0.2, random_state=42)
  20. # Getting causal estimates
  21. tau_pred, mu0_pred, prob_t_pred, psi_0, psi_1, history, history_ps = causal_net_estimate(
  22. [X_train, T_train, Y_train], [X_valid, T_valid, Y_valid], [X, T, Y], [60, 30],
  23. dropout_rates=None, batch_size=None, alpha=0., r_par=0., optimizer='Adam', learning_rate=0.0009,
  24. max_epochs_without_change=30, max_nepochs=10000, seed=None, estimate_ps=False, verbose=True)
  25. # Plotting estimated coefficient vs true coefficients
  26. plt.figure(figsize=(10, 5))
  27. plt.clf()
  28. plt.subplot(1, 2, 1)
  29. bins = np.linspace(min(float(min(tau_pred)), float(min(tau_real))),
  30. max(float(max(tau_pred)), float(max(tau_real))), 15)
  31. plt.hist(tau_pred, alpha=0.6, label=r'$\tau~_{pred}$',
  32. density=True, bins=bins)
  33. plt.hist(tau_real, label=r'$\tau~_{ real}$', histtype=u'step',
  34. density=True, linewidth=2.5, bins=bins)
  35. plt.legend(loc='upper right')
  36. plt.title('CATE(Conditional average treatment effect)')
  37. plt.xlabel(r'$\tau$', fontsize=14)
  38. plt.ylabel('Density')
  39. plt.subplot(1, 2, 2)
  40. bins = np.linspace(min(float(min(mu0_pred)), float(min(mu0_real))),
  41. max(float(max(mu0_pred)), float(max(mu0_real))), 15)
  42. plt.hist(mu0_pred, alpha=0.7, label=r'$\mu_{0~pred}$',
  43. density=True, bins=bins)
  44. plt.hist(mu0_real, label=r'$\mu_{0~real}$', histtype=u'step',
  45. density=True, linewidth=2.5, bins=bins)
  46. plt.legend(loc='upper right')
  47. plt.title(r'$\mu_0(x)$')
  48. plt.xlabel(r'$\mu_0$', fontsize=14)
  49. plt.ylabel('Density')
  50. plt.tight_layout()
  51. plt.show()
  52. # Calculate the average treatment effect
  53. ate = np.mean(psi_1-psi_0)
  54. # Calculate the 95% confidence interval for average treatment effect
  55. CI_lowerbound = ate - norm.ppf(0.975)*np.std(psi_1-psi_0)/np.sqrt(len(psi_0))
  56. CI_upperbound = ate + norm.ppf(0.975)*np.std(psi_1-psi_0)/np.sqrt(len(psi_0))

To run the same example directly in R, additional R packages need to be installed: reticulate, repr and gensvm. Then run the code below:

  1. library(reticulate)
  2. library(gensvm)
  3. library(repr)
  4. causalNets <- import("causal_nets")
  5. np <-import("numpy")
  6. set.seed(3)
  7. N <- 10000
  8. d <- 10
  9. X <- matrix(runif(N * d), N, d)
  10. mu0_real <- 1.5 + 0.012*X[, 4] - 0.75*X[, 6]*X[ ,8] - 0.9*X[, 5] - rowMeans(X)
  11. tau_real <- X[, 3] + 0.04*X[, 10] - 0.35*log(X[, 4])
  12. prob_of_T <- 0.5
  13. T <- rbinom(N, 1, prob_of_T)
  14. normal_errors <- rnorm(N)
  15. Y <- mu0_real + tau_real*T + normal_errors
  16. split = gensvm.train.test.split(X, train.size=0.8, shuffle=TRUE,
  17. random.state=42, return.idx=TRUE)
  18. # Splitting the data
  19. X_train <- X[split$idx.train,]
  20. Y_train <- Y[split$idx.train]
  21. T_train <- T[split$idx.train]
  22. X_valid <- X[split$idx.test,]
  23. Y_valid <- Y[split$idx.test]
  24. T_valid <- T[split$idx.test]
  25. # Converting arrays into ndarrays which are recognized by Python
  26. X_train <- np$array(X_train)
  27. T_train <- np$array(T_train)
  28. Y_train <- np$array(Y_train)
  29. X_valid <- np$array(X_valid)
  30. T_valid <- np$array(T_valid)
  31. Y_valid <- np$array(Y_valid)
  32. X <- np$array(X)
  33. T <- np$array(T)
  34. Y <- np$array(Y)
  35. # Getting causal estimates
  36. coeffs <- causalNets$causal_net_estimate(
  37. list(X_train, T_train, Y_train), list(X_valid, T_valid, Y_valid),
  38. list(X, T, Y), list(60L, 30L), dropout_rates=NULL, batch_size=NULL,
  39. alpha=0., r_par=0., optimizer='Adam', learning_rate=0.0009, max_epochs_without_change=30L,
  40. max_nepochs=10000L, seed=NULL, estimate_ps=FALSE, verbose=TRUE)
  41. # Plotting estimated coefficient vs true coefficients
  42. tau_pred <- as.vector(coeffs[[1]])
  43. mu0_pred <- as.vector(coeffs[[2]])
  44. c1 <- rgb(0, 0, 230, max=255, alpha=70, names="purple")
  45. c2 <- rgb(0, 230, 0, max=255, alpha=70, names="green")
  46. options(repr.plot.width=9, repr.plot.height=5)
  47. par(mfrow=c(1,2))
  48. # Plot histograms for tau_pred and tau_real
  49. breaks <- seq(min(c(tau_pred, tau_real)), max(c(tau_pred, tau_real)), length.out=15)
  50. hist(tau_pred, main="CATE(Conditional average treatment effect)",
  51. freq=FALSE, cex.main=0.9, ylab="Density", xlab=expression(tau), col=c1,
  52. border=F, breaks=breaks)
  53. hist(tau_real, col=c2, add=TRUE, freq=FALSE, border=F, breaks=breaks)
  54. legend("topright", legend=c(expression(tau[' pred']), expression(tau[' real'])),
  55. fill=c(c1, c2), box.lty=0, cex=0.8, border=F, x.intersp=0.5)
  56. # Plot histograms for mu0_pred and mu0_real
  57. breaks <- seq(min(c(mu0_pred, mu0_real)), max(c(mu0_pred, mu0_real)), length.out=15)
  58. hist(mu0_pred, main=expression(mu[0]*"(x)"),
  59. freq=FALSE, cex.main=0.9, ylab="Density", xlab=expression(mu[0]), col=c1,
  60. border=F, breaks=breaks)
  61. hist(mu0_real, col=c2, add=TRUE, freq=FALSE, border=F, breaks=breaks)
  62. legend("topright", legend=c(expression(mu[0][' pred']), expression(mu[0][' real'])),
  63. fill=c(c1, c2), box.lty=0, cex=0.8, border=F, x.intersp=0.5)
  64. # Calculate the average treatment effect
  65. psi_0 <- as.vector(coeffs[[4]]) # influence function for t=0
  66. psi_1 <- as.vector(coeffs[[5]]) # influence function for t=1
  67. ate <- mean(psi_1-psi_0)
  68. # Calculate the 95% confidence interval for average treatment effect
  69. CI_lowerbound <- ate - qnorm(0.975)*sd(psi_1-psi_0)/sqrt(length(psi_0))
  70. CI_upperbound <- ate + qnorm(0.975)*sd(psi_1-psi_0)/sqrt(length(psi_0))

Explanation of the parameters of the main function causal_net_estimate()

  1. causal_net_estimate(training_data, validation_data, test_data, hidden_layer_sizes,
  2. dropout_rates=None, batch_size=None, alpha=0., r_par=0.,
  3. optimizer='Adam', learning_rate=0.0009, max_epochs_without_change=30,
  4. max_nepochs=5000, seed=None, estimate_ps=False, verbose=True,
  5. hidden_layer_sizes_t=None, dropout_rates_t=None, batch_size_t=None,
  6. alpha_t=0., r_par_t=0., optimizer_t='Adam', learning_rate_t=0.0009,
  7. max_epochs_without_change_t=30, max_nepochs_t=5000, seed_t=None)
  1. Parameters
  2. ----------
  3. training_data: list of arrays
  4. Data on which the training of the Neural Network will be
  5. performed. It must be comprised as a list of arrays, in the
  6. following manner: [X_train, T_train, Y_train]
  7. Here, `X_train` is an array of input features, `T_train` is
  8. the treatment array, and `Y_train` is the target array.
  9. validation_data: list of arrays
  10. Data on which the validation of the Neural Network will be
  11. performed. It has to be composed in the same manner as the
  12. training data.
  13. test_data: list of arrays
  14. Data on which we want to perform estimation. It has to be
  15. composed in the same manner as the training and validation data.
  16. hidden_layer_sizes: list of ints
  17. `hidden_layer_sizes` is a list that defines a size and width of
  18. the neural network that estimates causal coefficients. Length of
  19. the list defines the number of hidden layers. Entries of the
  20. list define the number of hidden units in each hidden layer.
  21. No default value, it needs to be provided.
  22. E.g. hidden_layer_sizes = [60, 30]
  23. dropout_rates: list of floats or None, optional
  24. If it's a list than the values of the list represent dropout
  25. rate for each layer of the feed-forward neural network
  26. that estimates causal coefficients. Each entry of the list has
  27. to be between 0 and 1. Also, list has to be of the same length
  28. as the list 'hidden_layer_sizes'. If is set to None, than
  29. dropout is not applied. Default value is None.
  30. batch_size: int, optional
  31. Batch size for the neural network that estimates causal
  32. coefficients. Default value is None. If batch_size is None,
  33. than batch size is equal to length of the training dataset for
  34. training datasets smaller than 50000 rows and set to 1024 for
  35. larger datasets. Otherwise, it is equal to the value provided.
  36. alpha: float, optional
  37. Regularization strength parameter for the neural network that
  38. estimates causal coefficients. Default value is 0.
  39. r_par: float, optional
  40. Mixing ratio of Ridge and Lasso regression for the neural
  41. network that estimates causal coefficients.
  42. Has to be between 0 and 1. If r_par = 1, than this is equal to
  43. having Lasso regression. If r_par = 0, than it is equal to
  44. having Ridge regression. Default value is 0.
  45. optimizer: {'Adam', 'GradientDescent', 'RMSprop'}, optional
  46. Which optimizer to use for the neural network that estimates
  47. causal coefficients. Default: 'Adam'.
  48. learning_rate: scalar, optional
  49. Learning rate for the neural network that estimates
  50. causal coefficients. Default value is 0.0009.
  51. max_epochs_without_change: int, optional
  52. Number of epochs with no improvement on the validation loss to
  53. wait before stopping the training for the neural network that
  54. estimates causal coefficients. Default value is 30.
  55. max_nepochs: int, optional
  56. Maximum number of epochs for which neural network that
  57. estimates causal coefficients will be trained.
  58. Default value is 5000.
  59. seed: int or None, optional
  60. Tensorflow seed number for the neural network that estimates
  61. causal coefficients. Default value is None.
  62. estimate_ps: bool, optional
  63. Should the propensity scores be estimated or not. If the
  64. treatment is randomized then this variable should be set to
  65. False. In not randomized treatment case, it should be set to
  66. True. Default value is False.
  67. verbose: bool, optional
  68. Should the model summary and losses during training be printed.
  69. If it is set to False, the printing behavior is suppressed.
  70. Default value is True.
  71. hidden_layer_sizes_t: list of ints or None, optional
  72. `hidden_layer_sizes_t` is a list that defines a size and width
  73. of the neural network that estimates propensity scores. Length
  74. of the list defines the number of hidden layers. Entries of the
  75. list define the number of hidden units in each hidden layer.
  76. Default value is None, but if 'estimate_ps' is set to True,
  77. than the values for this argument needs to be provided.
  78. E.g. hidden_layer_sizes_t = [60, 30]
  79. dropout_rates_t: list of floats or None, optional
  80. If it's a list than the values of the list represent dropout
  81. rate for each layer of the neural network that estimates
  82. propensity scores. Each entry of the list has to be between 0
  83. and 1. Also, list has to be of same length as the list
  84. 'hidden_layer_sizes_t'. If is set to None, than dropout is not
  85. applied.
  86. Default value is None.
  87. batch_size_t: int, optional
  88. Batch size for the neural network that estimates propensity
  89. scores. Default value is None. If batch_size is None, than
  90. batch size is equal to the length of the training dataset for
  91. training datasets smaller than 50000 rows and set to 1024 for
  92. larger datasets. Otherwise, it is equal to the value provided.
  93. alpha_t: float, optional
  94. Regularization strength parameter for the neural network that
  95. estimates propensity scores. Default value is 0.
  96. r_par_t: float, optional
  97. Mixing ratio of Ridge and Lasso regression for the neural
  98. network that estimates propensity scores.
  99. Has to be between 0 and 1. If r_par_t = 1, than this is equal to
  100. having Lasso regression. If r_par_t = 0, than it is equal to
  101. having Ridge regression. Default value is 0.
  102. optimizer_t: {'Adam', 'GradientDescent', 'RMSprop'}, optional
  103. Which optimizer to use for the neural network that estimates
  104. propensity scores. Default: 'Adam'.
  105. learning_rate_t: scalar, optional
  106. Learning rate for the neural network that estimates propensity
  107. scores. Default value is 0.0009.
  108. max_epochs_without_change_t: int, optional
  109. Number of epochs with no improvement on the validation loss to
  110. wait before stopping the training for the neural network that
  111. estimates propensity scores. Default value is 30.
  112. max_nepochs_t: int, optional
  113. Maximum number of epochs for which neural network, that
  114. estimates propensity scores, will be trained.
  115. Default value is 5000.
  116. seed_t: int or None, optional
  117. Tensorflow seed number for the neural network that estimates
  118. propensity scores. Default value is None.
  119. Returns
  120. -------
  121. tau_pred: ndarray
  122. Estimated conditional average treatment effect.
  123. mu0_pred: ndarray
  124. Estimated target value given x in case of no treatment.
  125. prob_t_pred: ndarray
  126. Estimated propensity scores.
  127. psi_0: ndarray
  128. Influence function for given x in case of no treatment.
  129. psi_1: ndarray
  130. Influence function for given x in case of treatment.
  131. history_dict: dict
  132. Dictionary that stores validation and training loss values for
  133. CoeffNet.
  134. history_ps_dict: dict
  135. Dictionary that stores validation and training loss values for
  136. PropensityScoreNet. If estimate_ps is set to None,
  137. history_ps_dict is set to None as well.

References

Farrell, M.H., Liang, T. and Misra, S. (2021), Deep Neural Networks for Estimation and Inference. Econometrica, 89: 181-213. https://doi.org/10.3982/ECTA16901