项目作者: difuture-lmu

项目描述 :
ROC-GLM for DataSHIELD
高级语言: R
项目地址: git://github.com/difuture-lmu/ds.roc.glm.git
创建时间: 2021-05-02T09:12:49Z
项目社区:https://github.com/difuture-lmu/ds.roc.glm

开源协议:GNU Lesser General Public License v3.0

下载


ROC-GLM for DataSHIELD

THE PACKAGE HAS MOVED TO THE dsBinVal REPOSITORY!

The package provides functionality to conduct and visualize ROC analysis
on decentralized data. The basis is the
DataSHIELD](https://www.datashield.org) infrastructure for
distributed computing. This package provides the calculation of the
ROC-GLM as well as
AUC confidence
intervals
. In order to
calculate the ROC-GLM it is necessry to push models and predict them at
the servers. This is done automatically by the base package
dsPredictBase. Note
that DataSHIELD uses an option datashield.privacyLevel to indicate the
minimal amount of numbers required to be allowed to share an aggregated
value of these numbers. Instead of setting the option, we directly
retrieve the privacy level from the
DESCRIPTION
file each time a function calls for it. This options is set to 5 by
default.

Installation

At the moment, there is no CRAN version available. Install the
development version from GitHub:

  1. remotes::install_github("difuture-lmu/dsROCGLM")

Register methods

It is necessary to register the assign and aggregate methods in the OPAL
administration. These methods are registered automatically when
publishing the package on OPAL (see
DESCRIPTION).

Note that the package needs to be installed at both locations, the
server and the analysts machine.

Usage

A more sophisticated example is available
here.

  1. library(DSI)
  2. #> Loading required package: progress
  3. #> Loading required package: R6
  4. library(DSOpal)
  5. #> Loading required package: opalr
  6. #> Loading required package: httr
  7. library(dsBaseClient)
  8. library(dsPredictBase)
  9. library(dsROCGLM)

Log into DataSHIELD server

  1. builder = newDSLoginBuilder()
  2. surl = "https://opal-demo.obiba.org/"
  3. username = "administrator"
  4. password = "password"
  5. builder$append(
  6. server = "ds1",
  7. url = surl,
  8. user = username,
  9. password = password
  10. )
  11. builder$append(
  12. server = "ds2",
  13. url = surl,
  14. user = username,
  15. password = password
  16. )
  17. connections = datashield.login(logins = builder$build(), assign = TRUE)
  18. #>
  19. #> Logging into the collaborating servers

Assign iris and validation vector at DataSHIELD (just for testing)

  1. datashield.assign(connections, "iris", quote(iris))
  2. datashield.assign(connections, "y", quote(c(rep(1, 50), rep(0, 100))))

Load test model, push to DataSHIELD, and calculate predictions

  1. # Model predicts if species of iris is setosa or not.
  2. iris$y = ifelse(iris$Species == "setosa", 1, 0)
  3. mod = glm(y ~ Sepal.Length, data = iris, family = binomial())
  4. # Push the model to the DataSHIELD servers using `dsPredictBase`:
  5. pushObject(connections, mod)
  6. # Calculate scores and save at the servers using `dsPredictBase`:
  7. pfun = "predict(mod, newdata = D, type = 'response')"
  8. predictModel(connections, mod, "pred", "iris", predict_fun = pfun)
  9. datashield.symbols(connections)
  10. #> $ds1
  11. #> [1] "iris" "mod" "pred" "y"
  12. #>
  13. #> $ds2
  14. #> [1] "iris" "mod" "pred" "y"

Calculate l2-sensitivity

  1. # In order to securely calculate the ROC-GLM, we have to assess the
  2. # l2-sensitivity to set the privacy parameters of differential
  3. # privacy adequately:
  4. l2s = dsL2Sens(connections, "iris", "pred")
  5. l2s
  6. #> [1] 0.1280699
  7. # Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to
  8. # - epsilon = 0.2, delta = 0.1 if i l2s <= 0.01
  9. # - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03
  10. # - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05
  11. # - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07
  12. # - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!

Calculate ROC-GLM

  1. roc_glm = dsROCGLM(connections, truth_name = "y", pred_name = "pred",
  2. dat_name = "iris", seed_object = "y")
  3. #>
  4. #> [2022-04-04 12:47:46] L2 sensitivity is: 0.1281
  5. #> Warning in dsROCGLM(connections, truth_name = "y", pred_name = "pred", dat_name
  6. #> = "iris", : l2-sensitivity may be too high for good results! Epsilon = 0.5 and
  7. #> delta = 0.5 is used which may lead to bad results.
  8. #>
  9. #> [2022-04-04 12:47:47] Setting: epsilon = 0.5 and delta = 0.5
  10. #>
  11. #> [2022-04-04 12:47:47] Initializing ROC-GLM
  12. #>
  13. #> [2022-04-04 12:47:47] Host: Received scores of negative response
  14. #> [2022-04-04 12:47:47] Receiving negative scores
  15. #> [2022-04-04 12:47:49] Host: Pushing pooled scores
  16. #> [2022-04-04 12:47:50] Server: Calculating placement values and parts for ROC-GLM
  17. #> [2022-04-04 12:47:52] Server: Calculating probit regression to obtain ROC-GLM
  18. #> [2022-04-04 12:47:53] Deviance of iter1=137.2431
  19. #> [2022-04-04 12:47:54] Deviance of iter2=121.5994
  20. #> [2022-04-04 12:47:56] Deviance of iter3=147.7237
  21. #> [2022-04-04 12:47:57] Deviance of iter4=140.4008
  22. #> [2022-04-04 12:47:58] Deviance of iter5=129.2244
  23. #> [2022-04-04 12:48:00] Deviance of iter6=123.9979
  24. #> [2022-04-04 12:48:01] Deviance of iter7=123.1971
  25. #> [2022-04-04 12:48:02] Deviance of iter8=124.1615
  26. #> [2022-04-04 12:48:04] Deviance of iter9=124.5356
  27. #> [2022-04-04 12:48:05] Deviance of iter10=124.5503
  28. #> [2022-04-04 12:48:06] Deviance of iter11=124.5504
  29. #> [2022-04-04 12:48:08] Deviance of iter12=124.5504
  30. #> [2022-04-04 12:48:08] Host: Finished calculating ROC-GLM
  31. #> [2022-04-04 12:48:08] Host: Cleaning data on server
  32. #> [2022-04-04 12:48:09] Host: Calculating AUC and CI
  33. #> [2022-04-04 12:48:18] Finished!
  34. roc_glm
  35. #>
  36. #> ROC-GLM after Pepe:
  37. #>
  38. #> Binormal form: pnorm(2.51 + 1.55*qnorm(t))
  39. #>
  40. #> AUC and 0.95 CI: [0.86----0.91----0.95]
  41. plot(roc_glm)

Deploy information:

Build by root (Darwin) on 2022-04-04 12:48:23.

This readme is built automatically after each push to the repository.
Hence, it also is a test if the functionality of the package works also
on the DataSHIELD servers. We also test these functionality in
tests/testthat/test_on_active_server.R. The system information of the
local and remote servers are as followed:

  • Local machine:
    • R version: R version 4.1.3 (2022-03-10)
    • Version of DataSHELD client packages:
Package Version
DSI 1.3.0
DSOpal 1.3.1
dsBaseClient 6.1.1
dsPredictBase 0.0.1
dsROCGLM 1.0.0
  • Remote DataSHIELD machines:
    • R version of ds1: R version 4.1.1 (2021-08-10)
    • R version of ds2: R version 4.1.1 (2021-08-10)
    • Version of server packages:
Package ds1: Version ds2: Version
dsBase 6.1.1 6.1.1
resourcer 1.1.1 1.1.1
dsPredictBase 0.0.1 0.0.1
dsROCGLM 1.0.0 1.0.0