项目作者: NetherlandsForensicInstitute

项目描述 :
LIR Python Likelihood Ratio Library
高级语言: Python
项目地址: git://github.com/NetherlandsForensicInstitute/lir.git
创建时间: 2019-02-21T12:01:15Z
项目社区:https://github.com/NetherlandsForensicInstitute/lir

开源协议:Apache License 2.0

下载


LIR Python Likelihood Ratio Library

This library provides a collection of scripts to aid calibration, and
calculation and evaluation of Likelihood Ratios.

A simple score-based LR system

A score-based LR system needs a scorer and a calibrator. The most basic setup
uses a training set and a test set. Both the scorer and the calibrator are
fitted on the training set.

  1. import lir
  2. import numpy as np
  3. from sklearn.linear_model import LogisticRegression
  4. from sklearn.model_selection import train_test_split
  5. # generate some data randomly from a normal distribution
  6. X = np.concatenate([np.random.normal(loc=0, size=(100, 1)),
  7. np.random.normal(loc=1, size=(100, 1))])
  8. y = np.concatenate([np.zeros(100), np.ones(100)])
  9. # split the data into train and test
  10. X_train, X_test, y_train, y_test = train_test_split(X, y)
  11. # initialize a scorer and a calibrator
  12. scorer = LogisticRegression(solver='lbfgs') # choose any sklearn style classifier
  13. calibrator = lir.KDECalibrator() # use plain KDE for calibration
  14. calibrated_scorer = lir.CalibratedScorer(scorer, calibrator)
  15. # fit and predict
  16. calibrated_scorer.fit(X_train, y_train)
  17. lrs_test = calibrated_scorer.predict_lr(X_test)
  18. # print the quality of the system as log likelihood ratio cost (lower is better)
  19. print('The log likelihood ratio cost is', lir.metrics.cllr(lrs_test, y_test), '(lower is better)')
  20. print('The discriminative power is', lir.metrics.cllr_min(lrs_test, y_test), '(lower is better)')
  21. # plot calibration
  22. import lir.plotting
  23. with lir.plotting.show() as ax:
  24. ax.pav(lrs_test, y_test)

The log likelihood ratio cost (Cllr) may be used as a metric of performance.
In this case it should yield a value of around .8, but highly variable due to
the small number of samples. Increase the sample size to get more stable
results.