项目作者: sidenver

项目描述 :
Hierarchical Time-Biased Gain (hTBG): A metric for hierarchical ranking
高级语言: Python
项目地址: git://github.com/sidenver/hTBG.git
创建时间: 2020-04-14T22:46:34Z
项目社区:https://github.com/sidenver/hTBG

开源协议:

下载


Hierarchical Time-Biased Gain (hTBG)

An evaluation metric for hierarchical ranking

hTBG

Hierarchical Time-Biased Gain (hTBG) implementation from the ACL2020 paper A Prioritization Model for Suicidality Risk Assessment by Han-Chin Shing, Philip Resnik, and Doug Oard.

This repository contains implementation for hTBG and TBG only. The SuicideWatch dataset, due to privacy concern, is not included. Please see the UMD SuicideWatch Dataset page for how to obtain the data.

Commend Line Usage

See below for expected input and output format.

  1. Usage:
  2. hTBG.py --relevance=<json>
  3. --prediction=<json>
  4. [--t_half_lives=<arg>]...
  5. [options]
  6. hTBG.py --help
  7. Options:
  8. -h --help show this help message and exit
  9. -t --tbg run TBG, the non-hierarchical version of hTBG, stopping probabilities set to zero
  10. -v --verbose be verbose, print out hyper-parameters
  11. --p_click_true=<prob> probability of clicking based on the summary if the target is relevant [default: 0.64]
  12. --p_click_false=<prob> probability of clicking based on the summary if the target is not relevant [default: 0.39]
  13. --p_save_true=<prob> probability of saving based on detailed review if the target is relevant [default: 0.77]
  14. --p_save_false=<prob> probability of saving based on detailed review if the target is not relevant [default: 0.27]
  15. --t_summary=<sec> time (in second) to read a summary [default: 4.4]
  16. --t_alpha=<float> scaling factor from word to second [default: 0.018]
  17. --t_beta=<float> bias factor from word to second [default: 7.8]
  18. --t_half_lives=<arg> time (in second) of the half-live parameter in the decay function [default: 224. 1800.]

For the detailed meaning of the hyperparameters:

Smucker, Mark D., and Charles LA Clarke. "Time-based calibration of effectiveness measures." SIGIR 2012

For the detailed description of hTBG:

Shing, Han-Chin, Resnik, Philip, and Oard, Doug. "A Prioritization Model for Suicidality Risk Assessment." ACL 2020

Use as a library

  1. from hTBG import hTBG
  2. from pprint import pprint
  3. htbg_parameters = {
  4. "p_click_true": 0.64,
  5. "p_click_false": 0.39,
  6. "p_save_true": 0.77,
  7. "p_save_false": 0.27,
  8. "t_summary": 4.4,
  9. "t_alpha": 0.018,
  10. "t_beta": 7.8,
  11. "t_half_lives": [224., 1800.]
  12. }
  13. # relevance and prediction follows the same input format as specified below.
  14. # with open(relevance_path, 'r') as input_file:
  15. # relevance = json.load(input_file)
  16. # with open(prediction_path, 'r') as input_file:
  17. # prediction = json.load(input_file)
  18. htbg = hTBG(relevance=relevance,
  19. prediction=prediction,
  20. hierarchical=True,
  21. verbose=False,
  22. **htbg_parameters)
  23. print('evaluation results:')
  24. pprint(htbg.evaluate())
  25. print('best possible values:')
  26. pprint(htbg.evaluate_best())

Expected Input Format:

The relevance (truth) structure:

  1. {
  2. "query_name": {
  3. "higher_level_name": [
  4. true_score (0 or 1),
  5. {
  6. "lower_level_name": [stopping_probability, cost_of_lower_level],
  7. ...
  8. }
  9. ],
  10. ...
  11. },
  12. ...
  13. }

For an example, see ./hTBG_test/truth.json

The prediction structure:

  1. {
  2. "query_name": {
  3. "higher_level_name": [
  4. predicted_score (float),
  5. {
  6. "lower_level_name": predicted_lower_level_score (float),
  7. ...
  8. }
  9. ],
  10. ...
  11. },
  12. ...
  13. }

For an example, see ./hTBG_test/prediction.json

Output format

  1. {
  2. "query_name": {
  3. t_half_life: score,
  4. ...
  5. },
  6. ...
  7. }

Example

A toy example for hTBG

  1. python hTBG.py --relevance ./hTBG_test/truth.json --prediction ./hTBG_test/prediction.json \
  2. --t_half_lives=3 --t_half_lives=5 --t_half_lives=10

using ./hTBG_test/truth.json as the truth, ./hTBG_test/prediction.json as the prediction, and evaluate at t_half_lives=3, 5, and 10

Expected output:

  1. evaluation results:
  2. {'q_1': {3.0: 0.5248706964598764,
  3. 5.0: 0.588460647126441,
  4. 10.0: 0.7099210881142366},
  5. 'q_2': {3.0: 0.06428104166337158,
  6. 5.0: 0.17063498548694406,
  7. 10.0: 0.3878882375890544}}
  8. best possible values:
  9. {'q_1': {3.0: 0.543081360426777,
  10. 5.0: 0.6180888697456681,
  11. 10.0: 0.7412800897670984},
  12. 'q_2': {3.0: 0.5406284846869924,
  13. 5.0: 0.6143850709821594,
  14. 10.0: 0.7375797438106515}}

A toy example for TBG

Just turn on the -t parameter (equivalent to setting the stopping probabilities to zero)

  1. python hTBG.py --relevance ./hTBG_test/truth.json --prediction ./hTBG_test/prediction.json -t \
  2. --t_half_lives=3 --t_half_lives=5 --t_half_lives=10

Expected output:

  1. evaluation results:
  2. {'q_1': {3.0: 0.5217239318926233,
  3. 5.0: 0.5827130392122296,
  4. 10.0: 0.7032973769997782},
  5. 'q_2': {3.0: 0.06407785194702847,
  6. 5.0: 0.169888953131207,
  7. 10.0: 0.3864369224412395}}
  8. best possible values:
  9. {'q_1': {3.0: 0.5364948631648881,
  10. 5.0: 0.607966590522515,
  11. 10.0: 0.731031181438315},
  12. 'q_2': {3.0: 0.5364948631648881,
  13. 5.0: 0.607966590522515,
  14. 10.0: 0.731031181438315}}

Citation

If you use this software, please cite

Shing et al., A Prioritization Model for Suicidality Risk Assessment, ACL2020

  1. @inproceedings{shing-etal-2020-prioritization,
  2. title = "A Prioritization Model for Suicidality Risk Assessment",
  3. author = "Shing, Han-Chin and
  4. Resnik, Philip and
  5. Oard, Douglas",
  6. booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
  7. month = jul,
  8. year = "2020",
  9. address = "Online",
  10. publisher = "Association for Computational Linguistics",
  11. url = "https://www.aclweb.org/anthology/2020.acl-main.723",
  12. pages = "8124--8137",
  13. abstract = "We reframe suicide risk assessment from social media as a ranking problem whose goal is maximizing detection of severely at-risk individuals given the time available. Building on measures developed for resource-bounded document retrieval, we introduce a well founded evaluation paradigm, and demonstrate using an expert-annotated test collection that meaningful improvements over plausible cascade model baselines can be achieved using an approach that jointly ranks individuals and their social media posts.",
  14. }