A clinical BERT-based NLP tool for parsing clinical trial abstracts following the PICO framework
Parse RCT PubMed abstracts following PICO framework to standarize PICO elements.
More modules coming soon for representing medical evidence information comprehensively from RCT abstracts.
Adapted from NCBI-NLP BlueBERT
requirements.txt
parser_config.py
to customize your own diretories and BERT configuration Run to start parsing (specify your input in —data_dir and output directory in — output_dir. In the input directory, each abstract text is put in one text file with its pmid as the file name. Example data is provided in test
folder.
python run_bluebert_ner_predict.py --data_dir= --output_dir=
To run examples:
python run_bluebert_ner_predict.py --data_dir=test/txt --output_dir=test/json`
Input test/txt
Parsing results test/json
requirements.txt
parser_config.py
to customize your own diretories and installation python Phase1_NER_predict.py
to start parsing JSON
Input example.txt
contain over 70+ abstracts with methods sections
Parsing results folder example_json_out
{
"pmid": "11264545",
"sentences": {
"sent_1": {
"Section": "METHODS",
"text": "METHODS AND RESULTS : To determine the relative power of radiographic heart measurements for predicting outcome in dilated cardiomyopathy , we retrospectively studied 88 adult patients with chest radiographs obtained within 35 days of echocardiography .",
"entities": {
"entity_1": {
"text": "radiographic heart measurements",
"class": "Outcome",
"negation": 0,
"UMLS": "C0018787:heart,C1306645:radiograph,",
"index": 1,
"start": 10
},
"entity_2": {
"text": "predicting outcome",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"index": 2,
"start": 14
},
"entity_3": {
"text": "dilated cardiomyopathy",
"class": "Participant",
"nega tion": 0,
"UMLS": "C0007193:dilated cardiomyopathy,",
"index": 3,
"start": 17
},
"entity_4": {
"text": "chest radiographs",
"class": "Participant",
"negation": 0,
"UMLS": "C1306645:radiographs,C0817096:chest,",
"index": 4,
"start": 27
},
"entity_5": {
"text": "echocardiography",
"c lass": "Participant",
"negation": 0,
"UMLS": "C0013516:echocardiography,",
"index": 5,
"start": 34
}
},
"relations": {}
},
"sent_2": {
"Section": "METHODS",
"text": "Standard radiographic variables were measured for each patient , and the cardiothoracic ( CT ) ratio , frontal cardiac area , and volume were calculated .",
"entities": {
"entity_6": {
"text": "Standard radiographic variables",
"class": "Outcome",
"negation": 0,
"UMLS": "C0038137:Standard,C1306645:radiograph,",
"index": 1,
"start": 0
},
"entity_7": {
"text": "cardiothoracic ( CT ) ratio",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"index": 2,
"start": 11
},
"entity_8": {
"text": "frontal cardiac area",
"class": "Outcome",
"negation": 0,
"UMLS": "C0018787:cardiac,",
"index": 3,
"start": 17
},
"entity_9": {
"text": "volume",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"inde x": 4,
"start": 22
}
},
"relations": {}
}
}
}
XML
Input test.txt
Parsing results temp.xml
A double-blind crossover comparison of pindolol , metoprolol , atenolol and labetalol in mild to moderate hypertension . 1 This study was designed to compare in a double-blind randomized crossover trial , atenolol , labetalol , metoprolol and pindolol . Considerable differences in dose ( atenolol 138 +/- 13 mg daily ; labetalol 308 +/- 34 mg daily ; metoprolol 234 +/- 22 mg daily ; and pindolol 24 +/-2 mg daily were required to produce similar antihypertensive effects .
<abstract>
<sent>
<text>A double-blind crossover comparison of pindolol , metoprolol , atenolol and labetalol in mild to moderate hypertension .</text>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T1' start='5'> pindolol </entity>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T2' start='7'> metoprolol </entity>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T3' start='9'> atenolol </entity>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T4' start='11'> labetalol </entity>
<entity class='Participant' UMLS='C0020538:hypertension' index='T5' start='13'> mild to moderate hypertension </entity>
</sent>
<sent>
<text>1 This study was designed to compare in a double-blind randomized crossover trial , atenolol , labetalol , metoprolol and pindolol .</text>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T6' start='14'> atenolol </entity>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T7' start='16'> labetalol </entity>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T8' start='18'> metoprolol </entity>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T9' start='20'> pindolol </entity>
</sent>
<sent>
<text>Considerable differences in dose ( atenolol 138 +/- 13 mg daily ; labetalol 308 +/- 34 mg daily ; metoprolol 234 +/- 22 mg daily ; and pindolol 24 +/-2 mg daily were required to produce similar antihypertensive effects .</text>
<attribute class='modifier' index='T10' start='1'> differences </attribute>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T11' start='5'> atenolol </entity>
<attribute class='measure' index='T12' start='6'> 138 +/- 13 mg daily </attribute>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T13' start='12'> labetalol </entity>
<attribute class='measure' index='T14' start='13'> 308 +/- 34 mg daily </attribute>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T15' start='19'> metoprolol </entity>
<attribute class='measure' index='T16' start='20'> 234 +/- 22 mg daily </attribute>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T17' start='27'> pindolol </entity>
<attribute class='measure' index='T18' start='28'> 24 +/-2 mg daily </attribute>
<entity class='Outcome' UMLS='C0003364:antihypertensive' index='T19' start='37'> antihypertensive effects </entity>
</sent>
</abstract>