knowledge based Question Answering
This is knowledge base QA task with the data from http://tcci.ccf.org.cn/conference/2017/taskdata.php (task 5: Open Domain Question Answering).
Chinese version introduction: https://blog.csdn.net/m0_37531129/article/details/103321814
You can clone or download this ‘KB_QA’ repository.
please ‘cd preProcessData’
run splitTest.py
There are train and test dataset in NLPCC2017 task5. We can spilt test data by 1:1 to get test and dev data.
run preCleanData.py
There are three functions in this script: getNERData, getDBData and getSimilarityData.
You will get three folders named NERData, DBData, SIMData.
run uploadDB.py
(pls create a KB_QA database in mysql).
Running the script, it will upload data in DBData folder to KB_QA database.
training NER model
run NERMain.py —data_dir preProcessData/NERData —vocab_file BertPreTrainedModel/vocab.txt —model_config BertPreTrainedModel/conig.json —output_dir output_model —pre_train_model BertPreTrainedModel/pytorch_model.bin —max_seq_length 64 —do_train —train_batch_size 8 —eval_batch_szie 8 —gradient_accumulation_steps 16 —num_train_epochs 8
training classification model
run SIMMain.py —data_dir preProcessData/SIMData —vocab_file BertPreTrainedModel/vocab.txt —model_config BertPreTrainedModel/config.json —output_dir output_model —pre_train_model BertPreTrainedModel/pytorch_model.bin —max_seq_length 64 —do_train —train_epoch_size 8 —eval_batch_size 8 —gradient_accumulation_steps 16 -num_train_epochs 8
python RunTask.py