项目作者: qcl
项目描述 :
研究硕士学位,操作projizz-I / O.
高级语言: Python
项目地址: git://github.com/qcl/master-research.git
qcl ‘s research — Detection of Entity Properties in Content Stream
Preprocess for KBA - filter target attributes/properties rapidly.
Motivation
Linguistic knowledge —> world knowledge
- World knowledge varies with time
- How to acquire knowledge from heterogeneous resources to reflect the changes of real wrold is very important (KBA)
Problem
Given a target entity to be tracked, find its (new) related information from heterogeneous resources effectively because large volume of data are created.
- Target entity
- Different patterns related to the entity type
- e.g. 歐巴馬 - person type
- extract all patterns related to person
- e.g. MS - org type
- extract pattern related to org.
- Patterns 分類
- Entites (entities types)
- Dynamic v.s Static
- Related information
- Exact
- Evaluation matrics
- Speed
- acc, prec, recall
- TREC KBA -> to read KBA for more information.
- Testing Dataset
Issues
Target
Target entities have different types, different types have different patterns, different patterns related to differnet types.
Note that
- Type
- Pattern
- Feature
- Information
- Information related to human
- How many features are related to human ?
- What kinds of features are related to human ?
- What kinds of patterns can be used to find the features ?
- Tell out Dynamic and Static information
- What kinds of features are dynamic, and what kinds of features are static?
- In other words, what knowledge is unchanged?
- Tell out the position of the information
- Tell out if the information is related to the interesting targets
Efficiency
Effectiveness
- Patterns
- entity types
- dynamic (new) v.s. static
- related information, related to some patterns
- exact information, target entites, mention disambiguation
Others (暫時沒想到分法)
- Pattern coverage
- Pattern use
Statistics
References
- MongoDB
Stanford parser- Python
Dataset Used
- DBpedia
- Download
- DBpedia Ontology
- Raw Infobox Property
- Raw Infobox Property Definitions
- Persondata
- Wikipedia dump
- Wiki API,1961414
- 01-05,58115
- 06-10,126706
- 11-15,195114
- 15-18,262784
- 19-20,192237
- 21-22,206117
- 23,129647
- 24,142679
- 25,132130
- 26,125104
- 27,390781
- PATTY
- YAGO