项目作者: vincentlux

项目描述 :
Pilot study on email retrieval system
高级语言: Python
项目地址: git://github.com/vincentlux/Cymantix.git
创建时间: 2018-10-15T18:10:18Z
项目社区:https://github.com/vincentlux/Cymantix

开源协议:

下载


Cymantix

Pilot study on email retrieval system

Run

  • git clone https://github.com/vincentlux/Cymantix.git
  • pip install -r requirements.txt
  • python cymantix_grammar.py —command ‘command here’

Part1: Grammar

EBNF (with regexes):

  1. all ::= space op space sc
  2. sc ::= sc_EMAIL_from | sc_EMAIL_attach | sc_EMAIL_piece | space
  3. sc_EMAIL_piece ::= "EMAIL last" space op_LAST_piece
  4. sc_EMAIL_from ::= "EMAIL from" space op_lit_name
  5. sc_EMAIL_attach ::= "EMAIL" space sc_attach
  6. sc_attach ::= "MSWORD" | "PDF" | "GIF"
  7. op ::= op_trig space op_first space
  8. op_first ::= op_TOTAL | op_LAST | op_lit_ON | op_lit_name
  9. op_lit_ON ::= op_lit_name* op_ON space op_lit_topic space sc_attach* space op_LAST*
  10. op_lit_topic ::= "'" chars "'"
  11. op_lit_name ::= ("'" (chars space)+ "'" space)+
  12. op_ON ::= "ON"
  13. op_LAST ::= "LAST" space (op_LAST_time | op_LAST_piece)
  14. op_LAST_piece ::= ~"[0-9]*"
  15. op_LAST_time ::= ~"[0-9]*" space ~"[a-z]+"
  16. op_TOTAL ::= "TOTAL"
  17. op_trig ::= "?"
  18. space ::= " "*
  19. chars ::= ~"[A-z0-9]*"

Done:

  • Literature review and base language selection
  • Finish ?TOTAL, “name” (1025)
  • Test example of NodeVisitor (1026)
  • Finish SimpleEvaluator(parsimonious.NodeVisitor) to test (1026, 1029)
  • Finish ?TOTAL, ON “topic”, “LAST”, “name” (1029)
  • Finish main Cymantix grammar

Todo:

  • Debug:
    • Add double quote compatibility
    • Date issue
    • Instance vs Class issue
  • Convert email to xml
  • Web-based retrieval system

Design notes

  • Desired result

    • Save all emails in MongoDB
    • Input:

      • Json file, each one represents an email (key-value)
      • Cymantix Command

        ?”Mike” ON “picnic” MSWORD LAST

    • Output:

      • Parsed Cymantix command

        {'from': "'Mike' ", 'topic': "'Soccer'", 'attachment': 'MSWORD', 'piece': '1'}

      • Use parsed Cymantix command to retrieve corresponding emails

Part2: Web app

  • Technologies:
    • Vue.js; Python Flask; Solr

Todo:

  • Add voice (0131-0201)
  • Move miksa3 Solr indexed item to own server(0131)
  • Optimize webpage for refs and hints
  • simple.unc.edu

Done:

  • Node.js workflow (1105-1111)
  • Solr installation on vm (1112)
  • Finish converting all emails to xml (1114-1118)
  • Escape xml field in mail2xml.py to avoid solr xml bug (0111)
  • change-schema.sh(0111)
  • Add personalized schema(0115)
  • escaped all files on the server(0115)
  • time conversion (in mail2xml.py) (0116)
  • successfully query latest date
  • Modify Cymantix output grammar
  • Research on Solr search grammar
  • Extract attachment information( or just change search query) (0116-0117)
  • cymantix_solr function to convert cymantix grammar to solr grammar
  • JCDL submitted

Schema API

Difference between StrField and TextField

Working with dates