This repository provides an insight into ventures of framing an intelligent Command line interface (CLI) for UNIX Based Systems.
This repository provides an insight into ventures of framing an intelligent Command line interface (CLI) for UNIX Based Systems.
The database of linux documentation can be obtained by Web Scraping the domain https://linux.die.net/ by depoying the python library Beautiful Soup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/). The script yields two comma seperated values files
named ‘cmd-names.csv’ and ‘data.csv’ containing name of the UNIX command and it’s particulars’ links and Complete database of the command respectivey.
This data can be pre-processed and utilized for the construction of knowedge base.
We have used joint learning model for learning word vector representations from both large text corpora and the knowledge base that was created in the previous step.
The data obtained after scraping from the website is passed through an extensive process of pre-processing. The various columns obtained are concatenated, all special characters are removed and the resulting data is stemmed to obtain a uniform data for all the commands. For further removing bias due to large data present in some of the commands, common words 3 characters or less are removed to finally obtain a data ready for synonym pair extraction.
The data is then feeded as input to kb.py and an output knowledge base can be obtained.