Parallel Corpus Crawlers for Machine Translation
This repo contains crawlers to collect parallel corpora. However, un-authorized crawling may cause an issue. Before crawling, user should check the policy of the website about allowing of crawlers. Also, a user needs to assure that the result of crawling belongs to user. This repo does not have any responsibility of the result from any crawling behavior. In other words, user has all responsible for all uses.
Moreover, source codes in this repo is coded in naive way. Thus, it cannot guarantee the optimal operation.
$ python joongang_daily.py [output_fn]
$ python chosun.py