项目作者: kh-kim

项目描述 :
Parallel Corpus Crawlers for Machine Translation
高级语言: Python
项目地址: git://github.com/kh-kim/parallel_corpus_crawler.git
创建时间: 2018-03-10T13:39:51Z
项目社区:https://github.com/kh-kim/parallel_corpus_crawler

开源协议:

下载


Parallel Corpus Crawlers for Machine Translation

This repo contains crawlers to collect parallel corpora. However, un-authorized crawling may cause an issue. Before crawling, user should check the policy of the website about allowing of crawlers. Also, a user needs to assure that the result of crawling belongs to user. This repo does not have any responsibility of the result from any crawling behavior. In other words, user has all responsible for all uses.

Moreover, source codes in this repo is coded in naive way. Thus, it cannot guarantee the optimal operation.

Usage

  1. $ python joongang_daily.py [output_fn]
  1. $ python chosun.py