项目作者: yutkin

项目描述 :
Corpus of Russian news articles collected from Lenta.Ru
高级语言: Python
项目地址: git://github.com/yutkin/Lenta.Ru-News-Dataset.git
创建时间: 2017-04-04T06:56:45Z
项目社区:https://github.com/yutkin/Lenta.Ru-News-Dataset

开源协议:

下载


Corpus of news articles of Lenta.Ru

  • Size: 337 Mb (2 Gb uncompressed)
  • News articles: 800K+
  • Dates: 30/08/1999 - 14/12/2019
  • Script for news downloading (Python 3.7+ is required).

Download

Decompression

bzip2 -d lenta-ru-news.csv.bz2