项目作者: hieusydo

项目描述 :
Create an inverted index structure
高级语言: C++
项目地址: git://github.com/hieusydo/IndexBuilder.git
创建时间: 2018-10-11T03:10:59Z
项目社区:https://github.com/hieusydo/IndexBuilder

开源协议:MIT License

下载


IndexBuilder

CS 6913 (Web Search Engine) Assignment - NYU Tandon School of Engineering

Goal

Create an inverted index structure from CommonCrawl data

What It Does

  • Use merge sort indexing
  • Compress final index with variable byte encoding and chunk-wise compression
  • End-to-end indexing rate of 430 documents per second
  • Store original documents to database (SQLite3 or Redis)