项目作者: returnString

项目描述 :
Native R interface to the MongoDB aggregation pipeline
高级语言: R
项目地址: git://github.com/returnString/mongoplyr.git
创建时间: 2017-08-04T23:14:24Z
项目社区:https://github.com/returnString/mongoplyr

开源协议:MIT License

下载


mongoplyr

Travis-CI Build Status
Coverage Status

A native R interface to the MongoDB aggregation pipeline.

Inspired by the pipeline-esque syntax enjoyed when doing in-memory work with a combination of dplyr and magrittr, mongoplyr allows you to construct aggregation pipeline queries for MongoDB without the ugliness and insecurity of pasting strings together.

It uses jeroen/mongolite as a client library to actually communicate with MongoDB servers.

If you have a feature request, e.g. to prioritise an unimplemented portion of the aggregation pipeline, or a bug, please either file an issue on GitHub or poke me on Twitter.

Example usage

  1. library(mongoplyr)
  2. # using the MongoDB NY restaurant dataset
  3. conn <- mongo(db = "mydb", collection = "restaurants")
  4. MongoPipeline() %>%
  5. mmatch(.borough == "Manhattan") %>%
  6. mgroup(by = .cuisine, count = .sum(1)) %>%
  7. msort(.count = -1) %>%
  8. mlimit(5) %>%
  9. mexecute(conn) -> topFiveCuisinesInManhattan

Should result in a data frame like so:

  1. id count
  2. 1 American 3205
  3. 2 Café/Coffee/Tea 680
  4. 3 Italian 621
  5. 4 Chinese 510
  6. 5 Japanese 438

Installation

The package is not yet available on CRAN. However, it can be installed directly from source with the devtools package:

  1. devtools::install_github("returnString/mongoplyr")

Currently supported and tested R versions:

  • 3.4.x
  • 3.3.x

This package is also built and tested against the latest development build of R.

With regards to platforms, we officially test all changes on Linux with Travis CI, and most development currently occurs on Windows. I aim to implement automated testing on Windows and OSX to expand this.

Production usage

This package is currently in use at Deep Silver Dambuster Studios, primarily inside of Shiny dashboards for internal cluster telemetry, where we extract and prepare data for further analysis and presentation. This has caused a 99% reduction in our usage of paste0 ;)

That said, due to the relative immaturity of this codebase, please exercise caution and verify your results independently from this package before committing to presenting data.