项目作者: mannharleen

项目描述 :
This repo is a scala project that helps convert to and from hadoop file formats without having to use a hadoop cluster i.e. in local mode
高级语言: XSLT
项目地址: git://github.com/mannharleen/convertHadoopFileFormatsLocally.git
创建时间: 2018-02-03T09:37:59Z
项目社区:https://github.com/mannharleen/convertHadoopFileFormatsLocally

开源协议:Apache License 2.0

下载


convert Hadoop File Formats Locally

This scala pet-project helps convertions between hadoop file formats and text formats without having to use a hadoop cluster i.e. in local mode

Motivation:

I developed this utility to convert certain text-like files into hadoop file formats before ingesting into HDFS. The

Building the code:

  • The build.sbt file shipped here can be used to create an assembly jar if the need be. i recommend creating an assembly jar wherever possible
  • To build an assembly jar:
    sbt assembly
  • To build a jar:
    sbt package

Using the JAR in your code:

  • Place the jar in the class path
  • import csvToParquet._
  • To convert csv to parquet, use: readCsvWriteParquet.main(Array(“d:\abc.csv”, “d:\abc.parquet”, “string,int,double,string”))

Using the JAR as an application:

  • java -cp convertHadoopFileFormatsLocally-assembly-0.1.jar csvToParquet.readCsvWriteParquet d:\abc.csv d:\abc.parquet string,int,double,string

Using the JAR is a Talend job: