项目作者: singhrahuldps

项目描述 :
A python script to deploy One-Hot encoding in Pandas Dataframes
高级语言: Python
项目地址: git://github.com/singhrahuldps/OneHotEncode.git
创建时间: 2018-04-18T19:31:43Z
项目社区:https://github.com/singhrahuldps/OneHotEncode

开源协议:MIT License

下载


One-Hot Encode

A python script to deploy One-Hot encoding in Pandas Dataframes

Requirements -> Pandas, Numpy

Installation

  1. pip install OneHotEncode

Usage

  1. from OneHotEncode.OneHotEncode import *
  2. df,dropped_cols,all_new_cols,new_col_dict = OneHotEncode(df,Categorical_column_list,check_numerical=False,max_var=20)

Input -> (pandas_dataframe,cols,check_numerical=False,max_var=20)

  1. pandas_dataframe -> The Pandas Dataframe object that contains the column you want to one-hot encode
  2. cols -> List of column names in pandas_dataframe that you want to one-hot encode
  3. check_numerical (Default=False) -> A naive way of checking if the column contains numerical
  4. data or is unsuitable for one-hot encoding
  5. Set it to True to turn on the detection
  6. max_var (Default=20) -> Max number of diferent variables allowed in a category

Returns df,dropped_cols,all_new_cols,new_col_dict

  1. df -> Pandas dataframe that returns the one-hot encoded data with the original columns dropped
  2. dropped_cols -> List of arrays containg the dropped columns that were originally input
  3. all_new_cols -> List of list of names of all new columns made for a past single column
  4. new_col_dict -> List of dictionary of names of all new columns made for a past single column

More info would be soon provided through a Medium page.