This is a repository for custom user defined functions used in Apache Hive
Apache Hive is a big data database that facilitates reading, writing, and managing large datasets residing in the distributed storage and queried using SQL syntax.
Built on top of Apache Hadoop, hive enables easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
Apache Hive supports many in-built functions to manipulate and process the data.
Though there are lot of available options, sometimes due to business use-cases, readily available functions may not be available.
Hive allows you to extend and create User defined functions (UDFs) by extending the org.apache.hadoop.hive.ql.exec.UDF
class.
This repository provides users with a single unified custom user defined binary for an array of plugins and functions.
The idea is to solve the work-around-solutions in apache hive UDFs and create a single repository for all the custom udfs.
Below are the list of currently available plugins. Click on the below urls to learn more.
In order to use and plugin the UDFs, follow the below steps:
hive-custom-udfs-*.jar
onto your system or edge-node.Upload the downloaded jar onto HDFS location. Use the below command:
$ hadoop fs -copyFromLocal hive-custom-udfs-*.jar <path of your hdfs directory>
Login to Hive SHELL and add the jar to the Hive Classpath and define a function, as below:
$ beeline -u jdbc
//localhost:10000
hive> create function <function_name> as '<function_java_class>' using JAR 'hdfs:///<path of hdfs directory>/hive-custom-udfs-*.jar';
Use the function in hive
hive> SELECT function_name(str,str) FROM <table_name>;
You can use the official Hive Documentation for more on deploying hive udfs.
For raising issues, you can make use of the github issues.