A set of tools to perform code quality audits and naming convention checks on Kettle (Pentaho Data Integration) jobs and transformations
Kettle (a.k.a Pentaho Data Integration) is an open source data integration platform.
This repository contains a set of tools to perform checks on code quality and working practices.
The checks are built in Kettle. You’ll need
The main job is jb_pentaho_code_audit.kjb
. This job takes two parameters:
The config
folder contains a number of files. These are stored in this code repository as filename.properties.template
. Create your copy of each file without the .template
extension to configure this framework.
kettle-pdi-general.properties
: general properties about your Kettle/PDI version, code paths etc kettle.pdi.version
: your Kett/PDI version, e.g. 8.2kettle.pdi.locale
: your locale, eg. en_USkettle.pdi.code.tmp.dir
: temporary directory to write the audit and working practices checks to.kettle.pdi.code.path
: path to the jobs and transformations you’d like to checkkettle.pdi.code.path.exclude.dir
: directory that needs to be excluded from kettle.pdi.code.path
know.bi.error.handling.framework
: shared error handling framework. Processes all incoming errors based on a provided error code. Error handling and actions to be taken are handled from a set of business rules, as defined in this example repository (private repository, will be public soon (2020-11-17)).kettle-pdi-audit.properties
: a set of audit-specific properties. impacted.tables
: tables to specifically include for table impact analysiskettle-pdi-code-wps.properties
: configuration for the working practice checks to be performed. kettle.pdi.conventions.naming.transformation.step.default.avoid
(default: ‘Y’): avoid the default step name, e.g. avoid ‘Select Values’. Default step names don’t offer any context and make a transformations unnecessarily hard to read.kettle.pdi.conventions.naming.job.entry.default.avoid
(default: ‘Y’): avoid the default job entry name, e.g. avoid ‘Simple Evaluation’ as a job entry name. Default job entry names don’t offer any context and make a job unnecessarily hard to read.pentaho.code.wps.pdi.trans.prefix
(default: ‘tr‘): filename prefix for transformationskettle.pdi.code.wps.pdi.job.prefix
(default: ‘jb‘): filename prefix for jobskettle.pdi.code.wps.pdi.trans.casing
(default: ‘lower’): filename casing for transformations (lower, upper, initcap)kettle.pdi.code.wps.pdi.job.casing
(default: ‘lower’): filename casing for jobs (lower, upper, initcap)kettle.pdi.code.wps.pdi.trans.hasnote
(default: ‘Y’): are notes required for transformations?kettle.pdi.code.wps.pdi.job.hasnote
(default: ‘Y’): are notes required for jobs?kettle.pdi.code.wps.pdi.trans.hasdescription
(default: ‘Y’): are descriptions required for transformations?kettle.pdi.code.wps.pdi.job.hasdescription
(default: ‘Y’): are descriptions required for jobs?kettle.pdi.code.wps.pdi.trans.ignoredefaults
(default: ‘’): comma separated list of step types to ignore for default step name checks.kettle.pdi.code.wps.pdi.job.ignoredefaults
default: “Abort job,Success”): comma separated list of job entry types to ignore for default job entry name checks (there is no added value in providing context for a ‘Start’ or ‘Success’ job entry).The checks performed in the code audit job are listed below. A csv file for the results per check is written to ${kettle.pdi.code.tmp.dir}/output/audit
The working practices downloads the xml and properties files for steps and job entries to compare the actual step and entry names to the default names, and uses those files to perform the working practices checks.