Large scale sparse similarity calculation with tools for parallelisation and incorporation of hierarchical domain knowledge