Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.