A Comparison of Distributed Machine Learning Platforms
Kuo Zhang
University at Buffalo, SUNY
Salem Alqahtani
University at Buffalo, SUNY
Murat Demirbas
University at Buffalo, SUNY
ABSTRACT
The proliferation of big data and big computing boosted
the adoption of machine learning across many application
domains. Several distributed machine learning platforms
emerged recently. We investigate the architectural design
of these distributed machine learning platforms, as the de-
sign decisions inevitably affect the performance, scalabil-
ity, and availability of those platforms. We study Spark
as a representative dataflow system, PMLS as a parameter-
server system, and TensorFlow and MXNet as examples of
more advanced dataflow systems. We take a distributed
systems perspective, and analyze the communication and
control bottlenecks for these approaches. We also consider
fault-tolerance and ease-of-development in these platforms.
In order to provide a quantitative evaluation
se/distributed/learning/machine/dataflow/system/Buffalo/big/evaluation/kafka/
se/distributed/learning/machine/dataflow/system/Buffalo/big/evaluation/kafka/
-->