This is a simple and fast, fully event-based, low memory footprint, Kubernetes gang scheduler
This is a very simple implementation of a Gang Scheduler for Kubernetes, built mainly for research purposes. It needs to be revised before being used in production environments.
This scheduler was designed mainly for running Spark jobs with Kubernetes as a resource manager.
The default Kubernetes scheduler was not designed to be aware of connections between pods (like there is between a driver and an executor), so if it’s used to schedule
a Spark job, it can half-schedule jobs if the cluster is running at capacity, thus leading to the inability for a job to make progress (because it’s not fully scheduled).
To avoid that issue, this scheduler was designed. It’s main purpose is to schedule a driver pod only if there is enough room in the cluster for all the executors associated to that Job.
The scheduler has an asynchronously-updated internal resourceCache, which keeps the state
of the cluster (from node resources standpoint). It’s basically a map from nodeName to it’s resources characteristics.
Check if that pod is a driver or an executor (by looking at it’s labels)
If it’s a driver, check if all it’s executors fit in the resourceCache. If so, schedule the driver pod.
If it’s an executor, schedule it directly
For any scheduled pod, launch a bind event (async) and a cluster event