Joint text classification on multiple levels with multiple labels, using a multi-head attention mechanism to wire two prediction tasks together.