the implementation of chunk-level attention-based temporal aggregation framework for sequence-to-one recognition tasks