Implementation of joint text and image representations, using VSE++ losses, and implementation of t-SNE