Modeling And Decoding Semantic Representations In Language And Vision