Learning Generalizable Visual Representations Towards Novel Viewpoints, Scenes and Vocabularies