An investigation into visual content understanding based on deep learning and natural language processing