Towards multi-modal face recognition in the wild