A Human-In-The-Loop Framework To Assess Multimodal Machine Learning Models