Towards Post-Hoc Human-Interpretability of Multimodal Neural Networks for Healthcare Applications