## Description *LLaVA*, or "Large Language and Vision Assistant" is supposed to be a multi-modal model that should be able to handle images and text within conversation.