Welcome to multimodal documentation.ΒΆ

multimodal is a python library providing tools for vision and language research. It provides visual features commonly used for Captionning and Visual Question Answering tasks, as well as datasets such as VQA.

This library was developped by Corentin Dancette. If you have any new feature request or want to report a bug, please open an issue on the github tracker, or submit a Pull Request.