Multimodal Neural Language Models
Abstract:
Automatically generating descriptive captions of images has long been regarded as a challenging perception task that integrates visual perception and the ability to understand language. One not only needs to correctly recognize what appears in an image, but also must use a natural language such as English to describe the interaction of the observed objects.
In this project, we will gradually go through from the distributed representation of words to the recurrent neural network language models. You will have an opportunity to implement an image description generating model which is based on convolutional neural networks and neural language models.
Project prerequisites:
- Math
- Basics of linear algebra
- Basics of probability theory
- Basics of supervised machine learning
- First order optimization methods
- Basics of natural language processing
- Programming language: Python, Theano
Planned lectures:
- Theano tutorial (2-4h)
- Word embeddings and neural language modeling
- Convolutional neural networks for image processing
Associated topics:
natural language processing, image classification, convolutional neural networks, recurrent neural networks
Planned lectures:
- Theano tutorial
- Word embeddings and neural language modeling
- Convolutional neural networks for image processing
About lecturer:
Mr. Sergii Gavrylov,
Junior research engineer at Grammarly.