Our work instead focuses on learning a contextual correspondence between image regions and textual captions describing people and their depicted interactions. However, the addition of the two branches enables supervision for head and feat locations, which is crucial to let the network choose which picture options to concentrate to. Although Independence Day is celebrated on […]