Two recent studies on Large Vision-Language Models (LVLMs) in referential communication have yielded conflicting results regarding their ability to coordinate efficient referring expressions. One paper, by Jones et al., suggests that LVLMs can coordinate efficiently when explicitly prompted, but fail to infer this need from implicit prompts. Another paper, by Zeng et al., indicates that LVLMs struggle with interactive generation and resolution of referring expressions, highlighting a deficit in modeling common ground crucial for human-like collaboration. Both studies utilize referential communication experiments to explore these differences. AI
RANK_REASON Two academic papers published on arXiv detailing research into LVLM communication capabilities.
- arXiv
- Hugging Face
- Jones et al.
- LVLMs
- Zeng et al.
- alphaXiv
- CatalyzeX
- CORE Recommender
- DagsHub
- Gotit.pub
- Influence Flower
- ScienceCast
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →