Cross-view object geo-localization (CVOGL) determines the geographic location of an object on the satellite view reference image. The object is indicated by a point prompt in a ground- or drone-view ...
Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we introduce a ...