🤔 We identify several limitations in coordinate-generation based methods (i.e., output screen positions as text tokens x=..., y=...) for GUI grounding, including ...