arXiv:2512.14157v2 Announce Type: replace Abstract: Recent medical MLLMs have made significant progress in generating step-by-step textual reasoning chains. However, they still struggle with complex clinical tasks that necessitate dynamic and iterative focusing on fine-grained visual regions. To close this gap, we introduce Ophiuchus, a versatile, tool-augmented framework that equips an MLLM to (i) decide when fine-grained visual evidence is needed, (ii) determine where to probe and ground within the medical image, and (iii) seamlessly weave the relevant sub-image content back into an interlea

Source: arXiv cs.AI — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.