arXiv:2407.17200v3 Announce Type: replace-cross Abstract: Many real-world decision problems require solving, again and again, combinatorial optimization instances drawn from a common distribution. A recent line of structured learning methods exploits this regularity by learning policies that pair a statistical model with a tractable combinatorial oracle, instead of solving each instance independently. Training such policies is notoriously difficult, however: the resulting empirical risk is piecewise constant in the model parameters, which hinders gradient-based optimization, and only a few the
Source: arXiv cs.LG — read the full report at the original publisher.
