-based technique is used as a hyper-heuristic to generate heuristics for an online combinatorial problem. It observes and learns from the actions of a known-expert heuristic on small instances, but has the advantage of producing a general heuristic that works well on other larger instances. Specifically, we generate heuristic policies for online bin packing problem by using expert near-optimal policies produced by a hyper-heuristic on small instances, where learning is fast. The ”expert” is a policy matrix that defines an index policy, and the apprenticeship learning is based on observation of the action of the expert policy together with a range of features of the bin being considered, and then applying a k-means classification. We show that the generated policy often performs better than the standard best-fit heuristic even when applied to instances much larger than the training set.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten