1 Introduction

During the last decades professional sports leagues worldwide have turned into million or sometimes even billion dollar businesses. Soccer in Europe as well as American Football, basketball, baseball or ice hockey in North America attracts thousands of fans inside the stadiums and millions of spectators around the world. A crucial contribution to the success of a season lies in the timetable or schedule of the league which determines which games are arranged when and at which arenas. In doing so, the planers of those leagues have to balance not only the expectations of the fans but also many requests made by clubs and TV stations.

In this paper we will focus on the Traveling Tournament Problem (TTP) introduced by Easton et al. (2001). It is a quite well-known and practically difficult optimization problem inspired by Major League Baseball. Especially North American sports leagues have an incentive to minimize the travel distance of the participants of a tournament due to the vast expanse of the continent.

1.1 Sports scheduling and the traveling tournament problem

Sports scheduling in general deals with the design of tournaments. A single round robin tournament on n teams where n is an even number consists of (n−1) rounds (also called slots). In each round n/2 games which are themselves ordered pairs of teams take place. Every team has to participate in one game per round and must meet every other team exactly once. It is standard to assume n to be even since in sports leagues with n being odd, usually a dummy team is introduced, and whoever plays it has a day off, which is called a bye. For scheduling single round robin tournaments a rather general and useful scheme called the canonical schedule has been known in sports scheduling literature for at least 30 years (de Werra 1981). It is based on the polygon/circle method, which was first suggested by Kirkman (1847). One can think of Kirkman’s method as a long table at which n players sit such that n/2 players on one side face the other players seated on the other side of the table. Every player plays a match against the person seated directly across the table. The next round of the schedule is obtained when everyone moves one chair to the right with the crucial exception that there exists one person at the end of the table who never moves and always maintains the seat from his or her first round. Note that this method only specifies who plays whom when and not where. The canonical schedule introduced by de Werra defines for each of the encounters specified by the method described above, at whose site they take place such that the number of successive home or away games is minimized (de Werra 1981).

A double round robin tournament on n teams consists of 2(n−1) rounds and every team must meet every other team twice: once at its own home venue (home game) and once at the other team’s venue (away game). A popular policy in practice is to obtain a double round robin tournament from a single round robin tournament by mirroring, that is repeating the matches of round k for k=1,…,n−1 in round k+n−1 with changed home field advantage. Consecutive home games are called a home stand and consecutive away games form a road trip. The length of a home stand or road trip is the number of opponents played (and not the distance traveled).

The Traveling Tournament Problem (TTP) as introduced in Easton et al. (2003) is then defined as follows:

Input::
  • A set V={1,2,…,n} of n teams with n even.

  • An n×n integer distance matrix D containing the metric travel distances between the home venues of all teams.

  • Integers L,k, representing lower and upper bounds for the lengths of the teams’ home stands and road trips.

Output::

A double round robin tournament on V satisfying:

  • The length of every home stand and road trip is between L and k inclusive.

  • No pair of teams plays both of their matches against each other in two successive rounds.

  • The total distance traveled by all teams is minimized.

In this paper, we assume that L=1 which is common in literature and means that we forget about L. This assumption is reasonable since it is hard to imagine a sports league planner who will insist on forbidding home stands or road trips of length 1 when facing his many conflictive objectives.

1.2 Previous work

So far, most efforts concerning the TTP have led to a variety of algorithms aiming to minimize the total distance driven by the teams. Kendall et al. (2010) and Rasmussen and Trick (2008) provide a good overview of the work done on the TTP and sports scheduling in general. Just to mention a very few examples, hybrid algorithms with constraint programming (CP) exist by Benoist et al. (2001) who additionally use Lagrange relaxation. Easton et al. (2003) merge CP with integer programming while Henz (2004) combines CP with large neighborhood search. Anagnostopoulos et al. (2003), van Hentenryck and Vergados (2006), Di Gaspero and Schaerf (2007) and Lim et al. (2006) propose neighborhood search-based algorithms, whereas Ribeiro and Urrutia (2007) focus on the special class of constant distance TTP where break maximization is equivalent to travel distance minimization.

Miyashiro et al. (2008) provide a 2+(9/4)/(n−1) approximation for the intensively studied special case k=3 by means of the Modified Circle Method, a variation of the canonical schedule. Yamaguchi et al. (2009) obtain an algorithm with approximation ratio (2k−1)/k+O(k/n) for k≤5 and (5k−7)/(2k)+O(k/n) for k>5. Again they make use of the canonical schedule, now refined such that the teams are ordered around the ‘table’ such that most of the distances driven are part of a near optimal traveling salesman tour which clearly has positive effects on the length of many distances traveled. As kn−1, they showed this way that a constant factor approximation for any choice of k and n exists. However, they did not show how this factor looks exactly.

The complexity has been settled quite recently by Thielen and Westphal (2011) who showed that the TTP is strongly NP-hard. They use a reduction from 3-satisfiability (3-SAT) by showing that for any instance ϕ of 3-SAT there is an instance of TTP(3) and some ξ such that there is a feasible schedule of cost at most ξ if and only if ϕ is solvable.

1.3 Our results

Our aim in this work is to approximate the TTP by a constant ratio for arbitrary choices of k and n.

Applying the canonical schedule mentioned above, we choose a specific orientation of the underlying graph which ensures that home stands and road trips do not contain more than k matches and for which the total distance traveled is not too long. Whereas it is common practice to derive the second half of the season by repeating the first half’s games in the same order but with changed home field advantage, it is not suitable here, as road trips or home stands might become too long. Thus, we derive the second half in a slightly different way. Finally, we show that the plan we construct approximates the optimal solution by a factor of 2+2k/n+k/(n−1)+3/n+3/(2⋅k). For the case of k=3 this guarantees an approximation ratio of 5/2+12/(n−1) which is actually not an improvement on the ratio of Miyashiro et al. cited above. But for any choice of k≥4 (and thus n≥6) this yields an approximation ratio of less than 5.875, which is the first constant factor approximation for k>3.

Furthermore, we show that this algorithm is also applicable to real-world applications as it produces solutions of high quality in a very short amount of time. We were also able to find solutions for the benchmark instances Galaxy22 to Galaxy40 which are better than the previously known ones.

2 Lower bounds

The objective of the TTP, minimizing the total travel distance of all teams during a double round robin tournament, can be estimated by various bounds. One of them is called Independent Lower Bound (ILB) (Easton et al. 2001) and consists of finding the shortest tour for each team individually, independent of the other team constraints (primarily that B has to be at home when A visits B during one of A’s road trips). Finding an ILB is equivalent to solving a capacitated vehicle routing problem. In this paper we will use an even coarser version of ILB where we focus only on a traveling salesman tour traversing all venues.

Theorem 1

Let ρ be the length of a TSP in G. Every solution of the TTP has a total length of at least nρ.

Proof

Every team has to visit all the other teams. Thus, each team has to travel at least a distance of ρ which gives a total distance of nρ. □

As in Miyashiro et al. (2008), we denote the sum of the distances of all ordered pairs of teams as Δ=∑ i,jT d(i,j). Miyashiro et al. (2008) showed a lower bound of 2/3⋅Δ for the objective function of TTP with k=3. We generalize this result for arbitrary k:

Theorem 2

Every solution of the TTP has a total length of at least 2/kΔ.

Proof

Consider an arbitrary solution and suppose team i plays lk consecutive away games at teams t 1,t 2,…,t l . The distance \(\tilde{d}_{i}\) driven thereby is

$$\tilde{d}_i= d(i,t_1) + \sum_{j=1}^{l-1} d(t_j,t_{j+1}) + d(t_l,i)$$

Because of the triangle inequality we have \(\tilde{d}_{i} \geq 2\cdot d(i,t_{j})\) for all j and thus we have

$$l\cdot \tilde{d}_i\geq 2\cdot \sum_{j=1}^{l} d(i,t_j)\quad \Longrightarrow \quad \tilde{d}_i\geq \frac{2}{k}\cdot \sum_{j=1}^{l} d(i,t_j)$$

Summing up over all tours driven yields the desired lower bound of 2/kΔ for the total distance driven by the teams in any solution. □

3 Construction of the tournament

For iV let s(i)=∑ jV d(i,j) be the star-weight of i. Since

$$\sum_{i \in V} s(i) = \sum_{i \in V, j \in V} d(i,j) = {\varDelta}$$

there has to be one jV for which s(i)≤Δ/n. Let T heu be a tour through all of the teams’ venues which has been found by applying the well known heuristic by Christofides (1976). Therefore, we know that this tour is not more than 1.5 times longer than the shortest possible tour. We furthermore assume that the teams are named in a way such that T heu traverses them in the order 1,2,…,n and that n is the team with minimum star weight. Given this tour we construct a solution of the TTP in the following way. For n=20 the games of the first two rounds of the season are displayed in Figs. 1 and 2. The figures corresponding to other choices of n can be derived analogously. A solid arc (u,v) in this digraph means that team u is playing against team v in the arena of team v. The games of the other rounds can be derived analogously by changing the positions of the teams counterclockwise. The only arc which changes its orientation during one half of the season is the arc incident to node n which changes its orientation every kth match. This way, the season starts for team 4 with a tour visiting the teams 16,17,18 and 19 before coming home and then playing against the teams 1,2 and 3. Then, it starts off again to play against 20,5,6,7, and has then a home stand again consisting of matches against 8,9,10,11. Finally, there is a last road trip including 12 and 13 and a last home stand with 14 and 15. It is clear that no team has home stands or road trips which are longer than k matches. And it is also clear that every two teams have met each other during this first n−1 games. The full schedule for the first half of the season is shown in Table 1.

Fig. 1
figure 1

Example for slot 1 with n=20, k=4 and l=2

Fig. 2
figure 2

Example for slot 2 with n=20, k=4 and l=2

Table 1 Whole schedule for the first half of the season according to the example with n=20, k=4 and l=2

In order to construct a full tournament, it remains to construct the second half of the season. If we just repeated the first n−1 matches with changed locations (changed the orientation of the arcs), we would obtain a solution, in which every pair of teams met twice and these two games took place at different sites. Furthermore, no half of the season would contain a road trip or a home stand longer than k matches. However, this solution could contain road trips and home stands being longer than k. For example, the team 4 we considered above would start into the second half of the tournament with a home stand of length 4 after having ended the first half with two home stands. In order to get rid of this problem, we start the second half with the match of round n−2, succeeded by the matches of the rounds n−1,1,2,…,n−3 in this order (see Figs. 3 and 4). The double round robin tournament obtained this way contains neither road trips nor home stands longer than k. To see this, assume for the sake of a contradiction that there is a team t which has a road trip longer than k. It is clear from construction that no half of the season completely contains such a tour. Thus, the tour has to include the rounds n−1 and n. In case t has away-games at both of these rounds, the other matches involving these opponents will be home-games for t. By construction, these games will take place on the rounds n−2 and n+1 which means that the road trip had only a length of 2, contradicting the assumption. The case for home stands that are too long follows along the same lines.

Fig. 3
figure 3

Example for slot n−1 with n=20, k=4 and l=2

Fig. 4
figure 4

Example for slot n with n=20, k=4 and l=2

By looking at the figures presented above, one can see that every home stand or road trip is defined by a set of consecutive arcs pointing in the same direction. We call such a set of arcs a block. Furthermore, any orientation of the arcs defining the schedule gives rise to a feasible schedule, as long as the blocks do not contain more than k arcs. The leftmost block is not even allowed to contain more than k−1 arcs because of the games team n is involved in. As long as we obey these rules for the maximum sizes of blocks stated above, we will always obtain a feasible plan for any choice of orientations of the arcs defining the tournament.

In the following, we consider k different orientations. The main difference between them is the width of the rightmost block. For l∈{1,…,k} let O l be the orientation in which the rightmost block has width l, the blocks in the middle all have width k and the leftmost block contains the rest (see Fig. 5). In case this leads to the leftmost block containing exactly k arcs, we change the orientation of the edge (u 1,v 1), such that the arc incident to team n cannot prolong the road trips induced by this block to have a length of k+1 matches. The left- and rightmost arcs in a block always define the first and the last match of a trip.

Fig. 5
figure 5

The blocks defined by the orientation O 2

4 Costs of the tournament

In this section we will prove an upper bound for the total length of the tours defined by the tournament constructed in the previous section.

We assume that every team t having an away game against team n will drive home first before driving to team n’s site and drives home after having played that match. By construction, t has a home game before or after that game anyway. We just obtain one more visit home this way. By the triangle inequality, the costs incurred this way are only higher than before. Furthermore, we will apply the triangle inequality a second time by assuming that every team drives home after the last game of the first half if it is not already at home. Let the nodes of the underlying graph be denoted as u 1,u 2,…,u n/2−2 and v 1,v 2,…,v n/2−2 (see Fig. 5).

In the following we will estimate the distances related to the constructed tournament separately:

  1. 1.

    C h —the costs related to home-games of team n

  2. 2.

    C a —the costs related to away-games of team n

  3. 3.

    C s —the costs related to the first rounds of the season-halves and the costs of returning home after the last rounds of the season-halves

  4. 4.

    C l —the other costs incurred by the edge (u 1,v 1)

  5. 5.

    C r —the other costs incurred by the edge (u n/2−1,v n/2−1)

  6. 6.

    C o —the other costs

C h —The costs related to home-games of team n:

Every other team plays against team n once. As we can assume by application of the triangle inequality that all teams come from their home venues to play against team n and return to their home venues after the game, we know that the cost incurred thereby is at most

$$C_h\leq \sum_{i=1}^{n-1} d(i,n)+d(n,i)=2\cdot s(n)\leq 2\cdot {\varDelta}/n$$

where the last follows from the assumption of n being the node with the smallest star-weight.

C a —The costs related to away-games of team n:

Analogously, to the estimation of the home-games of team n, we can upper bound the costs incurred by the away games by first assuming that team n always returns home after each away-game. This way, we derive the same upper bound of 2⋅Δ/n for the costs C a incurred by the away-games of team n.

C s —The costs related to the first rounds of the season-halves and the costs of returning home after the last rounds of the season-halves:

In the first round of the season, n/2 teams have to travel to their opponents. We do not consider the game that team n is involved in, as we have already taken care of these costs above. So, there are n/2−1 distances traveled left which correspond directly to the vertical arcs of Fig. 1. After the games of round n−1 the first half of the season is over, and we assume that all teams drive home. The second half of the season starts with the matches which have already taken place at round n−2 and it ends with the second leg of the game of round n−3. Observe, that the orientation of the arcs does not have an effect on the total distance driven. It only affects the question who is driving which is not of interest here. In the example mentioned above, for team 4 these are the teams 16,15,14 and 13. If team 4 did not start the season this way but with a match against team 15, then we would need to consider the distances to the teams 15,14,13 and 12. This way we obtain n−1 different choices for the first and last trips of the two halves of the season. Furthermore, it is easy to see that each edge of ({1,…,n−1}×{1,…,n−1}) is part of at most four of these choices. So, summing up the distances of the n−1 different possible choices for round 1, we obtain a total of at most

$$\sum_{i=1}^{n-1} \sum_{j=i+1}^{n-1} 4 d(i,j) =2 {\varDelta} - 4\cdot s(n)$$

So, there has to be a choice for which we can estimate

$$C_s\leq 2\cdot \bigl({\varDelta} - 2\cdot s(n)\bigr)/(n-1)$$

C l —The costs incurred by the edge (u 1,v 1):

As we assumed that every team’s trip to team n starts at the home-site and leads back there after the match, there is always a trip ending or starting with a trip along the edge (u 1,v 1). Apparently, these are always trips between teams being neighbors on the heuristically obtained tour T Heu . As these teams will meet in both halves of the games, the edges have to be counted twice and the cost incurred on that arc can thus be estimated as

$$C_l\leq 2d(n-1,1) + 2\sum_{i=1}^{n-2} d(i,i+1)\leq 2\cdot d(T_{\mathit{Heu}})$$

C r —The costs incurred by the edge (u n/2−1,v n/2−1):

In the first half of the season, the edge (u n/2−1,v n/2−1) always marks the end of a trip, whereas it stands for the beginning of a trip in the second half of the season. The costs incurred in both halves together can be estimated as follows.

(1)

with opt i denoting the length of team i driven in an optimal solution of total length opt. Every possible solution has to contain a trip for any team i∈{1,…,n/2} which covers team i+n/2−1. For the length of this trip is not longer than d(i,i+n/2−1)+d(i+n/2−1,i) and we can make similar observations for the other teams as well, inequality 1 follows.

C o —The other costs:

As already mentioned earlier in this paper, we do not only consider the orientation of the arcs as displayed in Figs. 13. Instead, we will consider k different orientations. The difference between them is the width of the rightmost block, the block including the arc (u n/2−1,v n/2−1) or resp. (v n/2−1,u n/2−1). For l∈{1,…,k} let O l be the orientation in which the rightmost block has width l, the blocks in the middle have width k and the leftmost block contains the rest. In case, this leads to the leftmost block containing exactly k arcs, we change the orientation of the edge (u 1,v 1), such that the arc incident to team n cannot prolong the road trips induced by this block to have a length of k+1 matches.

In every half, every team i is associated to one of the nodes v 1,v 2,…,v n/2−1 exactly once. When it is associated to node v j it plays against the team (i+j−1)mod(n−1)+1 which is associated to node u j at that time. In case the edge (u j ,v j ) marks the first or the last game of a road trip in the first or the second half of the tournament, we call this edge a home-edge (the dashed arcs in Fig. 5). If the home-edge corresponds to the beginning of a trip in the first half of the season, it marks the end of a tour in the second half of the season. Therefore, the distance associated with this edge is driven exactly twice in the corresponding tournament. Let us have a closer look at the costs which are being incurred by teams traveling along the home-edges. Since every direct travel from or to i’s home site can only happen via exactly one home-edge, and as there are at most two orientations in which some edge (u j ,v j ) is a home-edge, the overall costs incurred by the home edges is at most 2Δ. It still remains to estimate the distances traveled which are not from or to the traveling teams’ home sites. A trip which visits l teams consists of two drives along home-edges and l−1 drives inbetween. By construction, these l−1 rides are driven along edges which are part of the heuristically obtained tour T Heu . Let u j be a node which does not represent the beginning of a trip. Whenever a team i is assigned to this node, there is another team l visiting i after having played an away match at the team i−1, the predecessor of i in T Heu . Thus, for any node u j or v j which does not represent the beginning of a trip, we can estimate the sum of the distances driven to get to the teams assigned to this node as no more than d(T Heu ). Since there are no more than n/2−2 such nodes, the distances driven here are not more than (n−2)d(T Heu ).

For there are k different orientations, there has to be one with total distance incurred by the home-edges not more than

$$C_o \leq \frac{2{\varDelta} + (n-2) d(T_{\mathit{Heu}})}{ k}$$

5 The approximation ratio

If we choose the parameters in the above mentioned ways, we obtain an approximation ratio of

As kn−1, this bound cannot be larger than \(5 + \frac{3}{n} +3/(2\cdot k )\) which is not more than 5.875 for k≥4 and n≥6.

6 Computational results

In order to demonstrate this algorithm’s applicability to real-world instances, we applied it to the well-known benchmark-instances provided on Trick’s site (Trick 2010). At the core of every timetable generated this way lies the solution of a Traveling Salesman Problem. For that task, we applied the software LKH in the version 2.0.3, a very effective implementation of the Lin-Kernighan heuristic developed by Helsgaun (2012). After the generation of the timetables, we finally tried to improve the obtained solutions by changing the venues of single matches. More precisely, we check for every pair of teams whether it would be less expensive and feasible if they swapped their home/away roles for the matches between them. Whenever this is the case, we apply this change and look for more moves of that kind until no further improvement is possible this way. This neighborhood has been applied very successfully in several other approaches (Anagnostopoulos et al. 2003; Ribeiro and Urrutia 2007; Di Gaspero and Schaerf 2007) where it is called SwapHomes. We found it could help to improve our schedules, too. ling Salesman Problem. For that task, we applied the software LKH in the version 2.0.3, a very effective implementation of the Lin-Kernighan heuristic developed by Helsgaun (2012). After the generation of the timetables, we finally tried to improve the obtained solutions by changing the venues of single matches. More precisely, we check for every pair of teams whether it would be less expensive and feasible if they swapped their home/away roles for the matches between them. Whenever this is the case, we apply this change and look for more moves of that kind until no further improvement is possible this way. This neighborhood has been applied very successfully in several other approaches (Anagnostopoulos et al. 2003; Ribeiro and Urrutia 2007; Di Gaspero and Schaerf 2007) where it is called SwapHomes. It could help to improve our schedules, too.

The results of our computational experiments are displayed in Table 2. The first three columns contain the instances’ names, the value of their best known feasible solutions (UB) and the corresponding lower bounds (LB) as they could be found at Trick (2010) on October 27, 2010. Next, there are two columns which show the total distance driven according to the approximation and the relative gap compared to the best solutions known so far. Finally, the last two columns show the objective function after the execution of the simple greedy 2-Opt-heuristic mentioned above.

Table 2 Computational results for TTP(3)

The running time is very reasonable. Only Galaxy40, Galaxy38, Galaxy36, and Brazil24 took between one and two seconds. The others were solved in shorter time. The computation of all the schedules together took only 13.61 seconds on a regular desktop computer with 2.4 GHz and 4 GB RAM using only a single thread. The running time of the Lin-Kernighan heuristic is already included in these figures.

Comparing the values of our solution with the known lower bounds, we observe that the optimality gap we experience here seems to be somewhere between 15% and 20% and that this gap does not seem to increase with the number of teams increasing. (Actually, in case of the NFL-instances, it does decrease from 21% for NFL16 to 15% for NFL32.)

This behavior is very different to that of the approaches considered so far. The instances with only a few number of teams are by now all solved optimally, but with the number of teams increasing, the gaps between the known upper and lower bounds grow as well. For this reason, our algorithm does not provide better solutions than the ones already known for leagues with only a few teams. But with the number of teams increasing this situation changes slightly. For the well studied problem NFL32 our generated solution is only 5% more expensive than the best solution known so far. We could even find better solutions for the instances Galaxy22, Galaxy24, Galaxy26, Galaxy28, Galaxy30, Galaxy32, Galaxy34, Galaxy36, Galaxy38, and Galaxy40.

We also generated schedules for TTP(k) with k=4,5,6 (see Table 3). In these cases, we could lower the total distances driven on average by 13% in the case of k=4, and by 18% and 23% for k=5 and k=6, respectively.

Table 3 Comparison of computational results for TTP(k) with k∈{3,4,5,6}

7 Conclusions and outlook

In this paper, we present the first constant factor approximation for TTP(k) with k>3. Furthermore, we show that this algorithm is also applicable to real-world applications as it produces solutions of high quality in a very short amount of time. It was able to find solutions for the benchmark instances Galaxy22, Galaxy24, Galaxy26, Galaxy28, Galaxy30, Galaxy32, Galaxy34, Galaxy36, Galaxy38, and Galaxy40 which are better than the previously known ones.

The reason for this behavior might be that for leagues with only a small number of teams, it is possible to scan a big portion of the search space in a sophisticated way and this is basically how many algorithmic approaches for the TTP work. As our algorithm only does one smart guess, we cannot hope to find solutions as good as those which can be found by enumerative approaches for a small number of teams, but we can expect to find reasonable results even for big leagues in a very short amount of time.

For the future, it would be promising to implement some more local-search techniques. We have only implemented a greedy heuristic making use of SwapHomes which is just one out of several neighborhoods.

Furthermore, one could think about how to apply these concepts to cases in which other constraints are placed on the schedule. For example, how does the approximation ratio change, if certain matches have to be put in certain rounds or some teams cannot play at home at the same time?