My kids are on a summer swim team in the local league. It's a great activity--exercise, a bit of healthy competition, and all four of my kids and step-kids are on the same team for one season out of the year. It's a lot of fun for everyone.
It's complicated, though. There are five age groups for both boys and girls for a total of ten competition groups. There are 4 strokes (fly, back, breast, and freestyle). Each swimmer is assigned to one of three classes for each stroke. The A class has the faster kids, the B class the next faster kids, and the C class has the rest. First, second, and third place in each stroke-class (for each age/gender group) earn points for the team. The A, B, and C classes all count equally, in the spirit of the league. It's 5 points for a 1st, 3 for a 2nd, and 1 for a 3rd, regardless of class. This way, nearly everyone's performance can affect the outcome of the meet.
There's only one strategic variable in the meet. Each swimmer can only swim in 3 of the 4 stroke events in the meet. In other words, each swimmer has to skip one of the four strokes for the day. The manager of each team seeds the meet a couple days beforehand. You'd think that the best strategy is to have each swimmer participate in their 3 best strokes. But that's not the case.
Consider a swimmer who can take 2nd place in his age group/class in his best stroke, and who can take 3rd place in his worst. His manager may want to have him skip his best stroke if the team's next fastest swimmers were poised to take 3rd and 4th, (sliding up to 2nd and 3rd). The swimmer's absence in his best stroke would not cost his team any points, and his addition in his slowest stroke would add a point for his team. With 110 swimmers, 10 age/gender groups, and 3 classes in each group per team, there could be several convoluted and counter-intuitive possibilities--too many for any one person to think through.
I thought this could be conquered quickly by using Excel's its 'Solver' optimization add-in. But after a lot of work I learned that Solver doesn't play nice with problems like this. The swim meet scoring function is non-linear and non-continuous. There are only binary options (does a swimmer participate in a certain event or not?). Each option can have a large impact on the outcome. After investigating and experimenting, I realized Solver just can't handle complex binary problems.
It's a fun problem, so I pursued it further by coding my own solution. The swim league isn't intended to be super-competitive, and each kid should have a fair opportunity to compete in each stroke. But occasionally our team is challenged by our cross-town rivals for the championship, and the team wants to go all-out. With the swimmers' times posted online, all the necessary data is available to analyze.
There are two parts to the optimization problem. The first is computational complexity, and the second is strategic and related to game theory.
I needed way to run through all possible combinations of event participation for each group, and identify which combination provides the optimum score. This sounds easy, and it is except for one thing--it's exponentially complex. There are 4 possible combinations of events for every swimmer in a group. If there are 6 swimmers in an age/gender group, that's 4^6 = 4,096 different combinations we need to check for the optimum score. Any computer can do that in a flash. What if there are 8 swimmers in a group? That's 65,536, which is still very crunchable in a few seconds. What about 12 swimmers in a group, which is very common? That's 4^12 = 16,777,216 permutations, which takes the Advanced NFL Stats supercomputer over 2 hours to crunch.
The toughest challenges are caused by the largest groups. The 8-10 girls age group on our team has 20 swimmers, which means there are 1,099,511,627,776 combinations of swimmer-events, just for one team. Uh-oh. That would take 5,461 days to crunch. We can do some triage to exclude some swimmers who rank low enough in every stroke/class not to matter, but directly computing the optimum would still take far too much time. Using a compiled executable program or multi-threaded coding could also accelerate things, but not to the degree necessary for the larger groups.
From xkcd.com (If you get this right away, you're way ahead of me.)
The second part of the problem has to do with game theory. This isn't the typical engineering or transportation optimization problem. There's a thinking human on the other side of equation trying to win. The value of my team's optimum lineup depends on the opponent's lineup and vice versa. In the optimization problem, I can only control my own team's lineup. Otherwise, the algorithm would make the other team swim all their worst strokes.
The algorithm needs a starting point to represent the opponent's lineup of events. A team's lineup needs to be drafted without knowledge of the other's. There are a few options here. We can start with a randomly generated opponent lineup, and then iterate. Once I've optimized against a random opponent lineup, I could optimize the opponent lineup vs my own team's, and then optimize my own lineup again to counter the opponent's optimization. Presuming there isn't another sports-analytics-obsessed parent on the other team, this might be a good approach.
Further, we can continue to iterate until we reach a Nash Equilibrium where both teams are optimized against each other's optimizations, and no team could benefit from changing its lineup. Although it's not realistic to ever expect that level of overkill by managers, it's still interesting to see where the equilibrium is and what score it represents. This approach is feasible for smaller groups of swimmers, but not for larger groups--at least using the direct computational method. It requires the optimization to be run multiple times.
Another method is to start with an intuitively reasonable opponent lineup. A very good heuristic is to have each swimmer participate in his three most competitive strokes. This would be reasonably optimum except for combinations like those mentioned previously--when it makes sense for a competitive swimmer to allow slower swimmers on his own team to score just as many points in order to allow him to place in another stroke. I'm not the manager of our team, but I imagine this is how most managers seed their lineups.
A classic example of computationally complex optimization problem is the traveling salesman problem, aka the TSP. Consider a salesman who must visit 10 cities and return to his starting point. There is known time/distance between each city, and the salesman needs to visit each city exactly once in the minimum amount of time. There are 362,880 combinations of paths he could travel. Add an 11th city, and there are now 3,628,800 combinations. Every day in the working life of a FedEx or UPS driver is one big TSP.
One of the ways to crack the TSP is to use a genetic algorithm. The power of evolution's selection effect can be harnessed to quickly approximate the optimum solution, and often find the precise optimum solution. For extremely complex problems, the tradeoff between computational time and the level of optimality usually makes sense. I thought this approach was super-cool, so I thought I'd try to apply it to the swim meet problem.
Here's how I approached the swim meet optimization problem using genetic concepts. In every age/gender group, think of the lineup of events that each swimmer skips as a genome--a strand of DNA. But instead of GCTACTCAGGCA, it's Fly-Breast-Fly-Fly-Back-Free-Fly-Breast-Free... I created an original generation of 30 "individuals" (combinations of own-team swimmer-event lineups) randomly. Each different combination of swimmer-event "DNA" will produce a fitness level, determined by the meet scoring rules.
Then I spawned a new generation of 30 individuals (lineups) based on three different methods. First, I kept the 10 best lineups from the previous generation, defined as producing the highest meet scores. These are known as "the elite." The other 20 lineups die off and are replaced by children of the elite. This ensures the evolution process doesn't randomly regress into less fit populations.
Ten of the next-generation "children" are created by asexual mutation. Each of the elite is randomly mutated slightly to create a batch of new children, some of which may happen be more 'fit' than their parent. [Yes, that's right. I wasted an unknown amount of time making imaginary mathematical 'asexual mutant children.']
I used a set of data from our team's closest meet this year as a test. I selected an age group that happened to have 12 swimmers from each team. This provided a useful test because it's enough swimmers to make things tough but also solvable by brute force. That way I know the true optimum as the answer key for testing the genetic algorithm.
The brute-force optimization approach took 2 hours and 14 minutes to evaluate every possible lineup. The score was 46 for the Barracudas, our home team, to 62 for the Lightning, our rival. (We happened to be outgunned in this age group.) This result was based on a single iteration optimization versus the 'intuitively reasonable' opponent lineup heuristic, where each opponent swimmer simply swims his three most competitive strokes. It began from a standing start, initialized with a population of randomly chosen lineups.
The genetic algorithm took a fraction of the time of the brute force computation, averaging 17 seconds to reach optimum and then continue for another 1000 iterations in vain search for a higher score. In the dozens of times I ran it, the algorithm never failed to produce a lineup worth the known optimum of 46 points. It reached the optimum after an average of 75 generations, which equates to 75 * 30 = 2,250 potential solutions, and never exceeded 184 generations.
The impractically large group of 20 swimmers produced a (probable) optimum in a very steady average of only 24 seconds. I wrote probable because I'm not going to try to run the program for 12 years to search all the possible combinations. The genetic algorithm consistently produced the same "best" score every time I ran it, but I can't prove it's the optimum. I think the reason it worked so well with the very large group was because the larger groups still has the same number of swimmers as the smaller groups that matter in terms of scoring, and the rest of the swimmers can choose any event combination without an impact on the meet score. (Kind of like real-world 'junk DNA', I guess.)
I could probably tune the constants in the model to improve the speed. The number of 'stall generations' allowed where there is no longer any improvement was set to 1000, which could be safely lowered to quicken things. The biggest help would be to initialize the population with reasonably high scoring lineups instead of random ones, but that wouldn't have been a fair comparison with the brute force method. The number of elites, mutants, and children could also be tweaked to improve things, but I'm satisfied with it as is. Besides, this is more of a learning exercise than a real attempt to squeeze out an extra few points at the meet.