There’s a CodeWars kata called Twice linear. It’s one of those problems where a naive solution is just not feasible. Not to mention the first tests aren’t indicative of what’s expected so by the time I hit the wall of very big numbers, I already had a solidified (and naive) mental model. Getting rid of that isn’t an easy task.

Generating the list of numbers isn’t difficult. Keeping it sorted and not adding duplicates makes it a bit more tricky. I tried using Ruby’s SortedSet at first, then Enumerators and plain loops, but everything crapped out at bigger numbers. All tests are supposed to run within 12 seconds on CodeWars, and my code at that point was taking 20+ minutes to run for 60000.

I realized that the real problem is generating the list only to the number we need, without all the later numbers that would make the list needlessly huge. I tried throwing away early numbers that the loop’s already passed, I tried figuring out some rule between the number sought and when it gets its final position on the list (due to sorting that kept changing), I tried memoizing using hashes, but nothing really worked.

I tried going a different way then, trying to figure out the maths of the sequence, without much luck. I even tried reading a related paper on Hamming’s problem by Dijkstra, but I couldn’t really understand it.

Then I found a Python solution. Translating it to Ruby wasn’t a problem, but figuring out why and how it worked was exciting. The trick is to keep adding the numbers to the list in order, so that it doesn’t grow to ridiculous sizes and I don’t have to worry about uniqueness or sorting.

Keep the two series (the doubles and the triples) indexed separately, and only add triples once the next double would be bigger than the next triple. This means that the numbers are added to the list in order and I only need to check if the next double is also the next triple to ensure uniqueness. In other words, I just needed to combine the two linear sequences.

This way I could generate the list only to the number I was looking for, reducing my 20+ minute 60000 run to less than a second for all tests. I didn’t even have to use memoization or any particular optimizations either, but that could make it even faster (as multiples wouldn’t have to be calculated at every iteration).