Basic Mazes

Frank Lynam

My grandfather introduced me to my first algorithm when I was six or seven: a basic maze. It was in QBasic, although I'm certain the approach was ported from an earlier BASIC. He'd print out long sheets of dot matrix printer paper and I'd solve the mazes by hand on the kitchen table.

Even today, implementing what I remember of that original algorithm here reeks of the tight memory constraints of 8 bit (or maybe even 4 bit?) computing.

*inhales deeply*

I love that smell!

The Idea

This is the basic idea:

Neat, right? I think so. Let's talk about what's going on here.

I call it the "worm" algorithm, because you basically just pick a cell and wander around over and over, making little worms.

For the first worm, when it runs into a dead end (Tron-style), it finalizes itself. It becomes the basis for the rest of your map.

Since every part of the map gets added from one of these worms, we can readily see the first interesting feature of this algorithm: there's gotta be only one way to get between any two points in the completed maze.

When there's just one worm, we don't connect it to itself, so we can know it won't have any loops. We connect the next worm to it at just one point, so we still have no loops. Finish the whole space like this, and there's only one way to get from any point to any other.

For all the worms after the first, they only become "real" if they hit a "real" part of the map that's already connected to the first worm. This creates a problem that the algorithm is non-deterministic: worms can, and do, get eliminated if they crash into themselves instead of part of the real map! Why? Well, that's the next interesting thing I want to talk about...

The Outback Kids' Menu Problem

We have, on occasion, taken the family to the Outback Steakhouse. For a while, I don't know if this is still the case, their kids' menu had a maze with unreachable portions. My kids actually figured this out themselves by coloring in all the reachable paths and then going "what the heck!?" I was pleasantly surprised, and said "I know exactly what they did wrong."

They didn't kill the bad worms. Watch:

Just like with the crayons, the colors make it pretty obvious there are (usually) parts of this maze you just can't reach. This illustrates one of the neat features of the worm algorithm, when done right: everywhere is reachable. But only if you kill the bad worms (the ones that crash into themselves).

Memory Management

A more nuanced feature of this algorithm is the super weird way it stored the maze. You can see in the code if you want more details, but I'll give the high level explanation here.

The browser you're reading this on probably uses 100-1,000x more memory to display this simple text file than the computer my grandfather introduced this algorithm to me even had. We used to care a lot about using memory well, because we didn't have much of it. In fact, the idea is back in vogue lately. Whoever wrote this algorithm seemed to care very much.

The array elements I'm using to store the cells on this maze are probably 8 bytes, or 64 bits. Maybe more, probably not less these days. Depends on your computer, since you're the one running the code. But this algorithm, this algorithm was designed for 4 bits: not even bytes, nibbles!

You know how far back you need to go to use nibbles? 4-bit computers! There are still 4-bit chips out there, but this algorithm would probably have worked with this bitwise design back on the TTL-based computer my grandfather built out of wood and discrete components for his electrical engineering degree! Wild!

You can check out the code, I've more-or-less reconstructed the logic I can recall from memory trying to stay true to the bitwise operations, but the idea is that you can build this whole maze using 4-bits per cell:

Bit 0	Bit 1	Bit 2	Bit 3
Null	Final	Down	Right
Whether this cell in the maze has part of a worm	Whether this cell has been connected to the root worm	Whether you can go down from this cell	Whether you can go right from this cell

For the uninitiated, bits are a single 1 or 0 in memory. The smallest unit of data storage possible, they can store a yes or a no. This algorithm, and the logic in it, was designed to use every single bit in the nibble to store a yes or a no that mattered to the way the algorithm works. That's at least 8x more efficient than how your computer is doing this math right now, just in terms of data storage.

Directions

Another dimension of this clever trick is those last two bits: down and right. Where's up and left? Well, if you wanna know if you can go up, check the cell one square up and see if you can go down! Neat trick!

Compare this information density with how I'd store a simple circular map in one of my first programs back when I was 8:

1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1

The 1s are walls and the 0s are places you can walk. Each of these would probably be stored in a 2-byte integer, so 16 bits. Now that same maze using the two-bit technique here:

3 2 1 0

We go from 25 2-byte array elements to 4 2-bit elements, which is 400 bits down to 8 with the same total information. Even giving me undeserved credit for imagining I could use individual bits for my 5x5 map, we're still looking at 8 bits here vs 25 bits there.

Logic

Beyond the clever direction tricks, the entire algorithm logic is basically stored in those first two bits. For the non-programmers out there, they might seem a little odd, so let me dig in here as well.

The first bit tells us whether we care about the direction bits. When you first store the map in memory, you'll have to set all the cells to something if you don't just want a big mess, and I do that with a -1. A -1 in binary is stored as all 1's, typically. So, our direction bits initially store "right" and "down" when you first build the array. How do we know that isn't part of the final map, or part of a prospective worm? From the first bit! If the first bit is 1, then the cell is null: it doesn't have valid data yet.

The second bit gets unset when we put data in as part of a prospective worm, setting the whole nibble to something from 0 (0000) to 3 (0011). When we finalize a worm, we set that second bit, giving us a value from 4 (0100) to 7 (0111). This lets us know when a prospective worm has collided with a finalized portion of the map.

With these bits, we can keep loads of info about what parts of the map mean what in a tiny space, and it seems to be just enough to implement this algorithm. The only extra info I end up needing to store elsewhere is where the head of the current prospective worm is, and whether this is the first worm or not. No stack, no heap, no FIFO, nothing. Just the 4-bit map, the current head of the current worm, and whether or not it's the first one.

Determinism

An obvious issue with this algorithm is that it is indeterminate, meaning it can take longer because sometimes the worms have to die. Watch it next to a deterministic one:

I gave them both the same starting cell to help illustrate, but the deterministic solution is almost always much faster regardless. And this difference gets worse non-linearly (much, much worse) as the size of the map scales up, because it just keeps getting harder for the new worms to find the old ones.

But, I think there's something special about the worm approach. The maps look different. How different is probably complex enough for a separate post, but suffice to say for this discussion, the difference is pretty self-evident.

This deterministic model works on a rough concept of nucleation. A single cell is selected as a nucleation site, and all adjacent cells are stored in a list as a crystalization region. One cell is selected in turn in the crystalization region to nucleate, connecting back to the completed cells, until the whole map is filled.

This implementation of the nucleation approach is not memory efficient, since it stores the crystalization region in a separate array. You could trade computation time for memory efficiency, though, so arguably this maze could be just as memory dense as the other. Maybe even more memory dense.

But its mazes are... boring? They have many short hallways. Not nearly as much space to get lost. Faster, but, worse.

We'll explore how to measure that in the next post =]