Algorithms: Date Range Coalescing

I’ve done a lot of odd jobs in the software business over the years, including, but not limited to, automated UI testing, database management, package maintainer, network administration, system administration, SELinux/SEAndroid policy writing, and download script writing. My favorite thing to do, though, is algorithm writing. I don’t get to do much of it, but I have fond-yet-frustrating memories of sitting on the floor of my cubical with a pad of paper furiously drawing out ideas so I can get them out of my head and put them into code.

One of the early ones I did was a simple algorithm to coalesce a group of date ranges. The application for this algorithm would be for any situation where you needed to group blocks of time together, such as for a calendar scheduler. For example, if you need to put in a new appointment on the calendar of several, and you needed to see all available spots or simply find the first available time slot that all parties are available.

When first approached with this problem, my colleague and I had the simple approach of just grouping everything into half hour or even fifteen minute buckets, and we would just scrape along and find a time bucket everyone had unoccupied. I didn’t like that idea, as it meant that we would need to store discrete blocks of time in the database somehow. I thought a more graceful solution would be to find a way to locate availability by analyzing times.

Visually speaking, this is an easy problem to solve if all your appointments are laid out next to each other like this:

Crappy image courtesy of my Motorola Xoom and my bad handwriting.
Everyone has free time from 10-11.

Sure, we can see that everyone has an open slot from 10 to 11. But usually, we have our data presented to the application like so:

{Chris: [ [9:00, 10:00], [12:00, 13:00] ],
Steve: [ [11:00, 12:00] ],
Jen: [ [11:00, 12:00] ],
Anne: [ [9:00, 9:30], [11:30, 12:30] ]}

This is just a small example; if you have a much larger dataset, you can imagine this would be not as trivial to do. You have to approach this from a programmatic standpoint.

Attack Strategy

The general plan of attack is to try to group overlapping blocks of time into superblocks. You’ll notice that the open times sit in between the end of the last superblock and the beginning of the next superblock.

So how do we find the superblocks? We can borrow the simple approach from a basic parenthesis matching algorithm, which is:

  1. Start with i=0
  2. Traverse your string with the following rules:
    if the character is an opening parenthesis, then i = i+1;
    if the character is a closing parenthesis, then i=i-1;
  3. Once i==0, then the parentheses are balanced

When I first heard this algorithm, I thought it was so obvious that it wasn’t worth even mentioning. Then, I found myself doing it regularly, and I realize I probably wouldn’t have come up with it on my own.

Anyhow, we can treat our date tuples like opening and closing parentheses, and when we reach zero, then we know that we’ve concluded a superblock.

Algorithm

The first thing we need to do is to make sure our data structures are all in place. The data type you use will largely depend on the language you are using, but anything that is sortable will do. You can use datetime objects or UNIX epoch seconds as integers. For the sake of simplicity, we will use simple integers here with Python.

dates = [(1,4),(2,8),(12,16),(16,21)]

 

Now remember, we don’t care who these values are associated with or how they’re currently sorted; you can pull them all from your database as one big query and dump it into your local variable.

First, we need to convert these into a flat array of date values with some sort of distinguisher between an opening or closing. There are many ways to do this, such as using a list of objects that derive from a base class as a StartingDate or EndingDate, or an object with a Value and a Type, but I will choose to simply make it a tuple, with the first value being the date value, and the second being a 1 (starting) or -1 (ending). Then, we will sort this list.

parsed_dates = []
for date in dates:
    parsed_dates.extend([(date[0], 1),(date[1], -1)])
parsed_dates.sort(key = lambda d: d[0])

Next, we iterate through our new parsed_dates structure and build our coalesced superblock list. The general rules we apply as such — for each date value:

  • if our counter is zero before processing the date value, then create a new superblock object and set the starting value to the current value.
  • increment or decrement our counter
  • if our counter is zero after processing the date value, then set the end value of the current superblock object and append it to our coalesced superblock list.

The code is as follows:

count = 0
coalesced = []
for date in parsed_dates:
    if count == 0:
        current_block = [date[0], None]
    count += date[1]
    if count == 0:
        current_block[1] = date[0]
        coalesced.append((current_block[0], current_block[1]))

Running this will yield the result:

[(1, 8), (12, 16), (16, 21)]

Perfect! Wait a minute… it shows two adjacent superblocks (12, 16) and (16, 21). The gap between these values is 0, which provides very little benefit to the end user, as no one will try to schedule an appointment from 4:00 to 4:00. We’ll have to improvise this a little.

To fix this problem, we’re going to add in a quick adjacent superblock detection guard. If the end of the last superblock is the same as the start of the current superblock, then simply remove the last superblock and retain the value of the last superblock. The end will be overridden the next time a closing action brings the value back down to zero.

 for date in parsed_dates:
        if count == 0:
            if not coalesced or (coalesced[-1][1] != date[0]):
                current_block = [date[0], None]
            else:
                coalesced.pop()
        count += date[1]
        if count == 0:
            current_block[1] = date[0]
            coalesced.append((current_block[0], current_block[1]))

We have to get a little cute with our if guard. The coalesced[-1][1] != date[0] checks the last inserted element, seeing if the values are different. The not coalesced statement handles the edge condition on the first iteration, when the coalesced list is empty so we don’t hit a range exception. If the conditions are not met (meaning the values are indeed the same), then we remove the last element from our return list. The current_block should still have the start date from the last iteration and will be added back when we close the current block.

The complete codebase can be found on my GitHub page. You’ll also find the first approach I took with it, involving two separate start/end arrays. It works just the same, but this way presented here is more straight forward.

Leave a Reply

Your email address will not be published. Required fields are marked *