{"id":20,"date":"2017-07-09T04:28:25","date_gmt":"2017-07-09T04:28:25","guid":{"rendered":"http:\/\/edspace.com\/blog\/?p=20"},"modified":"2017-07-09T04:31:20","modified_gmt":"2017-07-09T04:31:20","slug":"algorithms-date-range-coalescing","status":"publish","type":"post","link":"http:\/\/edspace.com\/blog\/2017\/07\/09\/algorithms-date-range-coalescing\/","title":{"rendered":"Algorithms: Date Range Coalescing"},"content":{"rendered":"<p>I&#8217;ve done a lot of odd jobs in the software business over the years, including, but not limited to, automated UI testing, database management, package maintainer, network administration, system administration, SELinux\/SEAndroid policy writing, and download script writing. My favorite thing to do, though, is algorithm writing. I don&#8217;t get to do much of it, but I have fond-yet-frustrating memories of sitting on the floor of my cubical with a pad of paper furiously drawing out ideas so I can get them out of my head and put them into code.<\/p>\n<p>One of the early ones I did was a simple algorithm to coalesce a group of date ranges. The application for this algorithm would be for any situation where you needed to group blocks of time together, such as for a calendar scheduler. For example, if you need to put in a new appointment on the calendar of several, and you needed to see all available spots or simply find the first available time slot that all parties are available.<\/p>\n<p><!--more-->When first approached with this problem, my colleague and I had the simple approach of just grouping everything into half hour or even fifteen minute buckets, and we would just scrape along and find a time bucket everyone had unoccupied. I didn&#8217;t like that idea, as it meant that we would need to store discrete blocks of time in the database somehow. I thought a more graceful solution would be to find a way to locate availability by analyzing times.<\/p>\n<p>Visually speaking, this is an easy problem to solve if all your appointments are laid out next to each other like this:<\/p>\n<figure id=\"attachment_25\" aria-describedby=\"caption-attachment-25\" style=\"width: 1280px\" class=\"wp-caption alignleft\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"25\" data-permalink=\"http:\/\/edspace.com\/blog\/2017\/07\/09\/algorithms-date-range-coalescing\/sampleschedule\/\" data-orig-file=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?fit=1280%2C663\" data-orig-size=\"1280,663\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"SampleSchedule\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?fit=660%2C342\" class=\"wp-image-25 size-full\" src=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?resize=660%2C342\" alt=\"Crappy image courtesy of my Motorola Xoom and my bad handwriting.\" width=\"660\" height=\"342\" srcset=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?w=1280 1280w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?resize=300%2C155 300w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?resize=768%2C398 768w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/06\/SampleSchedule.png?resize=700%2C363 700w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><figcaption id=\"caption-attachment-25\" class=\"wp-caption-text\">Everyone has free time from 10-11.<\/figcaption><\/figure>\n<p>Sure, we can see that everyone has an open slot from 10 to 11. But usually, we have our data presented to the application like so:<\/p>\n<pre class=\"lang:js decode:true \">{Chris: [ [9:00, 10:00], [12:00, 13:00] ],\r\nSteve: [ [11:00, 12:00] ],\r\nJen: [ [11:00, 12:00] ],\r\nAnne: [ [9:00, 9:30], [11:30, 12:30] ]}<\/pre>\n<p>This is just a small example; if you have a much larger dataset, you can imagine this would be not as trivial to do. You have to approach this from a programmatic standpoint.<\/p>\n<h4>Attack Strategy<\/h4>\n<p>The general plan of attack is to try to group overlapping blocks of time into superblocks. You&#8217;ll notice that the open times sit in between the end of the last superblock and the beginning of the next superblock.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"30\" data-permalink=\"http:\/\/edspace.com\/blog\/2017\/07\/09\/algorithms-date-range-coalescing\/sampleschedule-superblocks\/\" data-orig-file=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?fit=1280%2C663\" data-orig-size=\"1280,663\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"SampleSchedule-superblocks\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?fit=660%2C342\" class=\"wp-image-30 size-full alignleft\" src=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?resize=660%2C342\" alt=\"\" width=\"660\" height=\"342\" srcset=\"https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?w=1280 1280w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?resize=300%2C155 300w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?resize=768%2C398 768w, https:\/\/i0.wp.com\/edspace.com\/blog\/wp-content\/uploads\/2017\/07\/SampleSchedule-superblocks.png?resize=700%2C363 700w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/p>\n<p>So how do we find the superblocks? We can borrow the simple approach from a basic parenthesis matching algorithm, which is:<\/p>\n<ol>\n<li>Start with <em>i=0<\/em><\/li>\n<li>Traverse your string with the following rules:<br \/>\nif the character is an opening parenthesis, then <em>i = i+1;<\/em><br \/>\nif the character is a closing parenthesis, then <em>i=i-1;<\/em><\/li>\n<li>Once <em>i==0<\/em>, then the parentheses are balanced<\/li>\n<\/ol>\n<p>When I first heard this algorithm, I thought it was so obvious that it wasn&#8217;t worth even mentioning. Then, I found myself doing it regularly, and I realize I probably wouldn&#8217;t have come up with it on my own.<\/p>\n<p>Anyhow, we can treat our date tuples like opening and closing parentheses, and when we reach zero, then we know that we&#8217;ve concluded a superblock.<\/p>\n<h4>Algorithm<\/h4>\n<p>The first thing we need to do is to make sure our data structures are all in place. The data type you use will largely depend on the language you are using, but anything that is sortable will do. You can use\u00a0<em>datetime<\/em> objects or UNIX epoch seconds as integers. For the sake of simplicity, we will use simple integers here with Python.<\/p>\n<pre class=\"lang:default decode:true \">dates = [(1,4),(2,8),(12,16),(16,21)]<\/pre>\n<p>&nbsp;<\/p>\n<p>Now remember, we don&#8217;t care who these values are associated with or how they&#8217;re currently sorted; you can pull them all from your database as one big query and dump it into your local variable.<\/p>\n<p>First, we need to convert these into a flat array of date values with some sort of distinguisher between an opening or closing. There are many ways to do this, such as using a list of objects that derive from a base class as a <em>StartingDate<\/em> or <em>EndingDate<\/em>, or an object with a <em>Value<\/em> and a <em>Type<\/em>, but I will choose to simply make it a tuple, with the first value being the date value, and the second being a 1 (starting) or -1 (ending). Then, we will sort this list.<\/p>\n<pre class=\"lang:python decode:true \">parsed_dates = []\r\nfor date in dates:\r\n    parsed_dates.extend([(date[0], 1),(date[1], -1)])\r\nparsed_dates.sort(key = lambda d: d[0])<\/pre>\n<p>Next, we iterate through our new <em>parsed_dates<\/em> structure and build our coalesced superblock list. The general rules we apply as such &#8212; for each date value:<\/p>\n<ul>\n<li>if our counter is zero before processing the date value, then create a new superblock object and set the starting value to the current value.<\/li>\n<li>increment or decrement our counter<\/li>\n<li>if our counter is zero after processing the date value, then set the end value of the current superblock object and append it to our coalesced superblock list.<\/li>\n<\/ul>\n<p>The code is as follows:<\/p>\n<pre class=\"lang:default decode:true\">count = 0\r\ncoalesced = []\r\nfor date in parsed_dates:\r\n    if count == 0:\r\n        current_block = [date[0], None]\r\n    count += date[1]\r\n    if count == 0:\r\n        current_block[1] = date[0]\r\n        coalesced.append((current_block[0], current_block[1]))<\/pre>\n<p>Running this will yield the result:<\/p>\n<pre class=\"lang:default decode:true \">[(1, 8), (12, 16), (16, 21)]<\/pre>\n<p>Perfect! Wait a minute&#8230; it shows two adjacent superblocks <em>(12, 16)<\/em> and <em>(16, 21)<\/em>. The gap between these values is 0, which provides very little benefit to the end user, as no one will try to schedule an appointment from 4:00 to 4:00. We&#8217;ll have to improvise this a little.<\/p>\n<p>To fix this problem, we&#8217;re going to add in a quick adjacent superblock detection guard. If the end of the last superblock is the same as the start of the current superblock, then simply remove the last superblock and retain the value of the last superblock. The end will be overridden the next time a closing action brings the value back down to zero.<\/p>\n<pre class=\"lang:python decode:true \"> for date in parsed_dates:\r\n        if count == 0:\r\n            if not coalesced or (coalesced[-1][1] != date[0]):\r\n                current_block = [date[0], None]\r\n            else:\r\n                coalesced.pop()\r\n        count += date[1]\r\n        if count == 0:\r\n            current_block[1] = date[0]\r\n            coalesced.append((current_block[0], current_block[1]))<\/pre>\n<p>We have to get a little cute with our <em>if<\/em> guard. The <em>coalesced[-1][1] != date[0]<\/em> checks the last inserted element, seeing if the values are different. The <em>not coalesced<\/em> statement handles the edge condition on the first iteration, when the <em>coalesced<\/em> list is empty so we don&#8217;t hit a range exception. If the conditions are not met (meaning the values are indeed the same), then we remove the last element from our return list. The <em>current_block<\/em> should still have the start date from the last iteration and will be added back when we close the current block.<\/p>\n<p>The complete codebase can be found on <a href=\"https:\/\/github.com\/edjmao\/algorithms\/blob\/master\/date_collapse.py\" target=\"_blank\" rel=\"noopener\">my GitHub page<\/a>. You&#8217;ll also find the first approach I took with it, involving two separate start\/end arrays. It works just the same, but this way presented here is more straight forward.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve done a lot of odd jobs in the software business over the years, including, but not limited to, automated UI testing, database management, package maintainer, network administration, system administration, SELinux\/SEAndroid policy writing, and download script writing. My favorite thing to do, though, is algorithm writing. I don&#8217;t get to do much of it, but &hellip; <a href=\"http:\/\/edspace.com\/blog\/2017\/07\/09\/algorithms-date-range-coalescing\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Algorithms: Date Range Coalescing<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[6],"tags":[],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-algorithms"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paDvvN-k","jetpack-related-posts":[],"_links":{"self":[{"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/posts\/20","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/comments?post=20"}],"version-history":[{"count":13,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/posts\/20\/revisions"}],"predecessor-version":[{"id":36,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/posts\/20\/revisions\/36"}],"wp:attachment":[{"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/media?parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/categories?post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/edspace.com\/blog\/wp-json\/wp\/v2\/tags?post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}