Merging Tables with Different Date Formats in PowerPivot
Posted by Alan on March 25, 2011
So in my prior post I detailed what it took to merge different data sources with different layouts and columns into a single table within PowerPivot. While I was doing this one of the problems I ran into was the fact that there was no consistency within my data sources in regards to the date format. Some of the date formats I found were:
On top of that I also had:
- Empty Date Fields
- Date Fields with Invalid Dates (i.e., 00 for the Month or Day)
- Date Fields with Impossible Dates (i.e., 2601 for the Year)
Althought my imports worked fine, I found out that when you bring in dates in a mix of formats or with invalid values if you try to format the date column as a date PowerPivot errors out. It is understandable if you think about it. If you had 08/12/2010 and 12/08/2010 then your day and month might be wrong but the system could deal with it. What it can’t seem to deal with is something like 03/28/2010 and 28/03/2010. I also found out it definitely can’t deal with blank dates or months/days that are 00.
So my colleague Richard Mintz suggested we might try a CASE statement to put some logic into the query to adjust for the different formats. But no matter what we tried the CASE statement kept erroring out. Richard’s thought is that because of the fact that the process we had to use to merge the different Excel sheets utilizes the Access/Jet driver and Access does not support CASE this was the problem. If anyone can confirm or deny this then I would love to hear from you.
In this whole process I am trying to avoid manually manipulating the source files each time we get a new data dump. And since I could not deal with this issue in the query that meant dealing with it within the PowerPivot end-user environment. As a reminder here is a portion of my original query:
SELECT [DataSource1$].XX_ID as ‘UniqueID’,
[DataSource1$].TITLE as ‘Salutation’,
[DataSource1$].FNAME as ‘First Name’,
[DataSource1$].LNAME as ‘Last Name’,
” as ‘Title’,
” as ‘Company ID’,
[DataSource1$].COMPANY as ‘Company Name’,
[DataSource1$].ADDR1 as ‘Address 1′,
[DataSource1$].ADDR2 as ‘Address 2′,
[DataSource1$].CITY as ‘City’,
[DataSource1$].PROVINCE as ‘Province’,
[DataSource1$].POSTAL as ‘Postal Code’,
” as ‘Phone’,
[DataSource1$].LAST_CONTACT_DATE as ‘Imported Date’,
‘M/D/Y’ as ‘Imported Date Format’,
[DataSource1$].SALE_AMOUNT as ‘Sale Amount’,
‘DataSource1′ as ‘Source’
UNION ALL (repeat with next data source and modified as necessary)……
Highlighted in red are two lines around the date issue that may seem a little strange. I brought in the dates into a field I called “Imported Date” and with each data source I flagged the date format using the “Imported Date Format” field. Lets shift gears and move to the PowerPivot environment to complete the work. Take all I do in PowerPivot with a grain of salt as I am a novice at PowerPivot and this whole problem could possibly have been handled much easier than my approach.
The screenshots show the whole process already completed, we will be working from left to right. In the first screenshot below you can see how the raw dates come into the system. I know that everything in this picture seems to be the same, but that is due to the fact that you are only seeing data from the first import source. If you were to scroll down you would see the mess I am dealing with in my scenario. So after I brought my data in the first thing I did was to highlight my column and change the data type from Text to Date.
However when I tried that, I got the following lovely error:
So begins the process to fix my data. The first thing I did was to deal with my dates that were blank. For those I decided I would assign them a date that was so outrageous that everyone would immediately recognize that this was a plug. For this example I used “12/31/9999″. It is nice to know that PowerPivot has some serious life left in it in terms of the dates it can handle.
The next two steps were to deal with the situation where I had months or days that were stored as “00”. For these I decided to change the value to “01”. I don’t have the option of doing something like I did for the year with “9999” so this was the next best thing.
After that I had to deal with the fact that the year was sometimes coming in as 2 digits (MM/DD/YY) and sometimes as 4 digits (MM/DD/YYYY).
After that fix, the final step was to bring it all together. This was where I was able to adjust for the MM/DD/YYYY vs. DD/MM/YYYY issue and finally turn my text column into a date column.
I like doing my cleanup in steps in different columns as it is easier for me to troubleshoot, but there is nothing keeping you from doing this all in one step in one really big Excel function. Again, if anyone has a better suggestion for how to deal with this issue, I would love to hear from you.