Alan Whitehouse's Ramblings

Continuing to work until my heavy investment in lottery tickets finally pays off….

  • Categories

  • Archives

  • Deep Thought

    Historically speaking, all true change in the world has come thanks to leaders emerging, them taking charge and giving the masses someone to rally around. Can an intentionally "leaderless" movement survive or will it just slowly fade away?

  • Subscribe

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 19 other followers

  • Me!

  • Disclaimer

    The views expressed in this blog, while intelligent and always right, are strictly my own, and do not necessarily reflect the views of anyone else with which I am in any way affiliated. And don't forget, I own the rights to all information on this blog (except for the stuff I stole from other people).
  • Current Rant

    *START OF RANT*

    You want change, then get involved. Vote, run for office, go to shareholder meetings and contact advertisers or investors. Sitting around banging drums, singing kumbaya, smoking weed and having a camp out under the stars is not going to get you the change you want.

    *END OF RANT*
  • Admin Stuff

Merging Tables with Different Date Formats in PowerPivot

Posted by Alan on March 25, 2011

So in my prior post I detailed what it took to merge different data sources with different layouts and columns into a single table within PowerPivot.  While I was doing this one of the problems I ran into was the fact that there was no consistency within my data sources in regards to the date format.  Some of the date formats I found were:

  • MM/DD/YYYY
  • DD/MM/YYYY
  • MM/DD/YY
  • DD/MM/YY

On top of that I also had:

  • Empty Date Fields
  • Date Fields with Invalid Dates (i.e., 00 for the Month or Day)
  • Date Fields with Impossible Dates (i.e., 2601 for the Year)

Althought my imports worked fine, I found out that when you bring in dates in a mix of formats or with invalid values if you try to format the date column as a date PowerPivot errors out.  It is understandable if you think about it.  If you had 08/12/2010 and 12/08/2010 then your day and month might be wrong but the system could deal with it.  What it can’t seem to deal with is something like 03/28/2010 and 28/03/2010.  I also found out it definitely can’t deal with blank dates or months/days that are 00.

So my colleague Richard Mintz suggested we might try a CASE statement to put some logic into the query to adjust for the different formats.  But no matter what we tried the CASE statement kept erroring out.  Richard’s thought is that because of the fact that the process we had to use to merge the different Excel sheets utilizes the Access/Jet driver and Access does not support CASE this was the problem.   If anyone can confirm or deny this then I would love to hear from you.

In this whole process I am trying to avoid manually manipulating the source files each time we get a new data dump.  And since I could not deal with this issue in the query that meant dealing with it within the PowerPivot end-user environment.  As a reminder here is  a portion of my original query:

SELECT ['MasterData$'].*
FROM ['MasterData$']
UNION ALL
SELECT [DataSource1$].XX_ID as ‘UniqueID’,
[DataSource1$].TITLE as ‘Salutation’,
[DataSource1$].FNAME as ‘First Name’,
[DataSource1$].LNAME as ‘Last Name’,
” as ‘Title’,
” as ‘Company ID’,
[DataSource1$].COMPANY as ‘Company Name’,
[DataSource1$].ADDR1 as ‘Address 1′,
[DataSource1$].ADDR2 as ‘Address 2′,
[DataSource1$].CITY as ‘City’,
[DataSource1$].PROVINCE as ‘Province’,
[DataSource1$].POSTAL as ‘Postal Code’,
” as ‘Phone’,
[DataSource1$].LAST_CONTACT_DATE as ‘Imported Date’,
‘M/D/Y’ as ‘Imported Date Format’,
[DataSource1$].SALE_AMOUNT as ‘Sale Amount’,
‘DataSource1′ as ‘Source’
FROM C:\file_path\UglyDataSource1.xlsx.[DataSource1$]
UNION ALL (repeat with next data source and modified as necessary)……

Highlighted in red are two lines around the date issue that may seem a little strange.  I brought in the dates into a field I called “Imported Date” and with each data source I flagged the date format using the “Imported Date Format” field.  Lets shift gears and move to the PowerPivot environment to complete the work.   Take all I do in PowerPivot with a grain of salt as I am a novice at PowerPivot and this whole problem could possibly have been handled much easier than my approach. 

The screenshots show the whole process already completed, we will be working from left to right.  In the first screenshot below you can see how the raw dates come into the system.  I know that everything in this picture seems to be the same, but that is due to the fact that you are only seeing data from the first import source.  If you were to scroll down you would see the mess I am dealing with in my scenario.  So after I brought my data in the first thing I did was to highlight my column and change the data type from Text to Date.

 However when I tried that, I got the following lovely error:

 

So begins the process to fix my data.   The first thing I did was to deal with my dates that were blank.  For those I decided I would assign them a date that was so outrageous that everyone would immediately recognize that this was a plug.  For this example I used “12/31/9999″.   It is nice to know that PowerPivot has some serious life left in it in terms of the dates it can handle.



The next two steps were to deal with the situation where I had months or days that were stored as “00″.  For these I decided to change the value to “01″.  I don’t have the option of doing something like I did for the year with “9999″ so this was the next best thing.

After that I had to deal with the fact that the year was sometimes coming in as 2 digits (MM/DD/YY) and sometimes as 4 digits (MM/DD/YYYY).

 

After that fix, the final step was to bring it all together.  This was where I was able to adjust for the MM/DD/YYYY vs. DD/MM/YYYY issue and finally turn my text column into a date column.

I like doing my cleanup in steps in different columns as it is easier for me to troubleshoot, but there is nothing keeping you from doing this all in one step in one really big Excel function.  Again, if anyone has a better suggestion for how to deal with this issue, I would love to hear from you.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: