tag:blogger.com,1999:blog-1479757307568145576.post4593267143213944314..comments2024-03-04T23:57:43.870-05:00Comments on Bentley Historical Library Curation Team Blog: Normalizing Dates with OpenRefineUniversity of Michigan Bentley Historical Libraryhttp://www.blogger.com/profile/09456699983075988941noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-1479757307568145576.post-29221059896525932822015-07-06T11:55:23.741-04:002015-07-06T11:55:23.741-04:00Thanks for the comment! Glad you enjoyed the post....Thanks for the comment! Glad you enjoyed the post.<br /><br />That is a really helpful tip. I experimented a bit with the toDate function initially, but OpenRefine did not seem to reliably and accurately recognize the starting format. If I had known that you could include the format in the command, and then converted the date back into a string, it could have saved me some time and effort, at least with some of the more accurate and uniform kinds of dates. <br /><br />One of the benefits of using the somewhat more labor intensive method described in this post was that I was able to take a really close look at the dates by faceting on months, years, days, etc. to seek out incorrect data. We had a lot of dates that were invalid (e.g., "April 31"), others that were incorrectly typed (e.g., the begin date came after the end date). and others still that were not dates at all (e.g., four-digit addresses that had been improperly identified as dates by an encoding script many years ago). I'm not sure how toDate and toString would have handled those, but I will definitely be putting those functions to use for projects like this in the future.<br /><br />Thanks again for the great comment!<br />Anonymoushttps://www.blogger.com/profile/02478420937302320293noreply@blogger.comtag:blogger.com,1999:blog-1479757307568145576.post-41325308945875902502015-07-04T03:55:36.945-04:002015-07-04T03:55:36.945-04:00Really interesting post - thanks for writing this....Really interesting post - thanks for writing this.<br /><br />In case it is helpful, another strategy you can adopt with dates in OpenRefine is using the 'toDate' function. This converts things into OpenRefine's Date format. Once you have a date in this format you can convert it back into a string specifying the format you want for the string. When you us 'toDate' on a value OpenRefine will do its best to recognise the starting format, but sometimes you have to prompt it by including the format in the command.<br /><br />To take a simple example if you start with "10 May 1976" you can use:<br /><br />value.toDate("dd MMM yyyy").toString("yyyy-MM-dd")<br /><br />The 'dd MMM yyyy' tells OpenRefine the format of the date you are starting with and the 'yyyy-MM-dd' tells it the format you want to end up with - so in this case you'd get "1976-05-10".<br /><br />To work a real example - you have "May 1909-July 1968" and want "1909-05/1968-07". You can do this with:<br /><br />forEach(value.split("-"),v,v.toDate("MMM yyyy").toString("yyyy-MM")).join("/")<br /><br />This splits the original string into an array (using 'split') then for each member of the array it tries to convert it to a Date, then back to the correctly formatted string.<br /><br />Thanks again for the postostephenshttps://www.blogger.com/profile/09316058230103850251noreply@blogger.com