Tuesday, September 6, 2016

When Data Has a Mind of Its Own

When good data goes bad.

Can’t hurt to post another example of dirty data, and this one is pretty cool (or not, if you’re a geneticist).

BBC News, Aug 2016

“Researchers trying to raise awareness of the issue claim that the spreadsheet software automatically converts the names of certain genes into dates.”

“Gene symbols like SEPT2 (Septin 2) were found to be altered to "September 2".”

“The researchers claimed the problem is present in "approximately one-fifth of papers" that collated data in Excel documents.”

“Excel's automatic renaming of certain genes was first cited by the scientific community back in 2004, the Baker IDI study claims. Since then the problem has "increased at an annual rate of 15%" over the past five years.

