Sketchplanations

Get my new weekly sketch in your inbox

Join over 30,000 people learning something new in a moment each Sunday.

Chihuahua syndrome illustration: an analyst wonders at the number of dog breeds when most of them are misspellings of chihuahua

Chihuahua syndrome

Chihuahua syndrome refers to messy data from variations in spelling or input—Chihuahua is easy to misspell. The quality of your data matters—errors can creep in anywhere, particularly when people enter data. Garbage in, garbage out.

Here's Chris Groskopf quoted in Seeing with Fresh Eyes—Meaning, Space, Data, Truth by Edward Tufte:

"There is no worse way to screw up data than to let a single human type it in, without validation. I acquired a complete dog licensing database. Instead of requiring people registering their dog to choose a breed from a list, the system gave dog owners a text field to type into, so this database had 250 spellings of Chihuahua. Even the best tools can't save messy data. Beware of human-entered data."

—Chris Groskopf

Capitals, spaces, misspellings, hyphens, numbers stored as text, numbers entered as letters (I, O), accents, straight/curly apostrophes, dates out of order, languages, dialects, abbreviations, and more are all routes for misleading your analysis.

Spend time with your data.

The name The chihuahua syndrome is from Edward Tufte.

You’re welcome to use and share this image and text for non-commercial purposes with attribution. Go wild!
See licence

Buy Me A Coffee