The SSA’s baby name data isn’t perfect. In fact, when it comes to the earliest years on record, some of the data is downright misleading.
I think most of us who write about names regularly are aware of the issues with the data set, but we don’t mention it as often as we should. (Me included, of course.)
So I’m grateful that blogger David Taylor has created graphics to illustrate the problematic aspects of the U.S. Baby Names dataset. I can’t improve upon his explanations, so I’ll just embed his slideshow below and recommend that everyone go check out the original post.
4 thoughts on “Problems with the SSA’s baby name data”
Interesting on the history of the data, But I expected numbers on more modern records as well. Like counting how many babies are named below the top 1000(And the less then five names babies). Combining name spellings ect. I don’t care too much about the data anyway and think people over think it. But I’m curious on more matters.
What an interesting read – thanks for re-posting it :)
That was really interesting and included some new info about the data as well as confirming some ideas I’ve had for a long time. One is the use of nicknames as full names in the earlier years–it’s easy to get the impression from the SSA data that a large number of men were officially named Joe, Tom, Bill, Bob, etc. I’ve found it helpful to compare the SSA lists to counts that were made in earlier, pre-Internet years (for example, in the books by Dunkling & Gosling). There you don’t find Joe at all, though it’s #28 on the SSA list for 1890 and #34 in 1925. That leads me to conclude that most of those Joes were officially Josephs. On the other hand, these lists DO indicate that Harry was considerably more popular than Harold in the 1880s and 1890s, but Harold had taken the lead by the 1920s.
Thanks for posting this!