Sunday, February 2, 2014

What is a Goldilocks, Timeless name?

Since moving to San Francisco a few years ago, seven of our friends in the Bay Area have given birth.  Five of the baby names I had never heard before, and one of two recognizable names is so rare that less than 100 other people got it the year he was born.  Only one of those seven names is remotely common; specifically, it's been a top-50 name in its gender every year since 1970.  For the other five names, the total number of American births ever recorded with those names are less than 1000, less than 50, about 500, about 600, and less than 50.  If you live in Israel or speak Hebrew, these names are all very easy to remember, but here, in English, I maintain a wall chart that I hide when my friends visit.

A decade ago I served as naming consultant to a couple who are not native English-speakers.  We came up with a list of requirements, such as recognizability to English speakers, unambiguous spelling and pronunciation, pronounceability by non-English-speaking grandparents, pleasing sound, the actual meaning of the name if any, and so forth.  This got me thinking about names and what they mean and what they are for.  Of course other people use your name to identify you; they also project onto you properties of other people with that name that they've met. You project onto yourself properties of your name.  Ideally one would select one's name to match the personalities already developed, which means waiting until perhaps age 25 to select a name.  But your temporary name may influence your personality, and just imagine having to get new checks all the time.

For the purpose of being identified by others, then, a good name is rare enough that you are usually the only person around with that name, but common enough that most people can recognize, spell, and pronounce it. And most names have a popularity spike, which means that the name is common in only one generation. So a good name is one that is, over an extended period of time, never too rare and also never too popular.  Let's call these properties Goldilocks and Timeless.  Can we quantify them?

The Social Security Administration provides a list of all birth names in the United States since 1880. The Baby Name Wizard does a very good job of providing charting tools based on this data (for example, me). But I wanted to do some more comprehensive investigation.

First, how can we define Too Popular and Too Rare?  My instinct is to declare Too Popular to be in the top 50, and Too Rare to be out of the top 500.  If we check the data, does this seem right?  For a pseudo-random sample, let's look at the 5th, 50th, 500th, and 5000th most popular female names in the last year of each decade since 1880:

I recognize all of the #5 names, although some sound very dated.  I recognize all of the #50s.  I've probably seen all of the #500s at least once, or they are alternate spellings of something I've seen.  The #5000s are completely novel to me, although a few seem like distant variants of more popular names. By the way, merging similar names (Sara, Sarah) is something that the Baby Name Wizard does very well.  I'm simply going to ignore the issue in my data analysis.

Note that US names are getting more diverse over time; before the 1950s, there simply weren't enough unique female names to get down to 5000.  The big caveat here is that the SSN raw data, for purposes of privacy, excludes names in years where there are less than five people born with that name.  So there are more than 5000 unique female names in 1940, but when you are down in the 5000s each name belongs to about two people per year. Similarly, by the time we get to #500, and especially #5000, there are a bunch of ties, so the #5000s are all randomly selected from all names with e.g., 5 or 7 or 10 owners.  The average #5 name in this set has 19,000 people; #50 under 5000, #500 under 300, and #5000 only 15.

I think this data validates my targets of 50 and 500.  With a top 5 name, you will know other people with your name.  With a top 50 name, if it were geographically evenly spread, there would still be a hundred people per state with your name and your same birth year, so you probably will still meet people with your name, but not every day.  At #500, I'm guessing you don't meet people with your exact name very often.  At #5000, everybody in the country with your name and birth year would fit into a van.

So I'm going to stick with my gut definition of Goldilocks: popularity between  50th and 500th.  And Timeless would mean Goldilocks over an extended period of time, maybe even back to 1880.  Are there any Timeless, Goldilocks name?

There are 63,246 distinct female names in the data.  61,373 have never been in the top 500 even once.  Of the remaining 1873,

1594 have never been in the top 50. The Y axis shows how many years nice 1880 the name has been out of the top 50, so those 1594 are the ones crammed up against the top so tight that you can't even read them.  Most of those are Too Rare, because not only have they never been in the top 50, they've also never been in the top 500.  That's shown by the X axis: the number of years that a name made the top 500.  The Too Rare names have spent less than a hundred years in the top 500.  Examples include Danika, Zelma, Sophronia, and Sue.

Sue seems common to me, suggesting the limitations of this kind of analysis where I don't merge Sue, Susan, and other related names.  Let's look in more detail.  Sue has never been top-50, and has 95 top-500 years since 1880. Pretty good.  But:

Basically Sue ran perfectly well in the 200s for fifty years but then got greedy in the 30s and 40s, peaking just below the top 50 names.  And then everybody got sick of Sue and it declined and then disappeared by the 1980s.  Possibly also tainted by association with Susan, which has 40 very peaky top-50 years, centered around the 1950s).

Meanwhile, a nearby cluster of names, the One-hit wonders, have been popular for a few years, but invisible the rest of the time.  For example, Harper (2010s),  Caitlin (1980s/90s), Noreen (1940s), and Daisy (1890s, but making a comeback in the 2010s to become a two-hit wonder).

Some names are always popular.  Always too popular. Mary was the #1 female name from 1880 to 1946, and then 1 or 2 for another two decades, and then still top 50 for three more decades, until finally dropping out of the top 100 in 2009.  Elizabeth, Anna, Margaret, and Sarah are some other perennials.

Finally, let's close in on the Timeless Goldilocks names.  80 female names have been popular enough at least 100 times and rare enough at least 100 times.  Some of these, though, are still really peaky, like Alma or Teresa. Let's narrow it down to names that have been in the zone almost every single year.  Claudia, for example, is basically timeless.  We might do a bit better with a more targeted definition of Goldilocks, such as a mathematical expression of how far the name deviates from the average year by year.  By our original definition of Goldilocks, there are exactly 10 female names that have always been in the top 500 but never been in the top 50.  I'll zoom in on the chart so that you can see them (I've added a bit of jitter so that names that exactly overlap are shifted a bit randomly to be more legible):

Oh yes, and I've blurred out these names because, if this blog post were to become popular, then by definition these names would all be ruined.  If you want to find the most Timeless, most Goldilocks names in the United States, you can do the analysis yourself.  I'm sharing my scripts at my Github ssn_names project.  By the way, in addition to the ten female names, there are 31 male names that meet the same criteria every single year since 1880.  Joel is one of them.

For what it's worth, Goldilocks has never been a name in the US, but Goldie has been around for a long time as a rare male and female name.

One more thing. When I started hanging around a lot more Jews, I caught on that Semitic names ending in El have religious meanings.
In Northwest Semitic usage El was both a generic word for any "god" and the special name or title of a particular God who was distinguished from other gods as being "the god", or in the monotheistic sense, God.
So Samuel means "God has heard", Michael "Who is like God?" (how would you like your name to be a rhetorical question?), Angel "messenger of God", Ariel "lion of God", and so forth. And then there are names like Joseph, meaning "God will increase". And you might wonder, where's the El? In English, we translate many historical names for a monotheistic god into God, but historically that god had several names, including Yahweh/Yehowah/Jehovah. So the Jo in Joseph is the Hebrew god YHWH. Which means that my name, Joel, is "Y El", or "YHWH Elohim" or "God God", or "Yahweh is El" or "It has become politically expedient for it to have always been the case that this one tribal god called Yahweh is and has always been the same entity as that other tribal god, El, and anybody who says differently is going to get stabbed." This is the root content of the Sh'ma, arguably the holiest Jewish prayer.

My given name is not and has never been a reflection of my values.

I have therefore been experimenting for several years with using my middle name, S. My mother and sister are willing to use it consistently, and I've gotten to the point where I respond to it more often then not, and now seems like as good a time as any to start a wider rollout. Please feel free to refer to me as either Joel or S, with my personal preference being S.

The origin of my middle name, for the curious, is that my parents couldn't agree between two different names beginning with S. Same as Harry S Truman. Unlike Truman, who was only joking when he told reporters it should be spelled without a period, my middle name is actually spelled without a period
Wrong: S.
Right: S
Just like on my birth certificate:

Damn it!

I've never tended to get upset when someone mis-pronounces "Joel". I use a one-syllable American pronunciation, but having lived in various countries I'm used to anything from "Jo-el" to "Zholl". What does bother me is when somebody spells my middle name incorrectly, usually somebody filling out official paperwork and adding a period to it. As you can see, the very first time that happened was the very first time it could have happened. This document was filed when my parents noticed the problem, a year after my birth:

See also part 2 and part 3, about the quest for a better mathematical definition of timeless.

No comments :

Post a Comment