Even more name MADness

We have been looking (previously, previouslier) for names that are both Timeless and Goldilocks, meaning neither too popular nor too rare.  The initial definition of Timeless was "stays-in-range", meaning the number of years that a name was in the top 500 but not in the top 50.  Is Median Absolute Dispersion, or MADness, a better mathematical definition of Timelessness?  It's hard to say, in part because it's mixed up with popularity.

This graph shows years of popularity (rank), MADness, and median rank for 49,150 female names, which is all of the names used in the US since 1880 that show up in at least two different years:

We have three different variables going on here, so it's kind of busy.  MAD is shown on the Y axis.  A more consistent (Timeless?) name has a lower MAD score; on this graph, up is good.  The highest dots should be the most Timeless.   However, a name that only shows up a few times could also have a very high MAD score, so the X axis shows how many different years a name shows up in the data.  Names that only showed up once will have no deviation, hence a perfect but meaningless MAD score, so they are omitted.  By the way, all of the dots are randomly wiggled around up to a half-point to make the graph more readable, since otherwise the dots stack on top of and hide each other.

So the cluster up the left side is names that have been around all 133 years; if we had data going back a few more centuries, they would probably smooth out, though I suppose there must be a statistical cluster of names that are millenia old, like Muhammad or Aaron.  And the cluster on the right side is all the one-hit wonders, with bogus high MAD scores.  The third variable we are looking at is popularity, which is shown with color.  This is the median rank of a name, so lower median rank (blacker) is more popular.  By the time you can see blue a name is out of the top 500 (e.g., Theodora), and red or yellow mean it's way out there (Adonica).  So a Timeless (as measured by MAD), Goldilocks name would be on the left side, basically black, and somewhere in that top left area.  With this popularity scale, the top 1000 are all the same color black, so let's zoom in by cutting unpopular names (as measured by median rank). Where should we cut?  At median rank 2000, the cut falls between Hortensia (1999.5) and Lollie (2000).  But Norah (2004) is out, and Andrew (1996) is in.  Hmm.  Norah stands out as a rare (down at this ranking) full-133-year name.  The lowest full-133-year name is Isa, at 3152 (Nevada is at 2441).  Let's lower the cut to median rank 3152.

We are getting somewhere, but remember that the question we are trying to answer is, is MAD a better definition of Timeless than stays-in-range?  Stays-in-range gave us a very clear cluster of names, ten names with a perfect record of rank between 50 and 500 for 133 straight years.

Let's keep zooming in MAD and look for a similar cluster.

If there is such a cluster, I don't see it in this plot.  Let's flip around our axes for a different view.  What's the relationship between MAD and popularity (median rank)?  In this chart, MAD is on the Y axis as before; Median Rank moves to the X axis, with lower (left side) being a more popular name, and color showing number of years. A Timeless, Goldilocks name will be blue or black, higher than the other names (more consistent) but not all the way on the left (too popular).  Is there a cluster?

Maybe?  Nothing as obvious as when we use stays-in-range, but let's look at the actual names:

I'm still not seeing it.  Those names floating far above are yellow dots in the previous graph, meaning they've only been around for maybe 5 or 10 years, so their MAD is excellent but invalid. I'm ready to throw in the towel on MAD.

Let's check the male names just in case.  Here's the big picture:

Very similar.  One big difference between male and female names in the US is that for female names, Mary is so far ahead of the other names that there isn't a second or third place.  Mary doesn't even show up on these graphs because it breaks the math.  Mary's median rank is 1.  Mary's MAD is 0.  Mary is the #1 name for 76 years and #2 for 10 more.  Mary fell out of the top 50 ten years ago and that still hasn't touched its popularity.  The second most popular name is Elizabeth, in literally tenth place (median rank = 10), with no names in two through nine.  For the male names, it's not quite so extreme; James and John are neck and neck at the top, and William, Robert, Michael, and Charles all have single-digit median ranks. 

An obvious cluster of Timeless, Goldilocks name is no more evident here than with the female names.  Let's zoom again.  This time, the least popular in-all-133-years name is West, at median rank 1834, so let's cut there.  Just missing the cut is Kathleen, at 1834.5, followed by Gaylen and Kelsey.  Just making the cut are Fate and Sylvan.  This is very interesting because with the female names, the names around the cut (see above) were not completely unfamiliar, even though that cut line was at a less popular point, but these male names sound really weird to me.  I'm sure this means something, but all of these names are well outside any reasonable Goldilocks range so it's moot.  Let's see the male names above median rank 1834:

Again, I don't see any kind of clear Timeless, Goldilocks cluster.  Okay, MAD loses by a knockout to stays-in-range. 

I previously showed the best female names by the stays-in-range method; here are the male names:

And here's a zoom on the Timeless, Goldilocks male names:

I have, of course, blurred the absolute best ones, so that they don't get ruined.  Once again, you can look yourself using tools from my github ssn_names project.

