ICYMI: Tips for searching on AmericanAncestors.org

[Editor’s note: This post originally appeared in Vita Brevis on 17 July 2014. Since the time of that posting, we have made enhancements to our search functionality on AmericanAncestors.org that return broader results without using wildcards.  The wildcard strategy still works as advertised, however.]

When we were deciding how our AmericanAncestors.org database search would work, one of the key considerations was that we didn’t want to return search results that contained a lot of ‘noise.’ On other websites, the database architects allowed for a certain (sometimes significant) number of irrelevant search results. This was undoubtedly intended to be helpful, but it is actually quite frustrating. So we decided to do ‘exact’ searches with a couple of twists. The goal was to give results that were exactly what you searched for. We spent quite a lot of time tuning our search algorithm, trying different approaches and analyzing the results. We’re pretty happy with our final approach, but it’s definitely helpful to understand how it works. And what the twists are.

Actually, I said that our searches are ‘exact’, but the ‘exact’ portion of the search applies to the surname, year range, record type, location, and any specific database or database type specified.

Twist #1 is that first names (which can include middle names and initials as well as maiden names) are searched with an ‘any match’ algorithm. So you will get a search result ‘hit’ by searching for given name ‘jon’ where ‘jon’ makes up any separate part of the given/middle/maiden name. For instance, searching our American Canadian Genealogical Society Index of Baptisms, Marriages, and Burials, 1840-2000, with ‘jon’ as a first name (and no other search fields filled-in) will returns hits for not only ‘Jon’ but also ‘Michael Jon,’ ‘Jon Alfred,’ ‘Jon Robert Jr,’ etc. Why did we do this? Well, often, a middle name or initial or suffix is not known by the searcher, and we feel that requiring an exact match in this case is too limiting. Any of those ‘Jons’ may be the one you’re looking for.

But what if you don’t want to do an exact search for last names?  It is certainly the case that some names have common spelling variations. And, in many cases, last names appear with the ‘as written’ spelling, often phonetically, as written by a town clerk or census enumerator, who often just wrote ‘what they heard.’ Sometimes when these phonetic surnames are indexed, we add spelling variations to make it possible to find the surname without having to resort to Twist #2. In our original database of Yarmouth, Massachusetts Vital Records to 1850,* we indexed last names as written. Those last names often contained many spelling variations, with ‘Eldridge’ being a prime example. In the Yarmouth records, ‘Eldridge’ was occasionally spelled as such and a search for ‘eldridge’ in the original Yarmouth to 1850 VR database results in 271 hits for that spelling. However, the surname variations of ‘Eldredg,’ ‘Eldred,’ ‘Eldreg,’ ‘Eldredge’ and others appear in those Yarmouth records.

Twist #2 involves the use of ‘wildcards.’ A search wildcard is a special character that represents any single character (a question mark, or ‘?’) or a sequence of any characters (an asterisk, or ‘*’). In the Eldridge example, with the original Yarmouth Vital Records to 1850 database, if you search for ‘eld*’ as a last name, you’ll get more hits. The asterisk allowed all the various spellings of the name that started with ‘eld’ to be found.*

My own last name ‘Sturgis’ is frequently spelled as ‘Sturges.’ Some branches of the family traditionally used the ‘e’ spelling, but mine usually used the ‘i’ spelling. And sometimes the same individual’s name was spelled with both variations. When I’m researching my own family, I always use the ‘?’ wildcard as part of the last name: ‘sturg?s’. This gets me both spelling variations.

The ‘?’ and ‘*’ wildcards can be used in any of the text fields, including First Name, Last Name, and keyword. If you haven’t tried using wildcards in searches, now’s the time to try them!

 

*Our original Yarmouth, Massachusetts Vital Records to 1850 database has been revised with the addition of first names and last name spelling variations and added to our Massachusetts Vital Records to 1850 database. A large number of previously ‘stand-alone’ databases have been given the same treatment.

 

About Sam Sturgis

Sam was born and raised in Ann Arbor, Michigan. He received B.S. and M.S. degrees in Psychology from Eastern Michigan University and worked as a Human Factors researcher in automotive safety for 13 years. He entered the field of commercial software development in 1983 and acted as software developer and development manager at Wang Laboratories and The Foxboro Company. Sam joined the NEHGS staff in 2005. Sam's interest in genealogy began shortly after moving to Massachusetts, when he and his family chanced upon the Sturgis Library in Barnstable, during a vacation on Cape Cod. There he discovered that he is a descendent of the Sturgis family that settled on Cape Cod in the 1630's. Sam and his wife Gail live in Medway, Massachusetts. They have two grown children: Katie, a Registered Nurse in Wrentham, and David, a software developer in Somerville.

6 thoughts on “ICYMI: Tips for searching on AmericanAncestors.org

  1. I have the most trouble with surnames where the first letter is mistranscribed. No search method seems to allow for the first character to be a wildcard. The best tactic seems to be to leave out the last name and look for families with the first names only.

    1. A very good point. Why not let us search the way the Scandinavians can–with the searching option “includes”? Then “ristian” will turn up CHRISTIAN as well as KRISTIAN.

  2. Thanks for all the heavy lifting. Much more lifting to do.

    Query: can one use ? several times in a word, or just once? Ditto for the *, though I suspect no, as it stands for a string of characters.

    Query: what is the point of returning hits as “Relevant” when such a return never relates to the desired search outcome at all in my experience? Why not just present results by default to, say, Last Name? In other words, what are the assumptions behind ranking and presenting hits that match the SPECIFIC criteria inputed? I see no logic to any Relevant output except when I’ve already separated the wheat from the chaff.

    Note: using Oldest to Newest as a sort alternative is defeated when a record has no assigned year date, or defaults to the source publication date, or is just treated as a null and so placed first in the results chain. And even if a decision is made to enter a year date, which year would it be? If only 1 year date available, well it’s got to be that. But if 2 from, say, a title reading “Ralph Paine (circa 1650–1727)”, which should it be or could it be both? Whatever the deciding criteria, when that option is selected, the assumptions should appear as a note on the results page.

    COMPLAINT: I have strong issues with the inaccurate information that is presented in many many results for Torrey’s NEM. As this information in that format is shared with Ancestry, new “wrong information” is being circulated. That is NOT a goal of the Society.

    To whit: see entry for Stephen Hopkins marrying Elizabeth Fisher. While the body of the text as presented as a hit clearly states just when and where they got married in England, the summary (which had to be typed in by someone for the Society) states 1617/18 in PLYMOUTH, MASSACHUSETTS.

    This is not an isolated case. It is true for ALL Great Migrators and others who married elsewhere. When typing the summary information in, a decision was made by someone that the First Identified Place in the US in the entry would be the place of marriage, irregardless of what Torrey himself had written (in this case London) or the simple logic of the event. The John Carver entry summary is even more ludicrous.

    Sure, part of the problem is the short title: New England Marriages. We know that it is actually a list of all people who were married BEFORE they came and/or married here after they arrived. But the searchable summaries make EVERY marriage strictly a New England one. HUH?

    Incorrect information in dbs is one thing. Sifting that out is part of the research we do. But creating false information is a disservice to what we are all trying to do.

    Given your priorities and resources, how can this be fixed?

  3. truly appreciate the discussions here. The tips will be helpful and it is good to know i’m not the only one surprised at some returns. That said, NEHGS is a superb resource for the serious researcher and my vote is KEEP IT UP! you are doing terrific work.

  4. Searching Kincaid/Kincade family, if I use “Kinc*d” will results include the Kincades or do I need to use “Kinc*d?”. Will using this last search word miss names that do not have a letter following the ‘d’?

  5. I searched the 1910 Fed Census in Massachusetts for first John Dunn (236 matches) then second for John Dunn? (14 matches). note all fields made explicit.

    I think it is safe to say that the addition of the trailing ? informs the search engine to return only surnames five characters in length.

    Stated another way, none of the existing 4-char Dunn matches are returned when the search calls for Dunn?.

    Tacking on a * instead of a ? at the end of the surname returns 279 matches, the Dunn, Dunne, and 43 others as well.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.