Everyone who has searched newspapers online will fail to find something. It happens incredibly often.
I have often heard researchers say “I can’t find a single article about a person or an event, even though I have searched for hours!”
Laying competency aside as a factor, the biggest reason is that scanning of one and two hundred year old newspapers, either from paper or from microfilm, produces way less than optimal results.
More importantly, one must know that searching through an index created by humans who have read the source material and then typed the index is far superior to having a machine/software scan and process a dusty old newspaper. Yet the massive size of newspaper collections prevents the creation of the index manually. You must expect inferior results and set your expectations accordingly.
Please take a look at the following list, and hopefully some of these errors and anomalies will provide you with some hints to overcome them and actually find what you are looking for. There are many others – but these are ones that I have personally experienced:
So don’t be discouraged by “lack of results” from doing online newspaper searches. You just need to “outsmart” OCR and try various combinations to get to those elusive ancestors. Be persistent.
A crowdsourcing example that I personally have used is that of correction on the actual online newspaper site, such as the aforementioned California Digital Newspaper Collection. In this example, registered users can provide edited text that is then incorporated into future searches. Kind of like a newspaper-related “pay it forward.” This capability is provided on that site and many others from the fine folks at DL Consulting who created the software used by the California collection as well as many other online newspaper sites.
For many more details about scanning, OCR and related subjects please read
an old article that is very informative - Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs, from the March/April 2009 publication of D-Lib magazine.
Good luck – be persistent and have reasonable expectations.
Leave a Reply.