The results of a Google search are often unpredictable, unreproducible, and incomprehensible. For consumers, who are generally only looking at the first 5 results, that is not really a problem. However, for researchers it is different. Even though the Google search engine was never created for researchers, it is still one of the most relevant search tools due to the enormous size of the index. But how to receive more relevant results from that index? In this short blog post I will discuss how a contra-intuitive use of Boolean operators can help to deal with the ‘Booleish’ logic of Google search index.
Searching for an individual
Let’s assume we are investigating an individual and search his or her name in the Google search engine as an exact string, hence using quotation marks. With any luck we’ll find a few thousand hits. I will use searching on my own name as an easy example here.
Getting your name back with 8.270 results in Google is good for your ego of course. However that feeling never lasts long because no matter how hard you try to reach that 8.270th result, you will never reach it. Usually after 10 to 15 pages you have reached the end of the results and only a few (hundred) accessible results remain:
The same happens to more common names, look for example at the results when I search on the most common Dutch name “Jan Jansen”:
Initially we get 715.000 results, however when we’re trying to reach the last page of the results, we get stuck this time on page 19 with ‘only’ 185 results:
Obviously, that cannot be correct. I know that there are far more than 103 pages indexed by Google where my name appears and certainly there are way more than 185 pages where the name Jan Jansen appears. So we can draw the conclusion the Google is culling the results to what they think that ‘normal’ people are looking for. So how can we reveal these hidden results?
A fist step could be to ‘repeat the search with the omitted results included’ as Google suggests to us on the bottom to the last page:
So let’s try that with search for myself again, which then gives a total of 8.960 results on the first page and ultimately 302 results on page 31:
302 results is already a bit better for my ego, however still far from the initial total of 8.960 results. In addition, these 302 results likely also contain more (near) duplicates. Lastly, I happen to know that there are pages in the Google index where my name appears which are not shown among these 302 results.
Now let’s trick Google
This is where we have to start using Boolean operators counter-intuitively. If we add a search term, normally we would expect that the number of results will go down. Searching for “Ludo Block” should produce more results than searching for “Ludo Block” AND [keyword]. Following Boolean logic, the results of the latter search should be a subset of the results of the first search.
And indeed, if we search for “Ludo Block” AND terrorism and we also repeat the search with the omitted results included, the initial total number of results that Google returns is lower, about 25% of the 8.980 total results when searching without the added keyword:
However, if we now try to reach the last page, something interesting happens:
What? The number of accessible results is higher than when searching without the added keyword?
Obviously under Boolean logic this result cannot be correct, and confirms that Google only shows a small part of the total actual results in its index. And in fact, in the results shown after searching on “Ludo Block” AND terrorism I have seen quite a number of results that did not show when searching broader on “Ludo Block”.
In other words, the effect of searching with added keywords, counter-intuively connected with the AND (and not OR) operator, is that (partly) different subsets of the total potential results from the Google index are shown. Hence, multiple searches with less potential results (due to an added keyword) in the end provide more unique results than one broader search. Less is indeed more.
Which keywords to add?
The question now remains is which keywords to add to tap into the ‘hidden’ part of the Google index? Obviously the keywords need to have some relation with the subject of the search. If not, Google starts take matters in its own hands. For example look at this:
I have obviously nothing with offshore platforms (really, I do not) however Google does not like to show ‘0’ results so it starts amending your query and showing results of what it thinks that you may have meant. For a researcher these are (almost) all false positives.
To conclude, the keyword should be chosen wisely and could for example be derived from already known parts of your subject’s life. That may be a trial and error process, however the most important lesson is that you know that there is a way to obtain more relevant results from the Google index.