Key Word Searching – What Is It? And How Do I Do It (Well)?
December 09, 2016
A key word search is a basic search technique that involves searching for one or more words within or across a collection of documents/files. Typically, the purpose of a key word search in a litigation is to limit the universe of potentially responsive data that one must process and review in order to prosecute/defend a lawsuit. (Note, this blog relates to processing and reviewing data. It presupposes counsel has preserved data in a robust and comprehensive manner). Limiting the volume of data you process and review can result in tremendous cost savings to one’s client. However, it must be done well so as to avoid pitfalls and resulting costs. Below are some tips that can be helpful when thinking about key word search terms. For example, one wants to give thought to their key words so as to avoid – to the greatest extent possible – “false hits” (a search term “hit” within a document, but not for the meaning that was intended). For example, the term “comp*” would return “compensation” and “comp” (the intended terms), but would also return “computer” (not intended). Computer would be the “false hit.”
1. Terms of four or fewer characters often result in false hits. Consider, for example, if one were to use “IT” as a search term hoping for hits relevant to information technology. However, “IT” appears in so many other words the results of such a search would include many false hits. This in turn, would increase the cost/time to review the hits to assess same for responsiveness.
2. Searching for numbers can return unwanted results. This can be relevant if a patent is in issue or zip codes are being searched, etc. If the numerical term is not quoted properly, the result may be skewed. Also, like (1) supra, searching for 1,000 will also return 1,000,000.
3. Avoid using wildcards. If you want to find “contract” or “contracts” then don’t use “contract*” as a search term. Simply provide both variations of the word. If you must use a wildcard, refrain from leading with a wildcard character. You may get the result you are looking for, but you will also bring a lot of unwanted hits with it.
4. Searching for custodian names is ill advised especially if that individual is part of the collection. Think about it – if you search for John Doe and all of John’s emails were collected as part of the process, you are going to get “hits” on ALL OF John’s documents.
5. Sample documents with the proposed terms. Before deciding on search terms with the opposing party, try to actually sample documents with the proposed terms.
6. Give some thought to your search hit expectations. For example, did you expect a 20% return rate and you are getting 90%, or vice versa? If so, reconsider your terms.
7. Always consider a “file type exclusion” list. For example, if there are no audio files or photos at issue in the lawsuit, then eliminate .wav and .jpegs. Other files types to consider excluding are EXE, DLL, and system files.
8. Use the “w/2” proximity search between the first and last names of persons. (John w/2 Doe) will pull back John Doe; Doe, John; John P. Doe; Doe, John P.
9. Suggest expanding first names with known nicknames. “Bill Johnson” could be searched with ((Bill OR William OR Will) w/2 Johnson). You will obviously need to gather any special nicknames from the client (this would be true of project names or code names assigned to different contracts, too etc).
10. Use domain names when searching for/identifying potentially privileged documents. The term (“farrellfritz.com”) for example would pick up all email addresses from that domain. Great search to identify communications with outside counsel.
Developing effective key words is very much an iterative and thought intensive process. These tips will be helpful but I strongly advise sampling “hits” before committing to search terms.