Welcome to the Mythicsoft Q&A site for:

- Agent Ransack
- FileLocator Lite
- FileLocator Pro

Please feel free to ask any questions on these products or even answer other community member questions.

Useful Links:

- Contact Us
- Help Manuals
- Mythicsoft Home
0 votes

I performed a search through 939 PDF files using the expression:

(transition NEAR education) OR (LIKE transicao NEAR LIKE educacao)

The files listed in the result were narrowed down to 309.

The screenshot:

Search with failures

However I am concerned with the "warnings generated during search".

I realized a couple of files listed in the "warnings" (not included in the main list of the search result) do fit the search criteria. I mean, the files seems to be neglected/missed by FileLocator (false negative).

I realized the problem through two process:

1- From the same search window (without change any criteria) I just click on the button "start" and the results of the search included the neglected/missed file. I did it a couple of times, being sometimes the same file was considered in the result, and other times it was neglected/missed as a "warning".

2- To double check I created a new search having selected the file in "look in", so just it was searched. The result listed the file and in the tab "hits" the expected text was presented.

Is there any setting in FileLocator to provide more accuracy in the search in order to avoid "warnings" (false negatives) of files which should be included in the result?

Any other tip to handle "warnings generated during search"?

asked by (510 points)
edited by

1 Answer

0 votes

Update 4 May 2015: The specific non-ASCII PDF name issue identified here has been fixed in the latest version of FileLocator Pro (build 2092).


If any error occurs during PDF to text conversion, e.g. the PDF disallows text extraction, it will report a warning and try and search the PDF file as raw text, which probably won't work. You'll need to look at the PDFs and try and figure out why FLPro can't extract the text from the PDF.

PDF reading can fail if the file name contains unusual characters. Looking at your file names you can see the problem characters in this screenshot:

Problem file names

If you remove the unusual characters the PDFs are searched just fine. (We're working on making the PDF reader more robust with unusual characters in the file name.)

It might look like these names are the same (apart from the year):

1. Abbott‐Chapman - 2006
2. Abbott-Chapman - 2011

but if you look really closely '2' contains a normal hyphen (minus sign) but '1' doesn't. See this binary dump:

Hyphen differences

E2 80 90 - corresponds to the Unicode code point U+2010 - HYPHEN.

answered by (57.8k points)
edited by
Actually, I searched individually a PDF file that as formerly classified as "warning" and FileLocator could search and find text on the PDF. So it seems there is not problem in the PDF it self. The "warning" problem appears when the search is through thousands of PDFs. Would there be any relation on amount of files being searching and likelihood of a false warning?
The 'Additional information' has been cut off, could you please paste the full error message (especially for those with Error code: 1).
They seem normal, allow copy text (not protected), so I am wondering why they are getting error.
It's the file names, please see updated answer.
I realized all the problems are related to special character hifen -.

But as you can see in the image below, FLPro is considering many results with hifen and also accepted another PDF from a same author Abbott-Chapman containing hifen.

Why "Abbott‐Chapman - 2006" isn't considered (warning) and "Abbott-Chapman - 2011" is accepted as valid by FLPro?
If you look REALLY closely you'll see that there are two different characters there.
I've updated the answer.
Many thanks for attention and so detailed reply! When you mention  "we're working on making the PDF reader more robust with unusual characters in the file name", is it an improvement expect for the next major release of FLPro?
Yes, if not sooner.
...