Welcome to the Mythicsoft Q&A site for:

- Agent Ransack
- FileLocator Lite
- FileLocator Pro

Please feel free to ask any questions on these products or even answer other community member questions.

Useful Links:

- Contact Us
- Help Manuals
- Mythicsoft Home
0 votes

I'm trying to search for files that include have text followed immediately by a line feed (\n) with nothing in between the text and the line feed (so ABC\n should return a match but DEF\r\n should not). Other files in the same path will have the ABC\r\n or the DEF\r pattern and I do not want to match them.

What I'm finding is that even when I turn off CR (by itself) as an EOL indicator in the configuration, the system will not match using \n (line feed) and I must still use \r (carriage return) to indicate the EOL (though they are not the same), and it is still returning a match to ABC\n when the item in question actually matches the pattern ABC\r\n.

Lastly, does the software as a whole use a different regular expressions engine than the tester? When I test all of the expressions described in the Regular Expressions Tester, my regex searches are working exactly as desired (with \n being treated as distinct from \r and with it not returning a match when I'm searching for ABC\n and the text being searched actually matches a pattern of ABC\r\n.

Am I missing something?

by (50 points)

1 Answer

+1 vote

Version 8 or higher

The only containing text expression type that doesn't remove the CR or LF characters is the Multi-line RegEx. If you choose that option you should be able to search for CR, LF, e.g.

Containing Text: ABC\n

would match to ABC with a LF but not ABC with CR LF.

Previous answer (ie earlier versions)

FileLocator Pro does not pass the raw text to the regex engine. The text files are loaded and valid EOL characters are replaced with \r\n, invalid EOL characters are removed.

This occurs at quite a low level in the file processing process. You could write a COM component to replace the file reading process with your own. You'd need to implement IExtDataInterpreter which has three methods:

HRESULT Open([in] BSTR bstrPathName);
HRESULT GetNextLine([out] BSTR* pbstrText, [out] LONG* pnLine);
HRESULT Close(void);

The text from GetNextLine is passed directly to the search engine without any further processing. If you'd like more information please contact Tech Support.

Another option would be to write a Custom Extension that converts the lone \n into something above the 0x1f range for searching.

by (30.3k points)
Are there any other characters that are modified before they hit the regex engine?  Also, does the API do this as well, or just the program proper?  If the API doesn't do it, I might be able to work around it by writing my own front-end.
All characters in the range 0x00-0x1f are removed from the input file (apart from 0x09). I'll append the answer with some additional information about the API.
Thank you for your help!  :)
Updated with version 8 information.
...