I have a client with a non indexed or catalogued collection of over 250,000 text documents(doc, docx, pdf) that may or may not contain a social security number. That collection is stored on a file server and there is a great need to know which of those documents contain a social security number.
What I want to do is build a solution around Agent Ransack or File locator Pro to search for possible Social Security numbers, return the string that is found and store filename, filepath and found string in a database. That collection is static so the tool would be used only once but it would be so helpful and save ton's of manual labor.
Will I find tools in the API that make this possible? It's my intention to use MS Visual Studio and C#.