0 votes

I'm trying to do a Regex search to find either Ř [Latin Capital R with Caron] or Ě [Latin Capital E with Caron].

I was able to make it work using this: [\x{0158}\x{011A}]

However, this does not work: [ŘĚ]

It appears that Regex is transliterating Ř to R and Ě to E, for purposes of the search.

Is is not possible to use these letters with diacritical marks to find them in text?

by (730 points)

1 Answer

0 votes

[ŘĚ] does work, ie it can search for those character points, but what may be confusing is that the search is also compiled for non-unicode searching in the local code page, which in this case translates to [RE].

In many cases this might be the desired behaviour but in your case if you're searching for the specific character and not a word it's not what you'd want. So the only way to search for just those characters is to specify the code point in hex as you've done.

by (31.4k points)
You are correct. So am I correct that using the hex values is the only way to ensure that one gets only those values?