

You can specify a range of characters by using a hyphen. Used to escape characters that are treated literally within regular expressions or alternatively to unescape special characters Lists the regular expression metacharacter constructs used in the examples in this lesson a more complete, but not exhaustive table with examples is listed at the end of the lesson.
Java regex predefined character classes how to#
Regular expressions are a large topic which you could write an entire book on and people have, but here we will just cover the basics of pattern matching to get a feel for how to use them. Searching through data for specific characters or groups of characters is known as pattern matching and is generally done from the left to the right of the input character sequence. The metacharacters are used to represent concepts such as positioning, quantity and character types. Well as metacharacters which make a pattern we can use to match data.

A regular expressions is a string containing normal characters as (I suspect if Python were re-designed from scratch today, or even way back when raw strings were added, it would go with the error.In this lesson we look at regular expressions ( regex) and how we can use regular expression patterns for matching data. In general, Perl went with maximal compatibility with C (and Java with C++) as a high priority in many areas, where Python put more priority on what made more intuitive sense to a programming teacher. If you're wondering why Python made a different design choice for unknown escapes from Perl and Java… As far as I know, that's not covered in the official Design FAQ and hasn't been directly addressed by Guido. The right thing to do is to escape your backslashes, or use the appropriate raw-string or regex-literal syntax for your language.

When I see unescaped "abc\sdef", I think I know, but I may be wrong, and I have to go look it up or try it in the interpreter to find out. Why not? Because, even if you're absolutely sure you've memorized the escape sequences, do you really want to make that a requirement for anyone who wants to read (or maintain) your code? When I see "abc\\sdef" or r"abc\sdef", I immediately know exactly what it means. So, if you have the list of known escapes memorized, you can sometimes get away with not escaping backslashes in Python. ', you don't need '\\s', while in Java and Perl, you need to escape the backslashes for both.Īnd there are languages that make the third choice, treating unknown escape sequences as errors. The only difference is that in Python, unknown escape sequences like '\s' resolve to themselves, while in Java and Perl they resolve to just 's'. This leads to problems in all three languages-and, in fact, for many, many other languages.įor example, all three languages will convert '\\' into a single backslash, '\n' into a newline, etc., before they can get to the regex compiler. Regex also uses C-style backslashes for escapes. Java, Perl, and Python all use C-style backslashes for escapes. So, I assume that there's a good reason for this decision. Besides being a little confusing, it's just messy. Python only requires \ either way, yet Perl and Java mandate \\ when dealing with "".

I'm going to assume that it wasn't arbitrary. Is this a consequence of something else? Or does it in some way simplify some sort of interaction(s) or what have you? Why do both Java and Perl treat escape sequences differently than special regex characters (when they're both encapsulated by ""), yet, python doesn't?Īs in why did the designers make the choice for escape sequences, like \n or \t, to require one backslash, but for predefined character classes, like \s, to require two (while in "")? I looked up Predefined Character Classes(Java), and it simply said: "If you are using an escaped construct within a string literal, you must precede the backslash with another backslash for the string to compile." In Python, one could use either "\s" or '\s'.īoth Java and Perl seem to treat special regex characters encapsulated by "" the same. In Java, this would be valid: String s = "The End" My $subStr = "\s" #Does NOT work, needs to be "\\s" or '\s' In Perl, the following would not be valid: my $sentence = "The End" Perl, like Java and Python, has \s, the special regex character that matches whitespace, in addition to other special characters.
