Skip to main content

Regular Expression

These are the commonly used regular expression syntax.


Basic Regex.
1. Input of literal will match whatever is inputted. It is case sensitive.
For example /s/ will match all the 's' in cats , sands but will not match "S" in Superman.

2.  $ ^ * + ? . ( ) [ ] { } | \ 
The above are special characters , and must be escaped with a backslash. You can safely escape all special characters without errors.

Spaces (" ") are not special characters , will be matched without backslash.

3.  /./
A period means any character. If you want to find "." only . Use backslash /\./

4. Regex will match any pattern you typed on it. If you typed /cat / it will match a "cat" followed by a space " " exactly. This is known as concatenation. The sequence of pattern matters.

5. If you want to choose one or the other,  /(cat|dog|rabbit)/ it will choose either cat , dog or rabbit. This is called Alternation.

6. /launch/i
If you want an case insensitive regex, use an 'i' after the last forward slash.

Character Class
It will match any single occurrence of any pattern inside [] . Almost similar to alternation.

1. Only these ^ \ - [  are special characters inside character class []. They need to escaped with "\" back slash.

2. Range of characters. [a-z] means regex will match any characters from a to z. Use [A-Z] for capital. Don't use [a-Z] or [A-z] as there will be bugs. Use [a-zA-Z0-9] to match all alphanumeric.

3. Range of characters. Don't build ranges [*-^] with non-alphanumeric.

4. Negation . Use  to represent "other than". Eg : [^a-z] means any characters other than a to z.


Character Class - short-cut
Short-cut for character class

1. A period (.) represents any character. (except newline) . If period inside [] like [.] , it represents the literal period.

2. \s = white space character ; \S = non-white space character.
white space is space (" "), tab ("\t"), vertical tab ("\v") , carriage return ("\r") , line feed ("\n") and form feed ("\f").

3. additional short cut
ShortcutMeaning
\dAny decimal digit (0-9)
\DAny character but a decimal digit
\hAny hexadecimal digit (0-9, A-F, a-f) (ruby only)
\HAny character but a hexadecimal digit (ruby only)
4. \w matches word character . \W matches non-word character
word-character is all letters , all numbers and underscore (_).

Anchors
anchors don't match any string. They ensure that the regex only matches a string at a certain place in the string, the beginning or the end of the sentence, the beginning of a line , or a word/non-word boundary.


1.  ^ means beginning of a line. $ is the end of a line.

2. \A means start of the string.  \z means end of the string.

3. \b means word boundary ( |word| ) . Pipe equals boundary. The idea of word boundary is to bound        words.
   
    \B means non-word boundary ( w|o|r|d ) . Pipe equals boundary. It is in between words.

Quantifiers
Quantifiers are useful for matching repeated patterns.

Zero or more
*    it matches 0 or more occurrences of the pattern just to its immediate left.
It really means zero or more. /x*/ will match empty strings because it tries to match zero x or more     x.

You can try (123)* to find 0 or more occurrences of sequence 123.

One or more
+    it matches 1 or more occurrences of the pattern just to its immediate left.


Zero or one
?    it matches 0 or 1 occurrences of the pattern just to its immediate left.

Any number of occurrences
{m} exactly m number of occurrences for the patterns to the left.

{ m, n } m up to n occurrences for the patterns to the left.

{ m, } m or more occurrences for the patterns to the left.



















Comments