These are the commonly used regular expression syntax.
Basic Regex.
1. Input of literal will match whatever is inputted. It is case sensitive.
For example /s/ will match all the 's' in cats , sands but will not match "S" in Superman.
2. $ ^ * + ? . ( ) [ ] { } | \
The above are special characters , and must be escaped with a backslash. You can safely escape all special characters without errors.
Spaces (" ") are not special characters , will be matched without backslash.
3. /./
A period means any character. If you want to find "." only . Use backslash /\./
4. Regex will match any pattern you typed on it. If you typed /cat / it will match a "cat" followed by a space " " exactly. This is known as concatenation. The sequence of pattern matters.
5. If you want to choose one or the other, /(cat|dog|rabbit)/ it will choose either cat , dog or rabbit. This is called Alternation.
6. /launch/i
If you want an case insensitive regex, use an 'i' after the last forward slash.
Character Class
It will match any single occurrence of any pattern inside [] . Almost similar to alternation.
1. Only these
2. Range of characters. [a-z] means regex will match any characters from a to z. Use [A-Z] for capital. Don't use [a-Z] or [A-z] as there will be bugs. Use [a-zA-Z0-9] to match all alphanumeric.
3. Range of characters. Don't build ranges [*-^] with non-alphanumeric.
4. Negation . Use ^ to represent "other than". Eg : [^a-z] means any characters other than a to z.
Character Class - short-cut
Short-cut for character class
1. A period (.) represents any character. (except newline) . If period inside [] like [.] , it represents the literal period.
2. \s = white space character ; \S = non-white space character.
white space is space (" "), tab ("\t"), vertical tab ("\v") , carriage return ("\r") , line feed ("\n") and form feed ("\f").
3. additional short cut
4. \w matches word character . \W matches non-word character
word-character is all letters , all numbers and underscore (_).
Anchors
anchors don't match any string. They ensure that the regex only matches a string at a certain place in the string, the beginning or the end of the sentence, the beginning of a line , or a word/non-word boundary.
1. ^ means beginning of a line. $ is the end of a line.
2. \A means start of the string. \z means end of the string.
3. \b means word boundary ( |word| ) . Pipe equals boundary. The idea of word boundary is to bound words.
\B means non-word boundary ( w|o|r|d ) . Pipe equals boundary. It is in between words.
Quantifiers
Quantifiers are useful for matching repeated patterns.
Zero or more
* it matches 0 or more occurrences of the pattern just to its immediate left.
It really means zero or more. /x*/ will match empty strings because it tries to match zero x or more x.
You can try (123)* to find 0 or more occurrences of sequence 123.
One or more
+ it matches 1 or more occurrences of the pattern just to its immediate left.
Zero or one
? it matches 0 or 1 occurrences of the pattern just to its immediate left.
Any number of occurrences
{m} exactly m number of occurrences for the patterns to the left.
{ m, n } m up to n occurrences for the patterns to the left.
{ m, } m or more occurrences for the patterns to the left.
Basic Regex.
1. Input of literal will match whatever is inputted. It is case sensitive.
For example /s/ will match all the 's' in cats , sands but will not match "S" in Superman.
2. $ ^ * + ? . ( ) [ ] { } | \
The above are special characters , and must be escaped with a backslash. You can safely escape all special characters without errors.
Spaces (" ") are not special characters , will be matched without backslash.
3. /./
A period means any character. If you want to find "." only . Use backslash /\./
4. Regex will match any pattern you typed on it. If you typed /cat / it will match a "cat" followed by a space " " exactly. This is known as concatenation. The sequence of pattern matters.
5. If you want to choose one or the other, /(cat|dog|rabbit)/ it will choose either cat , dog or rabbit. This is called Alternation.
6. /launch/i
If you want an case insensitive regex, use an 'i' after the last forward slash.
Character Class
It will match any single occurrence of any pattern inside [] . Almost similar to alternation.
1. Only these
^
\
-
[
]
are special characters inside character class []. They need to escaped with "\" back slash.2. Range of characters. [a-z] means regex will match any characters from a to z. Use [A-Z] for capital. Don't use [a-Z] or [A-z] as there will be bugs. Use [a-zA-Z0-9] to match all alphanumeric.
3. Range of characters. Don't build ranges [*-^] with non-alphanumeric.
4. Negation . Use ^ to represent "other than". Eg : [^a-z] means any characters other than a to z.
Character Class - short-cut
Short-cut for character class
1. A period (.) represents any character. (except newline) . If period inside [] like [.] , it represents the literal period.
2. \s = white space character ; \S = non-white space character.
white space is space (" "), tab ("\t"), vertical tab ("\v") , carriage return ("\r") , line feed ("\n") and form feed ("\f").
3. additional short cut
Shortcut | Meaning |
---|---|
\d | Any decimal digit (0-9) |
\D | Any character but a decimal digit |
\h | Any hexadecimal digit (0-9, A-F, a-f) (ruby only) |
\H | Any character but a hexadecimal digit (ruby only) |
word-character is all letters , all numbers and underscore (_).
Anchors
anchors don't match any string. They ensure that the regex only matches a string at a certain place in the string, the beginning or the end of the sentence, the beginning of a line , or a word/non-word boundary.
1. ^ means beginning of a line. $ is the end of a line.
2. \A means start of the string. \z means end of the string.
3. \b means word boundary ( |word| ) . Pipe equals boundary. The idea of word boundary is to bound words.
\B means non-word boundary ( w|o|r|d ) . Pipe equals boundary. It is in between words.
Quantifiers
Quantifiers are useful for matching repeated patterns.
Zero or more
* it matches 0 or more occurrences of the pattern just to its immediate left.
It really means zero or more. /x*/ will match empty strings because it tries to match zero x or more x.
You can try (123)* to find 0 or more occurrences of sequence 123.
One or more
+ it matches 1 or more occurrences of the pattern just to its immediate left.
Zero or one
? it matches 0 or 1 occurrences of the pattern just to its immediate left.
Any number of occurrences
{m} exactly m number of occurrences for the patterns to the left.
{ m, n } m up to n occurrences for the patterns to the left.
{ m, } m or more occurrences for the patterns to the left.
Comments
Post a Comment