Regular Expression

These are the commonly used regular expression syntax.

Basic Regex.
1. Input of literal will match whatever is inputted. It is case sensitive.
For example /s/ will match all the 's' in cats , sands but will not match "S" in Superman.

2. $ ^ * + ? . ( ) [ ] { } | \
The above are special characters , and must be escaped with a backslash. You can safely escape all special characters without errors.

Spaces (" ") are not special characters , will be matched without backslash.

3. /./
A period means any character. If you want to find "." only . Use backslash /\./

4. Regex will match any pattern you typed on it. If you typed /cat / it will match a "cat" followed by a space " " exactly. This is known as concatenation. The sequence of pattern matters.

5. If you want to choose one or the other, /(cat|dog|rabbit)/ it will choose either cat , dog or rabbit. This is called Alternation.

6. /launch/i
If you want an case insensitive regex, use an 'i' after the last forward slash.

Character Class
It will match any single occurrence of any pattern inside [] . Almost similar to alternation.

1. Only these ^ \ - [ ] are special characters inside character class []. They need to escaped with "\" back slash.

2. Range of characters. [a-z] means regex will match any characters from a to z. Use [A-Z] for capital. Don't use [a-Z] or [A-z] as there will be bugs. Use [a-zA-Z0-9] to match all alphanumeric.

3. Range of characters. Don't build ranges [*-^] with non-alphanumeric.

4. Negation . Use ^ to represent "other than". Eg : [^a-z] means any characters other than a to z.

Character Class - short-cut
Short-cut for character class

1. A period (.) represents any character. (except newline) . If period inside [] like [.] , it represents the literal period.

2. \s = white space character ; \S = non-white space character.
white space is space (" "), tab ("\t"), vertical tab ("\v") , carriage return ("\r") , line feed ("\n") and form feed ("\f").

3. additional short cut

Shortcut	Meaning
\d	Any decimal digit (0-9)
\D	Any character but a decimal digit
\h	Any hexadecimal digit (0-9, A-F, a-f) (ruby only)
\H	Any character but a hexadecimal digit (ruby only)

4. \w matches word character . \W matches non-word character
word-character is all letters , all numbers and underscore (_).

Anchors
anchors don't match any string. They ensure that the regex only matches a string at a certain place in the string, the beginning or the end of the sentence, the beginning of a line , or a word/non-word boundary.

1. ^ means beginning of a line. $ is the end of a line.

2. \A means start of the string. \z means end of the string.

3. \b means word boundary ( |word| ) . Pipe equals boundary. The idea of word boundary is to bound words.

\B means non-word boundary ( w|o|r|d ) . Pipe equals boundary. It is in between words.

Quantifiers
Quantifiers are useful for matching repeated patterns.

Zero or more
* it matches 0 or more occurrences of the pattern just to its immediate left.
It really means zero or more. /x*/ will match empty strings because it tries to match zero x or more x.

You can try (123)* to find 0 or more occurrences of sequence 123.

One or more
+ it matches 1 or more occurrences of the pattern just to its immediate left.

Zero or one
? it matches 0 or 1 occurrences of the pattern just to its immediate left.

Any number of occurrences
{m} exactly m number of occurrences for the patterns to the left.

{ m, n } m up to n occurrences for the patterns to the left.

{ m, } m or more occurrences for the patterns to the left.

Launch School Notes !

Search This Blog

Regular Expression

Labels

Comments

Post a Comment

Popular posts from this blog

Problem Solving - Refactored

My Burnout Experience

Explain code