RegEx Series: Alternative Characters
Alternate characters in regular expressions can be very useful in addition to the many ways I have previously shown you in this series to match patterns. I’ll first start with the
| pipe symbol which works very much the same way as it does in our conditional statements, functioning as ‘or’. Second, I’ll show you how to also use the
() parenthesis in the regular expression to create capture groups as well to help specify which character we want to only be looking at first.
Again, if you want to follow along, I will be using the free site regex101.com.
The pipe symbol in a regular expression allows you to use a conditional to search for the pattern you want by way of the ‘or’ statement. To do this our regular expression would look like this:
/<insertPatternHere>|<insertPatternHere>/(optional flags here)/
I have the two patterns being separated by the pipe in the middle. You can see that unlike the conditional ‘or’ that we might use in an if-else statement, it is not a double pipe
|| so that makes it less confusing. For example, if we have are looking for the string ‘cat’ or ‘dog’ within a string of patterns, then we can set our regular expression pattern to look for either instance.
Let’s see what this looks like.
I have put the pipe symbol between my two strings ‘cat’ and ‘dog’ and as you can see I have already matched the ‘cat’.
And if I put the ‘dog’, it will also match:
Now what if I wanted to match something that ends with ‘at’ but starts with either ‘d’ or ‘c’. Previously I had mentioned that you could do this by using character sets such as
However, using the pipe symbol we can also match it, but we will use the parentheses characters to form a capture group.
Parentheses Symbol => Capture Groups
What we will do in the case of wanting to match something that ends with ‘at’ but starts with either ‘d’ or ‘c’, will be to create what is called a capture group. The parenthesis tell the regular expression to evaluate what is in the parentheses first, almost as its own regular expression (it reminds me of operator precedence) and then continue looking at the rest of the regular expression. Let me show you.
It checks the capture group first, as you can see in the image above, the ‘d’ and the ‘c’ are highlighted in green and then the rest are in blue. It’s telling us the full match is made in blue, but also that the capture group has been evaluated as well and has also been matched. The image below may make this easier to understand:
Another great thing about capture groups is that it allows you to extract that group separate from the whole match itself.
You can also apply quantifiers to capture groups and it will apply only to the capture group instead of your whole expression which is very useful.
Now the example above may seem very arbitrary. Let’s make something more realistic. For example the string “The quick brown fox jumps over the lazy dog.” What if we did not know it was a brown fox, but we want it to match anything that is equal to a ‘fox’, a ‘cat’ or a ‘bear’. Well we can do this with the capture group and the pipe symbol.
This is what our regular expression would look like for all three examples and it is able to match them.
I hope this was helpful when learning how to match patterns using the
| ‘or’ symbol and capture groups.