Skip to content

GoldinGuy/UltimateRegexResource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

54 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UltimateRegexResource

πŸ“ A compilation of Regex syntax and resources for the Google DSC Regex Event

Discuss On Discord Contributors Issues

Watch a recording of the Regex presentation here!

Regex, or regular expressions, are patterns used to match strings. Regex is commonly used for searching/filtering strings for information, input validation, and web scraping. "Real-world" examples include everything from validating email addresses to formatting class names in a grades app.

Regex is incredibly powerful, but due to its seemingly unintelligible nature, it's also often intimidating to learn and difficult to remember.

For that reason, I've compiled a selection of the most helpful and commonly used regex syntax and some regex resources for your use below!

πŸ“„ Table of Contents

This repo contains a powerpoint presentation that can be viewed online here.

The lab file in this repo contains real-world practice problems and links to gamified resources for the DSC event. You are welcome to try your hand at some of them.

The Redoku folder of this repo contains the app "Redoku," a simple React Native application created for this event that allows you to hone your Regex skills through sudoku-like puzzles. This was heavily based on redoku, an awesome website with the same name. Thank you to @padolsey for granting permission to use the name "Redoku!" Download it below!

iOS Android
Get it on the App Store Get it on Google Play

Notes

  • Regex has different flavors depending on the language you are using. Different engines support different features and some patterns have different meanings. While this resource attempts to cover as much as possible, there may be slight differences.
  • UltimateRegexResource uses Javascript as the default regex engine. If there are differences between languages I attempt to note them. For a full review of what regex patterns are legal in each language, check out this awesome gist.
  • Anywhere used below, character represents either a letter, digit, or symbol.

βœ’οΈ "Balderdash" Basics (of Regex)

  • Regular expressions start and end with "delimiters." For example, Javascript regex literals generally have "slash" characters /, and Python regex usually begins with r" and ends with ". (While Python doesn't necessarily have Regex literals perse, Regex is written more easily using raw strings to avoid worrying about string escapes).
  • Patterns return the first case-sensitive match they find by default.

Therefore: given the sample string I scream, you scream, we all SCREAM for ice cream, /scream/ matches the first instance of "scream."

This behavior can be modified with flags.

🚩 "Flapdoodle" Flags

Syntax Flag Behavior Example
g global Returns additional matches /foo/g
i insensitive Allows case-insensitive matches /foo/i
x verbose Ignore whitespace & allow comments /foo/x
u unicode Expressions are treated as Unicode (UTF-16) /foo/u
s singleline Treats entire string as one line (allows . to match newline) /foo/s
m multiline Start & end anchors now trigger on each line /foo/m
n nth match Matches text returned by nth group /foo/n

Regex includes several flags that are appended to the end of the expression to change behavior. Using the string I scream, you scream, we all SCREAM for ice cream, the updated regex /scream/gi will now return scream scream SCREAM.

✏️ "Gibberish" Characters

Syntax Character Matches Example String Example Expression Example Match
. any Literally any character (except line break) a-c1-3 a.c a-c
\w word ASCII character (Or Unicode character in Python & C#) a-c1-3 \w-\w a-c
\d digit Digit 0-9 (Or Unicode digit in Python & C#) a-c1-3 \d-\d 1-3
\s whitespace Space, tab, vertical tab, newline, carriage return (Or Unicode seperator in Python, C#, & JS) a b a\sb a b
\W NOT word Anything \w does not match a-c1-3 \W-\W 1-3
\D NOT digit Anything \d does not match a-c1-3 \D-\D a-c
\S NOT whitespace Anything \s does not match a-c1-3 \S-\S a-c

πŸ–‹οΈ "Bafflegab" Special Characters

Syntax Special Character Matches Example String Example Expression Example Match
\ escape The following when preceding them: [{()}].*+?$^/\ )$[]*{ \[\] []
Syntax Substitute Behavior
\n newline Insert a newline character
\t tab Insert a tab character
\r carriage return Insert a carriage return character
\f form-feed Insert a form feed character

πŸ–ŒοΈ "Rigmarole" Ranges

Syntax Range Matches Example String Example Expression Example Match
[pog] word list Either p, o, or g awesomePOSSUM123 [awesum]+ awes
[^pog] NOT word list Any character except p, o, or g awesomePOSSUM123 [^awesum]+ o
[a-z] word range Any character between a and z, inclusive awesomePOSSUM123 [a-z]+ awesome
[^a-z] NOT word range Any character not between a and z, inclusive awesomePOSSUM123 [^a-z]+ 123
[0-9] digit range Any character between 0 and 9, inclusive awesomePOSSUM123 [0-9]+ 123
[^0-9] NOT digit range Any character not between 0 and 9, inclusive awesomePOSSUM123 [^0-9]+ awesomePOSSUM
[a-zA-Z] word range Any character not between a and z, inclusive awesomePOSSUM123 [a-zA-Z]+ awesomePOSSUM
[a-zA-Z] word range Any character not between a and z, inclusive awesomePOSSUM123 [a-zA-Z]+ awesomePOSSUM

There are also a few (mostly) semantically identical patterns in Golang and PHP. These do not appear to be supported in JS or Python:

Syntax Range Matches Example String Example Expression Example Match
[[:alpha:]] alpha class Any character between a and z, inclusive, not case sensitive Woodchuck could chuck 33 wood logs. [[:alpha:]]+ Woodchuck
[[:digit:]] digit class Any digit 0-9 Woodchuck could chuck 33 wood logs. [[:digit:]]+ 33
[[:alnum:]] alphanumeric class Any character between a and z, inclusive, not case sensitive, and any digit 0-9 Woodchuck could chuck 33 wood logs. [[:alnum:]]+ Woodchuck
[[:punct:]] punctuation class Any of ?!.,:; Woodchuck could chuck 33 wood logs. [[:punct:]]+ .

In some flavors of regex, the above are also called "Character Classes."

πŸ–ŠοΈ "Jargon" Quantifiers

Syntax Quantifier Matches Example String Example Expression Example Match
? optional 0 or 1 of the preceding expression ccc c? c
{X} X X of the preceding expression ccc c{2} cc
{X,} X+ X or more of the preceding expression ccc c{2,} ccc
{X,Y} range Between X and Y of the preceding expression ccc c{1,3} ccc

Beyond standard quantifiers, there are a few additional modifiers: greedy, lazy, and possessive.

Syntax Quantifier Matches Example String Example Expression Example Match
* 0+ greedy 0 or more of the preceding expression, using as many chars as possible abccc c* ccc
+ 1+ greedy 1 or more of the preceding expression, using as many chars as possible abccc c+ ccc
*? 0+ lazy 0 or more of the preceding expression, using as few chars as possible abccc c*? c
+? 1+ lazy 1 or more of the preceding expression, using as few chars as possible abccc c+? c
*+ 0+ possessive 0 or more of the preceding expression, using as many chars as possible, without backtracking (Not supported in JS or PY) abccc c*+ ccc
++ 1+ possessive 1 or more of the preceding expression, using as many chars as possible, without backtracking (Not supported in JS or PY) abccc c++ ccc

Put simply, greedy quantifiers match as much as possible, lazy as little as possible and possessive as much as possible without backtracking.

What this means in practice is that possessive quantifiers will always return either the same match as greedy quantifiers or if backtracking is required they will return no match. Therefore, posessive quantifiers should be used when you know backtracking is not necessary, allowing increased performance.

πŸ–οΈ "Gobbledygook" Groups

Groups allow you to pull out specific parts of a match. For example, given the string Peter Piper picked a peck of pickled peppers and the regex literal [peck]+ of (\w+) , an additional "capturing group" group 1 is returned.

By default, the whole match begins at group 0, and then every group after is n where n is 1 + the previous capturing group.

Syntax Group Matches Example String Example Expression Example Match
| alternate Either the preceding or following expression truly rural truly|rural truly
(...) isolate Everything enclosed; treats as separate capture group truly rural truly (rural) truly, rural
(?:...) include Everything enclosed; enables using quantifiers on part of regex truly ruralrural truly (?:rural)+ truly ruralrural
(?|...) combine Everything enclosed; treats all matches as same group truly rural (?|(rural)|(truly)) truly
(?>...) atomic Longest possible string without backtracking truly rural (?>rur) rur
(?#...) comment Everything enclosed; treats as comment and ignores truly #rural truly (?#rural) truly

βš“ "Malarkey" Anchors

Syntax Anchor Matches Example String Example Expression Example Match
^ start Start of string she sells seashells ^\w+ she
$ end End of string she sells seashells \w+$ seashells
\b word boundary Between a character matched and not matched by \w she sells seashells s\b s
\B NOT word boundary Between two characters matched by \w she sells seashells \w+$ seashells

There are additional anchors available that are unaffected by multiline mode m.

Syntax Anchor Matches Example String Example Expression Example Match
\A multi-start Start of string she sees cheese \A\w+ she
\Z multi-end End of string she sees cheese \w+\Z cheese
\Z absolute end Absolute end of string, ignoring trailing newlines she sees cheese \w+\Z cheese

πŸ“Œ "Mumbo Jumbo" Regex Resources

πŸ‘₯ "Codswallop" Contributing

  1. Fork UltimateRegexResource here
  2. Create a branch with your improvements (git checkout -b improvement/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin improvement/fooBar)
  5. Create a new Pull Request

Meta

Created by @GoldinGuy for the FAU Google DSC Regex Event.

About

πŸ“ The ultimate collection of regex syntax and resources to power up your programming!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published