Skip to content

Regular expression problems from HackerRank (with Solutions)

Notifications You must be signed in to change notification settings

sinjoysaha/regex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 

Repository files navigation

REGular EXpressions from HackerRank (with Solutions in Python 3)

GitHub contributors GitHub forks GitHub stars GitHub watchers GitHub issues Profile views GitHub followers LinkedIn Twitter

Click on the subheadings to view the question on HackerRank website.

Highlighted part in the Test String is matched with regex pattern.

Regex Pattern - wikipedia

Test String - https://en. wikipedia .org/

Regex_Pattern = r'hackerrank'	# Do not delete 'r'.

import re
Test_String = input()
match = re.findall(Regex_Pattern, Test_String)
print("Number of matches :", len(match))

dot - The dot (.) matches anything (except for a newline).

Regex Pattern - A.B.C.D.

Test String - A+B+C=DE

regex_pattern = r"^...\....\....\....$"	# Do not delete 'r'.

import re
import sys
test_string = input()
match = re.match(regex_pattern, test_string) is not None
print(str(match).lower())

\d - The expression \d matches any digit [0-9].

\D - The expression \D matches any character that is not a digit.

Regex Pattern - \D\D\D\D

Test String - Hack101

Regex_Pattern = r"\d\d\D\d\d\D\d\d\d\d"	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

\s - \s matches any whitespace character [ \r\n\t\f ].

\S - \S matches any non-white space character.

Regex Pattern - \s

Test String - A B

Regex_Pattern = r"\S\S\s\S\S\s\S\S"	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

\w - The expression \w will match any word character. Word characters include alphanumeric characters (a-z, A-Z and 0-9) and underscores ( _ ).

\W - \W matches any non-word character. Non-word characters include characters other than alphanumeric characters (a-z, A-Z and 0-9) and underscore ( _ ).

Regex Pattern - \w\w\w

Test String - $one

Regex_Pattern = r"\w\w\w\W\w\w\w\w\w\w\w\w\w\w\W\w\w\w"	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

^ - The ^ symbol matches the position at the start of a string.

$ - The $ symbol matches the position at the end of a string.

Non-word characters include characters other than alphanumeric characters (a-z, A-Z and 0-9) and underscore ( _ ).

Regex Pattern - ^123

Test String - 123456

Regex_Pattern = r"^\d\w\w\w\w.$"	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

[] - The character class [] matches only one out of several characters placed inside the square brackets.

Regex Pattern - [aeiou] is a vowel

Test String - o is a vowel | e is a vowel

Regex_Pattern = r'^[123][120][xs0][30Aa][xsu][.,]$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

[^] - The negated character class [^] matches any character that is not in the square brackets.

Regex Pattern - [^aeiou] is not a vowel

Test String - k is a vowel | p is a vowel

Regex_Pattern = r'^[\D][^aeiou][^bcDF][\S][^AEIOU][^.,]$'  # OR r'^[^\d][^aeiou][^bcDF][^\s][^AEIOU][^.,]$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

A hyphen (-) inside a character class specifies a range of characters where the left and right operands are the respective lower and upper bounds of the range. For example:

  • [a-z] is the same as [abcde...wxyz].
  • [A-Z] is the same as [ABCDE...WXYZ].
  • [0-9] is the same as [0123456789]. In addition, if you use a caret (^) as the first character inside a character class, it will match anything that is not in that range. For example, matches any character that is not a digit in the inclusive range from to . It's important to note that, when used outside of (immediately preceding) a character or character class, the caret matches the first character in the string against that character or set of characters.

Regex Pattern - [x-z][4-8][A-K]

Test String - x5F

Regex_Pattern = r'^[a-z][1-9][^a-z][^A-Z][A-Z]'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

{x} - The tool {x} will match exactly repetitions of character/character class/groups.

Regex Pattern - \w{4}

Test String - H_ck

Regex_Pattern = r'^[a-zA-Z02468]{40}[13579\s]{5}$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

{x,y} - The {x,y} tool will match between and (both inclusive) repetitions of character/character class/group.

Regex Pattern - \w{1,4}\d{4,}

Test String - Hk132156545654654654 | Hack1021

Regex_Pattern = r'^\d{1,2}[a-zA-Z]{3,}\.{0,3}$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

* - The * tool will match zero or more repetitions of character/character class/group.

Regex Pattern - Ab*s

Test String - As | Abbbbbs

Regex_Pattern = r'^\d{2,}[a-z]*[A-Z]*$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

+ - The + tool will match one or more repetitions of character/character class/group.

Regex Pattern - Ab+s

Test String - As | Abbbbbs

Regex_Pattern = r'^\d+[A-Z]+[a-z]+$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

$ - The $ boundary matcher matches an occurrence of a character/character class/group at the end of a line.

Regex Pattern - \w*s$

Test String - Challenges | Hints

Regex_Pattern = r'^[a-zA-Z]*[sS]$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

\b - \b assert position at a word boundary.

Regex Pattern - \bcat\b

Test String - Acat | A cat

Regex_Pattern = r'\b[aeiouAEIOU][a-zA-Z]*\b'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

() - Parenthesis () around a regular expression can group that part of regex together. This allows us to apply different quantifiers to that group.

These parenthesis also create a numbered capturing. It stores the part of string matched by the part of regex inside parentheses.

These numbered capturing can be used for backreferences.

Regex Pattern - It is (not)? your fault

Test String - It is not your fault | It is your fault

Regex_Pattern = r'(ok){3,}'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

| - Alternations, denoted by the | character, match a single item out of several possible items separated by the vertical bar. When used inside a character class, it will match characters; when used inside a group, it will match entire expressions (i.e., everything to the left or everything to the right of the vertical bar). We must use parentheses to limit the use of alternations.

Regex Pattern - (and|AND|And)

Test String - And the award goes to A and D company

Regex_Pattern = r'^(Mr\.|Mrs\.|Ms\.|Dr\.|Er\.)[a-zA-Z]+$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

\group_number - This tool (\1 references the first capturing group) matches the same text as previously matched by the capturing group.

Regex Pattern - (\w)(\w)(\w)(\w)y\4\3\2\1

Test String - malayalam

Regex_Pattern = r'^([a-z])(\w)(\s)(\W)(\d)(\D)([A-Z])([a-zA-Z])([aeiouAEIOU])(\S)\1\2\3\4\5\6\7\8\9\10$'	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

Backreference to a capturing group that match nothing is different from backreference to a capturing group that did not participate in the match at all.

Capturing group that match nothing

Regex Pattern - (\b?)o\1

Test String - o

Here, \b? is optional and matches nothing. Thus, (\b?) is successfully matched and capture nothing. o is matched with o and \1 successfully matches the nothing captured by the group.

Capturing group that didn't participate in the match at all

Regex Pattern - (\b)?o\1

Test String - o

In most regex flavors (excluding JavaScript), (b)?o\1 fails to match o. Here, (\b) fails to match at all. Since, the whole group is optional the regex engine does proceed to match o. The regex engine now arrives at \1 which references a group that did not participate in the match attempt at all. Thus, the backreference fails to match at all.

Regex_Pattern = r"^(\d\d)(-?)(\d\d)\2(\d\d)\2(\d\d)$"	# Do not delete 'r'.

import re
print(str(bool(re.search(Regex_Pattern, input()))).lower())

NOTE - Branch reset group is supported by Perl, PHP, Delphi and R.

(?|regex) - A branch reset group consists of alternations and capturing groups. (?|(regex1)|(regex2)) Alternatives in branch reset group share same capturing group.

Regex Pattern - (?|(Haa)|(Hee)|(bye)|(k))\1

Test String - HaaHaa | kk

Given below is a Perl code.

$Regex_Pattern = '^(\d\d)(?|(---)|(-)|(\.)|(:))(\d\d)\2(\d\d)\2(\d\d)$';

$Test_String = <STDIN> ;
if($Test_String =~ /$Regex_Pattern/){
    print "true";
} else {
    print "false";
}

NOTE - Forward reference is supported by JGsoft, .NET, Java, Perl, PCRE, PHP, Delphi and Ruby regex flavors.

Forward reference creates a back reference to a regex that would appear later. Forward references are only useful if they're inside a repeated group. Then there may arise a case in which the regex engine evaluates the backreference after the group has been matched already.

Regex Pattern - (\2amigo|(go!))+

Test String - go!go!amigo

Given below is a Perl code.

$Regex_Pattern = '^(\2tic|(tac))+$';

$Test_String = <STDIN> ;
if($Test_String =~ /$Regex_Pattern/){
    print "true";
} else {
    print "false";
}