Inspired by a question on StackOverflow.
Someone on StackOverflow was attempting to match dates with a regular expression in PHP. I decided to take on the task of constructing this regex for fun.
This post is a slightly longer version of my answer to the question.
Construct a regex to match dates of the form
Conditions to satisfy
YYYYis between 1900 and 9999 inclusive.
MMMis one of
00should not be a valid
- Only valid dates for each month should work.
FEB-29should be matched only for leap years. Keep in mind that 100 and 400 divisibility rules for leap year detection should also be satisfied.
238 characters. I could probably cut it down a little more if I removed the capture groups for the components and removed
?:’s from some groups.
This explanation is intended for people who already have familiarity with regex rules. If that’s not you, you might want to learn those first. You could start with rexegg.com.
^ // beginning of line ( // capture group for year (?! // match leap date for multiples of 400 but not for other multiples of 100 (?: [^048]|[^26] // clever hack based on a pattern in 4's multiples ) 00-FEB-29 ) (?:19|[2-90]\d) // match years from 1900 to 9999 (?! // do not match years not divisible by 4 (?: [^048]|[^26] // same hack as earlier ) -FEB-29 ) \d\d ) - // separator // do not match dates beyond the month's last (?!FEB-3) (?!(?:APR|JUN|SEP|NOV)-31) ( // capture group for month JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC ) - // separator (?!00) // do not match a zero date ( // capture group for date (?:[0-2][0-9]|3) // 00–29 or 30–31 ) $ // end of line
I didn’t explain the “clever hack” for leap year detection above. I’ll do that now.
All two digit multiples of 4 are either an even digit followed by one of
8 or an odd digit followed by one of
6. Since we’re using a negative lookahead to check for invalid years, we negate this condition to arrive at
[^048]|[^26]. This pattern is used twice. Once for the first two digits of the year (because
XY00 years are leap only when
XY is a multiple of 4) and once for the last two digits of the year (because a year is not leap if its last two digits are not divisible by four).
Was that fun? Absolutely! Is this approach practical? Hell no!
Play with this regex on my RegExr demo.