Is this post worth to read?
Yes
No

Regular expressions : Tricks you should know

his week I’ve been using regular expressions quite a lot and I wanted to share with you some very useful tricks I’ve learnt along the year.

I definitely believe that regular expressions should be taught at the very beginning of any CS course without focusing too much on mathematics automatons.

It would save a lot of space on StackOverflow and it would dramatically save you a lot of time.

Together, we will explore those few topics (in Javascript) :

  • Input validation
  • Smart Find & Replace in code editor
  • Real use-case

By the end of this article you will be able to use regular expressions as smoothly as conditional statements. It will save you time.

Note that regular expressions are almost language-agnostic so you can use those no matter which programming language you’re working with.

First things first !

Skip this part if you’re familiar with regular expressions already.

  • A regex is structured like that : /expression/flag
  • Most common flags :
    - i : case insensitive
    - g : global (doesn't stop after first match)
    - m : multi-line
  • Most common anchors :
    - ^ : Start of string
    - $ : End of string
    - \A : Start of string (not affected by multi-line content)
    - \Z : End of string (not affected by multi-line content)
  • Most common quantifiers :
    - {n} : Exactly n times
    - {n,} : At least n times
    - {m,n} : Between m and n times
    - ? : Zero or one time
    - + : One or more times
    - * : Zero, one or more times
  • Most common meta sequences :
    - . : Any character but \n and \r
    - \w | \W : Any word character | Any non-word character
    - \d | \D : Any digit character | Any non-digit character
    - \s | \S : Any whitespace character | Any non-whitespace character
  • Character set :
    — [abc] : Will match either a, b or c
    - [1-9] : Will match any digit from 1 to 9
    - [a-zA-Z] : Will match any letter
  • Match any character but :
    - [^abc] : Matches anything but a, b or c
  • Escape a character :
    - \character (example : escaping + => \+)
  • Refer to a group (also used for capturing groups, look further):
    — (group of characters) (example : /(he)+/ will match 'hehehe'
  • One group or another :
    - | : /^h((ello)|(ola))$/ will match both 'hello' and 'hola'

Input validation

In Javascript you can check if a string matches against a regular expression using test:

 

const str = '123456789' ;
  if ( /^\d{9}$/.test(str) )
  console.log('1 -> 9, you got it');
   
  // Prints : 1 -> 9, you got it

 

For instance, let’s check if a string is a phone number

We will work with french ones but you’re free to go with what suits best your needs.

A french phone number has two common formats :

  • +33 6 68 56 23 05
  • 06 68 56 23 05

We will build a regex to match both ! ??

 

const phone1 = '+33 6 68 56 23 05';
  const phone2 = '06 68 56 23 05';
   
  const regex = /^((\+33 \d)|(0\d))( \d{2}){4}$/;
   
  if( regex.test(phone1) && regex.test(phone2) )
  console.log('Call me maybe');
   
  // Prints : Call me maybe

 

 

We’ve got two possible beginnings :

  • +33 6 : matched with (\+33 \d)
  • 06 : matched with (0\d)
  • Combined via the OR operator : |

Followed by 4 pairs of digits which are separated by a blank space:
( \d{2}){4}


There are countless ways to implement input validation with regular expressions. In our scenario we could have chosen to use hyphens instead of blank spaces as digits separators.

One of the best parts in regular expressions is that you can easily modify rules over time whereas doing the same thing with a custom out-of-the box algorithm would be MUCH harder.


Smart Find & Replace in code editor

We will use capturing groups !

- /(group)/ : This captures a group and stores it in a base-1 indexed array
- \n : This allows to match what has been captured in the nth group

This feature is VERY important.

Example !

Let’s say you’re working on a chat app within which everyone has to type his ID. Your app is very strict on IDs, they must start and end with the same digit.

Each ID follow this scheme : X-PSEUDO-X where X is a digit.

This is how you solve the problem :

 

const uid = '6-DAVID-6';
  const regex = /^(\d)-.+-\1$/;
   
  if( regex.test(uid) )
  console.log('It is a valid uid');
   
  // Prints : It is a valid uid

 

What happened ?

We captured the first digit with (\d) , this digit has been put in an array of captured groups.

Then we used -.+- to match any pseudonym bounded by hyphens.

Finally, we used \1 to reuse the first captured digit (which equals 6).

 

Here is the interesting part !

Using $n in a function which accepts regular expressions as parameter allows us to work with captured groups. (n is the nth captured group)

Example !

 

// We will implement a Prettier option :
  // Remove parenthesis on ES6 functions using only a single parameter
  // (arg) => { } becomes arg => { }
   
  const es6Function = '(arg) => { }';
  const regex = /\((.+)\)/;
  const cleanedSingleParameter = es6Function.replace(regex,'$1');
   
  console.log(cleanedSingleParameter);
   
  // Prints : arg => { }

 

 

You could work with two groups and use $1 and $2 for instance.

The best is that you can do it in the Find & Replace tool of code editors.

For instance using VSCODE :

 

I want to truncate the decimal part of each variable.

Done in seconds !

Some tips :

  • Don’t forget to escape parenthesis and braces
  • Always try to capture the largest groups so that you don’t have to rewrite too much of a string.
  • Use \s+ and \n+ when dealing with multiple lines in a code editor as the \m flag is rarely supported.

Real use-case

I was working on a JS project which had not been updated to ES6 syntax and one of the things I did was to convert all the functions to their ES6 equivalent.

This is what I wanted :

// Old school
  function doSomething(a,b,c) { console.log(a,b,c); }
   
  // ES6
  const doSomething = (a,b,c) => { console.log(a,b,c); };

 

That’s how I did in VSCODE :

I built my regular expression by hand :

  • First, we need to remove the function keyword
  • Then, we capture the name of the function (.+)
  • Then, we capture the args enclosed by parenthesis (\(.+\))
  • Then, we capture the function body enclosed by braces (\{.+\})
  • Finally, we can replace everything using the 3 captured groups ?

Regular expressions can be used in a lot of functions already, (split for instance) and they are very handy to use.

If I had one advice : Become comfortable working with regular expressions as it is a skill which will save you a lot of time.

Very useful resources on the topic :

Leave a Reply

Your email address will not be published. Required fields are marked *