Indie Dev

Hello Guest!. Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, sell your games, upload content, as well as connect with other members through your own private inbox!

Regular Expressions (in JavaScript)

JibstaMan

Towns Guard
Xy$
0.00
Basics
Regular Expressions
, aka regex and RegExp, are quite tough. I'm not an expert on the topic, but I have used regular expression for over a year now and I guess I can write something useful about them.

In this tutorial, I've started with the "basics". If things go well, I might expand the tutorials with two additions:

Programming Regular Expression: once you can write regular expressions, what are the different ways they can be used within JavaScript?
Regular Expression Advanced: goes from theory to more practical examples and hopefully will make you able to figure things out by yourself.

If the tutorial goes to fast on certain topics, try to follow along until the first example. There are many things going on in a single regular expression, so I start with quite a bit of theory before giving a real example. Let me know if more examples would be good.

Why regular expressions?
Regular expressions are able to match parts within a larger body of text (called a string). You can either extract those matches, remove those matches or simple test whether a match is present. For example, plugin writers use regular expressions to extract part of the object notes that are relevant to their plugin.

What are regular expressions?
Well, besides gibberish? They are a sequence of characters which specify what to match within the string. You could see it as a requirements specification. The spec is used to find matches. Anything that deviates from the spec, will not match and is ignored.

JavaScript:
/(s.*?characters).*?specify (.+)/i;
This is just a useless example to give you a general idea of what regular expressions can look like. This regex can do the following:

Input string:
Code:
They are a sequence of characters which specify what to match within the string.
Matches:
Code:
"sequence of characters", "what to match within the string."
Testing regular expressions
There's a great online resource for testing regular expressions: regex101.com. Besides testing a regex, you can also see what it does, what the different characters within the regex do individually and all the matches found (highlighted within the string and shown separately at the right side of the screen).

However, like you'll (likely) see (after I've added it to the tutorial), using regular expressions within code gives a few options. Knowing which options work best for your use-case is at times quite challenges. To test with the different options, you can use an online JavaScript playground, like jsfiddle, jsbin, or codepen. I generally use jsfiddle, but that's just cause I'm accustomed to it. Feel free to pick your own.

Construction
There are two ways of creating a regular expression:
  1. Literal notation: /body/modifiers
  2. Constructor function: new RegExp("body", "modifiers")
Generally, you're regular expressions will be static, so the literal notation is preferred. Sometimes however, you want to be able to dynamically add things to the regular expression. In this cases, building a string and passing it to the constructor function is the way to do it. Note that when writing a regex within a string, you need to escape the \, e.g. \\n. Otherwise, the backslash will change the meaning of the character n, instead of placing a backslash within the regex.

Capture groups
First a bit of theory. Everything within the regular expression is about matching a part of a string or the entire string, right? Sometimes, you'll want to find and extract parts of that string. To do this, you need to capture those parts.

Capture groups are created by surrounding parts within the regex with (). Let's take my useless example again:

JavaScript:
/(s.*?characters).*?specify (.+)/i;
There are two capture groups in here. For this regex to match, the words characters and specify need to be present within the string. However, specify isn't within a capture group, so even though it is matched, it's not captured.

Character groups
Character groups are ways to allow multiple characters at a specific position within the string. Character groups are created by surrounding characters with [].

Let's make my example even more atrocious, shall we?

JavaScript:
/(s[a-z ]*?characters).*?specify (.+)/i;
Now we have introduced a character group within the regex. Note that it replaced the . from before. This character group allows any lower-cased character and spaces.
Quantity and greediness
By now, you must be addicted to my awesome example! One of the challenging things with regex is that many characters have multiple meanings in different context. Let's zoom in on some specific features within the regex: *, + and ?.

In the regex, there are three uses with different contexts:
  1. [a-z ]*?characters
  2. .*?specify
  3. (.+)
The first context could be spelled out as "a character group repeated as little as possible". In technical terms, this is a quantifier from 0 to ∞, but it's also being lazy about it. The * means it matches 0 to ∞ of the previous character. Since the previous character is the closing tag (]) of a character group, the entire character group is used as the previous character. So we're telling the system to match any lower-cased character or spaces as many times as possible. However, a part of this meaning is changed by the ? immediately after the *. Combined, it will match any lower-cased character or spaces until the system finds characters as a whole. Remember: if characters isn't in the string, the regex wouldn't match anything within the string.

The same is true for the second context. It allows any character except newlines (\n) and repeatedly allows those characters until the first occurrence of specify is found within the string. More about this later.

The last context is different, since it uses a + and doesn't have the ?. A + is very similar to *, but it doesn't allow 0 characters. So it matches any character (except newlines) 1 to ∞ times.

Time to start getting your hands dirty. Let's test the first context with a few different strings. To make this easy, we're using the g modifier, which returns all captured matches, not just the first.

I've made an example for you to toy with. It contains some instructions to see the above explanation in action.

Character escape
Before I tell you about what really fill a regular expression, there's one more thing I need you to know and that is escaping characters. \ allows you to escape other characters, but what does escaping mean? Escaping a character changes the meaning of that character. Special characters lose their meaning, while normal characters receive meaning.

Consider the capture groups and character groups. To make those, we use these characters: ( ) [ ]. Those characters have special meaning within a regular expression. But what if you want to match that character? What if you are looking for a parenthesis within your string? In this case, you need to remove the special meaning by prefixing it with a backslash:

JavaScript:
/\(explanation\)/
There's also the example of adding meaning, which we will explain in more detail in the next chapter. A quick example is \d. \d stands for "digit", it is synonymous for [0-9]. When you have a d within your regex, it simply matches the character d, if you prefix it with a \, it's meaning is changed to match a single digit.

Special characters (and pre-made character groups)
There is quite an extensive list of special characters for regular expressions. This isn't a complete list, but it's a good, basic starting point. You can find complete lists elsewhere.



Modifiers
Modifiers (can also be called flags) are added at the end of the regex. They change certain properties of how the regular expression function. These are quite important, especially when writing more advanced regular expression with many repeating parts.



Now that we've got the basics out of the way, time to dive into the more difficult concepts and options at our disposal.

Programming
RegExp.test
The easiest function when using regular expressions is the test function. It tests whether there is at least one match within the string. If there is, test returns true.
JavaScript:
if (/((hello) (world))/.test("hello world"))
{
    Alert("hello world");
}

RegExp.$n (or in regex style: /RegExp\.\$\d+/)
Note that the test function returns a boolean, so any capture groups seem to be ignored. Well, that's not the case. You can still access the captured data (individually). They are present in the global RegExp object, until you perform another regular expression.

Taking our previous hello world example:
JavaScript:
var val0 = RegExp.$1; // = hello world

var val1 = RegExp.$2; // = hello

var val2 = RegExp.$3; // = world
See this working.

Advanced
Non-capture groups
There's also a way to group characters together, but without actually capturing what is within the group. This is done similarly to capture groups: (?:). Note that, technically speaking, you don't need to use non-capture groups. You could use capture groups and ignore any captured data you're not interested in. However, not capturing unnecessary data should makes things easier for you to understand and preserve memory.

There are currently two use cases I can think of when using non-capture groups make sense:
  1. Make part of the regular expression optional, without capturing it.
  2. Making an options statement, without capturing it.
Optional group which isn't captured
This allows to create optional parts within the regular expression, without cluttering the captured information.

JavaScript:
/((?:https?:\/\/)?(?:www\.)?rpgmakermv\.co)/gi
See this in action. Note the match information to the right. Only the entire matches are visible, try removing the ?: within the capture groups and you'll see lots of other things being captured, which we might not need.

Non-capture group with multiple options
Instead of making the entire group optional, we allow multiple options within the regular expression.

JavaScript:
/((?:https?:\/\/)?(?:www\.)?(?:rpgmakermv|rmmv)\.co)/gi
See this in action. Note that both rpgmakermv and rmmv are matched, since they are options within the non-capture group. Options are separated by a pipe (|).

Examples
I've been working on a program that will make things easier when it comes the the MV Generator. The most important feature is importing files. This is done by taking a folder as input, looping over the contents and copy & rename files.

There are two things to keep in mind here

1. While renaming, I need to make sure the file names don't already exist. I keep record of the highest number within the file names, so I can use that when renaming files.
2. The highest number depends on gender, since the generator has less male hair compared to female hair. It also depends on the category (FrontHair, RearHair, etc).

Considering these two things, I have two problems:

1. I need to know which gender and category a specific file belongs to. This is needed to get the next unused number for that gender and category.
2. I need to rename the file, so I can perform the copy (technically, we copy the contents of the file to a new location, which has the renamed file name).

To accomplish this, I have two regular expressions:

1. /(Male|Female)(?:\\|\/).*?_(.*?)_/
2. /_p(\d+)/i

The first regular expression contains a few advanced themes. Remember, we care about the gender and category of the file. So we want the file path to contain either Male or Female, anything that doesn't have that, should be ignored. The non-capture group after that allows both Windows and Unix style file path separators. We don't need them to be captured, but we need to group the allowed options. Then we tell the system to allow anything. We don't care what it is. We don't care whether it's Face, SV, TV, TVD or Variation. What we do care about is that the system stops at the first underscore (_) occurrence, so we use the ? for non-greediness. Then we expect the category to be within the string. The category is between underscores, so we again use non-greediness to capture up the next first occurrence of an underscore. That's it, we got the two things we needed.

When using the first regular expression, I used RegExp.match, since I want the captured data.
JavaScript:
var fileRegex = /(Male|Female)(?:\\|\/).*?_(.*?)_/;
...
var match = fileRegex.exec(sourceFilePath);
if (!match)
{
    // ignore the file
    return;
}

var gender = match[1],
    category = match[2];
The second regular expression is much easier. We simply capture the number, which comes after _p. There's one important thing to consider here. We DO NOT perform this regular expression on the full path. The full path might have _p\d+ in it, which we don't want. We perform the regex on the file name, excluding all folders within the path. This is to remove the change of any false positives. We also ignore case, in the off chance that an author has the file name upper-cased, namely the p.

Since we're renaming a file, we want to modify the existing string. This can easily be done with the String.replace function.
JavaScript:
var renameRegex = /.*?_p(\d+)/i;

getRenamedFileName: function(source, number)
{
    number = number || 1;
    number = (number < 10) ? "0" + number : number;
    return source.replace(renameRegex, number);
}
PS: this example is as of yet untested. Things are getting a little complicated, so I'm calling it a day.

Further reading:

If you find this useful, I'd appreciate you mark it as helpful :)
 
Last edited:

LTN Games

Master Mind
Resource Team
Xy$
0.01
This was a great tutorial, it explains in great detail what does what, and with your examples I found it even easier to understand. When I first started learning Reg Ex I had issues because the only thing I was learning from was the Online tester, and they leave a small description for each symbol but it's not quite enough without a good explanation. Thank you for creating this it has helped me quite a bit, I feel a little more confident with it now.(cheeky)
 

JibstaMan

Towns Guard
Xy$
0.00
Added some minor information to the tutorial:

1. Programming -> RegExp.test
2. Examples

I'm having difficulty finding useful examples to explain the rest of the RegExp functions... I guess I'm just too darn perfectionistic and should accept any example over no content... (slanty)
 
Top