Express Yourself: Writing REGEX Functions for Administrators
Salesforce is only as good as the quality of data that is captured. Without good controls on data entry, users cannot trust in their database to provide actionable intelligence about business operations. This is also why it is of paramount importance to use the right validations at the right time so that our users can be guided to provide appropriate information.
While a good foundation can be built by properly using standard field types, in some cases admins must allow free text so they use custom formatting. To control this custom formatting we use validation rules to ensure that users are entering the right data in the right places. One of the most powerful functions that can be used to create these rules is REGEX(). This tool allows us to create a phrase that is used to parse data and search for various criteria within a string. Unlike regular functions, REGEX allows us to state criteria in linear terms rather than trying to build long strings of if, and, or or statements.
Building a Simple REGEX Function
REGEX functions are like a checklist that is applied to a string of characters. To create a full statement we need both a string to check and an expression to check the string with:
REGEX([string to check],”[reference expression]”)
To write the reference expression we have to step through what we would like to check the string for. For example, if we check for the group of characters “Salesforce” using a REGEX function we would first find a capital-‘S’ character. Then we would check to see if the next letter is an ‘a’, after that an ‘l’, and so on and so forth. The expression would move from the start to the end of the string checking off criteria. Ultimately it would result in a true or false statement depending on whether or not the checklist was completed. The function to accomplish this is as follows:
REGEX(“Salesforce”,” Salesforce”)
This function returns true because Salesforce = Salesforce. To create a validation rule we use the NOT() function to reverse the Boolean:
NOT(REGEX(“Salesforce”,” Salesforce”))
Using Special Characters in REGEX Functions
To check if “Salesforce” is contained within a longer phrase, we must tell the function to look for an optional group of miscellaneous characters before and after. To do this we use what is called a character class. Specifically we use the wild card character class ‘.’ followed by the quantifier ‘*’. The asterisk used in conjunction with the wild card denotes that any character should be matched zero or more times before moving on to the next criteria. Therefore the following function will not fire a validation error because “Salesforce” is captured within the string:
NOT(REGEX(“We think Salesforce is a great SaaS”,”.*Salesforce.*”))
Note that the following will also return false and subsequently not cause an error message:
NOT(REGEX(“Salesforce”,”.*Salesforce.*”)
As we have just seen, some characters have special powers when included in an expression. We can stop these operators from applying their custom criteria by using the breakout character ‘’. In Salesforce we actually use a double backslash because a single backslash is an escape character for the rest of the system. To check for the string “Salesforce.” we must breakout the period character using a double backslash:
NOT(REGEX(“Salesforce.”,”.* Salesforce ..*”)
The same could be done for the asterisk and any other character that has special meaning in the REGEX toolkit. A list of these characters can be found in the links below. It is helpful to know as many of these as possible as they will make for more elegant and clean expressions.
Groupings, Ranges, and Quantifiers in REGEX Functions
We can further expand the range of possibilities that are valid by using groupings, ranges and quantifiers. Groups allow us to capture a group and apply different criteria to it. For example if we want to make sure that “Salesforce” is in the string and the word “Rocks” is optional we could do the following:
NOT(REGEX(“Salesforce”,”Salesforces(Rocks)?”)
Note here that we checked for a space using the character class ‘s’, captured “Rocks” using parenthesis and made it optional using the quantifier ‘?’. We should also observe that the quantifier requires that the immediately previous input be repeated a measure of times. The question mark and the asterisk are examples that define ranges of repetitions: zero to one and zero to infinity respectively. To create custom ranges we can use braces and up to two criteria:
NOT(REGEX(“Salesforce Salesforce Salesforce”,”(Salesforces?){3,7}”)
This expression will not fire a validation error if the word “Salesforce” is repeated three to seven times with zero or one spaces between. We can also define ranges of characters that are available for use using brackets. For example, to validate that a user has entered a date that follows the format ‘mm/dd’ we can use the following:
NOT(REGEX(“12/25”,”((0[1-9]|1[012])/([123]0|[012][1-9]|31))”)
Here we have said that the first group of numbers can be from 01-09 or 10, 11, or 12. A forward slash and then a second group that can be 10, 20, 30, 01-09, 11-19, 21-29 or 31 must follow the first group. Using this expression the users cannot enter months that are out of the standard 12 months or 31 days. We could further refine this by adding in a negative look-back criteria ‘?<!’:
NOT(REGEX(“12/25”, “((0[1-9]|1[012])/([123]0|[012][1-9]|31))(?<!(02/29|02/30|02/31))”
This last expression limits the user from entering invalid dates in February by using a combination of standard characters and the special ‘|’ character that denotes an or statement.
Using REGEX functions to create validations for custom data types can become extremely complicated and difficult to read. The benefit, however, is that the logic is linear and can be updated much more easily than long nested logic statements.
The diehard programmers and computer scientists reading this have probably noticed that some of the criteria seems odd or inefficient. This may be true, but it has all been developed to comply with the engine that is given to Salesforce Administrators rather than Developers. A couple limitations that are important to keep in mind are the following
- Formula REGEX automatically searches for the beginning of the string at the beginning of the expression. To search in the middle of a string you must include optional wild cards on either side of the search expression.
- Formula REGEX doesn’t handle spaces and new lines very well in rich/long text areas. The result of this is that reading across line breaks is difficult if not impossible.
- There is no replace or modify functionality. The function solely returns a Boolean value.
- Formula REGEX cannot be used in formula fields or custom buttons and links.
Useful Resources
Regular Expression Language – Quick Reference (Microsoft)
RegexOne: Learn Regular Expressions with simple, Interactive Exercises