Why doesn’t Split work right in APEX?

Why doesn’t Split work in APEX?

Why doesn't Split work right in Apex?Doing a split (breaking a string into an array of Strings on a delimiter) is a staple feature in most
programming languages.  It is extremely useful for parsing out a list of values, or isolating a specific part of a string that you need.

Recently I tried to remove the extension from a file name, but ran into a few issues. Below is my use case and solution for fellow developers that may run into this problem

Problem: I needed to remove the extension from a file name.

Solution: Split the file name on the “dot”, and take the first string of the array, array to get the filename, and discarding the extension. Below is example code on how to split:

String FullFileName = ‘MyFile.ext’;
String Filename = FullFileName.Split(‘.’);
Should yield an array with [MyFile] [ext]

Problem: After implementing that, my filename was not getting a value. Even through the code said to split the field after the period character. What’s not working?

Solution: As it turns out, Split does not look for a character string.  Instead, it looks for a Regular Expression (RegEx) match. In RegEx, a period is a special character; it means, “match any single character.”  In order to find a literal period, we need to use an escape character (slash ) before the period so RegEx knows to treat the period as a period and not “any single character.” Also since Apex treats slashes as escape character, we need to use a double slash, even though the pure RegEx pattern would use a single. So we need to use the slash to put an escape character before the period, and another slash to put an escape character before the first slash.  That solution for this is:

String Filename = FullFileName.Split(‘\.’);

Mystery solved! With the Split argument, I’m able to get both parts of the filename, before and after the “dot.”

It is important to understand why split does use RegEx expressions, (it’s not just to make it trickier.) Rather, it’s a huge functionality boost and allows users to do more complicated parsing vs. a simple break on a sub-string.

OpFocus put my theory to the test:

Problem:
On an inbound email, OpFocus wanted to parse out the client ID from the body of the message. Since emails from the outside world are inherently uncontrolled in terms of formatting, this could get tricky.

However, from the use case, we knew we would have the string “Client Id:” followed by a string (the client number). This could potentially have any text before and after it including:

  • The entire number on a line, with a line-break after it
  • The entire number on a line with other information
  • A space between the colon and the actual ID
  • A tab
  • A lowercase or uppercase D in “ID”

Solution:
For each use case, we could find the client ID by first splitting the email’s body on the literal string value, “Client ID.”  This returned an array with two strings: one with everything before the “Client ID” (that we didn’t need) and one with everything after it, starting with the actual client ID. Then, we split on the next whitespace character to get the desired ID.

The answers to the quiz:

The first split:

String body = ‘Hello Bob, Here’s your email.nClient Id: 1234A And some closing text.nSincerily, Us’;
List<String> lstBody = body.split(‘Client ID:’);

This will work, unless the capitalizations don’t match. Also, we want a whole word match, so look for whitespace before and after.

lstBody = body.split(‘(?i)\sClient Id:\s’);

Note: The (?i) modifier signifies we want to do an case-insensitive match. The s will look for whitespace (space, tab, return, etc). Always remember to use a double slash!

Now to get the client ID from the 2nd part:

List<String> lstBody2 = lstBody[1].split(‘\s’);

This will work, unless they use a period instead of a carriage-return after the line.

List<String> lstBody2 = lstBody[1].split(‘\.|\s’);

The \. will find a literal period and the | (pipe) means OR.  So this will find a period or whitespace.

This information should help you understand why the Split function doesn’t seem to work in some scenarios and how to leverage some of the RegEx versatility that the good folks at Salesforce so cleverly put in for you.

For more introductory information on RegEx, check out RexEgg. Then, head over to RegEx101.com to experiment with RegEx in real time.

Comment below or contact us with any questions.