STRSPLIT

Syntax | Return Value | Arguments | Keywords | Examples | Version History | See Also

The STRSPLIT function splits its input String argument into separate substrings, according to the specified delimiter or regular expression. By default, the position of the substrings is returned. The EXTRACT keyword can be used to cause STRSPLIT to return an array containing the substrings.

Syntax

Result = STRSPLIT( String [, Pattern] [, COUNT=variable] [, ESCAPE=string | , /REGEX [, /FOLD_CASE]] [, /EXTRACT | , LENGTH=variable] [, /PRESERVE_NULL] )

Return Value

Returns an array containing either the positions of the substrings or the substrings themselves (if the EXTRACT keyword is specified).

Arguments

String

A scalar string to be split into substrings.

Pattern

A scalar string that can contain one of two types of information:

In either case, if the EXTRACT keyword is specified, the separator characters are not included in the result.

Note
Pattern is an optional argument. If it is not specified, STRSPLIT defaults to splitting on spans of whitespace (space or tab characters) in String.

Keywords

COUNT

Set this keyword to a named variable that will contain the number of matched substrings returned by STRSPLIT. This value will be 0 if either of the String or Pattern arguments is null. Otherwise, it will contain the number of elements in the Result array.

ESCAPE

When doing simple pattern matching, the ESCAPE keyword can be used to specify any characters that should be considered to be "escape" characters. Preceding any character with an escape character prevents STRSPLIT from treating it as a separator character even if it is found in Pattern.

Note that if the EXTRACT keyword is set, STRSPLIT will automatically remove the escape characters from the resulting substrings. If EXTRACT is not specified, STRSPLIT cannot perform this editing, and the returned position and offsets will include the escape characters.

For example:

print, STRSPLIT('a\,b,c', ',', ESCAPE='\', /EXTRACT) 

IDL prints:

a,b c 

ESCAPE cannot be specified with the FOLD_CASE or REGEX keywords.

EXTRACT

By default, STRSPLIT returns an array of character offsets into String that indicate where the substrings are located. These offsets, along with the lengths available from the LENGTH keyword can be used later with STRMID to extract the substrings. Set EXTRACT to bypass this step, and cause STRSPLIT to return the substrings. EXTRACT cannot be specified with the LENGTH keyword.

FOLD_CASE

Indicates that the regular expression matching should be done in a case-insensitive fashion. FOLD_CASE can only be specified if the REGEX keyword is set, and cannot be used with the ESCAPE keyword.

LENGTH

Set this keyword to a named variable to receive the lengths of the substrings. Together with this result of this function, LENGTH can be used with the STRMID function to extract the matched substrings. The LENGTH keyword cannot be used with the EXTRACT keyword.

PRESERVE_NULL

Normally, STRSPLIT will not return null length substrings unless there are no non-null values to report, in which case STRSPLIT will return a single null string. Set PRESERVE_NULL to cause all null substrings to be returned.

REGEX

For complex splitting tasks, the REGEX keyword can be specified. In this case, Pattern is taken to be a regular expression to be matched against String to locate the separators. If REGEX is specified and Pattern is not, the default Pattern is the regular expression:

'[ ' + STRING(9B) + ']+' 

which means "any series of one or more space or tab characters" (9B is the byte value of the ASCII TAB character).

Note that the default Pattern contains a space after the [ character.

The REGEX keyword cannot be used with the ESCAPE keyword.

Note
If Pattern specifies a single multi-character separator pattern (as contrasted with a string of two or more individual separator characters), you must specify the REGEX keyword.

Examples

Example 1

To split a string on spans of whitespace and replace them with hyphens:

Str = 'STRSPLIT chops up strings.'
print, STRJOIN(STRSPLIT(Str, /EXTRACT), '-')

IDL prints:

STRSPLIT-chops-up-strings. 

Example 2

As an example of a more complex splitting task that can be handled with the simple character-matching mode of STRSPLIT, consider a sentence describing different colored ampersand characters. For unknown reasons, the author used commas to separate all the words, and used ampersands or backslashes to escape the commas that actually appear in the sentence (which therefore should not be treated as separators). The unprocessed string looks like:

Str = 'There,was,a,red,&&&,,a,yellow,&&\,,and,a,blue,\&&.'

We use STRSPLIT to break this line apart, and STRJOIN to reassemble it as a standard blank-separated sentence:

S = STRSPLIT(Str, ',', ESCAPE='&\', /EXTRACT)
PRINT, STRJOIN(S, ' ')

IDL prints:

There was a red &, a yellow &, and a blue &. 

Example 3

Strings separated by multi-character delimiters cannot be split using the simple character matching mode of STRSPLIT. Such delimiters require the use of a regular expression. For instance, consider splitting the following string on double ampersand boundaries.

str = 'red&&blue&&yellow&&odds&ends'

The desired result of such splitting would be four strings, with the values `red', `blue', `yellow', and `odds&ends'. You might be tempted to use STRSPLIT as follows:

PRINT, STRSPLIT(str,'&&',/EXTRACT)

which causes IDL to print:

red blue yellow odds ends 

IDL split the string on single ampersand boundaries, yielding 5 strings instead of the desired 4. When using the simple character matching mode of STRSPLIT, the characters in the Pattern argument specify a set of possible single character delimiters. The order of these characters is unimportant, and specifying a character more than once has no effect (the extras are ignored).

To properly split the above string using a regular expression:

print, strsplit(str,'&&',/EXTRACT, /REGEX)

producing the desired IDL output:

red blue yellow odds&ends 

Example 4

Finally, suppose you had a complicated string, in which every token was preceded by the count of characters in that token, with the count enclosed in angle brackets:

str = '<4>What<1>a<7>tangled<3>web<2>we<6>weave.'

This is too complex to handle with simple character matching, but can be easily handled using the regular expression '<[0-9]+>' to match the separators. This regular expression can be read as "an opening angle bracket, followed by one or more numeric characters between 0 and 9, followed by a closing angle bracket." The STRJOIN function is used to glue the resulting substrings back together:

S = STRSPLIT(str,'<[0-9]+>',/EXTRACT,/REGEX)
PRINT, STRJOIN(S, ' ')

IDL prints:

What a tangled web we weave. 

Version History

5.3

Introduced

6.0

Added COUNT keyword

See Also

STRCMP, STRJOIN, STRMATCH, STREGEX, STRMID, STRPOS