STRSPLIT
Syntax | Return Value | Arguments | Keywords | Examples | Version History | See Also
The STRSPLIT function splits its input String argument into separate substrings, according to the specified delimiter or regular expression. By default, the position of the substrings is returned. The EXTRACT keyword can be used to cause STRSPLIT to return an array containing the substrings.
Syntax
Result = STRSPLIT( String [, Pattern] [, COUNT=variable] [, ESCAPE=string | , /REGEX [, /FOLD_CASE]] [, /EXTRACT | , LENGTH=variable] [, /PRESERVE_NULL] )
Return Value
Returns an array containing either the positions of the substrings or the substrings themselves (if the EXTRACT keyword is specified).
Arguments
String
A scalar string to be split into substrings.
Pattern
A scalar string that can contain one of two types of information:
- One or more single characters, each of which is considered to be a separator. String will be split when any of the characters is detected. For example, if Pattern is
" ,"String will be split whenever either a space or a comma is detected. In this case, IDL performs a simple string search for the specified characters. This method is simple and fast. - If the REGEX keyword is specified, Pattern is considered to be a single regular expression (as implemented by the STREGEX function). This method is slower and more complex, but can handle extremely complicated Pattern strings.
In either case, if the EXTRACT keyword is specified, the separator characters are not included in the result.
Note
Pattern is an optional argument. If it is not specified, STRSPLIT defaults to splitting on spans of whitespace (space or tab characters) in String.
Keywords
COUNT
Set this keyword to a named variable that will contain the number of matched substrings returned by STRSPLIT. This value will be 0 if either of the String or Pattern arguments is null. Otherwise, it will contain the number of elements in the Result array.
ESCAPE
When doing simple pattern matching, the ESCAPE keyword can be used to specify any characters that should be considered to be "escape" characters. Preceding any character with an escape character prevents STRSPLIT from treating it as a separator character even if it is found in Pattern.
Note that if the EXTRACT keyword is set, STRSPLIT will automatically remove the escape characters from the resulting substrings. If EXTRACT is not specified, STRSPLIT cannot perform this editing, and the returned position and offsets will include the escape characters.
For example:
IDL prints:
ESCAPE cannot be specified with the FOLD_CASE or REGEX keywords.
EXTRACT
By default, STRSPLIT returns an array of character offsets into String that indicate where the substrings are located. These offsets, along with the lengths available from the LENGTH keyword can be used later with STRMID to extract the substrings. Set EXTRACT to bypass this step, and cause STRSPLIT to return the substrings. EXTRACT cannot be specified with the LENGTH keyword.
FOLD_CASE
Indicates that the regular expression matching should be done in a case-insensitive fashion. FOLD_CASE can only be specified if the REGEX keyword is set, and cannot be used with the ESCAPE keyword.
LENGTH
Set this keyword to a named variable to receive the lengths of the substrings. Together with this result of this function, LENGTH can be used with the STRMID function to extract the matched substrings. The LENGTH keyword cannot be used with the EXTRACT keyword.
PRESERVE_NULL
Normally, STRSPLIT will not return null length substrings unless there are no non-null values to report, in which case STRSPLIT will return a single null string. Set PRESERVE_NULL to cause all null substrings to be returned.
REGEX
For complex splitting tasks, the REGEX keyword can be specified. In this case, Pattern is taken to be a regular expression to be matched against String to locate the separators. If REGEX is specified and Pattern is not, the default Pattern is the regular expression:
which means "any series of one or more space or tab characters" (9B is the byte value of the ASCII TAB character).
Note that the default Pattern contains a space after the [ character.
The REGEX keyword cannot be used with the ESCAPE keyword.
Note
If Pattern specifies a single multi-character separator pattern (as contrasted with a string of two or more individual separator characters), you must specify the REGEX keyword.
Examples
Example 1
To split a string on spans of whitespace and replace them with hyphens:
Str = 'STRSPLIT chops up strings.'
print, STRJOIN(STRSPLIT(Str, /EXTRACT), '-')
IDL prints:
Example 2
As an example of a more complex splitting task that can be handled with the simple character-matching mode of STRSPLIT, consider a sentence describing different colored ampersand characters. For unknown reasons, the author used commas to separate all the words, and used ampersands or backslashes to escape the commas that actually appear in the sentence (which therefore should not be treated as separators). The unprocessed string looks like:
Str = 'There,was,a,red,&&&,,a,yellow,&&\,,and,a,blue,\&&.'
We use STRSPLIT to break this line apart, and STRJOIN to reassemble it as a standard blank-separated sentence:
S = STRSPLIT(Str, ',', ESCAPE='&\', /EXTRACT)
PRINT, STRJOIN(S, ' ')
IDL prints:
Example 3
Strings separated by multi-character delimiters cannot be split using the simple character matching mode of STRSPLIT. Such delimiters require the use of a regular expression. For instance, consider splitting the following string on double ampersand boundaries.
str = 'red&&blue&&yellow&&odds&ends'
The desired result of such splitting would be four strings, with the values `red', `blue', `yellow', and `odds&ends'. You might be tempted to use STRSPLIT as follows:
PRINT, STRSPLIT(str,'&&',/EXTRACT)
which causes IDL to print:
IDL split the string on single ampersand boundaries, yielding 5 strings instead of the desired 4. When using the simple character matching mode of STRSPLIT, the characters in the Pattern argument specify a set of possible single character delimiters. The order of these characters is unimportant, and specifying a character more than once has no effect (the extras are ignored).
To properly split the above string using a regular expression:
print, strsplit(str,'&&',/EXTRACT, /REGEX)
producing the desired IDL output:
Example 4
Finally, suppose you had a complicated string, in which every token was preceded by the count of characters in that token, with the count enclosed in angle brackets:
str = '<4>What<1>a<7>tangled<3>web<2>we<6>weave.'
This is too complex to handle with simple character matching, but can be easily handled using the regular expression '<[0-9]+>' to match the separators. This regular expression can be read as "an opening angle bracket, followed by one or more numeric characters between 0 and 9, followed by a closing angle bracket." The STRJOIN function is used to glue the resulting substrings back together:
S = STRSPLIT(str,'<[0-9]+>',/EXTRACT,/REGEX)
PRINT, STRJOIN(S, ' ')
IDL prints:
Version History
See Also
STRCMP, STRJOIN, STRMATCH, STREGEX, STRMID, STRPOS