SoFunction
Updated on 2025-05-17

The use of substring, tokenizing and trimming methods in Java string decomposition technology

Key points

  • Java string processing is an indispensable part of development and is widely used in data parsing and formatting.
  • substring()The method can accurately extract substrings of a string, and pay attention to the index range to avoid exceptions.
  • ()It is the preferred method for word segmentation, supports regular expressions, and has high flexibility.
  • trim()andstrip()Methods are used to remove blanks,strip()More suitable for handling Unicode whitespace characters.
  • These methods are simple and easy to use, but in performance-sensitive scenarios, you need to pay attention to memory and efficiency.

Why do you need to decompose a string?

In Java programming, string processing is the core task of daily development. Whether it is extracting information from user input, parsing file paths, or processing query parameters for network requests, decomposing strings is a common requirement.

Java provides a variety of ways to achieve this, including extracting substrings, tokenizing and trimming. These methods are simple and powerful, and can help developers process string data efficiently.

Extract substring: Use substring()

substring()Methods allow you to extract specific parts from strings. You can specify the start index, or both the start and end indexes. Importantly, the end index isNot included, this follows Java's semi-open interval rules.

Example: Extract protocol and domain names from URL:

String url = "";
String protocol = (0, (":")); // "https"
String domain = (8); // ""

Notice: If the index is out of range, it will be thrownStringIndexOutOfBoundsException. Therefore, in usesubstring()Before, it is recommended to check the validity of the index.

Participle: Use split() and StringTokenizer

A participle is to split a string into small pieces (called tokens).()It is the most commonly used method, it accepts regular expressions as delimiters and has extremely high flexibility.

Example: Divide sentences into words:

String sentence = "Java   is\tgreat";
String[] words = ("\\s+");
for (String word : words) {
    (word);
}
// Output:// Java
// is
// great

For more complex scenarios, such as extracting numbers, you can combine regular expressions:

import .*;

String input = "Courses 471 and 472, 570";
Matcher matcher = ("\\d+").matcher(input);
while (()) {
    (());
}
// Output:// 471
// 472
// 570

AlthoughStringTokenizerParticiple can also be implemented, but it has been officially marked as "outdated" and is recommended to use it first.split()

Remove blanks: trim() and strip() series methods

Removing whitespace characters at the beginning and end of a string is a common requirement. Java provides the following methods:

  • trim(): Remove ASCII whitespace characters (character encoding ≤ 32).
  • strip(): Remove Unicode whitespace characters (more comprehensive).
  • stripLeading()andstripTrailing(): Remove the first or tail blanks respectively.
  • stripIndent(): Normalize the indentation of multi-line strings.

Example: Clean up user input:

String input = "   Hello World   ";
("[" + () + "]"); // "[Hello World]"
("[" + () + "]"); // "[Hello World]"

Unicode blank processing

String unicodeSpace = "\u2000Hello World\u2000";
("[" + () + "]"); // "[\u2000Hello World\u2000]"
("[" + () + "]"); // "[Hello World]"

Java string decomposition technology: substring, tokenizing and trimming methods

String processing is one of the core skills of Java programming. From parsing user input to processing complex data formats, string decomposition technology is everywhere in development. This article will explore in-depth three key methods for decomposing strings in Java:substring()Used to extract substrings,split()andStringTokenizerfor word participle, andtrim()andstrip()Series methods are used to remove blanks. With detailed code examples and practical suggestions, this article aims to help developers master these technologies and improve code efficiency and readability.

Extract substrings: Extract strings accurately

substring()Methods are the basic tool for Java string processing, which is used to extract specific parts from the original string. Java provides two overloading forms:

  • substring(int beginIndex): Extract from the specified index to the end of the string.
  • substring(int beginIndex, int endIndex): Extract frombeginIndexarriveendIndex - 1substrings.

It should be noted that Java adoptsHalf open rangeThe rule, that is, the end index isNot includedof. This design is very common in Java, such as array slicing and collection operations.

Example: parse URL

Suppose we need to extract protocols and domain names from the URL:

String url = "";
String protocol = (0, (":")); // "https"
String domain = (8); // ""
("protocol: " + protocol);
("Domain name: " + domain);

Example: Extract file extension

Extracting file extensions is another common scenario:

String fileName = "";
int dotIndex = ('.');
if (dotIndex != -1) {
    String extension = (dotIndex + 1); // "txt"
    ("Extension: " + extension);
} else {
    ("No extension");
}

Things to note

  • Index range:make surebeginIndexandendIndexIn valid range, otherwise it will be thrownStringIndexOutOfBoundsException. Recommended to useindexOf()orlastIndexOf()Dynamically calculate index.
  • performance: In Java 7u6 and above,substring()A new character array is created instead of a character array that shares the original string. This avoids memory leaks, but may increase memory usage.

Participle: Split the string into token

Tokenizing is the process of dividing a string into multiple independent parts. Java provides two main methods:()andStringTokenizer. in,split()It is the first choice in modern development because it supports regular expressions and has higher flexibility.

use ()

(String regex)Method splits strings into arrays based on the specified regular expression. Here are some common uses:

Example: Split sentences into words

String sentence = "Java   is\tgreat";
String[] words = ("\\s+"); // Match one or more whitespace charactersfor (String word : words) {
    ("Word: " + word);
}
// Output:// Word: Java// Word: is// word: great

Regular expressions\s+Match any whitespace characters (spaces, tabs, line breaks, etc.) to ensure that even multiple consecutive whitespaces can be split correctly.

Example: parse URL query parameters

Suppose we need to parse the query parameters of the URL:

String url = "?page=1&sort=asc";
int queryStart = ('?');
if (queryStart != -1) {
    String query = (queryStart + 1); // "page=1&sort=asc"
    String[] params = ("&");
    for (String param : params) {
        String[] keyValue = ("=");
        if ( == 2) {
            (keyValue[0] + ": " + keyValue[1]);
        }
    }
}
// Output:// page: 1
// sort: asc

Use regular expressions for advanced word segmentation

For more complex scenarios, specific patterns can be extracted in combination with regular expressions. For example, extract all numbers in a string:

import .*;

String input = "Courses 471 and 472, 570";
Matcher matcher = ("\\d+").matcher(input);
while (()) {
    String number = ();
    ("Course Number: " + number);
}
// Output:// Course number: 471// Course number: 472// Course number: 570

Regular expressions\d+Match one or more numeric characters. This approach is very suitable for handling inputs with unfixed formats.

StringTokenizer: an outdated alternative

StringTokenizerIt is an early class used in Java for word segmentation. Although it is still available, it is not officially recommended. It uses spaces as delimiters by default, but other delimiters can be specified.

Example: Using StringTokenizer

import ;

String input = "Hello,World,of,Java";
StringTokenizer st = new StringTokenizer(input, ",");
while (()) {
    ("Token: " + ());
}
// Output:// Token: Hello
// Token: World
// Token: of
// Token: Java

Why avoid using StringTokenizer?

  • Limited functionality: Regular expressions are not supported, and complex separation modes cannot be handled.
  • Poor maintenance: As a legacy category, it may not be updated in the future.
  • The alternative is more powerful:()and regular expressions provide greater flexibility and readability.

Notes on word segmentation

  • Regular expression escape:existsplit()When using regular expressions, special characters (such as.or|) Need to escape. For example, use when splitting CSV datasplit("\\.")Insteadsplit(".")
  • Complex format: For complex data such as CSV, it is recommended to use a dedicated library (such asOpenCSV) to handle quotes and escape characters.
  • performancesplit()Very efficient for small strings, but when dealing with oversized strings or frequent calls, performance impacts need to be evaluated.

Remove blanks: trim() and strip() series methods

Removing whitespace characters at the beginning and end of a string is a common requirement in many scenarios, such as cleaning up user input or formatting output. Java provides the following methods:

method Function Blank definition
trim() Remove the beginning and end blank characters ASCII characters (≤ U+0020)
strip() Remove the beginning and end blank characters Unicode whitespace characters
stripLeading() Remove only the first whitespace characters Unicode whitespace characters
stripTrailing() Remove only the tail whitespace characters Unicode whitespace characters
stripIndent() Normalized multi-line string indentation Based on common indentation

Example: Clean up user input

String input = "   Hello World   ";
("trim: [" + () + "]"); // "[Hello World]"
("strip: [" + () + "]"); // "[Hello World]"

Unicode blank processing

trim()andstrip()The main difference is the definition of whitespace characters.trim()Only handle ASCII whitespace characters, andstrip()Supports Unicode whitespace characters (such as\u2000)。

Example: Handling Unicode blanks

String unicodeSpace = "\u2000Hello World\u2000";
("trim: [" + () + "]"); // "[\u2000Hello World\u2000]"
("strip: [" + () + "]"); // "[Hello World]"

Example: Handling multi-line strings

Text blocks introduced by Java 14 andstripIndent()In combination, the indentation of multi-line strings can be normalized:

String text = """
        Line one
        Line two
        Line three
        """;
(());
// Output:// Line one
// Line two
// Line three

Choose the right trimming method

  • usestrip()Insteadtrim(): In Java 11 and above,strip()is a better choice because it supports Unicode whitespace characters and has a wider range of applicability.
  • Targeted clearance of blanks: If you only need to remove the first or tail blanks, usestripLeading()orstripTrailing()
  • Multi-line stringstripIndent()Designed for text blocks, suitable for formatting multi-line content.

Best practices and practical advice

For best results in string decomposition, here are some practical suggestions:

  1. Verify index range: In usesubstring()Always check index validity when . For example, useindexOf()orlastIndexOf()Calculate indexes dynamically to avoid hard coding.
  2. Priority usesplit(): For word participle task,split()Combining regular expressions is usually the most flexible and readable option. Avoid usingStringTokenizer
  3. Handle Unicode blanks: In international applications, usestrip()Series methods to ensure that Unicode whitespace characters are processed correctly.
  4. Performance optimization: For a large number of string operations, consider usingStringBuilderSplice the results to reduce the creation of temporary string objects.
  5. Exception handling: In production environment, add appropriate exception handling code, such as catchingStringIndexOutOfBoundsExceptionOr verify the validity of regular expressions.
  6. Use a dedicated library: For complex data formats (such as CSV or JSON), use mature libraries first (such asOpenCSVorJackson), not manually parsing.

Summarize

Java's string decomposition technology—substring()split()andtrim()/strip()Methods - Provide developers with powerful tools to handle various string manipulation needs. By understanding how these methods work and applicable scenarios, you can write more efficient and robust code. Whether it is extracting substrings, participle words, or clearing blanks, these methods are simple and easy to use and powerful. It is recommended that developers try more in practice, combining regular expressions and dedicated libraries to further improve the flexibility and efficiency of string processing.

Key Quotes:

  • OpenCSV Official Website
  • Jackson JSON Processor GitHub

The above is personal experience. I hope you can give you a reference and I hope you can support me more.