SoFunction
Updated on 2025-05-21

How to extract numbers in strings in Python

Extract numbers from string

Hey, friends! Have you ever encountered such a situation: holding a bunch of text data in your hand, but you are struggling to find the digital information? Don't worry, today we will talk about how to easily extract numbers in strings in Python. Whether it is processing log files, analyzing user comments, or crawling web page data, mastering this little trick can make your work more effective with half the effort. Don’t let cumbersome data processing tasks stumbling on you, follow me to learn this simple and practical Python skill!

Method 1: Use Regular Expressions

Regular expression extracts integers in strings

Regular expressions are powerful tools for handling strings and can match specific patterns. In Python, the re module provides support for regular expressions.

import re  # Import Python's regular expression module  
text = "abc123def456ghi789"  # Define a string containing letters and numbers  
# Use the() method to find all sequences of numbers in a string that match the regular expression r'\d+'# r'\d+' is a regular expression where \d represents a numeric character and + represents the preceding character (here is a number) can appear once or more times# Therefore, r'\d+' can match one or more consecutive numeric charactersnumbers = (r'\d+', text)  # Return a list of all matching numbers sequences (as strings)  
# Convert each element in the string list numbers to an integer using list parsing# For each element num in the numbers list, int(num) converts it from string type to integer type# The result is a new list numbers_int which contains the same number as numbers but is now an integer typenumbers_int = [int(num) for num in numbers]    
  
# Print out a list containing a numeric string and a list containing an integerprint(numbers)  # Output: ['123', '456', '789'], this is a list of sequences of numbers as stringsprint(numbers_int)  # Output: [123, 456, 789], this is a list of sequences of numbers as integers

Import module:

First, the code imports Python's regular expression module re through the import re statement so that the functions and regular expression functions provided by the module can be used in the future.

Define string:

Next, the code defines a string variable called text that contains some letters and numeric characters. Our goal is to extract the sequence of these numeric characters.

Find a sequence of numbers:

The code then uses the() function and the regular expression r’\d+’ to find all sequences of numbers in the string text that match the regular expression. The () function returns a list containing all the matches found. In this example, the match found is all consecutive sequences of numeric characters in the string text.

Convert number type:

Next, the code uses list parsing to iterate through the number list (a list containing a numeric string) and converts each element (numeric string) to an integer type. The converted integers are collected into a new list of numbers_int.

Print result:

Finally, the code uses the print() function to print out two lists: numbers (a list containing a string of numbers) and numbers_int (a list containing an integer). This allows us to view the results of the extract and transform.

Regular expression extracts integers and decimals in strings

import re  
  
text = "abc123.456def-789ghi0.987jkl4567mno123.00"  
  
# Define regular expression patterns to match integers and decimals# \d+ Match one or more numbers (integer part)# (\.\d+)? Match optional decimal parts (decimal points followed by one or more numbers)pattern = r'-?\d+(\.\d+)?'  
  
# Use the findall method to find all matching numbersnumbers = (pattern, text)  
  
# Convert the found list of numeric strings (possibly negative) to a list of floating point numbers (if needed)numbers_float = [float(num) for num in numbers]  
  
print(numbers)  # Output: ['123.456', '-789', '0.987', '4567', '123.00']print(numbers_float)  # Output: [123.456, -789.0, 0.987, 4567.0, 123.0]
  • -?: Match optional negative sign.
  • \d+: Match one or more numbers (integer part).
  • (.\d+)?: Match the optional fractional part, where . is the decimal point and \d+ is one or more numbers. The entire decimal part is enclosed in parentheses and marked as optional (?).

Note that this regular expression also matches negative numbers, because we added -? at the beginning of the pattern. If you don't want to match negative numbers, you can remove this section.

Method 2: Use list parsing and string methods

If the structure of a string is relatively simple, you can use the list parsing and the isdigit method of the string.

text = "abc123def456ghi789"  
  
# Use the join method of list parsing and strings, then split the string and filter out the number partsnumbers = ''.join([char if () else ' ' for char in text]).split()  
  
# Convert the found list of numeric strings to a list of integers (if required)numbers_int = [int(num) for num in numbers]  
  
print(numbers)  # Output: ['123', '456', '789']print(numbers_int)  # Output: [123, 456, 789]

Method 3: Use generator and iteration

Using generators and iterative methods, you can handle strings more flexibly.

text = "abc123def456ghi789"  
  
def extract_numbers(s):  
    number = []  
    for char in s:  
        if ():  
            (char)  
        else:  
            if number:  
                yield int(''.join(number))  
                number = []  
    if number:  
        yield int(''.join(number))  
  
numbers_gen = extract_numbers(text)  
numbers_int = list(numbers_gen)  
  
print(numbers_int)  # Output: [123, 456, 789]

Method 4: Use filter and (extract only single numbers)

If you only need to extract a single numeric character, you can use the filter function.

text = "abc123def456ghi789"  
  
# Extract all numeric charactersdigits = filter(, text)  
  
# Convert numeric characters to list (still in character form)digits_list = list(digits)  
  
# If you need a list of integers, you can convert characters to integersnumbers_int = [int(digit) for digit in digits_list]  
  
print(digits_list)  # Output: ['1', '2', '3', '4', '5', '6', '7', '8', '9']print(numbers_int)  # Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Note that this method extracts each numeric character individually, rather than extracting a complete sequence of numeric numbers.

Method 5: isnumeric() function extracts numbers

In Python, the isnumeric() method is a method of a string object that determines whether all characters in a string are numeric characters, and those numeric characters represent at least one number (for example, it does not return True for Roman numerals or Chinese numerals).

text = "abc123def456ghi789"  
numbers = []  
current_number = ""  
  
for char in text:  
    if ():  
        current_number += char  
    else:  
        if current_number:  # If current_number is not empty, it means we have collected a string of numbers before.            (int(current_number))  # Convert the collected numbers into integers and add them to the list            current_number = ""  # Reset current_number to collect the next number  
# Check the last number (if the string ends with a number)if current_number:  
    (int(current_number))  
  
print(numbers)  # Output: [123, 456, 789]

In this example, we iterate over each character in the string text and use the isnumeric() method to check if it is a numeric character. If so, we add it to the current_number string. When we encounter a non-numeric character, we check if current_number is empty (if not, it means we have collected a string of numbers before), convert it to integers and add it to the numbers list, and then reset current_number. Finally, we also need to check if current_number still contains numbers after the traversal ends (this happens when the string ends with a number).

Application scenarios for extracting numbers from strings

1. Data cleaning and preprocessing

In data analytics and machine learning projects, data is often derived from various text formats such as log files, user comments, social media posts, etc. In these texts, numbers may represent key information such as timestamps, scores, quantities, etc. By extracting these numbers, more efficient data cleaning and preprocessing can be performed to provide accurate and structured data for subsequent analysis and modeling.

2. Log analysis

System logs often contain a large amount of digital and text information, such as error codes, user IDs, response time, etc. By extracting these numbers, problems can be quickly located, system performance can be analyzed, and useful reports can be generated. For example, the response time figures can be extracted to evaluate the response time distribution of the system, or the error code can be extracted to count different types of error frequencies.

3. Text analysis and mining

In text mining and natural language processing (NLP) tasks, extracting numbers from strings can help understand the semantic content of text. For example, extracting stock prices, economic data or competition results in a news article can provide readers with valuable information summary. In addition, in social media analysis, extracting numbers can reveal user behavior patterns, such as the frequency of posting content, the number of likes, the number of comments, etc.

4. Financial data processing

In the financial field, text-formatted financial data (such as financial reports, press releases, social media comments, etc.) often contains key financial information, such as stock prices, price-to-earnings ratios, earnings forecasts, etc. By extracting these numbers, financial analysis and forecasting can be conducted to provide decision support to investors.

5. User input analysis

In an interactive application, user input may contain a combination of numbers and text. For example, the user might enter "I want to book a room worth $150, with the check-in date October 1, 2023". By extracting these numbers, the application can parse user intent and perform operations accordingly, such as calculating fees, checking availability, generating subscription confirmations, etc.

6. Web crawler and data crawling

Extracting numbers from strings can help collect useful information in web crawlers and data crawling tasks. For example, extracting digital information such as price, rating, inventory volume, etc. from the product page can provide data support for shopping price comparison websites, product recommendation systems, etc.

This is the end of this article about how Python extracts numbers in strings. For more related Python extracts numerical content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!