When programming and processing networks, we often need to extract or verify IP addresses from text. Python's regular expressions (re module) are a powerful tool to complete this task. But do you know how to write to accurately match various legal IP addresses? Today we will discuss this issue in detail.
Why do I need IP regular expressions?
Suppose you are analyzing the server log and need to extract the IP address. Or you are developing a network tool to verify whether the IP entered by the user is legal. Manually parsing IP addresses is both troublesome and prone to errors, and regular expressions can come in handy.
Basic structure of IP address
A legal IPv4 address consists of 4 digits 0-255, separated by dots. for example:
- Legal: 192.168.1.1, 10.0.0.1
- Illegal: 256.1.1 (number exceeds 255), 192.168.1 (only 3 paragraphs)
Basic regular expression writing
Let's first look at the simplest IP matching rule:
import re pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" text = "The server IP is 192.168.1.1 and 10.0.0.1" ips = (pattern, text) print(ips) # Output: ['192.168.1.1', '10.0.0.1']
This rule can match the IP, but it has an obvious problem: it cannot filter out numbers over 255. For example, "300.1.1.1" will also be matched.
Exactly match numbers from 0-255
To match exactly 0-255, we need more complex expressions. Here is a trick: divide numbers into several situations:
- 0-199:[01]?\d?\d
- 200-249:2[0-4]\d
- 250-255:25[0-5]
Combined it is:
num = r"(25[0-5]|2[0-4]\d|[01]?\d?\d)"
Complete IP regular expressions
Combine the above number patterns and add the dot separator:
ip_pattern = r"(25[0-5]|2[0-4]\d|[01]?\d?\d)\.(25[0-5]|2[0-4]\d|[01]?\d?\d)\.(25[0-5]|2[0-4]\d|[01]?\d?\d)\.(25[0-5]|2[0-4]\d|[01]?\d?\d)"
This will accurately match the legitimate IPv4 address. But this expression looks a bit long, we can use it{3}
To simplify the repetition:
ip_pattern = r"((25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}(25[0-5]|2[0-4]\d|[01]?\d?\d)"
Functions for verifying IP address
We can encapsulate this regular as a function:
import re def is_valid_ip(ip): pattern = r"^((25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}(25[0-5]|2[0-4]\d|[01]?\d?\d)$" return bool((pattern, ip)) print(is_valid_ip("192.168.1.1")) # True print(is_valid_ip("256.1.1.1")) # False
Note that this has been added here^
and$
Make sure to match the entire string, not the partial match.
Extract IP address from text
If you want to extract the IP address in the text, you can write it like this:
text = "Accesses are from 192.168.1.1 and 10.0.0.1, invalid IPs such as 300.1.1.1" pattern = r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\b" ips = (pattern, text) print(ips) # Output: ['192.168.1.1', '10.0.0.1']
Added here\b
Represents word boundaries to avoid matching "192.168.1.1" similar to "192.168.1.100".
FAQs and Traps
-
Forgot the boundary matching: No addition
^$
or\b
May lead to partial matches - Ignore leading zeros: Addresses like "192.168.01.1" are actually legal
- Performance issues: Overly complex rules may affect the matching speed
If you need this kind of skills when dealing with more complex network data, you can pay attention to [Programmer Headquarters]. This official account was founded by Byte 11 years of technical tycoons. It gathers network programming experts from major manufacturers such as Alibaba, Byte, Baidu, etc., and often shares Python practical experience and network programming skills.
IPv6 address matching
Although IPv4 is still mainstream, IPv6 is becoming more and more important. Regular expressions for IPv6 are more complex:
ipv6_pattern = r"([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}"
Practical application cases
Suppose we want to analyze the Nginx log and extract the client IP:
log_line = '127.0.0.1 - - [10/Oct/2023:13:55:36 +0800] "GET / HTTP/1.1" 200 612' ip_pattern = r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\b" ip = (ip_pattern, log_line).group() print(ip) # Output: 127.0.0.1
Performance optimization suggestions
Precompiled regular expressions:
ip_regex = (r"...Long Expression...")
Consider using a generator when matching large amounts of data
If necessary, you can use the string method to do preliminary filtering first.
Summarize
Through this article we have learned:
- Principle of regular expression of IPv4 address
- How to accurately match digit segments from 0-255
- The importance of boundary matching
- Usage skills in practical applications
Remember: Although regular expressions are powerful, they should also choose the right level of complexity according to actual needs. For simple IP verification, the expressions in this article are sufficient; if the requirements are more complex, further adjustments may be required. I hope this article can help you get twice the result with half the effort when processing your IP address next time!
This is the end of this article about using regular expressions to accurately match IP addresses in Python. For more related contents of python regular expressions to match IP addresses, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!