SoFunction
Updated on 2024-12-15

How to quickly count lines of text using Python

Normally we'd use wc -l to count file lines, but it's easy to do with Python.

To quickly count the number of lines in a text file, you actually have to count the number of line breaks in that text file. To maximize speed, we need to read as much text as possible and then process it together. To count the number of line breaks you can use thebytesbuilt-incountMethods.

The code is as follows:

from __future__ import print_function
import time

if __name__ == '__main__':
    import sys
    start = ()
    with open([1],'rb') as f:
        count = 0
        last_data = '\n'
        while True:
            data = (0x400000)
            if not data:
                break
            count += (b'\n')
            last_data = data
        if last_data[-1:] != b'\n':
            count += 1 # Remove this if a wc-like count is needed
    end = ()
    print(count)
    print((end-start) * 1000)

In the above code, we count the incomplete part of the end of the file without line breaks as one line, which is slightly different from wc -l. If you want to be consistent with wc -l, you can delete the lines with comments.

It's not being handled here.universal newline, ignore blank lines and other logic, if you need these features, the program will become a little more complex.

Tested using three text files with 10 million lines, 160 million lines, and 640 million lines. Run it twice with wc -l first, then with Python's.

Run results:

[root@yz- test]# docker run -it --rm -v `pwd`:/opt/workspace python:3 bash -c "cd /opt/workspace && time wc -l  && time wc -l  && time python3  "
10000000 

real    0m0.086s
user    0m0.072s
sys     0m0.013s
10000000 

real    0m0.080s
user    0m0.060s
sys     0m0.019s
10000000
64.38159942626953

real    0m0.150s
user    0m0.100s
sys     0m0.033s
[root@yz- test]# docker run -it --rm -v `pwd`:/opt/workspace python:3 bash -c "cd /opt/workspace && time wc -l  && time wc -l  && time python3  "
160000000 

real    0m1.322s
user    0m0.991s
sys     0m0.318s
160000000 

real    0m1.313s
user    0m0.966s
sys     0m0.341s
160000000
838.7012481689453

real    0m0.908s
user    0m0.595s
sys     0m0.297s
[root@yz- test]# docker run -it --rm -v `pwd`:/opt/workspace python:3 bash -c "cd /opt/workspace && time wc -l  && time wc -l  && time python3  "
640000000 

real    0m5.805s
user    0m4.349s
sys     0m1.455s
640000000 

real    0m5.787s
user    0m4.342s
sys     0m1.445s
640000000
3323.5926628112793

real    0m3.399s
user    0m2.255s
sys     0m1.108s

can be seenPythonis actually faster than wc -l, mainly because of the purelyPythonThere are very few steps, and most of the time is spent in the process of reading(), counting(), and such C implementations. wc is slower because the guess is probably that the default buffer is smaller, so it takes moreread()

to this article on how to use Python to quickly count the number of lines of text to this article, more related Python quickly count the number of lines of text content please search my previous posts or continue to browse the following related articles I hope you will support me in the future!