Python common read and write file operation examples summarized [text, json, csv, pdf and so on

This article example describes Python common read and write file operations. Shared for your reference, as follows:

Read and write files

Reading and writing files is the most common IO operation. python has built-in functions to read and write files, and their usage is compatible with that of c.

Read and write files before, we must understand, read and write files on the disk are provided by the operating system, the modern operating system does not allow ordinary programs to operate directly on the disk, so read and write files is to request the operating system to open a file object (file description), and then, through the operating system to provide interfaces to read the data from this file object (read the file), or write the data into this file object (write file).

1. Read the document

To open a file object in read mode, use python's built-in open() function, passing in the filename and identifier.

f = open("","r",encoding="utf-8")

The identifier 'r' indicates read.

If the file does not exist, the open() function throws an IOError error and gives you a detailed error code and message telling you that the file does not exist.

f = open("","r",encoding="utf-8")

Traceback (most recent call last):
File "D:/Learn/python/day14/", line 1, in <module>
f = open("","r",encoding="utf-8")
FileNotFoundError: [Errno 2] No such file or directory: ''

If the file is opened successfully, next, call read () method can read the entire contents of the file at once, python to read the contents of the memory, with a str object.

print(())

Output:

hello world!

The last step call close () method to close the file, the file must be closed after use, because the file object will take up operating system resources, and the operating system at the same time to open the number of files is also a limit.

()

f = open(r"File address","Reading method",encoding="utf-8")

"r": in read-only mode
encoding: the encoding format for reading.
(): Read the entire contents of the file at once
(): close stream
r "file address": does not allow the escape character "\" to function.

As the file read and write may generate IOError, once the error, the latter () will not be called, so in order to ensure that the file regardless of whether the implementation of the error will be able to close the file correctly, we can use try ... finally to achieve.

try:
 f = open("", "r", encoding="utf-8")
 print(())
finally:
 if f:
  ()

But it's too cumbersome to write it every time, so we've introduced the with statement to automatically call the close() method for us.

with open("", "r", encoding="utf-8") as f:
 print(())

This is the same as the previous try...finally, but the code is more concise and you don't have to call the () method.

Note: Use theread()will read the entire contents of the file at once, if your file is particularly large, for example, there are 5G, then your memory is burst, so, in order to be on the safe side, we can repeatedly call theread(size)method, which reads up to size bytes of content at a time, and calls thereadline()You can read one line at a time by calling thereadlines()Reads everything at once and returns the list by row, so decide how to call it as needed.

If the file is very small, read () a read the most convenient, if you can not determine the size of the file, repeatedly call read (size) is safer, if it is a configuration file, call readlines () is the most convenient.

for line in ():
 #Delete the '\n' at the end #
 print(())

2. Binary files

The default is to read text files, and UTF-8 encoded text files, to read binary files, such as pictures, videos, etc., with 'rb' mode to open the file can be.

f = open("", "rb")
print(())
b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00...' # Bytes in hexadecimal representation

3. Character encoding

To read non-UTF-8 encoded text files, you need to pass the encoding parameter to the open() function, for example, to read a GBK encoded file.

f = open('/user/demo/','r',encoding = 'gbk')
()
'Testing'

When you come across a document that has some coding irregularities, you may encounterUnicodeDecodeErrorBecause there may be some illegally encoded characters in the text file, in this case.open()The function also receives an error parameter, which indicates how to deal with encoding errors if encountered, the simplest way is to simply ignore them.

f = open('/users/demo/','r',encoding = 'gbk',errors = 'ignore')

4. Writing documents

Writing a file is the same as reading a file, the only difference is that calling theopen()function, pass the identifier 'w' or 'wb' to write a file or a binary.

f = open("/users/demo/",'w')
('hello, world!')
()

You can repeatedly call thewrite()to write to the file, but be sure to call the()to close the file.

When we write a file, the operating system often does not immediately write the data to disk, but to the memory cache, free time and then slowly write, only call theclose()method, the operating system guarantees to write all the unwritten data to disk, forgetting to call close() has the consequence that only part of the data may have been written to disk and the rest lost, so it's better to use the with statement to be on the safe side.

with open('/users/demo/', 'w') as f:
 ('hello, world')

To write to a text file with a specific encoding, pass the encoding parameter to the open() function, which automatically converts the string to the specified encoding.

When writing a file in 'w' mode, if the file already exists, it will be overwritten directly (equivalent to deleting it and then writing a new file), but what if we want to append to the end of the file? You can pass 'a' to write in append mode.

with open('/users/demo/', 'a') as f:
 ('hello, world')

5、StringIO

In many cases, data can be read or written not necessarily from a file, but also in memory.

stirngIO as the name suggests reads and writes str in memory.

To write str to a StringIO, we need to create a StringIO first, and then, write to it like a file.

from io import StringIO
f = StringIO()
("hello")
(" ")
('world')
# Get the str after writing
print(())

Output:

hello world

To read a StringIO, initialize the StringIO with a str, and then, read it as if it were a file.

from io import StringIO
f = StringIO("Hello\nHi\nGoodBye!")
while True:
 s = ()
 if s == '':
  break
 # Remove line breaks
 print(())

Output:

Hello
Hi
GoodBye!

6、BytesIO

StringIO operation can only be str, if you want to operate binary data, you need to use BytesIO.

BytesIO implements reading and writing bytes in memory, we create a BytesIO and write some bytes:.

from io import BytesIO
f = BytesIO()
("Chinese".encode('utf-8'))
print(())

Output:

b'\xe4\xb8\xad\xe6\x96\x87'

Note: It is not str, but UTF-8 encoded bytes.

Similar to StringIO, you can initialize BytesIO with a bytes, and then, read it like a file:.

from io import BytesIO
f = BytesIO(b'\xe4\xb8\xad\xe6\x96\x87')
d = ()
print(d)
print(())

Output:

b'\xe4\xb8\xad\xe6\x96\x87'
Chinese writing

StringIO and BytesIO are methods for manipulating str and bytes in memory, allowing a consistent interface for reading and writing files.

7. Serialization

While the program is running, all variables are in memory, such as defining a dict

dict1 = {name:"lili",age:18}

Here I change the name to "leilei", but once the program ends, the memory occupied by the variable will be reclaimed by the operating system, and if I don't save the modified name to disk, the next time the name is initialized, it will be "lili" again! "

Here we call the process of getting a variable from memory into a storable or transferable object serialization, or picking in python. After serialization, we can write the serialized content to disk or transfer it over the network to another machine. Conversely, re-reading the contents of a variable from a serialized object into memory is called deserialization, or unpicking.

python provides the pickle module for serialization.

import pickle
d = dict({"name":"lili","age":18})
The #() method serializes any object into a bytes, which can then be written to a file.
print((d))
# Write the serialized object to a file
f = open("",'wb')
# Parameter one: the object to be written, parameter two: the object to be written to the file
(d,f)
()
# Read serialized objects from a file
f2 = open("","rb")
#() deserialize an object
d = (f2)
()
print(d)
print(type(d))

Output:

{'name': 'lili', 'age': 18}
<class 'dict'>

Note: pickle can only be used in python, and different versions of python are not compatible with each other, so you can only use pickle to save some unimportant data, so that even if you can't successfully deserialize it, it doesn't matter.

8、Json

If we need to pass objects between different programming languages, then we have to serialize the objects into a standardized format, such as xml, but a better way is json, because json behaves as a string, which can be read by all languages, and is also convenient for storing to disk or transferring over the network. json is not only a standard pattern, but also faster than xml, and can also be read in the web, very convenient.

JSON type	Python type
{}	dict
[]	list
“string”	str
1234.56	int or float
true/false	True/False
null	None

Turning a python dict object into a json

import json
dict1 = {"name":"lili","age":18}
ji = (dict1)
print(type(ji))
with open("","w") as f:
 (dict1,f)
with open("","r",encoding="utf-8") as f:
 du = (f)
 print(du)
 print(type(du))

Output:

<class 'str'>
{'name': 'lili', 'age': 18}
<class 'dict'>

Serialize a class object to json

import json
class Student(object):
 def __init__(self, name, age, score):
   = name
   = age
   = score
# Convert student object to dict
def student2dict(std):
 return {
  'name': ,
  'age': ,
  'score': 
 }
s = Student('Bob', 20, 88)
# Parameter one: the object to be passed in Parameter two: the function that converts the object to a dict
d = (s, default=student2dict)
# Convert dict to object
def dict2student(d):
 return Student(d['name'], d['age'], d['score'])
jsonStr = '{"age": 20, "score": 88, "name": "Bob"}'
# json deserialize to an object
# Parameter one: json string, parameter two: dict to object function
print((jsonStr, object_hook=dict2student))
# Write the file
with open("","w") as f:
 (d)
 # or just use the following statement
 #(s,f,default=student2dict)
# Read the file
with open("","r",encoding="utf-8") as f:
 du = (f,object_hook=dict2student)
print(du)

Output:

<__main__.Student object at 0x000002CA795AA0F0>
<__main__.Student object at 0x000002CA7975B0F0>

9, read and write csv files

①, read csv file

The csv file itself is a plain text file, and this file format is often used as a format for data interaction between different programs.

Demo.

Requirement: Read a file

Description: you can print directly, then define the list

import csv
def readCsv(path):
 #List
 infoList = []
 # Open files as read-only
 with open(path, 'r') as f:
  # Read the contents of the file
  allFileInfo = (f)
  # Append the fetched content line by line to the list
  for row in allFileInfo:
   (row)
 return infoList
path = r"C:\Users\xlg\Desktop\"
info = readCsv(path)

②, write csv file

Demo.

Requirement:Write to a file

import csv
# Open file as write
def writeCsv(path,data):
 with open(path,'w',newline='') as f:
  writer = (f)
  for rowData in data:
   print("rowData =", rowData)
   # Write by line
   (rowData)
path = r"C:\Users\xlg\Desktop\"
writeCsv(path,[[1,2,3],[4,5,6],[7,8,9]])

10, read pdf files

pip is a tool for installing and managing python packages

code demonstration before the first installation and pdf-related tools

a. In cmd, enter the following command: pip list [Role: list all the tools installed under pip].

b. Install pdfminer3k, continue to enter the following command.pip install pdfminer3k

c. Code Demonstration

import sys
import importlib
(sys)
from  import PDFParser, PDFDocument
from  import PDFResourceManager, PDFPageInterpreter #interpreter
from  import PDFPageAggregator # Converters
from  import LTTextBoxHorizontal, LAParams #Layout
from  import PDFTextExtractionNotAllowed # Whether to allow pdf and text conversion
def readPDF(path, toPath):
 # Open pdf files in binary form
 f = open(path, "rb")
 # Create a pdf document analyzer
 parser = PDFParser(f)
 # Create pdf documents
 pdfFile = PDFDocument()
 # Link Analyzer and Document Objects
 parser.set_document(pdfFile)
 pdfFile.set_parser(parser)
 # Provide initialization password
 ()
 # Detect whether a document provides txt conversion
 if not pdfFile.is_extractable:
  raise PDFTextExtractionNotAllowed
 else:
  # Parsing data
  # Data Manager
  manager = PDFResourceManager()
  # Create a PDF device object
  laparams = LAParams()
  device = PDFPageAggregator(manager, laparams=laparams)
  # Interpreter objects
  interpreter = PDFPageInterpreter(manager, device)
  # Begin cycle of processing, one page at a time
  for page in pdfFile.get_pages():
   interpreter.process_page(page)
   layout = device.get_result()
   for x in layout:
    if (isinstance(x, LTTextBoxHorizontal)):
     with open(toPath, "a") as f:
      str = x.get_text()
      #print(str)
      (str+"\n")
path = r"0319Start.pdf".
toPath = r""
readPDF(path, toPath)

Readers interested in more Python related content can check out this site's topic: theSummary of Python file and directory manipulation techniques》、《Summary of Python text file manipulation techniques》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniquesand thePython introductory and advanced classic tutorials》

I hope that what I have said in this article will help you in Python programming.