SoFunction
Updated on 2024-11-21

Python easy to implement code encoding format conversion

Recently, I just changed my job not long ago, I don't have much time to organize things in the work, most of the time is used to familiarize with the new company's business, familiar with their code framework, the main thing is that there are a lot of new things to learn, I'm mainly doing php backend development before, to this side of the front-end of the halfway through the learning, but also to learn C + + +, haha, in a nutshell, it's very full, and every day after work to go home are I can sleep very well (summarized in one sentence, that is, eat well, sleep well ~). Say change work time it, the beginning of this year officially graduated half a year, I feel that their own technical growth is very fast, the status of the programmer inside the original company is not as good as the operation, so I want to change a job, interviewed 3 (2 big, a small), are given the offer, of course, from the big companies inside the selection of a combination of aspects (wages, what to do, transportation, etc.) is not bad, anyway, I feel that it is a very smooth way to come in! (much easier than when I graduated), haha, the harder I work, the luckier I get, the luckier I get, the harder I work! Starting this week, keep organizing the blog before you create a lazy habit for yourself.

Just came to this company, familiar with the environment, the boss began to let me do a migration, modify the code of the work, I want to say is that this kind of work is really boring ~ ~, look at other people's code, change other people's code, here to change a variable, there to change the name of the file ------, are some unskilled, very cumbersome things, but through the migration of the code to familiarize themselves with the environment by the way is also good. Pulled so much, talk about today's topic - code encoding format change, for some reasons, the need to migrate the code from the A room to the B room, the two can not be accessed between each other, but the historical reasons led to the A room of the code is all the utf8 encoding, the B room is required to be GBK encoding, look at how to solve this problem.

Coding issues
First of all, let's talk about why there will be encoding problems, take the above example, B server room side of the database are all GBK encoded, so the data taken out of the database are GBK, the data taken out of the database are GBK encoded, in order to show the time not to mess up the code, in the case of not converting the data taken out of the database, you need to send the header set the encoding for the GBK, the output file (html, tpl, etc.) must be GBK, look at the following figure will be more clear:

DB (GBK) => php, etc. (the encoding format is not limited, but if there are Chinese characters in the code file, the file has to be encoded in gbk or converted to gbk when the Chinese characters are output) => header(GBK) => html, tpl (GBK)

Or there is another way to just convert utf8 to gbk in the code at the time of export, in general utf8 is still more popular and less problematic

DB(GBK) => php, etc.(utf8 and convert data taken from database to utf8) => header(utf8) => html, tpl(utf8)

As long as in accordance with the above two standardized encoding format, there will be no messy code situation, at least the first way I tested is no problem, so I guess the second is also ok, well, now to write a small script to convert the file encoding format:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#Filename:
import os
import sys

def ChangeEncode(file,fromEncode,toEncode):
  try:
    f=open(file)
    s=()
    ()
    u=(fromEncode)
    s=(toEncode)
    f=open(file,"w");
    (s)
    return 0;
  except:
    return -1;

def Do(dirname,fromEncode,toEncode):
  for root,dirs,files in (dirname):
    for _file in files:
      _file=(root,_file)
      if(ChangeEncode(_file,fromEncode,toEncode)!=0):
        print "[Conversion failed:]"+_file
      else:
        print "[Success:]"+_file

def CheckParam(dirname,fromEncode,toEncode):
  encode=["UTF-8","GBK","gbk","utf-8"]
  if(not fromEncode in encode or not toEncode in encode):
    return 2
  if(fromEncode==toEncode):
    return 3
  if(not (dirname)):
    return 1
  return 0

if __name__=="__main__":
  error={1:"The first argument is not a valid folder.",3:"Source and target codes are the same.",2:"The encoding you want to convert is no longer in range: UTF-8, GBK"}
  dirname=[1]
  fromEncode=[2]
  toEncode=[3]
  ret=CheckParam(dirname,fromEncode,toEncode)
  if(ret!=0):
    print error[ret]
  else:
    Do(dirname,fromEncode,toEncode)

The script is simple and easy to use

Copy Code The code is as follows.

./ target_dir fromEncode toEncode

It is important to note here the relationship between several common codes:

us-ascii encoding is a subset of utf-8 encoding, this is from *, the original text is as follows ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded.

I tried it and it does, it shows the encoding as us-ascii when no kanji is added, after adding kanji, it changes to utf-8.

There is also the ASNI encoding format, which represents the local encoding format, for example, in the Simplified Chinese operating system, the ASNI encoding represents the GBK encoding, this point also needs to be noted

One more thing is that a command to view the encoding format of a file under linux is:

Copy Code The code is as follows.

file -i *

You can see the encoding format of the file.

Of course, the above may have some files with special characters, processing will fail, but the general program files are no problem.

The above is the entire content of this article, I hope that you can help you learn python.

Please take a moment to share the article with your friends or leave a comment. Your support will be sincerely appreciated!