A brief analysis of Android data compression

1. Preface

During development, we will inevitably encounter the data transmitted too much or the resources transmitted are too large, so the data compression technology emerges. There are many data compression algorithms now, each algorithm has its own characteristics and usage scenarios. This time I want to briefly talk about data compression.

Why do I think of this problem? Because I encountered some scenarios, I don’t know if you are too vague about the concept of data compression and dare not use it, or because I think it is too important to use it because I think it affects performance too much. I have a requirement to splice the parameters of the link, and then jump to the link. On the other hand, I get the splicing parameters from it. In fact, it is a get request. However, the current situation is that the link after splicing is smelly and long, which is url?a=xxx&b=xxx&c=xxx..., and then I splice the parameters crazily. Decompose the entire object and splice it back.Then why not convert the object into json and then compress it?

Do you think the string cannot be compressed? Or did you not realize that there was compression during design? I still think your decades of development intuition tells you that using compression will cause big problems.

2. About compression

First of all, what is data compression? To give a simple example, I turned the AAABBBCCC string into 3A3B3C, which is a compressed idea.

Write a demo to demonstrate Java using Deflater to compress strings

public class Test {
    public static String compress(String str) {
        Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION);
        (());
        ();
        final byte[] bytes = new byte[256];
        ByteArrayOutputStream bos = new ByteArrayOutputStream(256);
        while (!()) {
            int length = (bytes);
            (bytes, 0, length);
        }
        ();
        String result = ((), Base64.NO_PADDING);
        ("mmp", "Compressed results" + result);
        return result;
    }
}

Calling externally

String str = "ABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDE";
String result = (str);

You can see the results
After compression, eNpzdHJ2cXUkhQAATY4NFw

A significant effect can be seen before and after compression.

Someone may think of this, oh, it turns out that using Base64 compression is a misunderstanding. Friends who have some development experience or a certain foundation know it, but some newbies may not be familiar with it. I have also written it before. Base64 is not compression, but a kind of encoding. If you use Base64 purely, it will only become longer.

Then why do you still use Base64 here?Base64 is to convert a byte array into a string. The objects for data compression and decompression are byte arrays, so compression can compress the string or file, because it is for byte[]

Some people will say, if you understand, that's simple, then I also use Deflater to compress pictures and videos. This thing is really different. Compression is divided into lossy compression and lossless compression. The compression we used Deflater above is lossless compression and is reversible. The compression of pictures and videos often uses lossy compression more, especially videos, which have a high compression rate because lossy compression can make the data smaller, which is relatively irreversible. Therefore, the compression method to use for data and resources depends on the specific scenario. For example, if the string compression is used in a lossy way, wouldn’t the decompressed string be different from the original string content?

I believe that after seeing this, you have a rough understanding of the concept of data compression.

3. Deflater algorithm

There are many types of algorithms mentioned above, and you can even design a set of algorithms yourself and write a patent. The Deflater algorithm is a commonly used lossless data compression algorithm.

It is easy to find the Deflate compression algorithm = LZ77+Huffman encoding, which means that the internal implementation principle of this algorithm is to use LZ77 and Huffman encoding.

I won't talk about the implementation process and principles of these algorithms for the time being, because there is a lot of content. If you have time to write them separately in the future, and write them by hand and use code to implement these algorithms (usually written in C), I will just briefly introduce it here, just have a concept.

LZ77

LZ77 encoding is a dictionary-based lossless compression algorithm with "sliding window".

Simply put, during the sliding process, put the previous substring into the dictionary. When sliding to the same substring, you only need to replace the information of the position and length of the substring into it.

For example, ABCDEFABCDZZZ → ABCDEF(6,4)ZZZZ
It means the sixth one forward, with a length of 4.

Of course, this is just a simple example that reflects ideas. In reality, it is definitely not that simple, such as how to find substrings, how to slide, etc.

Huffman code

Huffman encoding, which also involves Huffman tree, greedy algorithm. This method constructs the codeword with the shortest average length of the different word header entirely based on the probability of character occurrence.

Because this requires building a Huffman tree based on the frequency of characters' appearance, it is difficult to demonstrate it simply and easily understandably. Here I will use a demo written by someone else to directly demonstrate the effect.

Original string: BCAADDDCCACACAC

After converting to binary:

10000100100001101000001010000010100010001000100010001000100001101000011010000010100001101000001010000110100000101000011

After encoding: 1000111110110110110110110110

It can be seen that the compression effect is very obvious.

summary

The Deflater algorithm is a commonly used data compression algorithm, which is internally encoded using LZ77 and Huffman. Compression algorithms generally have platform-independent. They are a kind of computing and a kind of idea. Java uses the Deflater class, PHP also has corresponding libraries, and Go also has corresponding libraries. Even after you know its principle, you can write down the implementation process yourself. Of course, this is very troublesome, after all, it is still difficult to involve algorithms. So generally during development you have to know that there is such a thing, what it does and how it is used. Of course, it is best to know its principles and how it is implemented. This is not useless. After you learn it, you will definitely gain something.

You can expand it further. For example, the quality compression of pictures is a lossy compression method. For example, H264 encoding, H265 encoding of video, etc., it is also a lossy process.You need to have a clear idea of whether this data needs to be reversible or whether it is for its size. If it is reversible, use lossless compression algorithm. In order to achieve the ultimate compression size and it doesn’t matter how irreversible, then use lossy compression algorithm. Whether the data transmission needs to be safe is not important, it is the fastest to transmit the data. If there are any requirements for its size, it is compressed, and if it is required for security, it is encrypted. Development is that simple!

GZIP

GZIP is also a compression technology, and I believe many people have heard of it. Our http request header can configure content-encoding as gzip, so the data returned by the server is the data compressed by gzip. So what's the use? If you have a large file and a large number of bytes, the transmission speed will be slow. After I have gizp compression, the compression rate will be high and the transmission number will be much smaller, so the transmission speed will be fast.

Some people will also say that it will take time to compress and decompress. Well said, I suggest you don’t believe in any principles of this, just practice it directly, and try to use GZIP compression or not to use Korea. Who has the faster speed? Of course, if the data is large, test it. You will find that even if I compress and decompress, it will be faster than you can directly transmit.

The implementation in GZIP also includes the Deflater algorithm. So we can see that there are many compression algorithms, and they are basically inseparable from each other. They basically rely on LZ77 and Huffman. Why? Because others are easy to use, if you can't write something better than it, then you don't need it.

The above is the detailed content of the Android data compression analysis. For more information about Android data compression, please follow my other related articles!