Remote control of hosts based on Python

Preface:

This paper is one of the selected topics of Cyberspace Security Design and Practice I of HITwh Cyberspace Security Program, which mainly realizes the functions of remote monitoring the desktop and network situation of hosts in the local area network, simple keyboard and mouse control, remote disconnection (ARP attack), and encrypted transmission of data, etc. The paper is generated by copying Word directly into Typora. Since this article is generated by copying Word directly into Typora, even though it has been modified briefly, it is still not guaranteed that all the formatting is correct, and many of the pictures are blurred, so there is no way to solve them for the time being. If there is a need, please buy me a cup of milk tea at the end of the article to send an e-mail to let me know, I will send you the original Word document in the form of an e-mail; or you can also add my WeChat directly.

1. Outline design

This design is to remotely control the host computer, I choose the remote host is windows 10 virtual machine (running in VMWare Workstation Pro 16). The system environment is Windows 10, the language used is python, version 3.9, and the programming software is VS Code.

Key features include:Graphical interface, video monitoring, remote control of mouse and keyboard, logging of monitoring duration, monitoring of hardware resource usage, monitoring of network activity, interruption of network access, etc. For better performance, it is also necessary to use a multi-threaded model.

The functional structure is shown in Figure 1:

The graphical interface is provided bypyqt5 After running, the program will start the main interface (GUI 1), and then block and wait for the user to operate. In this interface, the user can input the IP of the target host, and then start the video monitoring thread, hardware resource usage monitoring thread and monitoring time recording thread, the first two of which need to use socket communication. The video monitoring thread needs to continuously receive the packets from the target host, convert the contents of the packets into corresponding images, and then display them on the main interface; the hardware resource usage monitoring thread needs to receive the hardware resource information such as the CPU utilization, memory utilization and total memory from the target host on another socket, and then format them and display them on the main interface; and the monitoring duration logging thread You need to record the duration of the current video surveillance, format it and display it in the main interface. Keyboard and mouse control I put in the main thread, when the user clicks on a certain position in the video interface, the click coordinates will be recorded and converted to the coordinates of the target host's desktop according to the scaling ratio, and then the corresponding keyboard and mouse operations will be expressed as integers and sent to the target host together with the coordinates, and the target host will perform the corresponding operations after receiving them. At the same time, the integer representing the keystroke operation will be encrypted, and the encryption method is RSA. the video monitoring can be ended at any time, and the current screen capture can also be intercepted and saved.

While video monitoring, users can selectively turn on the network activity monitoring thread. If users need to turn it on, they need to set the filtering settings in the setting interface (GUI II) first, and select the packets to be captured, which can be set from several aspects, such as network interface, protocol type, source host, and destination host. After the thread is turned on, the machine will start to capture packets, and process the captured packets and display them in the main interface. During the process of capturing packets, it can be paused or terminated at any time, and after termination, the captured packets can be saved in pacp format for subsequent detailed viewing with wireshark. Users can also choose to interrupt the target host's network access, the principle is ARP attack, this option will open a thread to continuously send ARP packets to the target host.

Whenever there is an error due to improper use by the user, an error window (GUI III) will pop up, which will indicate exactly what went wrong.

2. Detailed design

Python's support for multithreading is very user-friendly, using theThreading library, which makes it very easy to create a new thread and make it run, as well as to coordinate running and blocking between multiple threads through the Event() function. Graphical interfaces can also be written very easily with pyqt5.

The main principle of video surveillance is that the target host intercepts the screenshot of the current desktop and sends it to the local machine via socket, and the local machine receives it and changes the picture into the required format and displays it in the main interface. However, one major problem is that the size of the picture intercepted by the target host is basically more than 500KB each time, and some of them even reach 1MB, so if ten pictures are intercepted per second and all of them are sent, the network bandwidth required will be 5M/s to 10M/s, which is obviously too much for the network to bear. Therefore, differentiated transmission can be taken. The same intercepts ten pictures per second, i.e., one picture is intercepted every 100 milliseconds, when the target host just establishes a connection with the local machine, the first picture intercepted will be sent in full, after that, the target host intercepts each picture and compares it with the previous one, if there is no change, it will not be sent, and then intercepts the next one in 100 milliseconds; if there is a change, then subtracts the two pictures to find the part that has a difference, and only sends out the differentiated part. If there is a change, then subtract the two pictures to find the difference and send only the differentiated part. For example, if there is only one cursor difference between the currently captured image and the previous one, then only that cursor will be sent. For continued optimization, the size of the differentiated portion will again be compared to the size of the current screenshot, and whichever is smaller will be sent. To make it easier for the local machine to determine whether the received picture is a complete or differentiated picture, a flag bit will be set in the packet, 1 for a complete picture and 0 for a differentiated picture. The pseudo-code for differentiated picture sending from the target host is shown in Fig. 2, and the flowchart of picture transmission from the local machine is shown in Fig. 3 respectively.

Keyboard and mouse control is also realized through socket. The resolution of the target host is 1920*1080, in order to display the monitoring screen of the target host in the screen, I reduced the image to 0.6 times of the original size, and then display it in the main interface. The widget that displays the image is a rewritten QLabel, in which I rewrote the mouse events and keyboard events, i.e. the MyQLabel class in the appendix source program. When the user performs a mouse operation on the QLabel (i.e. the displayed monitor screen), the rewritten mouse event will immediately get the current coordinates, the user's operation (mouse press or mouse release), and know whether the left or right button is operated, and then call the send function to send the mouse button, mouse operation and coordinates (divided by 0.6) to the target host as a package. After receiving this information, the target host will move the mouse to the corresponding coordinates and perform the corresponding operation. Keyboard operations are similar to mouse operations, but because the keyboard library used is not compatible with the keyboard codes returned by pyqt5's keyboard events, it only ensures that English letters are typed correctly and does not support key combinations. The flowchart of the local keyboard and mouse control is shown in Figure 4. The flowchart and pseudo-code of the target host's keyboard and mouse control are not given again because they are very simple, so you can refer to the source program in the appendix.

The monitoring of hardware resources of the target host is also realized through socket. The target host sends CPU utilization, memory utilization and total memory once every two seconds, and the local machine receives them in a loop, and then converts the format and unit when it receives them, and then fills them into the corresponding positions in the main interface. The corresponding flowchart of the local machine is shown in Figure 5, and the flowchart of the target host sending hardware resource information is very simple and will not be given again.

The monitoring hours are recorded mainly through thepython is implemented in the time library. When it starts monitoring a host, it gets the current time and opens a new thread for recording the time, which is updated once per second. The pseudo-code is shown in Figure 6.

The screenshot function can intercept the desktop of the current target host, this is very easy to realize, because before displaying the picture to the main interface, you must need to use a variable img to store it, as long as the screenshot button is pressed, immediately get the current img, and then saved as a png file can be.

At any time during the monitoring process, the video monitoring can be ended, and the various threads opened as a result of the video monitoring will block or exit with it. These threads are controlled by a function defined asevent_monitor The threads are scheduled by the events of the threads. When event_monitor is set (called by event_monitor.set() function), all threads will be opened and run normally, each thread detects whether the corresponding process event is set by event_monitor.is_set() function; when the user chooses to end the thread, event_monitor.clear() function is called to clear the setting of the event and disconnect the socket, the corresponding process will also block (such as monitoring time), the corresponding process will also block (such as monitoring time). monitor.clear() function will be called when the user chooses to end the thread to clear the setting of the event and disconnect the socket connection, and the corresponding process will be blocked (such as the monitoring time recording thread) or exit (the picture receiving thread and the hardware usage information receiving thread). The flowchart for ending video monitoring is shown in Figure 7.

At this point, the basic functions of the host control module have been introduced. The following section describes the Network Activity Monitor module.

The main function of the network activity monitoring module is to capture packets, which can be achieved by using the sniff function in python's scapy library to capture packets on a specific NIC. However, the packets captured by the sniff function can only be parsed up to the TCP/UDP level, but not beyond that, and are labeled as RAW, which can be resolved using the port number, as will be explained later. Before we start capturing packets, we would also like to perform a simple filter on the captured packets. To fulfill this requirement, users can click on the Settings button to enter the Settings interface to set the packet capture options: network interface, protocol type, host, source host and destination host. Since I am doing this on my own virtual machine, which is using NAT mode, the network interface defaults to "VMware Network Adapter VMnet8". The protocol type can be selected from a drop-down box: Any, TCP, UDP, IP, HTTP, HTTPS, SMTP, etc. The host is defaulted to "VMware Network Adapter VMnet8". The Host defaults to the IP of the host where the video monitoring is currently taking place, and the Source and Destination hosts can be specified by the user. In the process of setting these options, you are actually constructing a filter statement that will be passed as an argument to the sniff function. The pseudo-code for constructing the filter statement is shown in Figure 8.

Once the filter statement is constructed, the packet capture can start, the flowchart is shown in Figure 9. First, set the following keys, for example, set the save key to unavailable when capturing packets. Then the sniff function is called to capture packets, and every time a packet is captured, layer-by-layer parsing begins.

The first is an Ethernet frame, where the protocol type may be IPv4, ARP, IPv6, etc. Use packet[Ether].type to get a 2-byte hexadecimal number that corresponds to the protocol type. For example, 0x0800 means IPv4. A dictionary-type data structure can be used to store the hexadecimal number and the corresponding protocol type. To parse an IP-layer datagram, you can also use packet[IP].proto to obtain an integer that corresponds to a protocol type, for example, 4 for IP. This correspondence can still be stored in a dictionary. To obtain the source and destination host IPs, use packet[IP].src and packet[IP].dst, respectively. The next TCP/UDP layer segments are similar in that the port numbers can also be obtained using packet[TCP/UDP].sport and packet[TCP/UDP].dport. The port number is then used to determine more specific protocol types such as Http(tcp port 80), Https(tcp port 443), SMTP(tcp port 25), Telnet(tcp port 23), SNMP(udp port 161), and so on. The mapping relationship between these protocol types and port numbers is also kept in a dictionary. When the specific protocol type, source address, destination address, port, message length, message content and other information is determined, it can be displayed in the main interface.

During the packet capture process, you can pause at any time, at this time the variable flag_pause will be set to Ture (initially False), at this time the packet capture process is still continuing, but the packet will not be parsed and displayed. If you click the start button again, if flag_pause is True, the variable will be set to True, and the packet will continue to be parsed and displayed after it is captured. At the same time, when the pause button is pressed, the start button will be set to available and the pause button itself will be set to unavailable. The process is so simple that we won't show the flowchart again.

If you want to terminate the packet capture process, then you can press the terminate button, which will do the following: call the functionevent_stop_capture.set()The process of catching packets will be blocked; set whether the corresponding key is available or not, for example, the save key will be available at this time; clear the packet list, because all the captured packets are saved in a list, after clearing it, the next time you click the start key will restart catching packets. The process is very simple and no more flowcharts or pseudo-code will be shown.

Whether we are attempting to attack a target host or wish to protect it in the event of an attack, it may be necessary to forcibly disrupt network access to the target host. This function can be accomplished through an ARP attack. The flowchart is shown in Figure 10 on the previous page. First, you need to know the IP of the target host and the gateway IP, and then use the Scapy library's getmacbyip() function and get_if_hwaddr() function to obtain the target host's MAC address and the gateway's MAC address, respectively, and then construct an ARP packet: ARP (hwsrc = gateway MAC address, psrc = gateway IP, hwdst = target host MAC address) hwdst=destination host MAC address, pdst=destination host IP). The source address of this ARP packet is the gateway IP and gateway MAC address, and the destination address is the IP and MAC address of the target host. The Ethernet frame is then constructed: Ether(src=gateway MAC address, dst=destination host MAC address). Once constructed, it is sent to the target host every second so that the target host will not be able to access the network properly. Again, this function requires a new thread to be opened, which is coordinated through the thread event event_cutoff.

When the user wants to allow network access to the target host, just click the Restore Network button, so that the event_cutoff event will be cleared, the corresponding thread will stop sending ARP packets, and the network access to the target host will be restored to normal.

Up to this point, the two main modules of the remote control host have been described, the others are the encryption function and error alert function.

In order to prevent the mouse and keyboard operations of the remote control module from being stolen by others, I encrypted the keystrokes. Keyboard operations on the local machine are sent to the target host via socket, and four numbers are sent: the integer corresponding to the left or right key or a keyboard key, the integer corresponding to the key pressed or released, the horizontal coordinate, and the vertical coordinate. I encrypted the first integer, the integer corresponding to the left or right key or one of the keyboard keys, with RSA, and used Python's RSA module to generate the keys, both public and private, and saved them in a file. The public key is placed on the local machine and the private key needs to be placed on the target host. Then, every time a keystroke is performed, the integer corresponding to the keystroke is encrypted by the RSA public key into a 16-bit hexadecimal number, which is then sent to the target host along with three other numbers. The target host receives it, extracts the first sixteen bits, decrypts them with the private key, and then restores the corresponding operation on its screen. The flowchart is shown in Figure 11.

The error alert function is mainly to detect whether the user's usage is standardized or not. If the user starts to capture packets, video surveillance or ARP attack without specifying the target host IP, a pop-up window will prompt the user to set the IP.

2. Debugging analysis

The first problem I encountered during the programming process was that Scapy could only identify the TCP/UDP layer and could not determine whether it was Http, Https, SMTP, etc. My initial thought was to find a way to parse deeper, i.e. the RAW layer, to get certain strings such as 'Http' and determine the specific protocol. My initial thought was to find a way to parse the deeper layer, the RAW layer, to get some strings like 'Http' to determine the specific protocol type. However, after searching for various information, there is still no feasible solution, and finally suddenly realized that the port number can be used to determine the specific protocol type. For example, Http protocol packets through TCP port 80, Https protocol packets through TCP port 443. Scapy makes it easy to get port numbers and other information. So, I use a dictionary to save the mapping relationship between port number (or other integers) and protocol type, with port number (or other integers) as the key and protocol type as the value. When a packet is captured, the corresponding key-value pairs in the dictionary are found through the port number, and then the protocol type can be determined.

When doing the video surveillance module, I originally calculated to encrypt every picture transmitted, use a randomly generated picture for the key, differentiate the key picture from every pixel of every picture to be sent, and then send the encrypted picture. When the local machine receives the picture, it again performs the dissimilarity with the key picture, and what it gets is the original picture. However, due to the complexity of the format conversion involved, I did it for close to three days and still didn't make the slightest progress, so I had to give up and go for the encryption of the keymap operation.

Another thing is that when sending pictures, as described in the previous article, I initially took the whole picture to send completely, which is too much burden on the network. I thought about sacrificing some clarity to reduce the network burden, but for the convenience of local monitoring, scaling the image to 0.6 times of the original has already lost a lot of clarity, if it is further compressed, then it will be difficult to see the image clearly on the local computer, and the monitoring will lose its significance. Later on, I got inspiration when referring to another project, i.e. using differential transmission. In this way, the burden on the network can be basically not need to consider.

Now review the whole design process, mainly involved in the socket programming, arp attack principles, the use of scapy library, multi-threaded model, image processing (the use of opencv), RSA encryption, as well as basic knowledge of computer networks. In the study of computer networks have done packet capture experiments, but that is the use of wireshark tool to capture packets, this curriculum is to write their own handwritten packet capture tool, the principle of packet capture has a deeper understanding. The use of image processing, opencv library and Pillow library is something that I haven't touched before, but through this coursework, I have a general understanding of their basic usage. socket programming is something that I just learned in this semester, although python's socket operation is a lot simpler than that of C, it's still useful as a refresher, and also increases the experience of using socket. It's also a good way to get more experience with it.

3. Test results

Next I will show the final realized functions of the remote control host course design: interface display, video monitoring, remote control, hardware resource usage information monitoring, start, pause and termination of network activity monitoring, saving screenshots and packets, and so on.

First is the main interface after the program is run, as shown below.

To this article on the implementation of remote control of the host based on Python to this article, more related to the implementation of remote control of the host Python content, please search for my previous articles or continue to browse the following related articles I hope that you will support me more in the future!