SoFunction
Updated on 2024-11-12

Python uses pandas to put tabular data through its paces.

preamble

Mission Description:

Currently there is an excel spreadsheet data, there are missing values, the need for missing data to the es database to find and to make the missing data to complete.

The excel sheet data is shown below:

I. Constructing data in the es library

1.1 Creation of indexes

# Create physical_examination indexes
PUT /physical_examination
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "nums": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      },
      "sex": {
        "type": "text"
      },
      "phone": {
        "type": "integer"
      },
      "result": {
        "type": "text"
      }
    }
  }
}

1.2 Insertion of data

Attention:json data can not be formatted line feeds, or report an error]

# Add data to the physical_examination index
POST physical_examination/_bulk
{"index":{"_id":"1"}}
{"nums":1,"name":"Liu Yi","sex":"Male.","phone":1234567891,"result":"Excellent."}
{"index":{"_id":"2"}}
{"nums":2,"name":"Chen Er","sex":"Male.","phone":1234567892,"result":"Excellent."}
{"index":{"_id":"3"}}
{"nums":3,"name":"Zhang San","sex":"Male.","phone":1234567893,"result":"Excellent."}
{"index":{"_id":"4"}}
{"nums":4,"name":"Li Si.","sex":"Male.","phone":1234567894,"result":"Excellent."}
{"index":{"_id":"5"}}
{"nums":5,"name":"Wang Wu","sex":"Male.","phone":1234567895,"result":"Excellent."}

1.3 Query data

Attention:Default query for all data under the index]

# Query all data in the index
GET physical_examination/_search
{
  "query": {
    "match_all": {}
  }
}

Second, the excel form of data processing operations

2.1 Exporting data for es query

  • Method I:Copy and paste into a document directly from the results of a kibana or postman query.
  • Method II:Use kibana to export data.
  • Method Three:Use postman to export the data and save it locally.

Use python to process the data and get the needed data.

Sample code:

# Read medical exam information in json
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = ()
print(data_json)
 
# Handle exception data in json data
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)

Read the excel sheet with the missing data and fill in the missing data.

# Read tables that need to be filled with data
data_xlsx = pd.read_excel('. /data/medical-examination-form.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# Get rows and columns of an excel sheet
row, col = data_xlsx.shape
print(row, col)
 
# Modify data in tables
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['Name'], bb['Cell phone number'])
    if (bb['Cell phone number']):
        bb['Cell phone number'] = '666'
        for cc in need_data:
            if cc['name'] == bb['Name']:
                bb['Cell phone number'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['Cell phone number']
    print(bb['Name'], bb['Cell phone number'])
    print("-" * 100)
print(data_xlsx)

Save the final processed data in a newly created file.

# Save data in a new file
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

The full code is below:

import pandas as pd
 
# Read medical exam information in json
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = ()
print(data_json)
 
# Handle exception data in json data
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)
 
# Read tables that need to be filled with data
data_xlsx = pd.read_excel('. /data/medical-examination-form.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# Get rows and columns of an excel sheet
row, col = data_xlsx.shape
print(row, col)
 
# Modify data in tables
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['Name'], bb['Cell phone number'])
    if (bb['Cell phone number']):
        bb['Cell phone number'] = '666'
        for cc in need_data:
            if cc['name'] == bb['Name']:
                bb['Cell phone number'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['Cell phone number']
    print(bb['Name'], bb['Cell phone number'])
    print("-" * 100)
print(data_xlsx)
 
# Save data in a new file
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

Running the effect, the final processed data is shown below:

to this article on the use of Python pandas table data processing article is introduced to this, more related pandas table data processing content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!