Python Pandas Data Analysis Tool Usage Examples

1. Introduction

Pandas is a professional data analysis tool based on Numpy, which can flexibly and efficiently handle a variety of data sets, and is also a godsend for our late analysis cases. It provides two types of data structures, respectively, DataFrame and Series, we can simply and roughly DataFrame understood as a table inside Excel, and Series is a column in the table!

2、Create DataFrame

# -*- encoding=utf-8 -*-

import pandas

if __name__ == '__main__':
  pass
  test_stu = (
    {'Higher Mathematics': [66, 77, 88, 99, 85],
     'Big stuff': [88, 77, 85, 78, 65],
     'English': [99, 84, 87, 56, 75]},
  )
  print(test_stu)
  stu = (
    {'Higher Mathematics': [66, 77, 88, 99, 85],
     'Big stuff': [88, 77, 85, 78, 65],
     'English': [99, 84, 87, 56, 75]},
    index=['Little Red', 'Little Lee', 'White', 'Blackie', 'Little Green'] # Specify index
  )
  print(stu)

(of a computer) run

high mathematics big stuff English (language)
0 66 88 99
1 77 77 84
2 88 85 87
3 99 78 56
4 85 65 75
  high mathematics big stuff English (language)
little red 66 88 99
Xiao Li 77 77 84
young white 88 85 87
Blackie 99 78 56
lilac 85 65 75

3, read CSV or Excel (.xlsx) for simple operations (add, delete, change, check)

# -*- encoding=utf-8 -*-

import pandas

if __name__ == '__main__':
  pass
  data = pandas.read_csv('', engine='python') # Read csv file using python analytics engine
  print((5)) # Display the first 5 lines.
  print((5)) # Display the last 5 lines
  print(data) # Show all data
  print(data['height']) # Show height column
  print(data[['height', 'weight']]) # Show height and weight columns
  data.to_csv('') # Save to csv file
  data.to_excel('') # Save to xlsx file
  () # View data information (total number of rows, any vacant data, type)
  print(()) # (count non-null, mean mean, std standard deviation, min min, max max 25% 50% 75% quantile.)
  data['New Columns'] = range(0, len(data)) # Similar to a dictionary just add it
  print(data)
  new_data = ('New Columns', axis=1, inplace=False)
  # Delete columns, if inplace is True then delete in source data, return None, otherwise return new data, no change in source data
  print(new_data)
  data['Weight + Height'] = data['height'] + data['weight']
  print(data)
  data['remark'] = data['remark'].('to', '') # Manipulate strings
  print(data['remark'])
  data['birth'] = pandas.to_datetime(data['birth']) # Converted to date type
  print(data['birth'])

4, according to the conditions for screening, intercept

# -*- encoding=utf-8 -*-

import pandas

if __name__ == '__main__':
  pass
  data = pandas.read_csv('', engine='python') # Read csv file using python analysis engine
  a = [:12, ] # Intercepts 0-12 rows, columns all intercepted
  # print(a)
  b = [:, [1, 3]] # Rows are fully truncated, columns 1, 3
  # print(b)
  c = [0:12, 0:4] # Intercept rows 0-12, columns 0-4
  # print(c)
  d = data['sex'] == 1 # Viewed with gender 1 (male)
  # print(d)
  f = [data['sex'] == 1, :] # Viewed with gender 1 (male)
  # print(f)
  g = [:, ['weight', 'height']] # Selected height and weight
  # print(g)
  h = [data['height'].isin([166, 175]), :] # Selection of heights 166,175
  # print(h)
  h1 = [data['height'].isin([166, 175]), ['weight', 'height']] # Selection of heights 166,175
  # print(h1)
  i = data['height'].mean() # Mean
  j = data['height'].std() # Variance
  k = data['height'].median() # Median
  l = data['height'].min() # Minimum
  m = data['height'].max() # Maximum
  # print(i)
  # print(j)
  # print(k)
  # print(l)
  # print(m)
  n = [
    (data['height'] > data['height'].mean()) &
    (data['weight'] > data['weight'].mean()),
    :] # Height is greater than the mean of height and weight is greater than the mean of weight, not and but & if yes or |
  print(n)

5、Clear Nan data, de-weighting, grouping, merging

# -*- encoding=utf-8 -*-

import pandas

if __name__ == '__main__':
  pass
  sheet1 = pandas.read_excel('', sheet_name='Sheet1') # Read sheet1
  # print(sheet1)
  # print('-------------------------')
  sheet2 = pandas.read_excel('', sheet_name='Sheet2') # Read sheet2
  # print(sheet2)
  # print('-------------------------')
  a = ([sheet1, sheet2]) # Merge
  # print(a)
  # print('-------------------------')
  b = () # Delete empty data nan, with nan in it
  # print(b)
  # print('-------------------------')
  b1 = (subset=['weight']) # Delete empty data nan the specified column
  # print(b1)
  # print('-------------------------')
  c = b.drop_duplicates() # Delete duplicates
  # print(c)
  # print('-------------------------')
  d = b.drop_duplicates(subset=['weight']) # Remove duplicates from specified columns
  # print(d)
  # print('-------------------------')
  e = b.drop_duplicates(subset=['weight'], keep='last') # Remove duplicates from the specified column, saving the last identical data.
  # print(e)
  # print('-------------------------')
  f = a.sort_values(['weight'], ascending=False) # Sort weight from largest to smallest
  # print(f)
  g = (['sex']).sum() # Grouped by sex and then summed #
  # print(g)
  g1 = (['sex'], as_index=False).sum() # Group by sex, then sum, but sex is not indexed
  # print(g1)
  g2 = (['sex', 'weight']).sum() # Group by sex and then by weight and then sum it up
  # print(g2)
  h = (c['weight'], bins=[80, 90, 100, 150, 200], ) # Segmentation of body weight according to intervals
  print(h)
  # print('-------------------------')
  c['Split according to weight'] = h # There will be warnings, unresolved, but not affecting results
  print(c)

This is the whole content of this article, I hope it will help you to learn more.