I. Introduction
Recently, a software engineer by the name of Gabriele Venturi built the pandasai project on github. According to the official description, pandasai is a Python third-party library that integrates the generative power of artificial intelligence into the pandas package to make data analysis conversational. In fact, this third-party package can be very powerful by calling openai's API and statistically analyzing the data in the data box according to the interactive commands, which can filter, calculate, and analyze the data that the user wants, and even draw charts. Currently, there are 68,000 stars on github, 447 forks, and 18 versions have been updated.
With this tool, in the case of not familiar with Python commands, but also by sending detailed data analysis needs, with the help of pandasai and openai API to generate the required data or charts, reduce the process of data analysis, reduce the threshold of the use of Python for computing, charting.
II. Initial use of pandasai
1. pandasai installation
After installing Python 3.8 or higher, use the pip command to install:
pip install pandasai
2. Official code samples
import pandas as pd from pandasai import PandasAI # Sample DataFrame df = ({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12] }) # Instantiate a LLM from import OpenAI llm = OpenAI(api_token="YOUR_API_TOKEN") pandas_ai = PandasAI(llm, conversational=False) pandas_ai(df, prompt='Which are the 5 happiest countries?')
3. Sample code modifications
In the process of debugging this sample, on the one hand, you have to get the openai API (it seems that the free quota is no longer available), if you want to apply for it you can see this:
The other aspect is to be able to access this api. if you want to access the api successfully, you need to use the fourth method in the following article:
The modified sample code is as follows:
import pandas as pd from pandasai import PandasAI from import OpenAI import openai openai.api_base = "/v1" #Set your own URL here # Sample DataFrame #Sample from DataFrame df = ({ "country":["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp":[19294482071552,2891615567872,2411255037952,3435817336832,1745433788416,1181205135360,1607402389504,1490967855104, 4380756541440, 14631844184064], "happiness_index":[6.94,7.16,6.66,7.07,6.38,6.4,7.23,5.87,5.87,5.12] }) # Instantiate a LLM Generate an LLM Instance llm = OpenAI(api_token="Your Own Open_API_Key") # Put your own api here pandas_ai = PandasAI(llm, conversational=False) print(pandas_ai.run(df, prompt='Which are the 5 happiest countries? '))
Compared with the official code samples, we have increased the import openai, but also the base_url I set up to specify, and finally need to print () to generate the results printed out, the test found that you can also ask questions in Chinese. Generated results show
4. Visual mapping
If the phase generates a chart, you can specify it in the prompt, for example, change the last line of the sample code modified above to:
print(pandas_ai(df,"Plot the histogram of countries showing for each the gdp, using different colors for each bar"))
The idea is to generate a histogram of each country's GDP. The results obtained are shown below:
pandasai generates charts
5. Analyzing local charts
Modify the existing DateFrame to read the local Excel table, you can use pd.read_excel ("") command, read directly assigned to the variable df, note that if the amount of data is large, the reading speed may be a little slower.
import pandas as pd from pandasai import PandasAI from import OpenAI import openai openai.api_base = "/v1" #Sample DataFrame df = pd.read_excel("") # Instantiate a LLM llm = OpenAI(api_token="Your Own Open_API_Key") pandas_ai = PandasAI(llm, conversational=False) print(pandas_ai(df, prompt='Which are the 5 happiest countries? '))
III. Post-learning reflections
- pandasai integrates the functionality of pandas and chatgpt, which reduces the learning cost of data analysis and can be used as an important part of the data analysis stream.
- The pandasai analysis results are more accurate, due to the slow access to the api, and overall the program code, although not much, runs slower.
- Since openai has removed the free api credits, you have to set up a paid account if you want to use the api, which definitely puts a barrier in the way of regular users.
To this article on the use of Python pandasai data analysis article is introduced to this, more related Python pandasai content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!