SoFunction
Updated on 2024-11-21

Example of pandas and spark dataframe conversion to each other in detail

This article introduces the pandas and spark dataframe conversion examples of each other in detail, the text of the sample code through the introduction of the very detailed, for everyone's learning or work has a certain reference value of learning, the need for friends can refer to the following

from  import SparkSession
# Initialize a spark session
spark = SparkSession \
  .builder \
  .getOrCreate()
spark_df = (pandas_df)

Spark's dataframe to pandas's dataframe

import pandas as pd
pandas_df = spark_df.toPandas()

Since the pandas approach is the standalone version, i.e. the toPandas() approach is the standalone version, refer to breeze_lsw to change to the distributed version:

import pandas as pd
def _map_to_pandas(rdds):
  return [(list(rdds))]
  
def topas(df, n_partitions=None):
  if n_partitions is not None: df = (n_partitions)
  df_pand = (_map_to_pandas).collect()
  df_pand = (df_pand)
  df_pand.columns = 
  return df_pand
  
pandas_df = topas(spark_df)

This is the entire content of this article.