SoFunction
Updated on 2024-11-14

Pyspark Linear Regression Gradient Descent Cross Validation Knowledge Points Explained

I'm trying to perform cross-validation on an SGD model in pyspark, I'm using, ParamGridBuilder and CrossValidator both from the library LinearRegressionWithSGD.

After tracking down the documentation on the Spark website, I'm hoping that running this method will work correctly

Profile Reference:/docs/2.1.0/

lr = LinearRegressionWithSGD()
pipeline=Pipeline(stages=[lr])

paramGrid = ParamGridBuilder()\
    .addGrid(, Array(0.1, 0.01))\
    .build()

crossval = CrossValidator(estimator=pipeline,estimatorParamMaps= paramGrid,
                         evaluator=RegressionEvaluator(),
                         numFolds=10)

But LinearRegressionWithSGD() doesn't have the property stepSize (haven't had any luck trying others either).

I can set the lr to LinearRegression, but I can't use SGD in the model and cross validate it.

There is a kFold method in Skala, but I'm not sure how to access it from pyspark

prescription

You can use the step parameter in LinearRegressionWithSGD to define the step size, but since you are mixing incompatible libraries, this will not make the code work. Unfortunately, I don't know how to cross validate the ml libraries using SGD optimization, and I'd like to know myself, but you are mixing and libraries. Specifically, you cannot use LinearRegressionWithSGD with the library. You must use.

The good news is that you can set the setsolver properties to use 'gd'. So you may be able to set the parameters of the 'gd' optimizer to run in SGD, but I'm not sure where the solver documentation is or how to set the solver properties (e.g. batch size). The api shows the LinearRegression object that calls Param(), but I'm not sure if it uses the optimizer. If anyone knows how to set the solver properties, it could answer your question by allowing you to use the Pipeline, ParamGridBuilder, and CrossValidation ml packages for LinearRegression for model selection and SGD optimization for parameter tuning.

To this point this article on Pyspark linear regression gradient descent cross-validation knowledge points are introduced to this article, more related Pyspark linear regression gradient descent cross-validation content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!