Google News
logo
PySpark - Interview Questions
Does PySpark provide a machine learning API?
Similar to Spark, PySpark provides a machine learning API which is known as MLlib that supports various ML algorithms like:

* mllib.classification : This supports different methods for binary or multiclass classification and regression analysis like Random Forest, Decision Tree, Naive Bayes etc.

* mllib.clustering : This is used for solving clustering problems that aim in grouping entities subsets with one another depending on similarity.

* mllib.fpm : FPM stands for Frequent Pattern Matching. This library is used to mine frequent items, subsequences or other structures that are used for analyzing large datasets.

* mllib.linalg : This is used for solving problems on linear algebra.

* mllib.recommendation : This is used for collaborative filtering and in recommender systems.

* spark.mllib : This is used for supporting model-based collaborative filtering where small latent factors are identified using the Alternating Least Squares (ALS) algorithm which is used for predicting missing entries.

* mllib.regression : This is used for solving problems using regression algorithms that find relationships and variable dependencies.
Advertisement