Does PySpark provide a machine learning API?
Similar to Spark, PySpark provides a machine learning API which is known as MLlib that supports various ML algorithms like:
* mllib.classification
: This supports different methods for binary or multiclass classification and regression analysis like Random Forest, Decision Tree, Naive Bayes etc.
* mllib.clustering
: This is used for solving clustering problems that aim in grouping entities subsets with one another depending on similarity.
* mllib.fpm
: FPM stands for Frequent Pattern Matching. This library is used to mine frequent items, subsequences or other structures that are used for analyzing large datasets.
* mllib.linalg
: This is used for solving problems on linear algebra.
* mllib.recommendation
: This is used for collaborative filtering and in recommender systems.
* spark.mllib
: This is used for supporting model-based collaborative filtering where small latent factors are identified using the Alternating Least Squares (ALS) algorithm which is used for predicting missing entries.
* mllib.regression
: This is used for solving problems using regression algorithms that find relationships and variable dependencies.