What is Support Vector Regression?
Support Vector Machine is a supervised machine learning algorithm that can be used for regression or classification problems. It can solve linear and non-linear problems and work well for many practical problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.
Let’s understand Support Vector Regression using the Position_Salaries data set which is available on Kaggle. This data set consists of a list of positions in a company along with the band levels and their associated salary. The data set includes columns for Position with values ranging from Business Analyst, Junior Consultant to CEO, Level ranging from 1–10, and finally the Salary associated with each position ranging from $45000 to $1000000.
Required R package
First, you need to install the e1071 and ggplot2 package and load the e1071 and ggplot2 library then after you can able to perform the following operations. So let’s start to implement our non-linear regression model.
Import libraries
install.packages('e1071')
install.packages('ggplot2')
library(e1071)
library(ggplot2)
Note: If you use R studio then packages need to be installed only once.
Importing the dataset
dataset <- read.csv('../input/position-salaries/Position_Salaries.csv')
dataset <- dataset[2:3]
dim(dataset)
The read.csv() function is used to read the csv file and the dim() function is used to know the csv file contains how many rows and columns. In this data set, the column Position and Level have the same meaning therefore, we choose the Level column. Also, the data set is very small so don’t split into training and test set.
The problem statement is that the candidate with level 6.5 had a previous salary of 160000. In order to hire the candidate for a new role, the company would like to confirm if he is being honest about his last salary so it can make a hiring decision. To do this, we will make use of Support Vector Regression to predict the accurate salary of the employee.
Apply Support Vector Regression to the data set
regressor <- svm(formula <- Salary ~ ., data <- dataset, type <- 'eps-regression', kernel <- 'radial')
The svm() function used to create a Support Vector Regression model. If you look at the data set, we have one dependent variable salary and one independent variable Level. Therefore, the notation formula <- Salary ~ . means that the salary is proportional to Level. The dot represents all the independent variables. Now, the second argument takes the data set on which you want to train your regression model. The third argument is the most important because this argument type will specify if you’re making an SVM model which is used for regression or classification. Here, we’re building a non-linear regression model so we will choose the eps-regression type. The final argument is, add the kernel argument. If you don’t choose any kernel argument, the Gaussian kernel selected by default.
Predicting a new result with Support Vector Regression
y_pred <- predict(regressor, data.frame(Level = 6.5))
This code predicts the salary associated with 6.5 level according to a Support Vector Regression Model and it gives us the close prediction to 160 k so it’s a pretty good prediction.
Visualize the Support Vector Regression results
ggplot() +
geom_point(aes(x <- dataset$Level, y <- dataset$Salary), colour = 'red') +
geom_line(aes(x <- dataset$Level, y <- predict(regressor, dataset)), colour = 'blue') +
ggtitle('Support Vector Regression') +
xlab('Level') +
ylab('Salary')
The Support Vector Regression model represents the blue curve which fits good on the data because the observation points are very close to the real observation except for the outlier.
The code is available on my GitHub account.
The previous part of the series part1, part2 and part3 covered the Linear Regression, Multiple Linear Regression and Polynomial Linear Regression.
If you like the blog or found it helpful please leave a clap!
Thank you.