R Markdown

This R Markdown document is part of SMU’s Master’s in Data Science Program DS 6306 “Doing Data Science.” Student’s are given a data set and asked to make predictions using data science methods and techniques learned in the course. For this case study we are asumming that we have been hired by a company called DDSAnalytics that specializes in talent management. The company wants to gain a competitive edge by providing its customers with accurate predictions regarding attrition (employee turnover) and monthly salary.

We will start by importing the following data for analysis:

CaseStudy2-Data.csv:

  1. Later we will import the folowing data sets that will used to make our predictions for the class contest.
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v stringr 1.4.0
## v tidyr   1.1.3     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## corrplot 0.84 loaded
## Loading required package: grid
## Loading required package: rpart
## 
## Attaching package: 'BBmisc'
## The following object is masked from 'package:grid':
## 
##     explode
## The following objects are masked from 'package:dplyr':
## 
##     coalesce, collapse
## The following object is masked from 'package:base':
## 
##     isFALSE

Load Theme for plots

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Data Preparation #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

##   ID Age Attrition    BusinessTravel DailyRate             Department
## 1  1  32        No     Travel_Rarely       117                  Sales
## 2  2  40        No     Travel_Rarely      1308 Research & Development
## 3  3  35        No Travel_Frequently       200 Research & Development
## 4  4  32        No     Travel_Rarely       801                  Sales
## 5  5  24        No Travel_Frequently       567 Research & Development
## 6  6  27        No Travel_Frequently       294 Research & Development
##   DistanceFromHome Education   EducationField EmployeeCount EmployeeNumber
## 1               13         4    Life Sciences             1            859
## 2               14         3          Medical             1           1128
## 3               18         2    Life Sciences             1           1412
## 4                1         4        Marketing             1           2016
## 5                2         1 Technical Degree             1           1646
## 6               10         2    Life Sciences             1            733
##   EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1                       2   Male         73              3        2
## 2                       3   Male         44              2        5
## 3                       3   Male         60              3        3
## 4                       3 Female         48              3        3
## 5                       1 Female         32              3        1
## 6                       4   Male         32              3        3
##                  JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1        Sales Executive               4      Divorced          4403
## 2      Research Director               3        Single         19626
## 3 Manufacturing Director               4        Single          9362
## 4        Sales Executive               4       Married         10422
## 5     Research Scientist               4        Single          3760
## 6 Manufacturing Director               1      Divorced          8793
##   MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1        9250                  2      Y       No                11
## 2       17544                  1      Y       No                14
## 3       19944                  2      Y       No                11
## 4       24032                  1      Y       No                19
## 5       17218                  1      Y      Yes                13
## 6        4809                  1      Y       No                21
##   PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
## 1                 3                        3            80                1
## 2                 3                        1            80                0
## 3                 3                        3            80                0
## 4                 3                        3            80                2
## 5                 3                        3            80                0
## 6                 4                        3            80                2
##   TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## 1                 8                     3               2              5
## 2                21                     2               4             20
## 3                10                     2               3              2
## 4                14                     3               3             14
## 5                 6                     2               3              6
## 6                 9                     4               2              9
##   YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## 1                  2                       0                    3
## 2                  7                       4                    9
## 3                  2                       2                    2
## 4                 10                       5                    7
## 5                  3                       1                    3
## 6                  7                       1                    7
## The following object is masked from package:vcd:
## 
##     JobSatisfaction
## 'data.frame':    870 obs. of  36 variables:
##  $ ID                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                     : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ Attrition               : chr  "No" "No" "No" "No" ...
##  $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" ...
##  $ DailyRate               : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Sales" ...
##  $ DistanceFromHome        : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education               : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EducationField          : chr  "Life Sciences" "Medical" "Life Sciences" "Marketing" ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
##  $ EnvironmentSatisfaction : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ Gender                  : chr  "Male" "Male" "Male" "Female" ...
##  $ HourlyRate              : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement          : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobRole                 : chr  "Sales Executive" "Research Director" "Manufacturing Director" "Sales Executive" ...
##  $ JobSatisfaction         : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MaritalStatus           : chr  "Divorced" "Single" "Single" "Married" ...
##  $ MonthlyIncome           : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate             : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked      : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ Over18                  : chr  "Y" "Y" "Y" "Y" ...
##  $ OverTime                : chr  "No" "No" "No" "No" ...
##  $ PercentSalaryHike       : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating       : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction: int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears       : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear   : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance         : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole      : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager    : int  3 9 2 7 3 7 3 0 0 7 ...
## [1] 870  36
## integer(0)
##                       ID                      Age                Attrition 
##                        0                        0                        0 
##           BusinessTravel                DailyRate               Department 
##                        0                        0                        0 
##         DistanceFromHome                Education           EducationField 
##                        0                        0                        0 
##            EmployeeCount           EmployeeNumber  EnvironmentSatisfaction 
##                        0                        0                        0 
##                   Gender               HourlyRate           JobInvolvement 
##                        0                        0                        0 
##                 JobLevel                  JobRole          JobSatisfaction 
##                        0                        0                        0 
##            MaritalStatus            MonthlyIncome              MonthlyRate 
##                        0                        0                        0 
##       NumCompaniesWorked                   Over18                 OverTime 
##                        0                        0                        0 
##        PercentSalaryHike        PerformanceRating RelationshipSatisfaction 
##                        0                        0                        0 
##            StandardHours         StockOptionLevel        TotalWorkingYears 
##                        0                        0                        0 
##    TrainingTimesLastYear          WorkLifeBalance           YearsAtCompany 
##                        0                        0                        0 
##       YearsInCurrentRole  YearsSinceLastPromotion     YearsWithCurrManager 
##                        0                        0                        0
##        ID             Age         Attrition         BusinessTravel    
##  Min.   :  1.0   Min.   :18.00   Length:870         Length:870        
##  1st Qu.:218.2   1st Qu.:30.00   Class :character   Class :character  
##  Median :435.5   Median :35.00   Mode  :character   Mode  :character  
##  Mean   :435.5   Mean   :36.83                                        
##  3rd Qu.:652.8   3rd Qu.:43.00                                        
##  Max.   :870.0   Max.   :60.00                                        
##    DailyRate       Department        DistanceFromHome   Education    
##  Min.   : 103.0   Length:870         Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 472.5   Class :character   1st Qu.: 2.000   1st Qu.:2.000  
##  Median : 817.5   Mode  :character   Median : 7.000   Median :3.000  
##  Mean   : 815.2                      Mean   : 9.339   Mean   :2.901  
##  3rd Qu.:1165.8                      3rd Qu.:14.000   3rd Qu.:4.000  
##  Max.   :1499.0                      Max.   :29.000   Max.   :5.000  
##  EducationField     EmployeeCount EmployeeNumber   EnvironmentSatisfaction
##  Length:870         Min.   :1     Min.   :   1.0   Min.   :1.000          
##  Class :character   1st Qu.:1     1st Qu.: 477.2   1st Qu.:2.000          
##  Mode  :character   Median :1     Median :1039.0   Median :3.000          
##                     Mean   :1     Mean   :1029.8   Mean   :2.701          
##                     3rd Qu.:1     3rd Qu.:1561.5   3rd Qu.:4.000          
##                     Max.   :1     Max.   :2064.0   Max.   :4.000          
##     Gender            HourlyRate     JobInvolvement     JobLevel    
##  Length:870         Min.   : 30.00   Min.   :1.000   Min.   :1.000  
##  Class :character   1st Qu.: 48.00   1st Qu.:2.000   1st Qu.:1.000  
##  Mode  :character   Median : 66.00   Median :3.000   Median :2.000  
##                     Mean   : 65.61   Mean   :2.723   Mean   :2.039  
##                     3rd Qu.: 83.00   3rd Qu.:3.000   3rd Qu.:3.000  
##                     Max.   :100.00   Max.   :4.000   Max.   :5.000  
##    JobRole          JobSatisfaction MaritalStatus      MonthlyIncome  
##  Length:870         Min.   :1.000   Length:870         Min.   : 1081  
##  Class :character   1st Qu.:2.000   Class :character   1st Qu.: 2840  
##  Mode  :character   Median :3.000   Mode  :character   Median : 4946  
##                     Mean   :2.709                      Mean   : 6390  
##                     3rd Qu.:4.000                      3rd Qu.: 8182  
##                     Max.   :4.000                      Max.   :19999  
##   MonthlyRate    NumCompaniesWorked    Over18            OverTime        
##  Min.   : 2094   Min.   :0.000      Length:870         Length:870        
##  1st Qu.: 8092   1st Qu.:1.000      Class :character   Class :character  
##  Median :14074   Median :2.000      Mode  :character   Mode  :character  
##  Mean   :14326   Mean   :2.728                                           
##  3rd Qu.:20456   3rd Qu.:4.000                                           
##  Max.   :26997   Max.   :9.000                                           
##  PercentSalaryHike PerformanceRating RelationshipSatisfaction StandardHours
##  Min.   :11.0      Min.   :3.000     Min.   :1.000            Min.   :80   
##  1st Qu.:12.0      1st Qu.:3.000     1st Qu.:2.000            1st Qu.:80   
##  Median :14.0      Median :3.000     Median :3.000            Median :80   
##  Mean   :15.2      Mean   :3.152     Mean   :2.707            Mean   :80   
##  3rd Qu.:18.0      3rd Qu.:3.000     3rd Qu.:4.000            3rd Qu.:80   
##  Max.   :25.0      Max.   :4.000     Max.   :4.000            Max.   :80   
##  StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
##  Min.   :0.0000   Min.   : 0.00     Min.   :0.000         Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.: 6.00     1st Qu.:2.000         1st Qu.:2.000  
##  Median :1.0000   Median :10.00     Median :3.000         Median :3.000  
##  Mean   :0.7839   Mean   :11.05     Mean   :2.832         Mean   :2.782  
##  3rd Qu.:1.0000   3rd Qu.:15.00     3rd Qu.:3.000         3rd Qu.:3.000  
##  Max.   :3.0000   Max.   :40.00     Max.   :6.000         Max.   :4.000  
##  YearsAtCompany   YearsInCurrentRole YearsSinceLastPromotion
##  Min.   : 0.000   Min.   : 0.000     Min.   : 0.000         
##  1st Qu.: 3.000   1st Qu.: 2.000     1st Qu.: 0.000         
##  Median : 5.000   Median : 3.000     Median : 1.000         
##  Mean   : 6.962   Mean   : 4.205     Mean   : 2.169         
##  3rd Qu.:10.000   3rd Qu.: 7.000     3rd Qu.: 3.000         
##  Max.   :40.000   Max.   :18.000     Max.   :15.000         
##  YearsWithCurrManager
##  Min.   : 0.00       
##  1st Qu.: 2.00       
##  Median : 3.00       
##  Mean   : 4.14       
##  3rd Qu.: 7.00       
##  Max.   :17.00
Data summary
Name df
Number of rows 870
Number of columns 36
_______________________
Column type frequency:
character 9
numeric 27
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Attrition 0 1 2 3 0 2 0
BusinessTravel 0 1 10 17 0 3 0
Department 0 1 5 22 0 3 0
EducationField 0 1 5 16 0 6 0
Gender 0 1 4 6 0 2 0
JobRole 0 1 7 25 0 9 0
MaritalStatus 0 1 6 8 0 3 0
Over18 0 1 1 1 0 1 0
OverTime 0 1 2 3 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ID 0 1 435.50 251.29 1 218.25 435.5 652.75 870 ▇▇▇▇▇
Age 0 1 36.83 8.93 18 30.00 35.0 43.00 60 ▂▇▇▃▂
DailyRate 0 1 815.23 401.12 103 472.50 817.5 1165.75 1499 ▇▇▇▇▇
DistanceFromHome 0 1 9.34 8.14 1 2.00 7.0 14.00 29 ▇▅▂▂▂
Education 0 1 2.90 1.02 1 2.00 3.0 4.00 5 ▂▅▇▆▁
EmployeeCount 0 1 1.00 0.00 1 1.00 1.0 1.00 1 ▁▁▇▁▁
EmployeeNumber 0 1 1029.83 604.79 1 477.25 1039.0 1561.50 2064 ▇▇▇▇▇
EnvironmentSatisfaction 0 1 2.70 1.10 1 2.00 3.0 4.00 4 ▅▆▁▇▇
HourlyRate 0 1 65.61 20.13 30 48.00 66.0 83.00 100 ▇▇▆▇▇
JobInvolvement 0 1 2.72 0.70 1 2.00 3.0 3.00 4 ▁▃▁▇▁
JobLevel 0 1 2.04 1.09 1 1.00 2.0 3.00 5 ▇▇▃▂▁
JobSatisfaction 0 1 2.71 1.11 1 2.00 3.0 4.00 4 ▅▅▁▇▇
MonthlyIncome 0 1 6390.26 4597.70 1081 2839.50 4945.5 8182.00 19999 ▇▅▂▁▁
MonthlyRate 0 1 14325.62 7108.38 2094 8092.00 14074.5 20456.25 26997 ▇▇▇▇▇
NumCompaniesWorked 0 1 2.73 2.52 0 1.00 2.0 4.00 9 ▇▃▂▂▁
PercentSalaryHike 0 1 15.20 3.68 11 12.00 14.0 18.00 25 ▇▅▃▂▁
PerformanceRating 0 1 3.15 0.36 3 3.00 3.0 3.00 4 ▇▁▁▁▂
RelationshipSatisfaction 0 1 2.71 1.10 1 2.00 3.0 4.00 4 ▅▅▁▇▇
StandardHours 0 1 80.00 0.00 80 80.00 80.0 80.00 80 ▁▁▇▁▁
StockOptionLevel 0 1 0.78 0.86 0 0.00 1.0 1.00 3 ▇▇▁▂▁
TotalWorkingYears 0 1 11.05 7.51 0 6.00 10.0 15.00 40 ▇▇▂▁▁
TrainingTimesLastYear 0 1 2.83 1.27 0 2.00 3.0 3.00 6 ▂▇▇▂▃
WorkLifeBalance 0 1 2.78 0.71 1 2.00 3.0 3.00 4 ▁▃▁▇▂
YearsAtCompany 0 1 6.96 6.02 0 3.00 5.0 10.00 40 ▇▃▁▁▁
YearsInCurrentRole 0 1 4.20 3.64 0 2.00 3.0 7.00 18 ▇▃▂▁▁
YearsSinceLastPromotion 0 1 2.17 3.19 0 0.00 1.0 3.00 15 ▇▁▁▁▁
YearsWithCurrManager 0 1 4.14 3.57 0 2.00 3.0 7.00 17 ▇▂▅▁▁
## 'data.frame':    870 obs. of  38 variables:
##  $ ID                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                     : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ Attrition               : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 3 2 3 2 2 3 3 3 2 ...
##  $ DailyRate               : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ Department              : Factor w/ 3 levels "Human Resources",..: 3 2 2 3 2 2 2 3 3 2 ...
##  $ DistanceFromHome        : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education               : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 4 2 3 6 2 4 2 2 6 ...
##  $ EnvironmentSatisfaction : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ Gender                  : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 1 1 2 ...
##  $ HourlyRate              : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement          : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 8 6 5 8 7 5 7 8 9 1 ...
##  $ JobSatisfaction         : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 1 3 3 2 3 1 2 1 2 2 ...
##  $ MonthlyIncome           : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate             : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked      : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ OverTime                : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 2 2 2 1 ...
##  $ PercentSalaryHike       : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating       : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction: int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StockOptionLevel        : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears       : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear   : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance         : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole      : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager    : int  3 9 2 7 3 7 3 0 0 7 ...
##  $ iJobRole                : int  8 6 5 8 7 5 7 8 9 1 ...
##  $ iDepartment             : int  3 2 2 3 2 2 2 3 3 2 ...
##  $ iMaritalStatus          : int  1 3 3 2 3 1 2 1 2 2 ...
##  $ iBusinessTravel         : int  3 3 2 3 2 2 3 3 3 2 ...
##  $ iEducation              : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ iAttrition              : int  1 1 1 1 1 1 1 1 1 1 ...

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Exploratoration into Data #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

##   0%  25%  50%  75% 100% 
##   18   30   35   43   60
##       No      Yes 
## 37.41233 33.78571
##   0%  25%  50%  75% 100% 
##    0    1    2    4    9
##       No      Yes 
## 2.660274 3.078571
##   0%  25%  50%  75% 100% 
##   11   12   14   18   25
##       No      Yes 
## 15.17534 15.32857
##   0%  25%  50%  75% 100% 
##    0    6   10   15   40
##        No       Yes 
## 11.602740  8.185714
##      0%     25%     50%     75%    100% 
##  1081.0  2839.5  4945.5  8182.0 19999.0
##       No      Yes 
## 6702.000 4764.786
##   0%  25%  50%  75% 100% 
##    0    2    3    7   18
##       No      Yes 
## 4.453425 2.907143
##   0%  25%  50%  75% 100% 
##    0    2    3    7   17
##       No      Yes 
## 4.369863 2.942857

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##      0%     25%     50%     75%    100% 
##  1081.0  2839.5  4945.5  8182.0 19999.0
##   0%  25%  50%  75% 100% 
##    0    2    3    7   18
##       No      Yes 
## 4.453425 2.907143
##   0%  25%  50%  75% 100% 
##    0    2    3    7   18
##       No      Yes 
## 4.453425 2.907143
##   0%  25%  50%  75% 100% 
##    0    3    5   10   40
##       No      Yes 
## 7.301370 5.192857
## integer(0)
## 'data.frame':    870 obs. of  43 variables:
##  $ ID                        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                       : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ Attrition                 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ BusinessTravel            : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 3 2 3 2 2 3 3 3 2 ...
##  $ DailyRate                 : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ Department                : Factor w/ 3 levels "Human Resources",..: 3 2 2 3 2 2 2 3 3 2 ...
##  $ DistanceFromHome          : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education                 : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EducationField            : Factor w/ 6 levels "Human Resources",..: 2 4 2 3 6 2 4 2 2 6 ...
##  $ EnvironmentSatisfaction   : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ Gender                    : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 1 1 2 ...
##  $ HourlyRate                : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement            : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                  : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobRole                   : Factor w/ 9 levels "Healthcare Representative",..: 8 6 5 8 7 5 7 8 9 1 ...
##  $ JobSatisfaction           : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MaritalStatus             : Factor w/ 3 levels "Divorced","Married",..: 1 3 3 2 3 1 2 1 2 2 ...
##  $ MonthlyIncome             : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate               : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked        : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ OverTime                  : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 2 2 2 1 ...
##  $ PercentSalaryHike         : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating         : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction  : int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StockOptionLevel          : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears         : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear     : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance           : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany            : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole        : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion   : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager      : int  3 9 2 7 3 7 3 0 0 7 ...
##  $ iJobRole                  : int  8 6 5 8 7 5 7 8 9 1 ...
##  $ iDepartment               : int  3 2 2 3 2 2 2 3 3 2 ...
##  $ iMaritalStatus            : int  1 3 3 2 3 1 2 1 2 2 ...
##  $ iBusinessTravel           : int  3 3 2 3 2 2 3 3 3 2 ...
##  $ iEducation                : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ iAttrition                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Age.Group                 : Factor w/ 4 levels "Senior","Undergrad",..: 4 3 3 4 4 4 3 3 4 4 ...
##  $ MonthlyIncome.Group       : Factor w/ 4 levels "Above.Avg","Avg",..: 2 3 3 3 2 3 4 1 4 1 ...
##  $ YearsWithCurrManager.Group: Factor w/ 4 levels "2thru4","4thru6",..: 1 3 4 3 1 3 1 4 4 3 ...
##  $ YearsInCurrentRole.Group  : Factor w/ 4 levels "5&above","Lessthan2",..: 2 1 2 1 3 1 2 2 2 2 ...
##  $ YearsAtCompany.Group      : Factor w/ 4 levels "10&above","3thru5",..: 2 1 4 1 3 3 2 4 4 3 ...
## [1] 43

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Prepare data for Modeling Train Test SPlit #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## [1] 870  17
## [1] 609  17
## [1] 261  17
## 'data.frame':    609 obs. of  17 variables:
##  $ Attrition              : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Age.Group              : Factor w/ 4 levels "Senior","Undergrad",..: 3 3 4 4 4 3 1 3 4 4 ...
##  $ DistanceFromHome       : int  9 11 8 24 5 2 29 1 10 3 ...
##  $ MonthlyIncome.Group    : Factor w/ 4 levels "Above.Avg","Avg",..: 3 4 4 2 2 3 2 1 2 1 ...
##  $ TotalWorkingYears      : int  24 5 8 6 7 20 9 10 5 6 ...
##  $ OverTime               : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 1 1 2 1 ...
##  $ YearsAtCompany         : int  1 2 3 4 6 19 6 9 5 2 ...
##  $ StockOptionLevel       : int  0 1 0 0 2 0 0 0 1 0 ...
##  $ JobRole                : Factor w/ 9 levels "Healthcare Representative",..: 4 3 2 3 7 1 7 1 7 8 ...
##  $ JobLevel               : int  5 1 1 1 1 3 1 2 1 2 ...
##  $ JobInvolvement         : int  2 3 4 3 4 3 3 4 3 1 ...
##  $ Education              : int  2 4 2 3 2 4 3 4 4 2 ...
##  $ EnvironmentSatisfaction: int  4 4 4 4 1 3 3 2 4 4 ...
##  $ WorkLifeBalance        : int  3 3 3 3 2 3 2 2 4 3 ...
##  $ YearsInCurrentRole     : int  0 2 2 3 2 6 5 7 3 2 ...
##  $ YearsAtCompany.Group   : Factor w/ 4 levels "10&above","3thru5",..: 4 4 4 2 3 1 3 3 2 4 ...
##  $ YearsWithCurrManager   : int  1 2 2 2 5 8 3 8 0 2 ...

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # find important Variables #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
##                            No       Yes
## Age.Group           0.5103284 0.5103284
## DistanceFromHome    0.5786201 0.5786201
## MonthlyIncome.Group 0.6164635 0.6164635
## TotalWorkingYears   0.6683950 0.6683950
## OverTime            0.6629622 0.6629622
## YearsAtCompany      0.6513220 0.6513220
##                                No       Yes
## Age.Group               0.5103284 0.5103284
## DistanceFromHome        0.5786201 0.5786201
## MonthlyIncome.Group     0.6164635 0.6164635
## TotalWorkingYears       0.6683950 0.6683950
## OverTime                0.6629622 0.6629622
## YearsAtCompany          0.6513220 0.6513220
## StockOptionLevel        0.6624561 0.6624561
## JobRole                 0.5880397 0.5880397
## JobLevel                0.6551126 0.6551126
## JobInvolvement          0.6287647 0.6287647
## Education               0.5766267 0.5766267
## EnvironmentSatisfaction 0.5530366 0.5530366
## WorkLifeBalance         0.5292605 0.5292605
## YearsInCurrentRole      0.6472423 0.6472423
## YearsAtCompany.Group    0.6029952 0.6029952
## YearsWithCurrManager    0.6325759 0.6325759
##  [1] 0.6683950 0.6629622 0.6624561 0.6551126 0.6513220 0.6472423 0.6325759
##  [8] 0.6287647 0.6164635 0.6029952 0.5880397 0.5786201 0.5766267 0.5530366
## [15] 0.5292605 0.5103284

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Begin Modeling #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 1. Support Vector Model #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  215  45
##        Yes   0   1
##                                           
##                Accuracy : 0.8276          
##                  95% CI : (0.7762, 0.8714)
##     No Information Rate : 0.8238          
##     P-Value [Acc > NIR] : 0.4746          
##                                           
##                   Kappa : 0.0353          
##                                           
##  Mcnemar's Test P-Value : 5.412e-11       
##                                           
##             Sensitivity : 1.00000         
##             Specificity : 0.02174         
##          Pos Pred Value : 0.82692         
##          Neg Pred Value : 1.00000         
##              Prevalence : 0.82375         
##          Detection Rate : 0.82375         
##    Detection Prevalence : 0.99617         
##       Balanced Accuracy : 0.51087         
##                                           
##        'Positive' Class : No              
## 

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 2. Model Decesion Tree #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Attrition is 0.00 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Above.Avg or Avg or High
##     JobRole is Healthcare Representative or Human Resources or Laboratory Technician or Manufacturing Director or Research Director or Research Scientist
##     YearsAtCompany.Group is 10&above or 3thru5
## 
## Attrition is 0.06 when
##     OverTime is Yes
##     StockOptionLevel >= 1
##     JobLevel >= 2
## 
## Attrition is 0.07 when
##     OverTime is No
##     TotalWorkingYears >= 3
## 
## Attrition is 0.08 when
##     OverTime is No
##     StockOptionLevel >= 1
##     TotalWorkingYears < 3
## 
## Attrition is 0.15 when
##     OverTime is Yes
##     StockOptionLevel >= 1
##     DistanceFromHome < 13
##     JobLevel < 2
## 
## Attrition is 0.19 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Above.Avg or Avg or High
##     JobRole is Healthcare Representative or Human Resources or Laboratory Technician or Manufacturing Director or Research Director or Research Scientist
##     YearsAtCompany.Group is 5thru10 or LessThan3
##     JobInvolvement >= 3
## 
## Attrition is 0.38 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Above.Avg or Avg or High
##     JobRole is Manager or Sales Executive or Sales Representative
##     DistanceFromHome < 8
## 
## Attrition is 0.38 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Low
##     Age.Group is Veteran
## 
## Attrition is 0.62 when
##     OverTime is No
##     StockOptionLevel < 1
##     TotalWorkingYears < 3
## 
## Attrition is 0.67 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Above.Avg or Avg or High
##     JobRole is Healthcare Representative or Human Resources or Laboratory Technician or Manufacturing Director or Research Director or Research Scientist
##     YearsAtCompany.Group is 5thru10 or LessThan3
##     JobInvolvement < 3
## 
## Attrition is 0.75 when
##     OverTime is Yes
##     StockOptionLevel >= 1
##     DistanceFromHome >= 13
##     JobLevel < 2
## 
## Attrition is 0.88 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Above.Avg or Avg or High
##     JobRole is Manager or Sales Executive or Sales Representative
##     DistanceFromHome >= 8
## 
## Attrition is 0.89 when
##     OverTime is Yes
##     StockOptionLevel < 1
##     MonthlyIncome.Group is Low
##     Age.Group is Undergrad or Young-Professional

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  201  30
##        Yes  14  16
##                                           
##                Accuracy : 0.8314          
##                  95% CI : (0.7804, 0.8748)
##     No Information Rate : 0.8238          
##     P-Value [Acc > NIR] : 0.41022         
##                                           
##                   Kappa : 0.3275          
##                                           
##  Mcnemar's Test P-Value : 0.02374         
##                                           
##             Sensitivity : 0.9349          
##             Specificity : 0.3478          
##          Pos Pred Value : 0.8701          
##          Neg Pred Value : 0.5333          
##              Prevalence : 0.8238          
##          Detection Rate : 0.7701          
##    Detection Prevalence : 0.8851          
##       Balanced Accuracy : 0.6414          
##                                           
##        'Positive' Class : No              
## 

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 3. KNN Model #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  209  43
##        Yes   6   3
##                                           
##                Accuracy : 0.8123          
##                  95% CI : (0.7595, 0.8578)
##     No Information Rate : 0.8238          
##     P-Value [Acc > NIR] : 0.7192          
##                                           
##                   Kappa : 0.0546          
##                                           
##  Mcnemar's Test P-Value : 2.706e-07       
##                                           
##             Sensitivity : 0.97209         
##             Specificity : 0.06522         
##          Pos Pred Value : 0.82937         
##          Neg Pred Value : 0.33333         
##              Prevalence : 0.82375         
##          Detection Rate : 0.80077         
##    Detection Prevalence : 0.96552         
##       Balanced Accuracy : 0.51866         
##                                           
##        'Positive' Class : No              
## 

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Hyper Parameter tunning #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union