Miron B. Kursa, Witold R. Rudnicki, Feature Selection with the Boruta Package, Journal of Statistical Software, Vol. 36, Issue 11, Sept 2010.
2015-03-08 2356
library(Boruta)
Loading required package: randomForest
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
Loading required package: rFerns
dim(iris)
[1] 150 5
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Bor.iris <- Boruta(Species~.,data=iris,doTrace=2)
1. run of importance source...
2. run of importance source...
3. run of importance source...
4. run of importance source...
5. run of importance source...
6. run of importance source...
7. run of importance source...
8. run of importance source...
9. run of importance source...
Confirmed 4 attributes: Petal.Length, Petal.Width, Sepal.Length, Sepal.Width.
print(Bor.iris);
Boruta performed 9 iterations in 0.5438509 secs.
4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
No attributes deemed unimportant.
print(getSelectedAttributes(Bor.iris))
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
set.seed(777);
iris.extended <- data.frame(iris,apply(iris[,-5],2,sample))
names(iris.extended)[6:9] <- paste("Nonsense",1:4,sep="")
dim(iris.extended)
[1] 150 9
head(iris.extended)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Nonsense1
1 5.1 3.5 1.4 0.2 setosa 6.3
2 4.9 3.0 1.4 0.2 setosa 6.1
3 4.7 3.2 1.3 0.2 setosa 6.4
4 4.6 3.1 1.5 0.2 setosa 6.3
5 5.0 3.6 1.4 0.2 setosa 5.8
6 5.4 3.9 1.7 0.4 setosa 4.9
Nonsense2 Nonsense3 Nonsense4
1 3.6 4.0 1.5
2 3.4 1.4 0.4
3 3.7 6.6 1.2
4 2.8 3.3 0.2
5 2.4 3.8 0.2
6 3.2 5.0 1.4
Boruta.iris.extended <- Boruta(Species~., data=iris.extended, doTrace=2)
1. run of importance source...
2. run of importance source...
3. run of importance source...
4. run of importance source...
5. run of importance source...
6. run of importance source...
7. run of importance source...
8. run of importance source...
9. run of importance source...
10. run of importance source...
Confirmed 4 attributes: Petal.Length, Petal.Width, Sepal.Length, Sepal.Width.
Rejected 4 attributes: Nonsense1, Nonsense2, Nonsense3, Nonsense4.
print(Boruta.iris.extended)
Boruta performed 10 iterations in 0.7415659 secs.
4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
4 attributes confirmed unimportant: Nonsense1, Nonsense2,
Nonsense3, Nonsense4.
Boruta.ferns.irisE <- Boruta(Species~., data=iris.extended, getImp=getImpFerns)
print(Boruta.ferns.irisE)
Boruta performed 24 iterations in 0.4926431 secs.
4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
4 attributes confirmed unimportant: Nonsense1, Nonsense2,
Nonsense3, Nonsense4.
library(mlbench)
data(Ozone)
ozo <- na.omit(Ozone)
dim(ozo)
[1] 203 13
head(ozo)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
5 1 5 1 5 5760 3 51 54 45.32 1450 25 57.02 60
6 1 6 2 6 5720 4 69 35 49.64 1568 15 53.78 60
7 1 7 3 4 5790 6 19 45 46.40 2631 -33 54.14 100
8 1 8 4 4 5790 3 25 55 52.70 554 -28 64.76 250
9 1 9 5 6 5700 3 73 41 48.02 2083 23 52.52 120
12 1 12 1 6 5720 3 44 51 54.32 111 9 63.14 150
Boruta.ozone <- Boruta(V4~., data=ozo, doTrace=2)
1. run of importance source...
2. run of importance source...
3. run of importance source...
4. run of importance source...
5. run of importance source...
6. run of importance source...
7. run of importance source...
8. run of importance source...
9. run of importance source...
10. run of importance source...
11. run of importance source...
Confirmed 8 attributes: V1, V10, V11, V12, V5 and 3 more.
Rejected 2 attributes: V2, V3.
12. run of importance source...
13. run of importance source...
14. run of importance source...
15. run of importance source...
Rejected 1 attributes: V6.
16. run of importance source...
17. run of importance source...
18. run of importance source...
Confirmed 1 attributes: V13.
cat('Random forest run on all attributes:\n')
Random forest run on all attributes:
attStats(Boruta.ozone)
meanZ medianZ minZ maxZ normHits decision
V1 12.3428940 11.9139695 10.160204 14.48634559 1.00000000 Confirmed
V2 -3.0707839 -3.2246235 -4.377842 -1.14621754 0.00000000 Rejected
V3 -1.1109717 -1.1676011 -3.009919 0.02966894 0.00000000 Rejected
V5 6.6882642 6.6477693 5.777770 8.19098434 1.00000000 Confirmed
V6 -0.1250837 -0.1822801 -2.311239 1.87249337 0.05555556 Rejected
V7 8.4799403 8.1277497 6.215962 10.95424927 1.00000000 Confirmed
V8 17.9399462 18.0489732 16.409877 19.01645870 1.00000000 Confirmed
V9 20.0051741 19.8764410 17.906561 23.32511637 1.00000000 Confirmed
V10 8.8794936 8.8031797 7.457024 10.84540473 1.00000000 Confirmed
V11 9.1148697 8.5780434 5.682347 12.23283564 1.00000000 Confirmed
V12 13.6171744 13.8619421 11.570615 15.88157343 1.00000000 Confirmed
V13 6.7598285 6.8431383 5.032704 8.45258710 0.88888889 Confirmed
getSelectedAttributes(Boruta.ozone)
[1] "V1" "V5" "V7" "V8" "V9" "V10" "V11" "V12" "V13"
plot(Boruta.ozone, las=2, cex.axis=0.75)
print(randomForest(V4~., data=ozo))
Call:
randomForest(formula = V4 ~ ., data = ozo)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 4
Mean of squared residuals: 21.69685
% Var explained: 67.49
cat('Random forest run only on confirmed attributes:\n')
Random forest run only on confirmed attributes:
print(randomForest(ozo[,getSelectedAttributes(Boruta.ozone)],ozo$V4))
Call:
randomForest(x = ozo[, getSelectedAttributes(Boruta.ozone)], y = ozo$V4)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 3
Mean of squared residuals: 16.99056
% Var explained: 74.54
library(mlbench)
data(HouseVotes84)
hvo <- na.omit(HouseVotes84)
dim(hvo)
[1] 232 17
head(hvo)
Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
6 democrat n y y n y y n n n n n n y y y y
9 republican n y n y y y n n n n n y y y n y
20 democrat y y y n n n y y y n y n n n y y
24 democrat y y y n n n y y y n n n n n y y
26 democrat y n y n n n y y y y n n n n y y
27 democrat y n y n n n y y y n y n n n y y
Bor.hvo <- Boruta(Class~., data=hvo, doTrace=2)
1. run of importance source...
2. run of importance source...
3. run of importance source...
4. run of importance source...
5. run of importance source...
6. run of importance source...
7. run of importance source...
8. run of importance source...
9. run of importance source...
10. run of importance source...
11. run of importance source...
Confirmed 11 attributes: V11, V12, V13, V14, V15 and 6 more.
12. run of importance source...
13. run of importance source...
14. run of importance source...
15. run of importance source...
Rejected 1 attributes: V6.
16. run of importance source...
17. run of importance source...
18. run of importance source...
19. run of importance source...
20. run of importance source...
21. run of importance source...
22. run of importance source...
Rejected 1 attributes: V16.
23. run of importance source...
24. run of importance source...
25. run of importance source...
26. run of importance source...
27. run of importance source...
28. run of importance source...
29. run of importance source...
30. run of importance source...
31. run of importance source...
32. run of importance source...
33. run of importance source...
34. run of importance source...
35. run of importance source...
36. run of importance source...
37. run of importance source...
38. run of importance source...
39. run of importance source...
40. run of importance source...
41. run of importance source...
42. run of importance source...
43. run of importance source...
44. run of importance source...
45. run of importance source...
46. run of importance source...
47. run of importance source...
Rejected 2 attributes: V1, V10.
48. run of importance source...
49. run of importance source...
50. run of importance source...
51. run of importance source...
52. run of importance source...
53. run of importance source...
54. run of importance source...
55. run of importance source...
56. run of importance source...
57. run of importance source...
58. run of importance source...
59. run of importance source...
60. run of importance source...
61. run of importance source...
62. run of importance source...
63. run of importance source...
Rejected 1 attributes: V2.
print(Bor.hvo)
Boruta performed 63 iterations in 7.438322 secs.
11 attributes confirmed important: V11, V12, V13, V14, V15 and 6
more.
5 attributes confirmed unimportant: V1, V10, V16, V2, V6.
plot(Bor.hvo, las=2, cex.axis=0.75)
library(mlbench)
data(Sonar)
Bor.son <- Boruta(Class~.,data=Sonar,doTrace=2)
1. run of importance source...
2. run of importance source...
3. run of importance source...
4. run of importance source...
5. run of importance source...
6. run of importance source...
7. run of importance source...
8. run of importance source...
9. run of importance source...
10. run of importance source...
11. run of importance source...
12. run of importance source...
13. run of importance source...
Confirmed 11 attributes: V10, V11, V12, V21, V36 and 6 more.
Rejected 6 attributes: V3, V55, V56, V57, V60 and 1 more.
14. run of importance source...
15. run of importance source...
16. run of importance source...
17. run of importance source...
Confirmed 8 attributes: V13, V16, V20, V27, V28 and 3 more.
Rejected 4 attributes: V29, V40, V50, V53.
18. run of importance source...
19. run of importance source...
20. run of importance source...
21. run of importance source...
Confirmed 6 attributes: V1, V15, V17, V23, V46 and 1 more.
Rejected 1 attributes: V58.
22. run of importance source...
23. run of importance source...
24. run of importance source...
Rejected 2 attributes: V24, V41.
25. run of importance source...
26. run of importance source...
27. run of importance source...
Rejected 1 attributes: V33.
28. run of importance source...
29. run of importance source...
30. run of importance source...
31. run of importance source...
32. run of importance source...
33. run of importance source...
Confirmed 1 attributes: V5.
34. run of importance source...
35. run of importance source...
36. run of importance source...
Confirmed 1 attributes: V22.
37. run of importance source...
38. run of importance source...
39. run of importance source...
40. run of importance source...
41. run of importance source...
42. run of importance source...
Confirmed 1 attributes: V18.
Rejected 1 attributes: V38.
43. run of importance source...
44. run of importance source...
45. run of importance source...
46. run of importance source...
47. run of importance source...
48. run of importance source...
Rejected 1 attributes: V6.
49. run of importance source...
50. run of importance source...
51. run of importance source...
52. run of importance source...
53. run of importance source...
Confirmed 1 attributes: V31.
54. run of importance source...
55. run of importance source...
56. run of importance source...
Confirmed 1 attributes: V35.
57. run of importance source...
58. run of importance source...
59. run of importance source...
60. run of importance source...
61. run of importance source...
62. run of importance source...
63. run of importance source...
64. run of importance source...
65. run of importance source...
66. run of importance source...
67. run of importance source...
68. run of importance source...
69. run of importance source...
70. run of importance source...
71. run of importance source...
Confirmed 1 attributes: V19.
72. run of importance source...
73. run of importance source...
74. run of importance source...
75. run of importance source...
76. run of importance source...
77. run of importance source...
78. run of importance source...
79. run of importance source...
80. run of importance source...
81. run of importance source...
82. run of importance source...
83. run of importance source...
84. run of importance source...
85. run of importance source...
86. run of importance source...
87. run of importance source...
88. run of importance source...
89. run of importance source...
90. run of importance source...
91. run of importance source...
92. run of importance source...
93. run of importance source...
94. run of importance source...
95. run of importance source...
96. run of importance source...
97. run of importance source...
98. run of importance source...
99. run of importance source...
print(Bor.son)
Boruta performed 99 iterations in 43.115 secs.
31 attributes confirmed important: V1, V10, V11, V12, V13 and 26
more.
16 attributes confirmed unimportant: V24, V29, V3, V33, V38 and
11 more.
13 tentative attributes left: V14, V2, V25, V26, V30 and 8 more.
plotImpHistory(Bor.son)
stats <- attStats(Bor.son)
print(stats)
meanZ medianZ minZ maxZ normHits decision
V1 3.6530401 3.6444853 1.0475925 5.607688 0.77777778 Confirmed
V2 2.5164998 2.4508908 0.8796985 4.228693 0.41414141 Tentative
V3 0.9728750 0.8289368 -0.1268530 2.498122 0.00000000 Rejected
V4 5.2293782 5.2200279 3.3027666 7.235532 0.96969697 Confirmed
V5 3.7991123 3.7885119 1.4984098 5.545281 0.82828283 Confirmed
V6 1.7588173 1.5946677 -0.7450579 4.082650 0.11111111 Rejected
V7 0.8212300 0.8330158 -1.0083771 1.931514 0.00000000 Rejected
V8 2.5420991 2.4982483 0.5594706 5.257646 0.51515152 Tentative
V9 9.2427887 9.1948693 7.7791041 10.889831 1.00000000 Confirmed
V10 8.3376870 8.4323230 6.4904884 9.942388 1.00000000 Confirmed
V11 11.6021924 11.6211270 9.4979233 13.915700 1.00000000 Confirmed
V12 10.5115664 10.4512945 8.9831347 11.919861 1.00000000 Confirmed
V13 5.5651409 5.5388691 4.0350668 7.437915 0.98989899 Confirmed
V14 2.4597012 2.4310227 -0.5710810 4.074063 0.43434343 Tentative
V15 4.1801372 4.2442941 2.2013720 5.757296 0.89898990 Confirmed
V16 4.9904459 4.9985006 2.8683712 6.595888 0.97979798 Confirmed
V17 4.7842586 4.8574533 2.7231177 6.443001 0.96969697 Confirmed
V18 4.0492518 4.1422510 0.9480474 6.214764 0.87878788 Confirmed
V19 3.2088130 3.1912297 1.1842832 4.954614 0.70707071 Confirmed
V20 5.1132866 5.1194746 3.3387492 7.221378 0.97979798 Confirmed
V21 5.8193289 5.8004745 4.2212633 7.852021 1.00000000 Confirmed
V22 3.4730057 3.5076470 1.2618156 5.701728 0.80808081 Confirmed
V23 3.9977221 3.9749232 2.0282683 5.441874 0.87878788 Confirmed
V24 1.9017648 1.9363122 0.6611199 3.001546 0.03030303 Rejected
V25 2.1458338 2.2710539 -0.9757495 4.403309 0.35353535 Tentative
V26 3.1852047 3.2345693 0.9420935 4.759913 0.65656566 Tentative
V27 4.8434860 4.9352506 2.7065727 6.513909 0.97979798 Confirmed
V28 5.8507918 5.8376752 4.1993870 7.235571 0.98989899 Confirmed
V29 1.6785179 1.7669140 0.3409043 3.091791 0.01010101 Rejected
V30 2.5058045 2.5717359 0.6134807 4.713400 0.50505051 Tentative
V31 3.6770831 3.6133115 1.8553407 5.695155 0.79797980 Confirmed
V32 2.7319080 2.6865332 0.3348958 4.889188 0.46464646 Tentative
V33 1.7417104 1.6706347 -0.6025743 3.291508 0.04040404 Rejected
V34 2.2755731 2.2638458 -0.1782045 4.338294 0.38383838 Tentative
V35 3.2136774 3.1821189 1.0815490 4.690416 0.69696970 Confirmed
V36 6.9834258 7.0275938 5.3932208 8.634432 1.00000000 Confirmed
V37 5.7356023 5.7894117 3.9050333 7.784060 1.00000000 Confirmed
V38 1.6488811 1.8484526 -1.2942596 3.213088 0.09090909 Rejected
V39 2.5708872 2.5723870 -1.6331352 4.628588 0.52525253 Tentative
V40 1.5908988 1.7276400 -0.3855557 2.929240 0.01010101 Rejected
V41 1.1441840 1.2501144 -1.2553489 2.935398 0.03030303 Rejected
V42 2.2455157 2.2416167 -1.0185840 4.543826 0.34343434 Tentative
V43 2.9948703 3.0628737 0.6409093 4.748722 0.59595960 Tentative
V44 4.0675020 4.1099889 0.9265881 5.924562 0.90909091 Confirmed
V45 6.6879136 6.7006100 4.2407357 7.801320 1.00000000 Confirmed
V46 5.0840104 5.0049570 3.1328679 7.025135 0.96969697 Confirmed
V47 6.2091197 6.2263018 4.3085419 8.018038 0.98989899 Confirmed
V48 7.4061335 7.4382954 5.6249894 8.724833 1.00000000 Confirmed
V49 7.2091690 7.1748353 5.5874753 8.969070 1.00000000 Confirmed
V50 0.9282474 0.9112855 -0.7401568 3.006821 0.01010101 Rejected
V51 4.4947075 4.5465692 2.5353791 6.398363 0.94949495 Confirmed
V52 4.4478706 4.4574139 2.6404774 6.340599 0.95959596 Confirmed
V53 1.1062882 1.2696305 -0.8989411 3.201559 0.01010101 Rejected
V54 2.5583742 2.5540529 0.2077052 4.848674 0.45454545 Tentative
V55 0.7711991 0.6657143 -1.1346994 2.472689 0.00000000 Rejected
V56 0.3004477 0.1389055 -0.5401898 1.767219 0.00000000 Rejected
V57 0.2224416 0.1632599 -0.8937907 1.580834 0.00000000 Rejected
V58 1.1226998 1.2754091 -0.9577494 2.990236 0.02020202 Rejected
V59 2.4163296 2.4651994 0.1615830 4.459343 0.40404040 Tentative
V60 0.6233877 0.4664053 -0.7282950 1.994414 0.00000000 Rejected
plot(normHits~meanZ, col=stats$decision, data=stats)
plot(Bor.son,sort=FALSE, las=2, cex.axis=0.75)
packageVersion("Boruta")
## [1] '4.0.0'
efg
2015-03-08 2357