RNotes

Miron B. Kursa, Witold R. Rudnicki, Feature Selection with the Boruta Package, Journal of Statistical Software, Vol. 36, Issue 11, Sept 2010.


2015-03-08 2356

library(Boruta)
Loading required package: randomForest
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
Loading required package: rFerns

Iris dataset

dim(iris)
[1] 150   5
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Takes some time, so be patient

Bor.iris <- Boruta(Species~.,data=iris,doTrace=2)
 1. run of importance source...
 2. run of importance source...
 3. run of importance source...
 4. run of importance source...
 5. run of importance source...
 6. run of importance source...
 7. run of importance source...
 8. run of importance source...
 9. run of importance source...
Confirmed 4 attributes: Petal.Length, Petal.Width, Sepal.Length, Sepal.Width.
print(Bor.iris);
Boruta performed 9 iterations in 0.5438509 secs.
 4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
 No attributes deemed unimportant.
print(getSelectedAttributes(Bor.iris))
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 

Add some nonsense attributes to iris dataset by shuffling original attributes

set.seed(777);
iris.extended <- data.frame(iris,apply(iris[,-5],2,sample))
names(iris.extended)[6:9] <- paste("Nonsense",1:4,sep="")

dim(iris.extended)
[1] 150   9
head(iris.extended)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Nonsense1
1          5.1         3.5          1.4         0.2  setosa       6.3
2          4.9         3.0          1.4         0.2  setosa       6.1
3          4.7         3.2          1.3         0.2  setosa       6.4
4          4.6         3.1          1.5         0.2  setosa       6.3
5          5.0         3.6          1.4         0.2  setosa       5.8
6          5.4         3.9          1.7         0.4  setosa       4.9
  Nonsense2 Nonsense3 Nonsense4
1       3.6       4.0       1.5
2       3.4       1.4       0.4
3       3.7       6.6       1.2
4       2.8       3.3       0.2
5       2.4       3.8       0.2
6       3.2       5.0       1.4

Run Boruta on this data

Boruta.iris.extended <- Boruta(Species~., data=iris.extended, doTrace=2)
 1. run of importance source...
 2. run of importance source...
 3. run of importance source...
 4. run of importance source...
 5. run of importance source...
 6. run of importance source...
 7. run of importance source...
 8. run of importance source...
 9. run of importance source...
 10. run of importance source...
Confirmed 4 attributes: Petal.Length, Petal.Width, Sepal.Length, Sepal.Width.
Rejected 4 attributes: Nonsense1, Nonsense2, Nonsense3, Nonsense4.

Nonsense attributes should be rejected

print(Boruta.iris.extended)
Boruta performed 10 iterations in 0.7415659 secs.
 4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
 4 attributes confirmed unimportant: Nonsense1, Nonsense2,
Nonsense3, Nonsense4.

Boruta using rFerns’ importance

Boruta.ferns.irisE <- Boruta(Species~., data=iris.extended, getImp=getImpFerns)
print(Boruta.ferns.irisE)
Boruta performed 24 iterations in 0.4926431 secs.
 4 attributes confirmed important: Petal.Length, Petal.Width,
Sepal.Length, Sepal.Width.
 4 attributes confirmed unimportant: Nonsense1, Nonsense2,
Nonsense3, Nonsense4.

Ozone data from mlbench package

library(mlbench)
data(Ozone)
ozo <- na.omit(Ozone)
dim(ozo)
[1] 203  13
head(ozo)
   V1 V2 V3 V4   V5 V6 V7 V8    V9  V10 V11   V12 V13
5   1  5  1  5 5760  3 51 54 45.32 1450  25 57.02  60
6   1  6  2  6 5720  4 69 35 49.64 1568  15 53.78  60
7   1  7  3  4 5790  6 19 45 46.40 2631 -33 54.14 100
8   1  8  4  4 5790  3 25 55 52.70  554 -28 64.76 250
9   1  9  5  6 5700  3 73 41 48.02 2083  23 52.52 120
12  1 12  1  6 5720  3 44 51 54.32  111   9 63.14 150

Takes some time, so be patient

Boruta.ozone <- Boruta(V4~., data=ozo, doTrace=2)
 1. run of importance source...
 2. run of importance source...
 3. run of importance source...
 4. run of importance source...
 5. run of importance source...
 6. run of importance source...
 7. run of importance source...
 8. run of importance source...
 9. run of importance source...
 10. run of importance source...
 11. run of importance source...
Confirmed 8 attributes: V1, V10, V11, V12, V5 and 3 more.
Rejected 2 attributes: V2, V3.
 12. run of importance source...
 13. run of importance source...
 14. run of importance source...
 15. run of importance source...
Rejected 1 attributes: V6.
 16. run of importance source...
 17. run of importance source...
 18. run of importance source...
Confirmed 1 attributes: V13.
cat('Random forest run on all attributes:\n')
Random forest run on all attributes:
attStats(Boruta.ozone)
         meanZ    medianZ      minZ        maxZ   normHits  decision
V1  12.3428940 11.9139695 10.160204 14.48634559 1.00000000 Confirmed
V2  -3.0707839 -3.2246235 -4.377842 -1.14621754 0.00000000  Rejected
V3  -1.1109717 -1.1676011 -3.009919  0.02966894 0.00000000  Rejected
V5   6.6882642  6.6477693  5.777770  8.19098434 1.00000000 Confirmed
V6  -0.1250837 -0.1822801 -2.311239  1.87249337 0.05555556  Rejected
V7   8.4799403  8.1277497  6.215962 10.95424927 1.00000000 Confirmed
V8  17.9399462 18.0489732 16.409877 19.01645870 1.00000000 Confirmed
V9  20.0051741 19.8764410 17.906561 23.32511637 1.00000000 Confirmed
V10  8.8794936  8.8031797  7.457024 10.84540473 1.00000000 Confirmed
V11  9.1148697  8.5780434  5.682347 12.23283564 1.00000000 Confirmed
V12 13.6171744 13.8619421 11.570615 15.88157343 1.00000000 Confirmed
V13  6.7598285  6.8431383  5.032704  8.45258710 0.88888889 Confirmed
getSelectedAttributes(Boruta.ozone)
[1] "V1"  "V5"  "V7"  "V8"  "V9"  "V10" "V11" "V12" "V13"
plot(Boruta.ozone, las=2, cex.axis=0.75)

print(randomForest(V4~., data=ozo))

Call:
 randomForest(formula = V4 ~ ., data = ozo) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 4

          Mean of squared residuals: 21.69685
                    % Var explained: 67.49
cat('Random forest run only on confirmed attributes:\n')
Random forest run only on confirmed attributes:
print(randomForest(ozo[,getSelectedAttributes(Boruta.ozone)],ozo$V4))

Call:
 randomForest(x = ozo[, getSelectedAttributes(Boruta.ozone)],      y = ozo$V4) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 3

          Mean of squared residuals: 16.99056
                    % Var explained: 74.54

Boruta on the HouseVotes84 data from mlbench

library(mlbench)
data(HouseVotes84)

hvo <- na.omit(HouseVotes84)
dim(hvo)
[1] 232  17
head(hvo)
        Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
6    democrat  n  y  y  n  y  y  n  n  n   n   n   n   y   y   y   y
9  republican  n  y  n  y  y  y  n  n  n   n   n   y   y   y   n   y
20   democrat  y  y  y  n  n  n  y  y  y   n   y   n   n   n   y   y
24   democrat  y  y  y  n  n  n  y  y  y   n   n   n   n   n   y   y
26   democrat  y  n  y  n  n  n  y  y  y   y   n   n   n   n   y   y
27   democrat  y  n  y  n  n  n  y  y  y   n   y   n   n   n   y   y

Takes some time, so be patient

Bor.hvo <- Boruta(Class~., data=hvo, doTrace=2)
 1. run of importance source...
 2. run of importance source...
 3. run of importance source...
 4. run of importance source...
 5. run of importance source...
 6. run of importance source...
 7. run of importance source...
 8. run of importance source...
 9. run of importance source...
 10. run of importance source...
 11. run of importance source...
Confirmed 11 attributes: V11, V12, V13, V14, V15 and 6 more.
 12. run of importance source...
 13. run of importance source...
 14. run of importance source...
 15. run of importance source...
Rejected 1 attributes: V6.
 16. run of importance source...
 17. run of importance source...
 18. run of importance source...
 19. run of importance source...
 20. run of importance source...
 21. run of importance source...
 22. run of importance source...
Rejected 1 attributes: V16.
 23. run of importance source...
 24. run of importance source...
 25. run of importance source...
 26. run of importance source...
 27. run of importance source...
 28. run of importance source...
 29. run of importance source...
 30. run of importance source...
 31. run of importance source...
 32. run of importance source...
 33. run of importance source...
 34. run of importance source...
 35. run of importance source...
 36. run of importance source...
 37. run of importance source...
 38. run of importance source...
 39. run of importance source...
 40. run of importance source...
 41. run of importance source...
 42. run of importance source...
 43. run of importance source...
 44. run of importance source...
 45. run of importance source...
 46. run of importance source...
 47. run of importance source...
Rejected 2 attributes: V1, V10.
 48. run of importance source...
 49. run of importance source...
 50. run of importance source...
 51. run of importance source...
 52. run of importance source...
 53. run of importance source...
 54. run of importance source...
 55. run of importance source...
 56. run of importance source...
 57. run of importance source...
 58. run of importance source...
 59. run of importance source...
 60. run of importance source...
 61. run of importance source...
 62. run of importance source...
 63. run of importance source...
Rejected 1 attributes: V2.
print(Bor.hvo)
Boruta performed 63 iterations in 7.438322 secs.
 11 attributes confirmed important: V11, V12, V13, V14, V15 and 6
more.
 5 attributes confirmed unimportant: V1, V10, V16, V2, V6.
plot(Bor.hvo, las=2, cex.axis=0.75)

Boruta on the Sonar data from mlbench

library(mlbench)
data(Sonar)

Takes some time, so be patient

Bor.son <- Boruta(Class~.,data=Sonar,doTrace=2)
 1. run of importance source...
 2. run of importance source...
 3. run of importance source...
 4. run of importance source...
 5. run of importance source...
 6. run of importance source...
 7. run of importance source...
 8. run of importance source...
 9. run of importance source...
 10. run of importance source...
 11. run of importance source...
 12. run of importance source...
 13. run of importance source...
Confirmed 11 attributes: V10, V11, V12, V21, V36 and 6 more.
Rejected 6 attributes: V3, V55, V56, V57, V60 and 1 more.
 14. run of importance source...
 15. run of importance source...
 16. run of importance source...
 17. run of importance source...
Confirmed 8 attributes: V13, V16, V20, V27, V28 and 3 more.
Rejected 4 attributes: V29, V40, V50, V53.
 18. run of importance source...
 19. run of importance source...
 20. run of importance source...
 21. run of importance source...
Confirmed 6 attributes: V1, V15, V17, V23, V46 and 1 more.
Rejected 1 attributes: V58.
 22. run of importance source...
 23. run of importance source...
 24. run of importance source...
Rejected 2 attributes: V24, V41.
 25. run of importance source...
 26. run of importance source...
 27. run of importance source...
Rejected 1 attributes: V33.
 28. run of importance source...
 29. run of importance source...
 30. run of importance source...
 31. run of importance source...
 32. run of importance source...
 33. run of importance source...
Confirmed 1 attributes: V5.
 34. run of importance source...
 35. run of importance source...
 36. run of importance source...
Confirmed 1 attributes: V22.
 37. run of importance source...
 38. run of importance source...
 39. run of importance source...
 40. run of importance source...
 41. run of importance source...
 42. run of importance source...
Confirmed 1 attributes: V18.
Rejected 1 attributes: V38.
 43. run of importance source...
 44. run of importance source...
 45. run of importance source...
 46. run of importance source...
 47. run of importance source...
 48. run of importance source...
Rejected 1 attributes: V6.
 49. run of importance source...
 50. run of importance source...
 51. run of importance source...
 52. run of importance source...
 53. run of importance source...
Confirmed 1 attributes: V31.
 54. run of importance source...
 55. run of importance source...
 56. run of importance source...
Confirmed 1 attributes: V35.
 57. run of importance source...
 58. run of importance source...
 59. run of importance source...
 60. run of importance source...
 61. run of importance source...
 62. run of importance source...
 63. run of importance source...
 64. run of importance source...
 65. run of importance source...
 66. run of importance source...
 67. run of importance source...
 68. run of importance source...
 69. run of importance source...
 70. run of importance source...
 71. run of importance source...
Confirmed 1 attributes: V19.
 72. run of importance source...
 73. run of importance source...
 74. run of importance source...
 75. run of importance source...
 76. run of importance source...
 77. run of importance source...
 78. run of importance source...
 79. run of importance source...
 80. run of importance source...
 81. run of importance source...
 82. run of importance source...
 83. run of importance source...
 84. run of importance source...
 85. run of importance source...
 86. run of importance source...
 87. run of importance source...
 88. run of importance source...
 89. run of importance source...
 90. run of importance source...
 91. run of importance source...
 92. run of importance source...
 93. run of importance source...
 94. run of importance source...
 95. run of importance source...
 96. run of importance source...
 97. run of importance source...
 98. run of importance source...
 99. run of importance source...
print(Bor.son)
Boruta performed 99 iterations in 43.115 secs.
 31 attributes confirmed important: V1, V10, V11, V12, V13 and 26
more.
 16 attributes confirmed unimportant: V24, V29, V3, V33, V38 and
11 more.
 13 tentative attributes left: V14, V2, V25, V26, V30 and 8 more.
plotImpHistory(Bor.son)

stats <- attStats(Bor.son)
print(stats)
         meanZ    medianZ       minZ      maxZ   normHits  decision
V1   3.6530401  3.6444853  1.0475925  5.607688 0.77777778 Confirmed
V2   2.5164998  2.4508908  0.8796985  4.228693 0.41414141 Tentative
V3   0.9728750  0.8289368 -0.1268530  2.498122 0.00000000  Rejected
V4   5.2293782  5.2200279  3.3027666  7.235532 0.96969697 Confirmed
V5   3.7991123  3.7885119  1.4984098  5.545281 0.82828283 Confirmed
V6   1.7588173  1.5946677 -0.7450579  4.082650 0.11111111  Rejected
V7   0.8212300  0.8330158 -1.0083771  1.931514 0.00000000  Rejected
V8   2.5420991  2.4982483  0.5594706  5.257646 0.51515152 Tentative
V9   9.2427887  9.1948693  7.7791041 10.889831 1.00000000 Confirmed
V10  8.3376870  8.4323230  6.4904884  9.942388 1.00000000 Confirmed
V11 11.6021924 11.6211270  9.4979233 13.915700 1.00000000 Confirmed
V12 10.5115664 10.4512945  8.9831347 11.919861 1.00000000 Confirmed
V13  5.5651409  5.5388691  4.0350668  7.437915 0.98989899 Confirmed
V14  2.4597012  2.4310227 -0.5710810  4.074063 0.43434343 Tentative
V15  4.1801372  4.2442941  2.2013720  5.757296 0.89898990 Confirmed
V16  4.9904459  4.9985006  2.8683712  6.595888 0.97979798 Confirmed
V17  4.7842586  4.8574533  2.7231177  6.443001 0.96969697 Confirmed
V18  4.0492518  4.1422510  0.9480474  6.214764 0.87878788 Confirmed
V19  3.2088130  3.1912297  1.1842832  4.954614 0.70707071 Confirmed
V20  5.1132866  5.1194746  3.3387492  7.221378 0.97979798 Confirmed
V21  5.8193289  5.8004745  4.2212633  7.852021 1.00000000 Confirmed
V22  3.4730057  3.5076470  1.2618156  5.701728 0.80808081 Confirmed
V23  3.9977221  3.9749232  2.0282683  5.441874 0.87878788 Confirmed
V24  1.9017648  1.9363122  0.6611199  3.001546 0.03030303  Rejected
V25  2.1458338  2.2710539 -0.9757495  4.403309 0.35353535 Tentative
V26  3.1852047  3.2345693  0.9420935  4.759913 0.65656566 Tentative
V27  4.8434860  4.9352506  2.7065727  6.513909 0.97979798 Confirmed
V28  5.8507918  5.8376752  4.1993870  7.235571 0.98989899 Confirmed
V29  1.6785179  1.7669140  0.3409043  3.091791 0.01010101  Rejected
V30  2.5058045  2.5717359  0.6134807  4.713400 0.50505051 Tentative
V31  3.6770831  3.6133115  1.8553407  5.695155 0.79797980 Confirmed
V32  2.7319080  2.6865332  0.3348958  4.889188 0.46464646 Tentative
V33  1.7417104  1.6706347 -0.6025743  3.291508 0.04040404  Rejected
V34  2.2755731  2.2638458 -0.1782045  4.338294 0.38383838 Tentative
V35  3.2136774  3.1821189  1.0815490  4.690416 0.69696970 Confirmed
V36  6.9834258  7.0275938  5.3932208  8.634432 1.00000000 Confirmed
V37  5.7356023  5.7894117  3.9050333  7.784060 1.00000000 Confirmed
V38  1.6488811  1.8484526 -1.2942596  3.213088 0.09090909  Rejected
V39  2.5708872  2.5723870 -1.6331352  4.628588 0.52525253 Tentative
V40  1.5908988  1.7276400 -0.3855557  2.929240 0.01010101  Rejected
V41  1.1441840  1.2501144 -1.2553489  2.935398 0.03030303  Rejected
V42  2.2455157  2.2416167 -1.0185840  4.543826 0.34343434 Tentative
V43  2.9948703  3.0628737  0.6409093  4.748722 0.59595960 Tentative
V44  4.0675020  4.1099889  0.9265881  5.924562 0.90909091 Confirmed
V45  6.6879136  6.7006100  4.2407357  7.801320 1.00000000 Confirmed
V46  5.0840104  5.0049570  3.1328679  7.025135 0.96969697 Confirmed
V47  6.2091197  6.2263018  4.3085419  8.018038 0.98989899 Confirmed
V48  7.4061335  7.4382954  5.6249894  8.724833 1.00000000 Confirmed
V49  7.2091690  7.1748353  5.5874753  8.969070 1.00000000 Confirmed
V50  0.9282474  0.9112855 -0.7401568  3.006821 0.01010101  Rejected
V51  4.4947075  4.5465692  2.5353791  6.398363 0.94949495 Confirmed
V52  4.4478706  4.4574139  2.6404774  6.340599 0.95959596 Confirmed
V53  1.1062882  1.2696305 -0.8989411  3.201559 0.01010101  Rejected
V54  2.5583742  2.5540529  0.2077052  4.848674 0.45454545 Tentative
V55  0.7711991  0.6657143 -1.1346994  2.472689 0.00000000  Rejected
V56  0.3004477  0.1389055 -0.5401898  1.767219 0.00000000  Rejected
V57  0.2224416  0.1632599 -0.8937907  1.580834 0.00000000  Rejected
V58  1.1226998  1.2754091 -0.9577494  2.990236 0.02020202  Rejected
V59  2.4163296  2.4651994  0.1615830  4.459343 0.40404040 Tentative
V60  0.6233877  0.4664053 -0.7282950  1.994414 0.00000000  Rejected
plot(normHits~meanZ, col=stats$decision, data=stats)

Shows important bands

plot(Bor.son,sort=FALSE, las=2, cex.axis=0.75)


packageVersion("Boruta")
## [1] '4.0.0'

efg

2015-03-08 2357