broom 패키지

2017/10/18

Categories: Data Science Tags: broom data science

broom 패키지를 사용하면 통계 분석을 data.frame 내에서 할 수 있어서 후속 작업에 큰 잇점이 있습니다. https://cran.r-project.org/web/packages/broom/vignettes/broom.html

# packages
library(tidyverse)
library(broom)
library(beginr) 

예제

다룰 데이타는 dataset::mtcars입니다. 기술통계값은 다음과 같습니다.

head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
broom::tidy(mtcars)
column n mean sd median trimmed mad min max range skew kurtosis se
mpg 32 20.090625 6.0269481 19.200 19.6961538 5.4114900 10.400 33.900 23.500 0.6106550 -0.3727660 1.0654240
cyl 32 6.187500 1.7859216 6.000 6.2307692 2.9652000 4.000 8.000 4.000 -0.1746119 -1.7621198 0.3157093
disp 32 230.721875 123.9386938 196.300 222.5230769 140.4763500 71.100 472.000 400.900 0.3816570 -1.2072119 21.9094727
hp 32 146.687500 68.5628685 123.000 141.1923077 77.0952000 52.000 335.000 283.000 0.7260237 -0.1355511 12.1203173
drat 32 3.596563 0.5346787 3.695 3.5792308 0.7042350 2.760 4.930 2.170 0.2659039 -0.7147006 0.0945187
wt 32 3.217250 0.9784574 3.325 3.1526923 0.7672455 1.513 5.424 3.911 0.4231465 -0.0227108 0.1729685
qsec 32 17.848750 1.7869432 17.710 17.8276923 1.4158830 14.500 22.900 8.400 0.3690453 0.3351142 0.3158899
vs 32 0.437500 0.5040161 0.000 0.4230769 0.0000000 0.000 1.000 1.000 0.2402577 -2.0019376 0.0890983
am 32 0.406250 0.4989909 0.000 0.3846154 0.0000000 0.000 1.000 1.000 0.3640159 -1.9247414 0.0882100
gear 32 3.687500 0.7378041 4.000 3.6153846 1.4826000 3.000 5.000 2.000 0.5288545 -1.0697507 0.1304266
carb 32 2.812500 1.6152000 2.000 2.6538462 1.4826000 1.000 8.000 7.000 1.0508738 1.2570431 0.2855297

차의 중량과 연비와의 관계를 그림으로 그리면 다음과 같습니다.

beginr::plotlm(x = mtcars$wt, y = mtcars$mpg, xlab = 'wt', ylab = 'mpg', plot.title = 'mtcars')

## [[1]]
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## x           -5.344472   0.559101 -9.559044 1.293959e-10
## 
## [[2]]
## [1] 0.7528328
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars))) 

p + 
  geom_point() +
  geom_text(vjust = 0, nudge_y = 0.5) +
  geom_text(aes(label = paste0(wt, "^(", mpg, ")")), parse = TRUE)

lmfit <- stats::lm(formula = mpg ~ wt, data = mtcars)
lmfit
## 
## Call:
## stats::lm(formula = mpg ~ wt, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt  
##      37.285       -5.344
summary(lmfit)
## 
## Call:
## stats::lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
broom::tidy(lmfit)
term estimate std.error statistic p.value
(Intercept) 37.285126 1.877627 19.857575 0
wt -5.344472 0.559101 -9.559044 0
broom::augment(lmfit)
.rownames mpg wt .fitted .se.fit .resid .hat .sigma .cooksd .std.resid
Mazda RX4 21.0 2.620 23.282611 0.6335798 -2.2826106 0.0432690 3.067494 0.0132741 -0.7661677
Mazda RX4 Wag 21.0 2.875 21.919770 0.5714319 -0.9197704 0.0351968 3.093068 0.0017240 -0.3074305
Datsun 710 22.8 2.320 24.885952 0.7359177 -2.0859521 0.0583757 3.072127 0.0154394 -0.7057525
Hornet 4 Drive 21.4 3.215 20.102650 0.5384424 1.2973499 0.0312502 3.088268 0.0030206 0.4327511
Hornet Sportabout 18.7 3.440 18.900144 0.5526562 -0.2001440 0.0329218 3.097722 0.0000760 -0.0668188
Valiant 18.1 3.460 18.793254 0.5552829 -0.6932545 0.0332355 3.095184 0.0009211 -0.2314831
Duster 360 14.3 3.570 18.205363 0.5734244 -3.9053627 0.0354426 3.008664 0.0313139 -1.3055222
Merc 240D 24.4 3.190 20.236262 0.5386565 4.1637381 0.0312750 2.996697 0.0311392 1.3888971
Merc 230 22.8 3.150 20.450041 0.5397522 2.3499593 0.0314024 3.066058 0.0099619 0.7839269
Merc 280 19.2 3.440 18.900144 0.5526562 0.2998560 0.0329218 3.097435 0.0001706 0.1001080
Merc 280C 17.8 3.440 18.900144 0.5526562 -1.1001440 0.0329218 3.090979 0.0022962 -0.3672871
Merc 450SE 16.4 4.070 15.533127 0.7191881 0.8668731 0.0557518 3.093520 0.0025325 0.2928865
Merc 450SL 17.3 3.730 17.350247 0.6100029 -0.0502472 0.0401086 3.097938 0.0000059 -0.0168379
Merc 450SLC 15.2 3.780 17.083024 0.6236291 -1.8830236 0.0419205 3.077286 0.0087273 -0.6315997
Cadillac Fleetwood 10.4 5.250 9.226650 1.2576087 1.1733496 0.1704766 3.088702 0.0183826 0.4229607
Lincoln Continental 10.4 5.424 8.296712 1.3461693 2.1032876 0.1953319 3.067203 0.0719252 0.7697987
Chrysler Imperial 14.7 5.345 8.718926 1.3058069 5.9810744 0.1837942 2.843585 0.5319056 2.1735331
Fiat 128 32.4 2.200 25.527289 0.7831923 6.8727113 0.0661166 2.802362 0.1929858 2.3349021
Honda Civic 30.4 1.615 28.653805 1.0451849 1.7461954 0.1177498 3.078657 0.0248603 0.6103569
Toyota Corolla 33.9 1.835 27.478021 0.9418946 6.4219792 0.0956265 2.832808 0.2598750 2.2170827
Toyota Corona 21.5 2.465 24.111004 0.6832345 -2.6110037 0.0503168 3.057740 0.0204982 -0.8796401
Dodge Challenger 15.5 3.520 18.472586 0.5644203 -2.9725862 0.0343383 3.046600 0.0175365 -0.9931363
AMC Javelin 15.2 3.435 18.926866 0.5520329 -3.7268663 0.0328476 3.016967 0.0262873 -1.2441801
Camaro Z28 13.3 3.840 16.762355 0.6412083 -3.4623553 0.0443172 3.027336 0.0313496 -1.1627910
Pontiac Firebird 19.2 3.845 16.735633 0.6427306 2.4643670 0.0445279 3.062374 0.0159643 0.8277197
Fiat X1-9 27.3 1.935 26.943574 0.8965906 0.3564263 0.0866487 3.097178 0.0007112 0.1224441
Porsche 914-2 26.0 2.140 25.847957 0.8078823 0.1520430 0.0703510 3.097814 0.0001014 0.0517719
Lotus Europa 30.4 1.513 29.198941 1.0944578 1.2010593 0.1291136 3.088720 0.0132349 0.4225427
Ford Pantera L 15.8 3.170 20.343151 0.5390886 -4.5431513 0.0313252 2.977005 0.0371361 -1.5154971
Ferrari Dino 19.7 2.770 22.480940 0.5936730 -2.7809399 0.0379899 3.052884 0.0171095 -0.9308693
Maserati Bora 15.0 3.570 18.205363 0.5734244 -3.2053627 0.0354426 3.038092 0.0210945 -1.0715194
Volvo 142E 21.4 2.780 22.427495 0.5913398 -1.0274952 0.0376919 3.091840 0.0023159 -0.3438822
broom::glance(lmfit)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.7528328 0.7445939 3.045882 91.37533 0 2 -80.01471 166.0294 170.4266 278.3219 30

Reference