Bioestatística
Prof. Dr. Edson Zangiacomi Martinez
Faculdade de Medicina de Ribeirão Preto
Universidade de São Paulo (USP)
Esta página está em construção!
Todo seu conteúdo não é definitivo...
Regressão logística simples
Regressão logística múltipla
Diagnósticos
Regressão logística simples
Seja o seguinte banco de dados:
ID: Identification Code
LOW: Low Birth Weight (0 = Birth Weight >= 2500g, 1 = Birth Weight < 2500g)
AGE: Age of the Mother in Years
LWT: Weight in Pounds at the Last Menstrual Period
RACE: Race (1 = White, 2 = Black, 3 = Other)
SMOKE: Smoking Status During Pregnancy (1 = Yes, 0 = No)
PTL: History of Premature Labor (0 = None 1 = One, etc.)
HT: History of Hypertension (1 = Yes, 0 = No) HT
UI: Presence of Uterine Irritability (1 = Yes, 0 = No) UI
FTV: Number of Physician Visits During the First Trimester (0 = None, 1 = One, 2 = Two, etc.)
BWT: Birth Weight in Grams BWT
Clicar aqui para obter o banco de dados em Excel.
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second
Edition. These data are copyrighted by John Wiley & Sons Inc. and must
be acknowledged and used accordingly. Data were collected at Baystate
Medical Center, Springfield, Massachusetts during 1986.
# Diretório onde os dados estão salvos
setwd("E:\\Pessoal")
# Leitura do banco de dados
dados <- read.csv2("lowbwt.csv")
# Ou então, use estas linhas para ler os dados do GitHub:
urlfile <- "https://raw.githubusercontent.com/edsonzmartinez/cursoR/main/lowbwt.csv"
dados <- read.csv2(urlfile,head=TRUE)
# Variáveis
names(dados)
[1] "ID" "LOW" "AGE" "LWT" "RACE" "SMOKE" "PTL" "HT" "UI"
[10] "FTV" "BWT"
# Resumos dos dados
summary(dados)
ID LOW AGE LWT
Min. : 4.0 Min. :0.0000 Min. :14.00 Min. : 80.0
1st Qu.: 68.0 1st Qu.:0.0000 1st Qu.:19.00 1st Qu.:110.0
Median :123.0 Median :0.0000 Median :23.00 Median :121.0
Mean :121.1 Mean :0.3122 Mean :23.24 Mean :129.8
3rd Qu.:176.0 3rd Qu.:1.0000 3rd Qu.:26.00 3rd Qu.:140.0
Max. :226.0 Max. :1.0000 Max. :45.00 Max. :250.0
RACE SMOKE PTL HT
Min. :1.000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
Median :1.000 Median :0.0000 Median :0.0000 Median :0.00000
Mean :1.847 Mean :0.3915 Mean :0.1958 Mean :0.06349
3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :3.000 Max. :1.0000 Max. :3.0000 Max. :1.00000
UI FTV BWT
Min. :0.0000 Min. :0.0000 Min. : 709
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:2414
Median :0.0000 Median :0.0000 Median :2977
Mean :0.1481 Mean :0.7937 Mean :2945
3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:3475
Max. :1.0000 Max. :6.0000 Max. :4990
# Frequências para a variável RACE
table(dados$RACE)
1 2 3
96 26 67
# Frequências para a variável PTL (partos prematuros)
table(dados$PTL)
0 1 2 3
159 24 5 1
# Criando uma nova variável, PTL.C, que assume
# 0 se não houveram partos prematuros
# 1 se 1 ou mais partos prematuros
dados$PTL.C <- ifelse(dados$PTL==0,0,1)
table(dados$PTL.C)
0 1
159 30
# Frequências para a variável FTV (visitas médicas no primeiro trimestre)
table(dados$FTV)
0 1 2 3 4 6
100 47 30 7 4 1
# Criando uma nova variável, FTV.C, que assume
# 0 se não houveram visitas médicas no primeiro trimestre
# 1 se 1 ou mais visitas médicas no primeiro trimestre
dados$FTV.C <- ifelse(dados$FTV==0,0,1)
table(dados$FTV.C)
0 1
100 89
# A variável dependente é LOW (baixo peso ao nascer)
table(dados$LOW)
0 1
130 59
# Regressão logística
# Uma única variável independente, FTV.C
# função glm(): generalized linear models
model1 <- glm(LOW ~ as.factor(FTV.C),family = binomial(link = 'logit'),data = dados)
model1
Call: glm(formula = LOW ~ as.factor(FTV.C), family = binomial(link = "logit"),
data = dados)
Coefficients:
(Intercept) as.factor(FTV.C)1
-0.5754 -0.4788
Degrees of Freedom: 188 Total (i.e. Null); 187 Residual
Null Deviance: 234.7
Residual Deviance: 232.4 AIC: 236.4
summary(model1)
Call:
glm(formula = LOW ~ as.factor(FTV.C), family = binomial(link = "logit"),
data = dados)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9448 -0.9448 -0.7733 1.4294 1.6451
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5754 0.2083 -2.762 0.00575 **
as.factor(FTV.C)1 -0.4788 0.3194 -1.499 0.13389
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 232.39 on 187 degrees of freedom
AIC: 236.39
Number of Fisher Scoring iterations: 4
# O odds ratio (OR) para FTV é estimado por exp(-0.4788)
exp(cbind(coef(model1), confint(model1)))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.5625000 0.3704727 0.8408013
as.factor(FTV.C)1 0.6195286 0.3282280 1.1526472
exp(-0.4788)
[1] 0.6195264
# O pacote epiDisplay é útil para mostrar os resultados da regressão logística
# de uma forma mais apropriada
# Instalando o pacote epiDisplay
install.packages("epiDisplay")
library(epiDisplay)
# Convertendo a variável FTV.C em fator
dados$FTV.factor <- factor(dados$FTV.C)
model2 <- glm(LOW ~ FTV.factor,family = binomial(link = 'logit'),data = dados)
# Usando a função logistic.display() do pacote epiDisplay
logistic.display(model2)
Logistic regression predicting LOW
OR(95%CI) P(Wald's test) P(LR-test)
FTV.factor: 1 vs 0 0.62 (0.33,1.16) 0.134 0.131
Log-likelihood = -116.1969
No. of observations = 189
AIC value = 236.3938
Regressão logística múltipla
model3 <- glm(LOW ~ as.factor(FTV.C) + as.factor(RACE),family = binomial(link = 'logit'),data = dados)
summary(model3)
Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(RACE), family = binomial(link = "logit"),
data = dados)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1223 -0.8698 -0.6815 1.3434 1.7743
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9471 0.2906 -3.258 0.00112 **
as.factor(FTV.C)1 -0.3947 0.3271 -1.207 0.22749
as.factor(RACE)2 0.8160 0.4658 1.752 0.07981 .
as.factor(RACE)3 0.5648 0.3535 1.598 0.11005
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 228.19 on 185 degrees of freedom
AIC: 236.19
Number of Fisher Scoring iterations: 4
model4 <- glm(LOW ~ as.factor(HT) + as.factor(UI),family = binomial(link = 'logit'),data = dados)
summary(model4)
Call:
glm(formula = LOW ~ as.factor(HT) + as.factor(UI), family = binomial(link = "logit"),
data = dados)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3232 -0.7673 -0.7673 1.1774 1.6531
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0719 0.1879 -5.703 1.17e-08 ***
as.factor(HT)1 1.4084 0.6150 2.290 0.0220 *
as.factor(UI)1 1.0719 0.4221 2.539 0.0111 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 224.32 on 186 degrees of freedom
AIC: 230.32
Number of Fisher Scoring iterations: 4
exp(cbind(coef(model4), confint(model4)))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.3423423 0.2339676 0.489891
as.factor(HT)1 4.0894737 1.2343010 14.545492
as.factor(UI)1 2.9210526 1.2717694 6.737588
# Usando a função logistic.display() do pacote epiDisplay
logistic.display(model4)
OR lower95ci upper95ci Pr(>|Z|)
as.factor(HT)1 4.089474 1.225204 13.649807 0.02200750
as.factor(UI)1 2.921053 1.277126 6.681056 0.01110257
# Convertendo para fatores
dados$HT.factor <- factor(dados$HT)
dados$UI.factor <- factor(dados$UI)
model5 <- glm(LOW ~ HT.factor + UI.factor,family = binomial(link = 'logit'),data = dados)
logistic.display(model5)
Logistic regression predicting LOW
crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test)
HT.factor: 1 vs 0 3.37 (1.02,11.09) 4.09 (1.23,13.65) 0.022 0.022
UI.factor: 1 vs 0 2.58 (1.14,5.83) 2.92 (1.28,6.68) 0.011 0.012
Log-likelihood = -112.1603
No. of observations = 189
AIC value = 230.3206
model6 <- glm(LOW ~ as.factor(FTV.C) + as.factor(UI),family = binomial(link = 'logit'),data = dados)
summary(model6)
Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(UI), family = binomial(link = "logit"),
data = dados)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2522 -0.8823 -0.7298 1.2940 1.7048
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7426 0.2258 -3.288 0.00101 **
as.factor(FTV.C)1 -0.4442 0.3238 -1.372 0.17008
as.factor(UI)1 0.9168 0.4194 2.186 0.02882 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 227.69 on 186 degrees of freedom
AIC: 233.69
Number of Fisher Scoring iterations: 4
# Incluindo um termo de interação no modelo
model7 <- glm(LOW ~ as.factor(FTV.C) + as.factor(UI) + as.factor(FTV.C):as.factor(UI),
family = binomial(link = 'logit'),data = dados)
summary(model7)
Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(UI) + as.factor(FTV.C):as.factor(UI),
family = binomial(link = "logit"), data = dados)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3321 -0.8669 -0.7472 1.4224 1.6806
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7850 0.2367 -3.317 0.00091 ***
as.factor(FTV.C)1 -0.3481 0.3544 -0.982 0.32590
as.factor(UI)1 1.1416 0.5467 2.088 0.03677 *
as.factor(FTV.C)1:as.factor(UI)1 -0.5681 0.8725 -0.651 0.51495
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 227.26 on 185 degrees of freedom
AIC: 235.26
Number of Fisher Scoring iterations: 4