top of page

Regressão logística simples

Regressão logística múltipla

Diagnósticos

Regressão logística simples

Seja o seguinte banco de dados:

ID: Identification Code

LOW: Low Birth Weight (0 = Birth Weight >= 2500g, 1 = Birth Weight < 2500g)

AGE: Age of the Mother in Years

LWT: Weight in Pounds at the Last Menstrual Period

RACE: Race (1 = White, 2 = Black, 3 = Other)

SMOKE: Smoking Status During Pregnancy (1 = Yes, 0 = No)

PTL: History of Premature Labor (0 = None 1 = One, etc.)

HT: History of Hypertension (1 = Yes, 0 = No) HT

UI: Presence of Uterine Irritability (1 = Yes, 0 = No) UI

FTV: Number of Physician Visits During the First Trimester (0 = None, 1 = One, 2 = Two, etc.)

BWT: Birth Weight in Grams BWT

Clicar aqui para obter o banco de dados em Excel.

SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second

Edition. These data are copyrighted by John Wiley & Sons Inc. and must

be acknowledged and used accordingly. Data were collected at Baystate

Medical Center, Springfield, Massachusetts during 1986.

# Diretório onde os dados estão salvos

setwd("E:\\Pessoal")

# Leitura do banco de dados

dados <- read.csv2("lowbwt.csv")

 

# Ou então, use estas linhas para ler os dados do GitHub:

urlfile <- "https://raw.githubusercontent.com/edsonzmartinez/cursoR/main/lowbwt.csv"

dados   <- read.csv2(urlfile,head=TRUE)


# Variáveis

names(dados)

 

 [1] "ID"    "LOW"   "AGE"   "LWT"   "RACE"  "SMOKE" "PTL"   "HT"    "UI"   
[10] "FTV"   "BWT"


# Resumos dos dados

summary(dados)

       ID             LOW              AGE             LWT       
Min.   :  4.0   Min.   :0.0000   Min.   :14.00   Min.   : 80.0  
1st Qu.: 68.0   1st Qu.:0.0000   1st Qu.:19.00   1st Qu.:110.0  
Median :123.0   Median :0.0000   Median :23.00   Median :121.0  
Mean   :121.1   Mean   :0.3122   Mean   :23.24   Mean   :129.8  
3rd Qu.:176.0   3rd Qu.:1.0000   3rd Qu.:26.00   3rd Qu.:140.0  
Max.   :226.0   Max.   :1.0000   Max.   :45.00   Max.   :250.0  
     RACE           SMOKE             PTL               HT         
Min.   :1.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
Median :1.000   Median :0.0000   Median :0.0000   Median :0.00000  
Mean   :1.847   Mean   :0.3915   Mean   :0.1958   Mean   :0.06349  
3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
Max.   :3.000   Max.   :1.0000   Max.   :3.0000   Max.   :1.00000  
      UI              FTV              BWT      
Min.   :0.0000   Min.   :0.0000   Min.   : 709  
1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:2414  
Median :0.0000   Median :0.0000   Median :2977  
Mean   :0.1481   Mean   :0.7937   Mean   :2945  
3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:3475  
Max.   :1.0000   Max.   :6.0000   Max.   :4990
 

# Frequências para a variável RACE

table(dados$RACE)

 1  2  3
96 26 67

# Frequências para a variável PTL (partos prematuros)

table(dados$PTL)

  0   1   2   3
159  24   5   1

# Criando uma nova variável, PTL.C, que assume

# 0 se não houveram partos prematuros

# 1 se 1 ou mais partos prematuros

dados$PTL.C <- ifelse(dados$PTL==0,0,1)


table(dados$PTL.C)

  0   1
159  30

# Frequências para a variável FTV (visitas médicas no primeiro trimestre)

table(dados$FTV)


  0   1   2   3   4   6
100  47  30   7   4   1


# Criando uma nova variável, FTV.C, que assume

# 0 se não houveram visitas médicas no primeiro trimestre

# 1 se 1 ou mais visitas médicas no primeiro trimestre

dados$FTV.C <- ifelse(dados$FTV==0,0,1)
 

table(dados$FTV.C)

  0   1
100  89

# A variável dependente é LOW (baixo peso ao nascer)

table(dados$LOW)


  0   1
130  59

 

# Regressão logística

 

# Uma única variável independente, FTV.C

# função glm(): generalized linear models

 

model1 <- glm(LOW ~ as.factor(FTV.C),family = binomial(link = 'logit'),data = dados)

model1

Call:  glm(formula = LOW ~ as.factor(FTV.C), family = binomial(link = "logit"),
   data = dados)

Coefficients:
     (Intercept)  as.factor(FTV.C)1  
         -0.5754            -0.4788  

Degrees of Freedom: 188 Total (i.e. Null);  187 Residual
Null Deviance:      234.7
Residual Deviance: 232.4        AIC: 236.4

summary(model1)

Call:
glm(formula = LOW ~ as.factor(FTV.C), family = binomial(link = "logit"),
   data = dados)

Deviance Residuals:
   Min       1Q   Median       3Q      Max  
-0.9448  -0.9448  -0.7733   1.4294   1.6451  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)        -0.5754     0.2083  -2.762  0.00575 **
as.factor(FTV.C)1  -0.4788     0.3194  -1.499  0.13389   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 232.39  on 187  degrees of freedom
AIC: 236.39

Number of Fisher Scoring iterations: 4

 

# O odds ratio (OR) para FTV é estimado por exp(-0.4788)


exp(cbind(coef(model1), confint(model1)))


Waiting for profiling to be done...
                               2.5 %    97.5 %
(Intercept)       0.5625000 0.3704727 0.8408013
as.factor(FTV.C)1 0.6195286 0.3282280 1.1526472


exp(-0.4788)

 

[1] 0.6195264

# O pacote epiDisplay é útil para mostrar os resultados da regressão logística

# de uma forma mais apropriada

# Instalando o pacote epiDisplay

install.packages("epiDisplay")

 

library(epiDisplay)

# Convertendo a variável FTV.C em fator

dados$FTV.factor <- factor(dados$FTV.C)

model2 <- glm(LOW ~ FTV.factor,family = binomial(link = 'logit'),data = dados)

# Usando a função logistic.display() do pacote epiDisplay

logistic.display(model2)

Logistic regression predicting LOW
 
                  OR(95%CI)         P(Wald's test) P(LR-test)
FTV.factor: 1 vs 0 0.62 (0.33,1.16)  0.134          0.131     
                                                             
Log-likelihood = -116.1969
No. of observations = 189
AIC value = 236.3938

 

 

 

Regressão logística múltipla

model3 <- glm(LOW ~ as.factor(FTV.C) + as.factor(RACE),family = binomial(link = 'logit'),data = dados)

summary(model3)

Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(RACE), family = binomial(link = "logit"),
   data = dados)

Deviance Residuals:
   Min       1Q   Median       3Q      Max  
-1.1223  -0.8698  -0.6815   1.3434   1.7743  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)        -0.9471     0.2906  -3.258  0.00112 **
as.factor(FTV.C)1  -0.3947     0.3271  -1.207  0.22749   
as.factor(RACE)2    0.8160     0.4658   1.752  0.07981 .
as.factor(RACE)3    0.5648     0.3535   1.598  0.11005   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 228.19  on 185  degrees of freedom
AIC: 236.19

Number of Fisher Scoring iterations: 4


 

model4 <- glm(LOW ~ as.factor(HT) + as.factor(UI),family = binomial(link = 'logit'),data = dados)
 

summary(model4)

Call:
glm(formula = LOW ~ as.factor(HT) + as.factor(UI), family = binomial(link = "logit"),
   data = dados)

Deviance Residuals:
   Min       1Q   Median       3Q      Max  
-1.3232  -0.7673  -0.7673   1.1774   1.6531  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -1.0719     0.1879  -5.703 1.17e-08 ***
as.factor(HT)1   1.4084     0.6150   2.290   0.0220 *  
as.factor(UI)1   1.0719     0.4221   2.539   0.0111 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 224.32  on 186  degrees of freedom
AIC: 230.32

Number of Fisher Scoring iterations: 4

exp(cbind(coef(model4), confint(model4)))

Waiting for profiling to be done...
                            2.5 %    97.5 %
(Intercept)    0.3423423 0.2339676  0.489891
as.factor(HT)1 4.0894737 1.2343010 14.545492
as.factor(UI)1 2.9210526 1.2717694  6.737588

 

# Usando a função logistic.display() do pacote epiDisplay

logistic.display(model4)
 
                    OR lower95ci upper95ci   Pr(>|Z|)
as.factor(HT)1 4.089474  1.225204 13.649807 0.02200750
as.factor(UI)1 2.921053  1.277126  6.681056 0.01110257

 

# Convertendo para fatores

 

dados$HT.factor <- factor(dados$HT)
dados$UI.factor <- factor(dados$UI)

model5 <- glm(LOW ~ HT.factor + UI.factor,family = binomial(link = 'logit'),data = dados)

 

 

logistic.display(model5)

Logistic regression predicting LOW
 
                 crude OR(95%CI)    adj. OR(95%CI)     P(Wald's test) P(LR-test)
HT.factor: 1 vs 0 3.37 (1.02,11.09)  4.09 (1.23,13.65)  0.022          0.022     
                                                                                
UI.factor: 1 vs 0 2.58 (1.14,5.83)   2.92 (1.28,6.68)   0.011          0.012     
                                                                                
Log-likelihood = -112.1603
No. of observations = 189
AIC value = 230.3206

 

model6 <- glm(LOW ~ as.factor(FTV.C) + as.factor(UI),family = binomial(link = 'logit'),data = dados)
 

summary(model6)

Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(UI), family = binomial(link = "logit"),
   data = dados)

Deviance Residuals:
   Min       1Q   Median       3Q      Max  
-1.2522  -0.8823  -0.7298   1.2940   1.7048  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)        -0.7426     0.2258  -3.288  0.00101 **
as.factor(FTV.C)1  -0.4442     0.3238  -1.372  0.17008   
as.factor(UI)1      0.9168     0.4194   2.186  0.02882 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 227.69  on 186  degrees of freedom
AIC: 233.69

Number of Fisher Scoring iterations: 4

 

# Incluindo um termo de interação no modelo

model7 <- glm(LOW ~ as.factor(FTV.C) + as.factor(UI) + as.factor(FTV.C):as.factor(UI),

  family = binomial(link = 'logit'),data = dados)
 

summary(model7)

 

Call:
glm(formula = LOW ~ as.factor(FTV.C) + as.factor(UI) + as.factor(FTV.C):as.factor(UI),
   family = binomial(link = "logit"), data = dados)

Deviance Residuals:
   Min       1Q   Median       3Q      Max  
-1.3321  -0.8669  -0.7472   1.4224   1.6806  

Coefficients:
                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                       -0.7850     0.2367  -3.317  0.00091 ***
as.factor(FTV.C)1                 -0.3481     0.3544  -0.982  0.32590    
as.factor(UI)1                     1.1416     0.5467   2.088  0.03677 *  
as.factor(FTV.C)1:as.factor(UI)1  -0.5681     0.8725  -0.651  0.51495    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 227.26  on 185  degrees of freedom
AIC: 235.26

Number of Fisher Scoring iterations: 4

bottom of page