Article, Critical Care

Predictive performance of the SOFA and mSOFA scoring systems for predicting in-hospital mortality in the emergency department

a b s t r a c t

Background: The Sequential organ failure assessment and modified SOFA (mSOFA) are risk stratification systems which incorporate respiratory, coagulatory, liver, cardiovascular, renal, and neurologic systems to quan- tify the overall severity of acute disorder in the intensive care unit.

Objective: To evaluate the prognostic performance of the SOFA and mSOFA scores at arrival for predicting in- hospital mortality in the emergency department (ED).

Methods: All adult patients with an Emergency Severity Index of 1-3 in the ED of Imam Reza Hospital, northeast of Iran were included from March 2016 to March 2017. The Predictive performance of the SOFA or mSOFA scores were expressed in terms of accuracy (Brier Score, BS and Brier Skill Score, BSS), discrimination (Area Under the Receiver Operating Characteristic Curve, AUC), and calibration.

Results: A total of 2205 patients (mean age 61.8 +- 18.5 years, 53% male) were included. The overall in-hospital mortality was 19%. For SOFA and mSOFA the BS was 0.209 and 0.192 and the BSS was 0.11 and 0.09, respectively. The estimated AUCs of SOFA and mSOFA models were 0.751 and 0.739, respectively. No significant difference was observed between the AUCs (P = 0.186). The Hosmer-Lemeshow test did not show that the predictions deviated from the true probabilities. Also, the calibration plots revealed good agreement between the actual and predicted probabilities.

Conclusion: The SOFA and mSOFA scores demonstrated fair discrimination and good calibration in predicting in- hospital mortality when applied to ED. However, further external validation studies are needed before their use in routine clinical care.

(C) 2018

  1. Introduction

Emergency departments (EDs) are high-pressure overcrowded units in the inpatient service delivery system that treat highly heterogeneous patients in terms of disease and severity [1]. Mortality is the most im- portant outcome in ED which highlights the need for accurate and reli- able Triage systems for optimal use of limited resources and inclusion in early interventions [2]. A patient’s death is usually preceded by cumula- tive deterioration of vital signs due to organ dysfunction which are the basis of most calculations in available prioritization systems [3,4].

* Corresponding author at: Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.

E-mail address: [email protected] (S. Eslami).

prediction models are potentially useful tools in decision making which help clinicians to quantify the level of organ dysfunction by ana- lyzing past observations and extrapolating possible future events with an acceptable error. Several prediction models are designed to gauge the assessment of organ failure in critical care settings [5-8]. Although these risk stratification systems have been associated with acceptable predictive performance in the Intensive Care Unit (ICU) [9-11], few studies have attempted to evaluate the predictive performance of such systems when adapted to ED [12].

The Sequential Organ Failure Assessment score is an objective scoring system which incorporates six organ systems (i.e. respiratory, coagulation, liver, cardiovascular, renal, and neurologic), each of which is scored from 0 to 4 and results in an ordinal total score ranging from 0 to 24 in which higher score reflects more severity of acute

https://doi.org/10.1016/j.ajem.2018.09.011

0735-6757/(C) 2018

disorder [8]. The modified SOFA (mSOFA) is a simplified version of SOFA, which requires only one laboratory measurement and has been previ- ously found to be as effective as SOFA for predicting in-hospital mortality among patients admitted to ICU [13].

The present study aims to gauge the predictive value of the SOFA and mSOFA scores for in-hospital mortality among patients presented to ED.

  1. Methods
    1. Study design

This was a retrospective study performed from March 2016 to March 2017 at the ED of the Imam Reza Hospital in Mashhad, in the northeast- ern part of Iran. At the time of the study, this hospital was the largest one in the east of Iran and the average admission to its 60-bed ED was N180,000 patient visits annually. The study was approved by the institutional review board of Mashhad University of Medical Sciences (Number: IR.MUMS.fm.REC.1395.16) and conformed to the Declaration of Helsinki principles.

Study population

Patients with 18 years of age or older with higher Acuity level at arrival (an Emergency Severity Index of 1, 2, or 3) were included. Patients who were discharged within less than 4 h, died upon arrival, and the patients who were admitted due to a Traumatic event, gyneco- logic disorder, or a poisoning event were excluded from the study. In ad- dition, individuals with at least one assessment missing, patients with no identifications, or readmitted patients were discarded. Fig. 1 details the patient inclusion process in the study.

Data collection and study variables

To evaluate the respiratory component of SOFA and mSOFA, the par- tial pressure of arterial oxygen (PaO2), Arterial oxygen saturation (SpO2), and the fraction of inspired oxygen (FiO2) were measured to calculate PaO2/FiO2 and SpO2/FiO2 ratios, respectively. Platelets, biliru- bin, creatinine, and urine output were also measured to quantify the

function level of coagulatory, liver, and renal systems. The Mean Arterial Pressure (MAP) and dosage of vasopressor agents (i.e. epinephrine, nor- epinephrine, dopamine, and dobutamine) were monitored to assess the cardiovascular system. The Glasgow Coma Scale (GCS) was calculated to assess the neurologic system for each particular patient. All covariates, except for bilirubin, were measured in routine clinical care. Bilirubin was specifically measured for the study using the same serum to calcu- late the liver covariate in both models. The endpoint used for validation of the SOFA and mSOFA scores was in-hospital all-cause mortality after admission to ED.

Statistical analyses

Two logistic regression models were developed to predict outcome based on the SOFA and mSOFA scores, separately. The predictive perfor- mance was assessed in terms of three dimensions. The overall accuracy was assessed using the Brier Score (BS). The BS is the mean squared dif- ference between the predicted probability and actual outcomes which can range from 0 to 1, with 0 for a perfect model, 0.25 for a model which is not better than chance, and 1 for a model with completely in- correct predictions [14]. In addition, the Brier Skill Score (BSS) measures the proportional improvement of the SOFA and mSOFA forecasts over a non-informative model that simply uses the prior probability to all patients. The maximum value for BSS is 1 that indicates a perfect deterministic forecast [15]. The discrimination between alive and dead individuals was measured by the area under the receiver operating characteristic curve and its confidence interval using the bootstrapping with 1000 bootstrap samples to correct for optimism. This method was also used to calculate the confidence interval of the difference in the AUC between the SOFA and mSOFA models. In addi- tion, P-value for the comparison of the AUCs was calculated using the Delong’s method [16]. The Youden Index method was used to calculate the sensitivity, specificity, Positive Predictive Value (PPV), Negative Pre- dictive Value (NPV), and accuracy. Calibration, which gauges the agree- ment between the observed and predicted outcomes, was evaluated using the Hosmer-Lemeshow test [17]. Because this test has its limita- tions, we also performed a graphical assessment of the calibration graph plotted using 1000 bootstrap replicates. When all points are on

Fig. 1. Flowchart of patient selection for inclusion.

Table 1

Baseline demographic and clinical characteristics of patients admitted to emergency department.

Characteristic

Dead

Alive

Total

P value

(N = 426)

(N = 1779)

(N = 2205)

Age

67.9 +- 15.9

60.4 +- 18.8

61.8 +- 18.5

b0.001a

Gender

Male

232 (20%)

944 (80%)

1176 (53%)

0.62b

ESI

Level 1

77 (39%)

121 (61%)

198 (9%)

0.001c

Level 2

199 (24%)

632 (76%)

831 (38%)

Level 3

150 (13%)

1026 (87%)

1176 (53%)

PO2

91.5 +- 6.9

93.7 +- 5.1

93.3 +- 5.6

b0.001a

FiO2

29.5 +- 15.3

24 +- 8.3

25.0 +- 10.2

b0.001a

PCO2

39.5 +- 18.1

39.1 +- 12.4

151.8 +- 101

0.623a

HCO3

20.0 +- 8.0

22.7 +- 6.0

7.4 +- 0.1

b0.001a

SpO2/FiO2

359.9 +- 109.2

415.5 +- 79.4

404.7 +- 88.7

b0.001a

MAP

89.9 +- 20.8

94.9 +- 18.4

93.9 +- 19

b0.001a

Urea

113.2 +- 91.0

66.5 +- 59.7

75.5 +- 69.3

b0.001a

Creatinine

2.5 +- 2.5

1.9 +- 2.3

34.8 +- 8.8

b0.001a

Urine output

1301.4 +- 405.0

1428.2 +- 274.7

11.9 +- 13.8

b0.001a

Total bilirubin

3.4 +- 7.3

1.8 +- 3.7

4.3 +- 1.0

b0.001a

Platelets

206.5 +- 143.1

226.9 +- 130.4

223.0 +- 133

0.007a

GCS

13.51 +- 2.15

14.6 +- 1.0

39.1 +- 13.7

b0.001a

SOFA

5.0 +- 3.0

2.5 +- 2.2

3.0 +- 2.5

b0.001a

mSOFA

3.8 +- 2.6

1.7 +- 1.8

2.1 +- 2.1

b0.00a

MV

93 (76%)

30 (24%)

123 (6%)

b0.001b

EMS

244 (24%)

775 (64%)

1019 (46%)

b0.001b

Diabetes mellitus

114 (20%)

444 (80%)

558 (25%)

0.45b

Smoking

47 (17%)

229 (83%)

276 (13%)

0.32b

Icteric Diagnosis

59 (32%)

126 (68%)

185 (8%)

b0.001b b0.001c

Certain infectious and parasitic diseases

59 (14%)

117 (7%)

176 (7%)

Neoplasms, diseases of the blood

90 (21%)

193 (11%)

352 (16%)

Diseases of the circulatory system

58 (14%)

221 (12%)

279 (16%)

Diseases of the respiratory system

60 (14%)

236 (13%)

296 (13%)

Diseases of the digestive system

79 (19%)

470 (26%)

549 (25%)

Diseases of the Genitourinary system

41 (10%)

194 (11%)

235 (10%)

Symptoms

13 (3%)

132 (7%)

173 (7%)

Other reasons

26 (6%)

216 (12%)

145 (6%)

Abbreviations: ESI, Emergency Severity Index; PaO2, partial pressure of arterial oxygen; FiO2, fraction of inspired oxygen; PCO2, partial pressure of carbon dioxide; HCO3, bicarbonate; SpO2, pulse oxygen saturation; MAP, mean arterial pressure; GCS, Glasgow Coma Scale; SOFA, Sequential Organ Failure Assessment; mSOFA, modified SOFA; MV, mechanical ventilation; EMS, emergency medical services.

Mean +- SD for continues and N (%) for categorical variables are presented.

a Analysis by independent-samples t-test.

b Analysis by Fisher’s exact test.

c Analysis by Chi-square test.

the 45? line denoting the y = x line, then the model fits the data well [18]. Independent-samples t-test, Fisher exact test, and chi-square test were used to identify any significant differences in baseline characteris- tics of alive and dead patients. All analyses were performed in R studio using pROC, Hmisc, rms, and ResourceSelection packages.

  1. Results

A total of 3064 patients meeting the inclusion criteria were enrolled. After applying the exclusion criteria, 2205 patients remained with an overall in-hospital mortality of 19% (N = 426) (Table 1). Patients were predominantly elderly (median age 64 years, IQR: 50-77, minimum-maximum: 18-98) with fairly equal Gender distribution (53% male). The included patients were mostly admitted due to

hematological (21%), digestive (19%), respiratory (14%), circulation (14%), and infectious (14%) disorders. This may also include sepsis pa- tients. The frequency of level 1, 2, and 3 ESI ratings were 9%, 38%, and 53%, respectively. The mean SOFA and mSOFA scores for all patients were 3 +- 2.53 and 2.1 +- 2.13, respectively. About 46% of patients were brought in by emergency medical services and only 6% of patients required mechanical ventilation support. The data in Table 1 indicate that significant differences between alive and dead patients were ob- served for all inspected variables except for gender, diabetes mellitus, smoking, and partial pressure of carbon dioxide (PCO2) (P N 0.32).

The linear predictor of the logistic regression models was equal to

-2.781 + (0.375 x SOFA) and -2.529 + (0.419 x mSOFA), respec-

tively. As shown in Table 2, for SOFA and mSOFA the BS was 0.209 and

0.192 and the BSS was 0.11 and 0.09, respectively. The estimated AUCs

Table 2

performance measures of SOFA and mSOFA models to predict Inhospital mortality in emergency department.

Model

Overall accuracy

Discrimination

Calibration

Brier Score

Brier Skill Score

AUC

95% CI

H-L test

P value

SOFA

0.209

0.11

0.751

0.735 to 0.769

X2 (8) = 2.5

0.963

mSOFA

0.192

0.09

0.739

0.736 to 0.765

X2 (8) = 3.9

0.865

Abbreviations: AUC, Area Under the Receiver Operating Characteristic Curve; CI, confidence interval; H-L, Hosmer-Lemeshow; SOFA, Sequential Organ Failure Assessment; mSOFA, mod- ified SOFA.

Fig. 2. receiver operating characteristic curves of SOFA (0.751) and mSOFA (0.739) models in emergency department.

Table 3 Sensitivity, specificity, PPV, NPV, and accuracy for the SOFA and mSOFA models to predict inhospital mortality in emergency department.

and mSOFA: X2 (8) = 3.9, P = 0.865). Also, the calibration plots re- vealed good agreement between the actual and predicted probabilities (Fig. 3).

  1. Discussion
    1. Main findings

Overcrowding and understaffing of ED have turned into a topic of discussion especially in Developing countries, where resource- limitation is a serious problem. Under such circumstances, employing an accurate and easy-to-use risk stratification scoring system is a sensi- ble approach to prioritize heterogeneous and medically complicated patients admitted to the ED. We have demonstrated that SOFA and mSOFA models have fair to good accuracy for predicting in-hospital mortality when applied to patients in ED. Among patients who experienced in-hospital mortality, the initial SOFA and mSOFA scores were significantly higher (5 vs. 2.5 and 3.8 vs. 1.7, respectively) relative to patients who did not experience the outcome.

Furthermore, we found that in addition to all SOFA and mSOFA var- iables, alive and dead samples differed significantly with respect to age, ESI, partial pressure of carbon dioxide (PCO2), bicarbonate (HCO3), urea, support for mechanical ventilation, Ambulance transferring mode, and diagnosis (P b 0.007). The aforementioned variables might be consid- ered as potential predictors of in-hospital mortality to be included in future versions of SOFA in ED.

Quantitative performance measures revealed that both models performed equally well in predicting in-hospital mortality in ED (AUC SOFA = 0.751 and mSOFA = 0.739), which conforms to the previous findings [13]. Thus, mSOFA is easier to implement in the current time- limited setting [19]. The sensitivity (SOFA: 0.68 and mSOFA: 0.66) and specificity (SOFA: 0.69 and mSOFA: 0.69) values were fair on the best

Model

Threshold

Sensitivity

Specificity

PPV

NPV

Accuracy

threshold scores (SOFA: 3.5 and mSOFA: 2.5). Moreover, an NPV value

SOFA

3.5

0.68

0.69

0.34

0.90

0.69

N0.89 indicates that these models predict alive cases better than the

mSOFA

2.5

0.66

0.69

0.34

0.89

0.68

dead cases (PPV = 0.34).

Abbreviations: PPV, Positive Predictive Value; NPV, Negative Predictive Value; SOFA, Se- quential Organ Failure Assessment; mSOFA, modified SOFA.

of SOFA and mSOFA models were 0.751 (95% CI 0.735 to 0.769) and 0.739 (95% CI 0.736 to 0.765), respectively with no significant difference (95% CI -0.027 to 0.006, P = 0.186) (Fig. 2). The best discriminative thresholds for SOFA and mSOFA were 3.5 and 2.5, respectively. Further performance indices are reported in Table 3. The Hosmer-Lemeshow test did not show deviations between the predicted probabilities and the proportions of mortality (SOFA: X2 (8) = 2.5, P = 0.963

Comparison to similar studies

The inability of comorbidities (i.e. diabetes mellitus and smoking) to differentiate between deceased and aLive patients, confirms the results of a recent study by Arnold et al. [20] showing that diabetes and drug abuse fail to be accurate predictors for in-hospital mortality when the patients are admitted to the ED with a non-Metabolic disorder. It has been previously found that gender also is not a significant risk factor for in-hospital mortality in the emergency setting [21,22]. As shown in Table 4, most of similar studies have evaluated the predictive perfor- mance of SOFA for predicting in-hospital mortality in ED among

Fig. 3. Smoothed calibration plots of the SOFA (left) and mSOFA (right) models using 1000 bootstrap replicates of 2205 patients admitted to the emergency department.

Table 4

Published evaluation studies of SOFA and mSOFA in emergency department.

Study

Year

Country

Patients (N)

Male Gender

Age

Common diagnosis

Model

SOFA-mSOFA

AUC

SOFA-mSOFA

[21]

2009

United States

248

48%

57 +- 16

Sepsis

7.1

0.75

[23]

2014

Australia

240

64%

56

Sepsis

5

0.78

[24]

2016

Iran

140

46%

68 +- 18

Sepsis and infection

7

0.73

[25]

2016

China

477

38%

73

Infection

4

0.68

[22]

2017

Turkey

200

19%

74 +- 15

Sepsis

4

0.68

Current study

2017

Iran

2205

53%

62 +- 19

Case-mix (8% sepsis)

3-2.1

0.75-0.74

Abbreviations: SOFA, Sequential Organ Failure Assessment; mSOFA, modified SOFA.

patients with infection [21-25]. The AUC of SOFA ranged from 0.68 to

0.78 which is in line with the findings of the present study.

Limitations and strengths

This study has various limitations. First, since there were no previously published models providing probabilities based on SOFA and mSOFA scores, we developed logistic regression models aiming to internally validate the models based on these scores. In terms of model discrimination, our results effectively amount for an external validation because the logistic regression models based on the sole SOFA or mSOFA score will have the same discrimination ability as the scores themselves. However, in terms of calibration, the generalizability of the results to new samples may be more limited. Second, a single point was considered to evaluate the performance of the models, while a time series design with multiple time-points would have pro- vided more complete assessment of the models’ behavior over time. Third, inclusion of patients admitted to a single ED limits the generaliz- ability of results.

  1. Conclusions

The SOFA and mSOFA scores had comparable fair discrimination and good calibration in predicting in-hospital mortality when adapted to all- cause emergency admissions. However, clinical use of such models at the individual patient level is not warranted and would require external validation and impact assessment.

Conflicts of interest and source of funding

No conflict of interest has been declared by the authors. This study was part of the first author’s MSc thesis which was supported by a grant from Mashhad University of Medical Science Research Council (Khorasan Razavi, Mashhad, Iran; Number: 941594).

References

  1. Eitel DR, Rudkin SE, Malvehy MA, Killeen JP, Pines JM. Improving service quality by understanding emergency Department flow: a white paper and position statement prepared for the American Academy of Emergency Medicine. J Emerg Med 2010; 38:70-9.
  2. Fromm Jr RE, Gibbs LR, McCallum WG, et al. Critical care in the emergency depart- ment: a time-based study. Crit Care Med 1993;21:970-6.
  3. Buist M, Bernard S, Nguyen TV, Moore G, Anderson J. Association between clinically abnormal observations and subsequent in-hospital mortality: a prospective study. Resuscitation 2004;62:137-41.
  4. Kause J, Smith G, Prytherch D, Parr M, Flabouris A, Hillman K. A comparison of ante- cedents to cardiac arrests, deaths and emergency intensive care admissions in Australia and New Zealand, and the United Kingdom–the ACADEMIA study. Resus- citation 2004;62:275-82.
  5. Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute physiology and chronic health evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients. Crit Care Med 2006;34:1297-310.
  6. Moreno RP, Metnitz PG, Almeida E, et al. SAPS 3–from evaluation of the patient to evaluation of the intensive care unit. Part 2: development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med 2005;31:1345-55.
  7. Le Gall JR, Klar J, Lemeshow S, et al. The Logistic Organ Dysfunction system. A new way to assess organ dysfunction in the intensive care unit. ICU Scoring Group. JAMA 1996;276:802-10.
  8. Vincent JL, Moreno R, Takala J, et al. The SOFA (Sepsis-related Organ Failure Assess- ment) score to describe organ dysfunction/failure. On behalf of the working group on sepsis-related problems of the European Society of Intensive Care Medicine. In- tensive Care Med 1996;22:707-10.
  9. Varghese YE, Kalaiselvan MS, Renuka MK, Arunkumar AS. Comparison of acute phys- iology and chronic health evaluation II (APACHE II) and acute physiology and chronic health evaluation IV (APACHE IV) severity of illness scoring systems, in a multidisciplinary ICU. J Anaesthesiol Clin Pharmacol 2017;33:248-53.
  10. Metnitz PG, Moreno RP, Almeida E, et al. SAPS 3–from evaluation of the patient to evaluation of the intensive care unit. Part 1: objectives, methods and cohort descrip- tion. Intensive Care Med 2005;31:1336-44.
  11. Minne L, Abu-Hanna A, de Jonge E. Evaluation of SOFA-based models for predicting mortality in the ICU: a systematic review. Crit Care 2008;12 (R161).
  12. Jones AE, Fitch MT, Kline JA. Operational performance of validated physiologic scor- ing systems for predicting in-hospital mortality among critically ill emergency de- partment patients. Crit Care Med 2005;33:974-8.
  13. Grissom CK, Brown SM, Kuttler KG, et al. A modified sequential organ failure assess- ment score for critical care triage. Disaster Med Public Health Prep 2010;4:277-84.
  14. Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1-3.
  15. Murphy AH. A new vector partition of the probability score. J Appl Meteorol 1973; 12:595-600.
  16. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Bio- metrics 1988;44:837-45.
  17. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med 1997;16:965-80.
  18. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010;21: 128-38.
  19. Rubinson L, Knebel A, Hick JL. MSOFA: an important step forward, but are we spend- ing too much time on the SOFA? Disaster Med Public Health Prep 2010;4:270-2.
  20. Arnold RC, Sherwin R, Shapiro NI, et al. Multicenter observational study of the devel- opment of progressive organ dysfunction and therapeutic interventions in normo- tensive sepsis patients in the emergency department. Acad Emerg Med 2013;20: 433-40.
  21. Jones AE, Trzeciak S, Kline JA. The Sequential Organ Failure Assessment score for predicting outcome in patients with severe sepsis and evidence of hypoperfusion at the time of emergency department presentation. Crit Care Med 2009;37: 1649-54.
  22. Gunes Ozaydin M, Guneysel O, Saridogan F, Ozaydin V. Are scoring systems suffi- cient for predicting mortality due to sepsis in the emergency department? Turk J Emerg Med 2017;17:25-8.
  23. Macdonald SP, Arendts G, Fatovich DM, Brown SG. Comparison of PIRO, SOFA, and MEDS scores for predicting mortality in emergency department patients with severe sepsis and septic shock. Acad Emerg Med 2014;21:1257-63.
  24. Safari S, Shojaee M, Rahmati F, et al. Accuracy of SOFA score in prediction of 30-day outcome of critically ill patients. Turk J Emerg Med 2016;16:146-50.
  25. Wang J-Y, Chen Y-X, Guo S-B, Mei X, Yang P. Predictive performance of quick Sepsis- related Organ Failure Assessment for mortality and ICU admission in patients with infection at the ED. Am J Emerg Med 2016;34:1788-93.