Publications by year
In Press
Ukoumunne OC, Hyde C, Ozolins M, Zhelev Z, Errington S, Taylor RS, Benton C, Moody J, Cocking L, Watson J, et al (In Press). A directly comparative two-gate case-control diagnostic accuracy study of the pure tone screen and HearCheck Screener tests for identifying hearing impairment in school children. BMJ Open
Zhelev Z, Ohtake H, Iwata M, Terasawa T, Rogers M, Peters J, Hyde C (In Press). Diagnostic accuracy of contemporary and high-sensitivity cardiac troponin assays used in serial testing, versus single-sample testing as a comparator, to triage patients suspected of acute non- ST-segment elevation myocardial infarction: a systematic review protocol.
BMJ OpenAbstract:
Diagnostic accuracy of contemporary and high-sensitivity cardiac troponin assays used in serial testing, versus single-sample testing as a comparator, to triage patients suspected of acute non- ST-segment elevation myocardial infarction: a systematic review protocol
Introduction Although the new generation of cardiac
troponin assays have revolutionised the diagnosis of
myocardial infarction (MI), their application in triaging
patients with suspected acute coronary syndrome
requires further investigation. The objectives of the current
systematic review are to evaluate the diagnostic accuracy
of contemporary and high-sensitivity cardiac troponin
assays used in serial testing, versus single-sample testing
as a comparator, to identify patients with non-ST-segmentelevation
MI in the emergency department.
Methods and analysis We will conduct systematic
searches of MEDLINE, EMBASE, Science Citation Index,
the Cochrane Database of Systematic Reviews and the
CENTRAL database covering the period from 1 January 2006
to present, with no restrictions on language or publication
status. Two review authors will independently screen studies
for inclusion, extract data from eligible studies and assess
their methodological quality using Quality Assessment
of Diagnostic Accuracy Studies version 2. Studies will be
included if they evaluate contemporary or high-sensitivity
cardiac troponin assays used in serial testing, in patients
presenting to the ED with suspicion of MI. Estimates of
sensitivity and specificity from each study will be presented
in forest plots and in the receiver-operating characteristics
space. If appropriate, we will pool the results using Bayesian
hierarchical models that allow correction for imperfect
reference standard. We will obtain summary estimates of
sensitivity and specificity of alternative testing protocols and
compare their accuracy. We will also investigate the impact
of prespecified sources of heterogeneity and methodological
quality items. If pooling of results is considered inappropriate,
we will present our findings in tables and diagrams and will
describe them narratively.
Ethics and dissemination No formal ethical approval will
be sought, but we will report on the ethical approval of the
included studies. Dissemination of findings will be through
publications in peer-reviewed journals, presentations at
conferences and the websites of the universities.
Abstract.
Hardwick RJL, Heaton J, Griffiths G, Vaidya B, Child S, Fleming S, Hamilton WT, Tomlinson J, Zhelev Z, Patterson A, et al (In Press). Exploring reasons for variation in ordering thyroid function tests in primary care: a qualitative study.
Quality in Primary Care,
22, 256-261.
Abstract:
Exploring reasons for variation in ordering thyroid function tests in primary care: a qualitative study
Background: the ordering of thyroid function tests (TFTs) is increasing but there is not a similar increase in thyroid disorders in the general population, leading some to query whether inappropriate testing is taking place. Inconsistent clinical practice is thought to be a cause of this, but there is little evidence of the views of general practitioners, practice nurses or practice managers on the reasons for variation in the ordering of TFTs.Aim: to find out the reasons for variation in ordering of TFTs from the perspective of primary healthcare professionals Methods: Fifteen semi-structured interviews were carried out with primary healthcare professionals (general practitioners, practice nurses, practice managers) that used one laboratory of a general hospital in South West England for TFTs. Framework Analysis was used to analyse views on test ordering variation at the societal, practice, individual practitioner and patient level.Results: a number of reasons for variation in ordering across practices were suggested. These related to: primary healthcare professionals awareness of and adherence to national policy changes; practices having different protocols on TFTs ordering; the set-up and use of computer systems in practices; the range of practice healthcare professionals able to order TFTs; greater risk-aversion amongst general practitioners and changes in their training and finally how primary healthcare staff responded to patients who were perceived to seek help more readily than in the past.Conclusion: the reasons for variation in TFTs ordering are complex and interdependent. Interventions to reduce variation in TFTs ordering need to consider multiple behavioural and contextual factors to be most effective.
Abstract.
2023
Zhelev Z, Peters J, Rogers M, Allen M, Kijauskaite G, Seedat F, Wilkinson E, Hyde C (2023). Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: a systematic review.
J Med Screen,
30(3), 97-112.
Abstract:
Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: a systematic review.
OBJECTIVES: to systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening. METHODS: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively. RESULTS: Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was
Abstract.
Author URL.
2022
Thompson G, Zhelev Z, Hunt H, Hyde C (2022). It was not easy to identify the study design from the title and abstract of articles indexed as diagnostic (test) accuracy studies in EMBASE in 2012 and 2019.
J Clin Epidemiol,
144, 102-110.
Abstract:
It was not easy to identify the study design from the title and abstract of articles indexed as diagnostic (test) accuracy studies in EMBASE in 2012 and 2019.
OBJECTIVE: to quantify use of shorthand description of research design in the titles and abstracts of diagnostic test accuracy studies, comparing 2012 and 2019. STUDY DESIGN AND SETTING: Joint examination, using pre-specified criteria, by two investigators of 320 randomly sampled articles indexed as "diagnostic (test) accuracy studies" in EMBASE in 2012 and 2019. RESULTS: the percentage of abstracts with shorthand descriptions of study design was 11% in 2012 and 15% in 2019, a difference of 4% (95% CI -3, 12). Although use of the term accuracy in the abstract did increase (58% in 2012 to 74% in 2019, difference 16% (95% CI 5, 26)), accuracy was only used to convey purpose or design in 49% (95% CI 43, 56) of abstracts where accuracy appeared (2012+2019). CONCLUSION: it is difficult to identify the study design of test evaluations from information in the title and abstract. This is important because bias is associated with different study designs. Developing a limited number of standardised, widely understood study design descriptions could greatly improve clarity of the only freely available information on many pieces of medical research. It may be helpful that the fact that a study addresses test accuracy be part of shorthand descriptions.
Abstract.
Author URL.
Ohtake H, Terasawa T, Zhelev Z, Iwata M, Rogers M, Peters JL, Hyde C (2022). Serial high-sensitivity cardiac troponin testing for the diagnosis of myocardial infarction: a scoping review.
BMJ Open,
12(11), e066429-e066429.
Abstract:
Serial high-sensitivity cardiac troponin testing for the diagnosis of myocardial infarction: a scoping review
ObjectivesWe aimed to assess the diversity and practices of existing studies on several assays and algorithms for serial measurements of high-sensitivity cardiac troponin (hs-cTn) for risk stratification and the diagnosis of myocardial infarction (MI) and 30-day outcomes in patients suspected of having non-ST-segment elevation MI (NSTEMI).MethodsWe searched multiple databases including MEDLINE, EMBASE, Science Citation Index, the Cochrane Database of Systematic Reviews and the CENTRAL databases for studies published between January 2006 and November 2021. Studies that assessed the diagnostic accuracy of serial hs-cTn testing in patients suspected of having NSTEMI in the emergency department (ED) were eligible. Data were analysed using the scoping review method.ResultsWe included 86 publications, mainly from research centres in Europe, North America and Australasia. Two hs-cTn assays, manufactured by Abbott (43/86) and Roche (53/86), dominated the evaluations. The studies most commonly measured the concentrations of hs-cTn at two time points, at presentation and a few hours thereafter, to assess the two-strata or three-strata algorithm for diagnosing or ruling out MI. Although data from 83 studies (97%) were prospectively collected, 0%–90% of the eligible patients were excluded from the analysis due to missing blood samples or the lack of a final diagnosis in 53 studies (62%) that reported relevant data. Only 19 studies (22%) reported on head-to-head comparisons of alternative assays.ConclusionEvidence on the accuracy of serial hs-cTn testing was largely derived from selected research institutions and relied on two specific assays. The proportions of the eligible patients excluded from the study raise concerns about directly applying the study findings to clinical practice in frontline EDs.PROSPERO registration numberCRD42018106379.
Abstract.
Allen M, James C, Frost J, Liabo K, Pearn K, Monks T, Zhelev Z, Logan S, Everson R, James M, et al (2022). Using simulation and machine learning to maximise the benefit of intravenous thrombolysis in acute stroke in England and Wales: the SAMueL modelling and qualitative study.
Health and Social Care Delivery Research,
10(31), 1-148.
Abstract:
Using simulation and machine learning to maximise the benefit of intravenous thrombolysis in acute stroke in England and Wales: the SAMueL modelling and qualitative study
BackgroundStroke is a common cause of adult disability. Expert opinion is that about 20% of patients should receive thrombolysis to break up a clot causing the stroke. Currently, 11–12% of patients in England and Wales receive this treatment, ranging between 2% and 24% between hospitals.ObjectivesWe sought to enhance the national stroke audit by providing further analysis of the key sources of inter-hospital variation to determine how a target of 20% of stroke patients receiving thrombolysis may be reached.DesignWe modelled three aspects of the thrombolysis pathway, using machine learning and clinical pathway simulation. In addition, the project had a qualitative research arm, with the objective of understanding clinicians’ attitudes to use of modelling and machine learning applied to the national stroke audit.Participants and data sourceAnonymised data were collected for 246,676 emergency stroke admissions to acute stroke teams in England and Wales between 2016 and 2018, obtained from the Sentinel Stroke National Audit Programme.ResultsUse of thrombolysis could be predicted with 85% accuracy for those patients with a chance of receiving thrombolysis (i.e. those arriving within 4 hours of stroke onset). Machine learning models allowed prediction of likely treatment choice for each patient at all hospitals. A clinical pathway simulation predicted hospital thrombolysis use with an average absolute error of 0.5 percentage points. We found that about half of the inter-hospital variation in thrombolysis use came from differences in local patient populations, and half from in-hospital processes and decision-making. Three changes were applied to all hospitals in the model: (1) arrival to treatment in 30 minutes, (2) proportion of patients with determined stroke onset times set to at least the national upper quartile and (3) thrombolysis decisions made based on majority vote of a benchmark set of 30 hospitals. Any single change alone was predicted to increase national thrombolysis use from 11.6% to between 12.3% and 14.5% (with clinical decision-making having the most effect). Combined, these changes would be expected to increase thrombolysis to 18.3% (and to double the clinical benefit of thrombolysis, as speed increases also improve clinical benefit independently of the proportion of patients receiving thrombolysis); however, there would still be significant variation between hospitals depending on local patient population. For each hospital, the effect of each change could be predicted alone or in combination. Qualitative research with 19 clinicians showed that engagement with, and trust in, the model was greatest in physicians from units with higher thrombolysis rates. Physicians also wanted to see a machine learning model predicting outcome with probability of adverse effect of thrombolysis to counter a fear that driving thrombolysis use up may cause more harm than good.LimitationsModels may be built using data available in the Sentinel Stroke National Audit Programme only. Not all factors affecting use of thrombolysis are contained in Sentinel Stroke National Audit Programme data and the model, therefore, provides information on patterns of thrombolysis use in hospitals, but is not suitable for, or intended as, a decision aid to thrombolysis.ConclusionsMachine learning and clinical pathway simulation may be applied at scale to national audit data, allowing extended use and analysis of audit data. Stroke thrombolysis rates of at least 18% look achievable in England and Wales, but each hospital should have its own target.Future workFuture studies should extend machine learning modelling to predict the patient-level outcome and probability of adverse effects of thrombolysis, and apply co-production techniques, with clinicians and other stakeholders, to communicate model outputs.FundingThis project was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research programme and will be published in full inHealth and Social Care Delivery Research; Vol. 10, No. 31. See the NIHR Journals Library website for further project information.
Abstract.
2021
Thompson G, Zhelev Z, Peters J, Khalid S, Briscoe S, Shaw L, Nunns M, Ludman S, Hyde C (2021). Symptom scores in the diagnosis of pediatric cow's milk protein allergy: a systematic review.
Pediatr Allergy Immunol,
32(7), 1497-1507.
Abstract:
Symptom scores in the diagnosis of pediatric cow's milk protein allergy: a systematic review.
BACKGROUND: Cow's milk protein allergy (CMPA) is an immune-mediated allergic response to proteins in milk that is common in infants. Broad CMPA symptoms make diagnosis a challenge, particularly in primary care. Symptom scores may improve a clinician's awareness of symptoms, indicating a need for further testing. This systematic review examined the development and evaluation of such symptom scores for use in infants. METHODS: CENTRAL, MEDLINE, EMBASE and CINAHL databases were searched from inception to 3 December 2019 (Updated 14 November 2020) for diagnostic accuracy studies, randomised controlled trials, observational studies, economic evaluations, qualitative studies and studies reporting development of the tools. Data were not suitable for meta-analysis due to clinical and methodological heterogeneity, so were narratively synthesised. RESULTS: We found two symptom scores evaluated in one and fourteen studies, respectively. Estimated sensitivity and specificity ranged from 37% to 98% and 38% to 93%. The evaluations of each tool were at high risk of bias or failed to address issues such as clinical and cost-effectiveness. CONCLUSIONS: Estimates of accuracy of symptom scores for CMPA offered so far should be interpreted cautiously. Rigorous, conflict-free research based on well-defined roles for the tools is urgently required.
Abstract.
Author URL.
2020
Thompson G, Zhelev Z, Peters J, Khalid S, Briscoe S, Shaw L, Nunns M, Ludman S, Hyde C (2020). A comprehensive evaluation of symptom scores designed to inform the triage and diagnosis of cow’s milk protein allergy in children: a systematic review of the research evidence.
Rakshasbhuvankar AA, Nagarajan L, Zhelev Z, Rao SC (2020). Amplitude-integrated electroencephalography compared with conventional video-electroencephalography for detection of neonatal seizures.
Cochrane Database of Systematic Reviews,
2020(3).
Abstract:
Amplitude-integrated electroencephalography compared with conventional video-electroencephalography for detection of neonatal seizures
This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:. Our primary objective is to assess the accuracy of aEEG against the reference standard cEEG for detection of 'neonates with seizures' and 'individual seizures'. Detection of 'neonates with seizures': this refers to the ability of the index test to identify a 'neonate' as 'seizure positive' or 'seizure negative' correctly based on the detection of at least one seizure episode in the entire aEEG recording of the neonate. Detection of 'individual seizures': this refers to the ability of the index test to identify an 'individual' seizure episode within the same neonate correctly rather than just diagnosing the neonate as 'seizure positive' or 'seizure negative'. Diagnosis of an 'individual seizure' episode is important for optimal management of seizures. If data are available, we will perform subgroup analysis for seizure detection where duration of monitoring is less than or equal to six hours. This subgroup is particularly important as six hours is the cut-off point to decide whether infants with hypoxic ischaemic encephalopathy require therapeutic hypothermia (Shankaran 2005).
Abstract.
Mallett S, Allen AJ, Graziadio S, Taylor S, Sakai NS, Green K, Suklan J, Hyde C, Shinkins B, Zhelev Z, et al (2020). At what times during infection is SARS-CoV-2 detectable and no longer detectable using RT-PCR based tests?: a systematic review of individual participant data.
Mallett S, Allen AJ, Graziadio S, Taylor SA, Sakai NS, Green K, Suklan J, Hyde C, Shinkins B, Zhelev Z, et al (2020). At what times during infection is SARS-CoV-2 detectable and no longer detectable using RT-PCR-based tests? a systematic review of individual participant data.
BMC Medicine,
18(1).
Abstract:
At what times during infection is SARS-CoV-2 detectable and no longer detectable using RT-PCR-based tests? a systematic review of individual participant data
Background: Tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral ribonucleic acid (RNA) using reverse transcription polymerase chain reaction (RT-PCR) are pivotal to detecting current coronavirus disease (COVID-19) and duration of detectable virus indicating potential for infectivity. Methods: We conducted an individual participant data (IPD) systematic review of longitudinal studies of RT-PCR test results in symptomatic SARS-CoV-2. We searched PubMed, LitCOVID, medRxiv, and COVID-19 Living Evidence databases. We assessed risk of bias using a QUADAS-2 adaptation. Outcomes were the percentage of positive test results by time and the duration of detectable virus, by anatomical sampling sites. Results: of 5078 studies screened, we included 32 studies with 1023 SARS-CoV-2 infected participants and 1619 test results, from − 6 to 66 days post-symptom onset and hospitalisation. The highest percentage virus detection was from nasopharyngeal sampling between 0 and 4 days post-symptom onset at 89% (95% confidence interval (CI) 83 to 93) dropping to 54% (95% CI 47 to 61) after 10 to 14 days. On average, duration of detectable virus was longer with lower respiratory tract (LRT) sampling than upper respiratory tract (URT). Duration of faecal and respiratory tract virus detection varied greatly within individual participants. In some participants, virus was still detectable at 46 days post-symptom onset. Conclusions: RT-PCR misses detection of people with SARS-CoV-2 infection; early sampling minimises false negative diagnoses. Beyond 10 days post-symptom onset, lower RT or faecal testing may be preferred sampling sites. The included studies are open to substantial risk of bias, so the positivity rates are probably overestimated.
Abstract.
2019
Walker GB, Zhelev Z, Henschke N, Fridhandler J, Yip S (2019). Prehospital Stroke Scales as Screening Tools for Early Identification of Stroke and Transient Ischemic Attack.
Stroke,
50(10), e285-e286.
Author URL.
Zhelev Z, Walker G, Henschke N, Fridhandler J, Yip S (2019). Prehospital stroke scales as screening tools for early identification of stroke and transient ischemic attack.
Cochrane Database Syst Rev,
4(4).
Abstract:
Prehospital stroke scales as screening tools for early identification of stroke and transient ischemic attack.
BACKGROUND: Rapid and accurate detection of stroke by paramedics or other emergency clinicians at the time of first contact is crucial for timely initiation of appropriate treatment. Several stroke recognition scales have been developed to support the initial triage. However, their accuracy remains uncertain and there is no agreement which of the scales perform better. OBJECTIVES: to systematically identify and review the evidence pertaining to the test accuracy of validated stroke recognition scales, as used in a prehospital or emergency room (ER) setting to screen people suspected of having stroke. SEARCH METHODS: We searched CENTRAL, MEDLINE (Ovid), Embase (Ovid) and the Science Citation Index to 30 January 2018. We handsearched the reference lists of all included studies and other relevant publications and contacted experts in the field to identify additional studies or unpublished data. SELECTION CRITERIA: We included studies evaluating the accuracy of stroke recognition scales used in a prehospital or ER setting to identify stroke and transient Ischemic attack (TIA) in people suspected of stroke. The scales had to be applied to actual people and the results compared to a final diagnosis of stroke or TIA. We excluded studies that applied scales to patient records; enrolled only screen-positive participants and without complete 2 × 2 data. DATA COLLECTION AND ANALYSIS: Two review authors independently conducted a two-stage screening of all publications identified by the searches, extracted data and assessed the methodologic quality of the included studies using a tailored version of QUADAS-2. A third review author acted as an arbiter. We recalculated study-level sensitivity and specificity with 95% confidence intervals (CI), and presented them in forest plots and in the receiver operating characteristics (ROC) space. When a sufficient number of studies reported the accuracy of the test in the same setting (prehospital or ER) and the level of heterogeneity was relatively low, we pooled the results using the bivariate random-effects model. We plotted the results in the summary ROC (SROC) space presenting an estimate point (mean sensitivity and specificity) with 95% CI and prediction regions. Because of the small number of studies, we did not conduct meta-regression to investigate between-study heterogeneity and the relative accuracy of the scales. Instead, we summarized the results in tables and diagrams, and presented our findings narratively. MAIN RESULTS: We selected 23 studies for inclusion (22 journal articles and one conference abstract). We evaluated the following scales: Cincinnati Prehospital Stroke Scale (CPSS; 11 studies), Recognition of Stroke in the Emergency Room (ROSIER; eight studies), Face Arm Speech Time (FAST; five studies), Los Angeles Prehospital Stroke Scale (LAPSS; five studies), Melbourne Ambulance Stroke Scale (MASS; three studies), Ontario Prehospital Stroke Screening Tool (OPSST; one study), Medic Prehospital Assessment for Code Stroke (MedPACS; one study) and PreHospital Ambulance Stroke Test (PreHAST; one study). Nine studies compared the accuracy of two or more scales. We considered 12 studies at high risk of bias and one with applicability concerns in the patient selection domain; 14 at unclear risk of bias and one with applicability concerns in the reference standard domain; and the risk of bias in the flow and timing domain was high in one study and unclear in another 16.We pooled the results from five studies evaluating ROSIER in the ER and five studies evaluating LAPSS in a prehospital setting. The studies included in the meta-analysis of ROSIER were of relatively good methodologic quality and produced a summary sensitivity of 0.88 (95% CI 0.84 to 0.91), with the prediction interval ranging from approximately 0.75 to 0.95. This means that the test will miss on average 12% of people with stroke/TIA which, depending on the circumstances, could range from 5% to 25%. We could not obtain a reliable summary estimate of specificity due to extreme heterogeneity in study-level results. The summary sensitivity of LAPSS was 0.83 (95% CI 0.75 to 0.89) and summary specificity 0.93 (95% CI 0.88 to 0.96). However, we were uncertain in the validity of these results as four of the studies were at high and one at uncertain risk of bias. We did not report summary estimates for the rest of the scales, as the number of studies per test per setting was small, the risk of bias was high or uncertain, the results were highly heterogenous, or a combination of these.Studies comparing two or more scales in the same participants reported that ROSIER and FAST had similar accuracy when used in the ER. In the field, CPSS was more sensitive than MedPACS and LAPSS, but had similar sensitivity to that of MASS; and MASS was more sensitive than LAPSS. In contrast, MASS, ROSIER and MedPACS were more specific than CPSS; and the difference in the specificities of MASS and LAPSS was not statistically significant. AUTHORS' CONCLUSIONS: in the field, CPSS had consistently the highest sensitivity and, therefore, should be preferred to other scales. Further evidence is needed to determine its absolute accuracy and whether alternatives scales, such as MASS and ROSIER, which might have comparable sensitivity but higher specificity, should be used instead, to achieve better overall accuracy. In the ER, ROSIER should be the test of choice, as it was evaluated in more studies than FAST and showed consistently high sensitivity. In a cohort of 100 people of whom 62 have stroke/TIA, the test will miss on average seven people with stroke/TIA (ranging from three to 16). We were unable to obtain an estimate of its summary specificity. Because of the small number of studies per test per setting, high risk of bias, substantial differences in study characteristics and large between-study heterogeneity, these findings should be treated as provisional hypotheses that need further verification in better-designed studies.
Abstract.
Author URL.
Olsen M, Zhelev Z, Hunt H, Peters JL, Bossuyt P, Hyde C (2019). Use of test accuracy study design labels in NICE's diagnostic guidance.
Diagn Progn Res,
3Abstract:
Use of test accuracy study design labels in NICE's diagnostic guidance.
BACKGROUND: a variety of study designs are available to evaluate the accuracy of tests, but the terms used to describe these designs seem to lack clarity and standardization. We investigated if this was the case in the diagnostic guidance of the National Institute of Care and Health Excellence (NICE), an influential source of advice on the value of tests. OBJECTIVES: to describe the range of study design terms and labels used to distinguish study designs in NICE Diagnostic Guidance and the underlying evidence reports. METHODS: We carefully examined all NICE Diagnostic Guidance that has been developed from inception in 2011 until 2018 and the corresponding diagnostic assessment reports that summarized the evidence, focusing on guidance where tests were considered for diagnosis. We abstracted labels used to describe study designs and investigated what labels were used when studies were weighted differently because of their design (in terms of validity of evidence), in relevant sections. We made a descriptive analysis to assess the range of labels and also categorized labels by design features. RESULTS: from a total of 36 pieces of guidance, 20 (56%) were eligible and 17 (47%) were included in our analysis. We identified 53 unique design labels, of which 19 (36%) were specific to diagnostic test accuracy designs. These referred to a total of 12 study design features. Labels were used in assigning different weights to studies in seven of the reports (41%) but never in the guidance documents. CONCLUSION: Our study confirms a lack of clarity and standardization of test accuracy study design terms. There seems to be scope to reduce and harmonize the number of terms and still capture the design features that were deemed influential by those compiling the evidence reports. This should help decision makers in quickly identifying subgroups of included studies that should be weighted differently because their designs are more susceptible to bias.
Abstract.
Author URL.
2018
Rachuba S, Salmon A, Zhelev Z, Pitt M (2018). Redesigning the diagnostic pathway for chest pain patients in emergency departments.
Health Care Management Science,
21(2), 177-191.
Abstract:
Redesigning the diagnostic pathway for chest pain patients in emergency departments
Patients presenting with chest pain at an emergency department in the United Kingdom receive troponin tests to assess the likelihood of an acute myocardial infarction (AMI). Until recently, serial testing with two blood samples separated by at least six hours was necessary in order to analyse the change in troponin levels over time. New high-sensitivity troponin tests, however, allow the inter-test time to be shortened from six to three hours. Recent evidence also suggests that the new generation of troponin tests can be used to rule out AMI on the basis of a single test if patients at low risk of AMI present with very low cardiac troponin levels more than three hours after onset of worst pain. This paper presents a discrete event simulation model to assess the likely impact on the number of hospital admissions if emergency departments adopt strategies for serial and single testing based on the use of high-sensitivity troponin. Data sets from acute trusts in the South West of England are used to quantify the resulting benefits.
Abstract.
2017
Chisnell J, Marshall T, Hyde C, Zhelev Z, Fleming LE (2017). A content analysis of the representation of statins in the British newsprint media.
BMJ Open,
7(8).
Abstract:
A content analysis of the representation of statins in the British newsprint media.
OBJECTIVE: This study reviewed the news media coverage of statins, seeking to identify specific trends or differences in viewpoint between media outlets and examine common themes. DESIGN: the study is a content analysis of the frequency and content of the reporting of statins in a selection of the British newsprint media. It involved an assessment of the number, timing and thematic content of articles followed by a discourse analysis examining the underlying narratives. The sample was the output of four UK newspapers, covering a broad-spectrum readership, over a six month timeframe 1 October 2013 to 31 March 2014. RESULTS: a total of 67 articles included reference to statins. The majority (39, 58%) were reporting or responding to publication of a clinical study. The ratio of negative to positive coverage was greater than 2:1 overall. In the more politically right-leaning newspapers, 67% of coverage was predominantly negative (30/45 articles); 32% in the more left-leaning papers (7/22 articles). Common themes were the perceived 'medicalisation' of the population; the balance between lifestyle modification and medical treatments in the primary prevention of heart disease; side effects and effectiveness of statins; pharmaceutical sponsorship and implications for the reliability of evidence; trust between the public and government, institutions, research organisations and the medical profession. CONCLUSIONS: Newsprint media coverage of statins was substantially influenced by the publication of national guidance and by coverage in the medical journals of clinical studies and comment. Statins received a predominantly negative portrayal, notably in the more right-leaning press. There were shared themes: concern about the balance between medication and lifestyle change in the primary prevention of heart disease; the adverse effects of treatment; and a questioning of the reliability of evidence from research institutions, scientists and clinicians in the light of their potential allegiances and funding.
Abstract.
Author URL.
Walker GB, Zhelev Z, Frid handler J, Henschke N, Yip S (2017). Abstract TP252: Prehospital Stroke Scales as a Tool for Early Identification of Stroke and Transient Ischemic Attacks: a Cochrane Systematic Review. Stroke, 48(suppl_1).
2016
Fortnum H, Ukoumunne OC, Hyde C, Taylor RS, Ozolins M, Errington S, Zhelev Z, Pritchard C, Benton C, Moody J, et al (2016). A programme of studies including assessment of diagnostic accuracy of school hearing screening tests and a cost-effectiveness model of school entry hearing screening programmes.
Health Technol Assess,
20(36), 1-178.
Abstract:
A programme of studies including assessment of diagnostic accuracy of school hearing screening tests and a cost-effectiveness model of school entry hearing screening programmes.
BACKGROUND: Identification of permanent hearing impairment at the earliest possible age is crucial to maximise the development of speech and language. Universal newborn hearing screening identifies the majority of the 1 in 1000 children born with a hearing impairment, but later onset can occur at any time and there is no optimum time for further screening. A universal but non-standardised school entry screening (SES) programme is in place in many parts of the UK but its value is questioned. OBJECTIVES: to evaluate the diagnostic accuracy of hearing screening tests and the cost-effectiveness of the SES programme in the UK. DESIGN: Systematic review, case-control diagnostic accuracy study, comparison of routinely collected data for services with and without a SES programme, parental questionnaires, observation of practical implementation and cost-effectiveness modelling. SETTING: Second- and third-tier audiology services; community. PARTICIPANTS: Children aged 4-6 years and their parents. MAIN OUTCOME MEASURES: Diagnostic accuracy of two hearing screening devices, referral rate and source, yield, age at referral and cost per quality-adjusted life-year. RESULTS: the review of diagnostic accuracy studies concluded that research to date demonstrates marked variability in the design, methodological quality and results. The pure-tone screen (PTS) (Amplivox, Eynsham, UK) and HearCheck (HC) screener (Siemens, Frimley, UK) devices had high sensitivity (PTS ≥ 89%, HC ≥ 83%) and specificity (PTS ≥ 78%, HC ≥ 83%) for identifying hearing impairment. The rate of referral for hearing problems was 36% lower with SES (Nottingham) relative to no SES (Cambridge) [rate ratio 0.64, 95% confidence interval (CI) 0.59 to 0.69; p
Abstract.
Author URL.
Zhelev Z, Patel K, Youngman E, Peters J, Lowe J, Cooper C, Shields B, Hattersley A, Hyde C (2016). Autoantibody status as a predictor of future insulin deficiency in patients with diabetes: a systematic review.
Author URL.
Zhelev Z, Abbott R, Rogers M, Fleming S, Patterson A, Hamilton WT, Heaton J, Thompson Coon J, Vaidya B, Hyde C, et al (2016). Effectiveness of interventions to reduce ordering of thyroid function tests: a systematic review.
BMJ Open,
6(6).
Abstract:
Effectiveness of interventions to reduce ordering of thyroid function tests: a systematic review.
OBJECTIVES: to evaluate the effectiveness of behaviour changing interventions targeting ordering of thyroid function tests. DESIGN: Systematic review. DATA SOURCES: MEDLINE, EMBASE and the Cochrane Database up to May 2015. ELIGIBILITY CRITERIA FOR SELECTING STUDIES: We included studies evaluating the effectiveness of behaviour change interventions aiming to reduce ordering of thyroid function tests. Randomised controlled trials (RCTs), non-randomised controlled studies and before and after studies were included. There were no language restrictions. STUDY APPRAISAL AND SYNTHESIS METHODS: 2 reviewers independently screened all records identified by the electronic searches and reviewed the full text of any deemed potentially relevant. Study details were extracted from the included papers and their methodological quality assessed independently using a validated tool. Disagreements were resolved through discussion and arbitration by a third reviewer. Meta-analysis was not used. RESULTS: 27 studies (28 papers) were included. They evaluated a range of interventions including guidelines/protocols, changes to funding policy, education, decision aids, reminders and audit/feedback; often intervention types were combined. The most common outcome measured was the rate of test ordering, but the effect on appropriateness, test ordering patterns and cost were also measured. 4 studies were RCTs. The majority of the studies were of poor or moderate methodological quality. The interventions were variable and poorly reported. Only 4 studies reported unsuccessful interventions but there was no clear pattern to link effect and intervention type or other characteristics. CONCLUSIONS: the results suggest that behaviour change interventions are effective particularly in reducing the volume of thyroid function tests. However, due to the poor methodological quality and reporting of the studies, the likely presence of publication bias and the questionable relevance of some interventions to current day practice, we are unable to draw strong conclusions or recommend the implementation of specific intervention types. Further research is thus justified. TRIAL REGISTRATION NUMBER: CRD42014006192.
Abstract.
Author URL.
Rachuba S, Salmon A, Zhelev Z, Pitt M (2016). Simulating single test rule-out strategies for chest pain patients at emergency departments.
Abstract:
Simulating single test rule-out strategies for chest pain patients at emergency departments
Abstract.
2015
Zhelev Z, Hyde C, Youngman E, Rogers M, Fleming S, Slade T, Coelho H, Jones-Hughes T, Nikolaou V (2015). Diagnostic accuracy of single baseline measurement of Elecsys Troponin T high-sensitive assay for diagnosis of acute myocardial infarction in emergency department: systematic review and meta-analysis.
BMJ,
350Abstract:
Diagnostic accuracy of single baseline measurement of Elecsys Troponin T high-sensitive assay for diagnosis of acute myocardial infarction in emergency department: systematic review and meta-analysis.
OBJECTIVE: to obtain summary estimates of the accuracy of a single baseline measurement of the Elecsys Troponin T high-sensitive assay (Roche Diagnostics) for the diagnosis of acute myocardial infarction in patients presenting to the emergency department. DESIGN: Systematic review and meta-analysis of diagnostic test accuracy studies. DATA SOURCES: Medline, Embase, and other relevant electronic databases were searched for papers published between January 2006 and December 2013. STUDY SELECTION: Studies were included if they evaluated the diagnostic accuracy of a single baseline measurement of Elecsys Troponin T high-sensitive assay for the diagnosis of acute myocardial infarction in patients presenting to the emergency department with suspected acute coronary syndrome. STUDY APPRAISAL AND DATA SYNTHESIS: the first author screened all titles and abstracts identified through the searches and selected all potentially relevant papers. The screening of the full texts, the data extraction, and the methodological quality assessment, using the adapted QUADAS-2 tool, were conducted independently by two reviewers with disagreements being resolved through discussion or arbitration. If appropriate, meta-analysis was conducted using the hierarchical bivariate model. RESULTS: Twenty three studies reported the performance of the evaluated assay at presentation. The results for 14 ng/L and 3-5 ng/L cut-off values were pooled separately. At 14 ng/L (20 papers), the summary sensitivity was 89.5% (95% confidence interval 86.3% to 92.1%) and the summary specificity was 77.1% (68.7% to 83.7%). At 3-5 ng/L (six papers), the summary sensitivity was 97.4% (94.9% to 98.7%) and the summary specificity was 42.4% (31.2% to 54.5%). This means that if 21 of 100 consecutive patients have the target condition (21%, the median prevalence across the studies), 2 (95% confidence interval 2 to 3) of 21 patients with acute myocardial infarction will be missed (false negatives) if 14 ng/L is used as a cut-off value and 18 (13 to 25) of 79 patients without acute myocardial infarction will test positive (false positives). If the 3-5 ng/L cut-off value is used,
Abstract.
Author URL.
Zhelev Z, Hyde C, Fitzgerald JE, Ukoumunne O, Briscoe S, Chisnell J, Grigore B (2015). Tests for screening for hearing loss in children about to start school.
Cochrane Database of Systematic Reviews,
2015(11).
Abstract:
Tests for screening for hearing loss in children about to start school
This is the protocol for a review and there is no abstract. The objectives are as follows: to investigate the accuracy of hearing screening tests, individually or in combination, used in children at or around school entry age (four to eight years old), excluding children with known hearing loss and those unable to perform the test due to significant developmental delay. A secondary objective is to assess the relative diagnostic accuracy of different hearing screening tests when directly compared (within the same study).
Abstract.
Hunt HA, Stanworth S, Curry N, Wolley T, Cooper C, Ukoumunne O, Zhelev Z, Hyde C (2015). Thromboelastography (TEG) and rotational thromboelastometry (ROTEM) for trauma-induced coagulopathy in adult trauma patients with bleeding. Cochrane Database of Systematic Reviews 2015(2).
2014
Walker G, Yip S, Zhelev Z, Henschke N (2014). Prehospital stroke scales as screening tools for early identification of stroke and transient ischemic attack.
Cochrane Database of Systematic Reviews,
2014(12).
Abstract:
Prehospital stroke scales as screening tools for early identification of stroke and transient ischemic attack
This is the protocol for a review and there is no abstract. The objectives are as follows: Among the currently validated prehospital stroke scales, what is the diagnostic accuracy of the index tests for the diagnosis of stroke in prehospital screening of patients suspected of having a stroke? to assess the influence of the following potential sources of heterogeneity. Patient demographics (e.g. age, gender). Type of event (transient ischemic attack (TIA), ischemic or hemorrhagic). The definition of TIA used by the study. Level of training of paramedic staff. Items from the methodological quality checklist.
Abstract.
2013
Zhelev Z, Garside R, Hyde C (2013). A qualitative study into the difficulties experienced by healthcare decision makers when reading a Cochrane diagnostic test accuracy review.
Syst Rev,
2Abstract:
A qualitative study into the difficulties experienced by healthcare decision makers when reading a Cochrane diagnostic test accuracy review.
BACKGROUND: Cochrane reviews are one of the best known and most trusted sources of evidence-based information in health care. While steps have been taken to make Cochrane intervention reviews accessible to a diverse readership, little is known about the accessibility of the newcomer to the Cochrane library: diagnostic test accuracy reviews (DTARs). The current qualitative study explored how healthcare decision makers, who varied in their knowledge and experience with test accuracy research and systematic reviews, read and made sense of DTARs. METHODS: a purposive sample of clinicians, researchers and policy makers (n = 21) took part in a series of think-aloud interviews, using as interview material the first three DTARs published in the Cochrane library. Thematic qualitative analysis of the transcripts was carried out to identify patterns in participants' 'reading' and interpretation of the reviews and the difficulties they encountered. RESULTS: Participants unfamiliar with the design and methodology of DTARs found the reviews largely inaccessible and experienced a range of difficulties stemming mainly from the mismatch between background knowledge and level of explanation provided in the text. Experience with systematic reviews of interventions did not guarantee better understanding and, in some cases, led to confusion and misinterpretation. These difficulties were further exacerbated by poor layout and presentation, which affected even those with relatively good knowledge of DTARs and had a negative impact not only on their understanding of the reviews but also on their motivation to engage with the text. Comparison between the readings of the three reviews showed that more accessible presentation, such as presenting the results as natural frequencies, significantly increased participants' understanding. CONCLUSIONS: the study demonstrates that authors and editors should pay more attention to the presentation as well as the content of Cochrane DTARs, especially if the reports are aimed at readers with various levels of background knowledge and experience. It also raises the question as to the anticipated target audience of the reports and suggests that different groups of healthcare decision-makers may require different modes of presentation.
Abstract.
Author URL.
2003
Freeman, P. Taylor, A. Proykov, T. (2003). The reform of child welfare services in Bulgaria. Journal of Social Work in Europe, 10(3).
2002
Zhelev Z, Bakalova R (2002). Effect of vitamins E and C on transplant-associated atherosclerosis.
Lancet,
360(9331).
Author URL.