Published at MetaROR
March 20, 2025
Table of contents
Curated
Article
Systematic review: The reliability of indicators that may differentiate between suicidal, homicidal, and accidental sharp force wounds
Jason M. Chin1, Stephanie Clayton2, Stephen Cordner3, Gary Edmond4, Bethany Growns5, Kylie Hunter6, Bernard I’Ons7, Kristy A. Martire8, Gianni Ribeiro9, Stephanie Summerby10
1. College of Law, Australian National University
2. South Wales Health
3. The Victorian Institute of Forensic Medicine
4. Law & Justice, University of New South Wales
5. School of Psychology, Speech, and Hearing, University of Canterbury
6. National Health and Medical Research Council Clinical Trials Centre, University of Sydney
7. New South Wales Health
8. School of Psychology, University of New South Wales
9. Criminology, University of Southern Queensland
10. Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department
Originally published on October 2, 2004 at:
Editors
Ludo Waltman
Jennifer Anne Byrne
Editorial Assessment
by Jennifer Anne Byrne
This protocol aims to address two questions: (1) What do we know about the science underlying impactful legal decisions? (2) How can we assess this evidence efficiently and accurately, such that it is usable for courts? The protocol has been reviewed by three reviewers (reviewer 2 in fact represents a team of three individuals). The reviewers mention various strengths of the protocol. Reviewer 1 emphasises the importance and timeliness of the research questions and praises the interdisciplinary nature of the research team. Reviewer 3 considers the protocol to be thoughtful and detailed, and reviewer 2 notes that the protocol presages an important effort. The reviewers do not see any major shortcomings in the protocol, but they do highlight opportunities to strengthen the protocol, such as considering studies published in languages other than English and adding more detail on how team disagreements will be resolved.
Peer Review 1
Summary
This protocol describes the plan for a systematic review of the literature on stab wounds. The focus is on the types of observations made in such cases, and whether there are any (types of) observations that can be considered “indicators” of the manner of death, to help distinguish between cases of self-inflicted injury and those inflicted by others.
Strong points of this research plan
The authors present compelling arguments for the need for the proposed meta-research project; they refer to a recent case in the High Court of Australia (Lang v the Queen). The arguments highlight the importance and timeliness of the research questions. More generally, the field of forensic pathology and its perception and use by the legal community seems to be an area with great research potential: see for example the problematic cases involving the testimony of Colin Manock in South Australia (e.g. the Keogh case, where the examination of bruises was an issue).
Overall, the research plan is well informed: the authors have conducted a preliminary review of existing relevant studies and reviews. They use the findings from this preliminary review to critically inform the design of their study.
The research team is interdisciplinary, with members from law, psychology and pathology, and appears to be suitably qualified to carry out the proposed research.
The research plan is sufficiently detailed and transparent in terms of search procedures, eligibility criteria, outcome variables, data management and open access policy, which should make the research results widely accessible and reproducible.
Comments, suggestions, critiques
The title includes the term “reliability”, but it is never defined in the text. While this term can be taken in its common sense interpretation, this may not be sufficient for a scientific study. Do the authors mean “reliability” as used, for example, by the US FRE? Or do they understand the term to be similar to the PCAST’s use of the term “validity”?
The plan is not clear (enough) about how – conceptually – to characterise the potential of an observation (made by a pathologist) to provide information about a selected question of interest (e.g., manner of death, the way in which an injury was inflicted, etc.). Formally, the diagnosticity of an observation (or type of observation) is defined in terms of a likelihood ratio. In other words, for an observation to have diagnostic value with respect to a given proposition (hypothesis), the probability of the observation of interest given the proposition of interest must be higher than given an alternative proposition. Thus, whatever this study will reveal about medico-legal observations (in stab wound cases), an inferential framework is needed to assess diagnosticity and, more broadly, reliability. The research plan is silent on this aspect. Instead, most of the effort is spent on descriptive statistics. There is nothing wrong with descriptive statistics, but it will not help to address the main question posed in the title of the proposed research. As an aside, the reference to “confidence intervals” (p. 15 and 19) is unfortunate in the sense that frequentist statistics, although (still) ubiquitous, are problematic for a variety of reasons.
To some extent, the research proposal is too uncritical and passive with respect to terminology that appears to be standard in the field in which the literature review is to be conducted. Consider, for example, the terms “defense injuries” and “tentative injuries” (p. 7). These terms are problematic because they mix observations (e.g., cuts) with ground truth (i.e., self-inflicted or third-party inflicted). Since the ground truth cannot be known in actual cases, “defense injury” cannot meaningfully serve as a descriptor. Moreover, the use of such terms is problematic: suppose an examiner talks about “tentative injuries”. This could suggest to the recipient of expert information that the observed injury is necessarily self-inflicted. Of course, the authors’ intention might be to determine how diagnostic the expert’s utterance of “tentative injury” is with respect to the proposition of self-inflicted injury (without assuming that the utterance of “tentative injury” necessarily implies self-inflicted injury). Nevertheless, this doesn’t solve the problem of confusing terminology. Therefore, this research project could be strengthened by not limiting itself to the descriptiveadoption of standard terminology, but by including a critical analysis and discussion of terminology. In fact, the problem of testimony in this field is not limited to the (currently unknown) diagnosticity of observations made during pathological examinations. It also depends on the coherence of foundational terminology (i.e., its logic) used in this field, as well as on the soundness of the reasoning methods used (e.g., the crucial distinction between findings/observations and unobservable ground truth states).
On p. 15, the research plan states: “We will attempt to quantitively synthesise cases by first separating them into four groups: those classified by study authors as suicides, homicides, accidents or inconclusives. Then, we will list the frequency with which the case variables listed above appear in each group.” Treating the data in this way will lead to useful statistics: i.e., the probability of different observations given different case types (suicides, homicides, etc.). Such statistics characterise the diagnosticity of the various observations (“case variables”). However, a major problem arises here: how – if at all – it can one known that the reported classification of cases into suicides, homicides etc. was correct? For obvious reasons, none of the case reports in the literature involve experiments under controlled conditions. However, there may be other information or evidence in a case (e.g., video surveillance) that supports particular classifications. Will the project control for this complication, and if so, how?
It would be valuable for this research to include normative considerations, as opposed to a purely descriptive perspective, of what it means for an observation – be it in pathology or any other forensic field – to be “indicative” or discriminative with respect to selected (disputed) propositions. This relates to the notion of inferential frameworkmentioned above, which is largely established in the philosophy of science (see e.g. Howson/Urbach, Scientific Reasoning, 2005), and which could serve as an additional reference point against which to evaluate the current literature. It remains unclear to the reader why this research project refrains from taking a firmer position on the logic of evaluative thinking, which has now become inseparable from sound evaluation procedures in forensic science. Reviewing and synthesising existing literature is one thing, challenging the current state of the art is another. Combining the two is a valuable opportunity that this project could seize.
Peer Review 2
Anonymous User
On behalf of the Center for Integrity in Forensic Sciences and its Executive Director, Katherine H. Judson, as well as its co-founder, Professor Emeritus Keith A. Findley, I am pleased to submit these comments on the above-cited draft work of Jason Chin, Stephanie Clayton, and their colleagues. Thank you for soliciting our views. You may learn more about the Center for Integrity in Forensic Sciences at http://www.cifsjustice.org
The authors’ explanation of their planned systematic review is helpful and presages an important effort. We commend the authors for their thoughtful study design, their transparency, and their initial research into source materials listed in Appendix A.
Two minor methodological concerns appear to us initially. One, we do not fully understand the intention, described in four places (pages 11, 18, and 19), to use two independent reviewers of data and to resolve disagreements “by discussion.” It is not clear whether that discussion is to occur between the pair of reviewers only, or whether others will join the adjudicative discussion. In either event, it may be useful to consider an odd number of adjudicators for purposes of breaking a deadlock, if necessary. Two, the intended systematic review excludes studies not published in English (see page 10). While the lack of proficiency in other languages among the research team is understandable (and rightly acknowledged), the availability of reliable translations today should allow inclusion of studies published in other languages, we suspect.
Our two principal substantive concerns are broader, though. First, this systematic review appears to overlook risks of availability bias and confirmation bias in information gathering by pathologists, who often rely on information passed along by law enforcement officers and others invested in a particular outcome or conclusion. Relatedly, forensic pathologists themselves often are closely aligned professionally and attitudinally with law enforcement personnel. Indeed, the pathologists may be employed by prosecutive and investigative agencies of the government, and therefore professionally and financially dependent on their sources of information. We predict that the research team will encounter frequently—perhaps almost uniformly—the absence of pre-existing protocols that Cochrane raises as a concern and that the authors rightly note at page 18 of this draft. That common absence of a known protocol, established in advance and subject to compliance assessment later, may be both caused in part by and an effect of the availability and confirmation (or tunnel vision) biases we discuss here.
Second, the systematic review does not seem designed to consider the normative question of which systemic actor or actors are best equipped and most appropriate to make manner of death determinations for judicial, as opposed to statistical, purposes. We hope that the researchers will recommend that such determinations by pathologists or other biomedical experts should be limited to statistical purposes, for use in allocating public
resources. In the end, regardless how reliable their opinions, pathologists and biomedical practitioners are no better positioned than jurors or judges to make adjudicative determinations of suicide or homicide, as the factfinders in a judicial system should have access to all information—presented to them in a more transparent, testable form in court—that the pathologist has in drawing conclusions. And as a normative matter, those adjudicative conclusions are assigned to jurors and judges, not to pathologists or other biomedical experts.
With these caveats, we again welcome this initial work and description of the metaanalysis to come. Especially if confined to assessing and advancing the reliability of manner of death determinations in cases of sharp force wounds for statistical purposes, and thus as an aid in allocating public resources outside the judicial system, the eventual systematic review may be quite valuable.
Finally, for a pertinent and longer discussion of related issues, see Keith A. Findley & Dean A. Strang, Ending Manner of Death Testimony and Other Opinion Determinations of Crime, 60 Duquesne Law Review 302 (2022). The authors themselves cite this article at footnotes 5 and 7 of their draft. Again, thank you for the opportunity to offer these comments.
Peer Review 3
Thank you for the opportunity to review this protocol. My expertise is in systematic review methods, generally relating to health interventions, and as such I should note that I do not have expertise in forensic pathology or medico-legal issues.
This paper outlines the protocol for a systematic review of characteristics which allow forensic experts to distinguish between suicide and homicide relating to sharp force wounds, in the context of contributing to criminal prosecution. Interestingly, the protocol outlines the development of preliminary approaches to novel methodology adapted for use in this field, including novel approaches to assessing risk of bias and certainty in the evidence, which have primarily been developed to assess intervention research.
I commend the authors for a thoughtful and detailed protocol. In my view, this is a strong piece of work and will contribute findings of interest to the field, as well as contributing to the exploration of methods for the assessment of a category of research for which such methods are currently lacking. I have made a few suggestions below for consideration by the authors that may strengthen the protocol.
Rationale
-
It may be helpful to international readers to clarify in the text of the Rationale that R v Lang is a case in the High Court of Australia, and to spell HCA out in full in the footnote. With regard to readers looking for details on this case, are these published on a website for which a URL can be provided?
-
It would be helpful for readers without a background in legal proceedings to discuss the extent to which research evidence and systematic reviews are or are not commonly presented in legal proceedings, in contrast to expert opinion.
-
Where you discuss the debate about the role of cause of death findings, it would be helpful to explicitly state in which jurisdictions these discussions have been occurred, so that readers can understand whether and how this topic relates to their own jurisdiction or where there may be differences. It may further be helpful to elaborate briefly on why cause of death determinations may be considered unreliable.
Methods
-
It is a limitation to the review to only include studies published in English. The proficiency of automated translation is currently such that screening of potentially relevant studies in multiple languages is often possible, with assistance from multilingual colleagues or communities such as Cochrane Engage can enable the inclusion of studies in additional languages.
-
Regarding grey literature, both of the listed organisations appear to be based in the USA (although this is not stated for the OSCAC) – could you provide a rationale for only using US institutions to identify relevant data? For example, there may be organisations in Australia (which is the jurisdiction of interest for the legal aspects of this review) or in countries with comparable criminal legal systems (such as the UK, Europe or elsewhere).
-
Will a software tool be used to support study selection, such as Covidence or similar? This may contribute to your analysis of time and process.
-
Injury severity score – will injury severity be captured if other measures of severity are used, or not at all? There are methods available to consider results across different measures of similar outcomes, if these would be considered valid alternatives.
-
· In the rationale and the methods relating to risk of bias, you note that it may be relevant to capture (if available) information such as whether witness, video evidence or a confession was available to support the conclusion of cause of death. Should this kind of characteristic be added to the data collected?
-
The methods provided for data synthesis, risk of bias assessment and the certainty/quality of the evidence (based on GRADE) all currently read as if all your included studies will be case series or case studies. As your included studies also include observational studies that may give effect estimates such as odds ratios rather than individual counts of characteristics, methods should be provided for handling and perhaps quantitatively synthesising this kind of data, where appropriate. Risk of bias methods and GRADE methods may more closely correspond to the existing methods for this kind of study, and require less adaptation.
-
GRADE methodology generally refers to “certainty in the evidence” rather than confidence, to avoid confusion with risk of bias assessment.
-
You note in the rationale that you plan to collect data on the review process, such as time taken to complete different tasks. I’d suggest putting this detail in the methods section.
-
I would recommend giving some further thought to how you will draw conclusions from the data you find int his review. Assuming that sufficient data can be found, and that you have a set of either percentages from case studies/series or effect estimates from observational studies, it is likely that you will wish to discuss which factors appear to be associated with different causes of death, or which are most effective at discriminating between causes. I would strongly recommend considering what thresholds for associations or differences between causes of death would underpin such conclusions, and specify these in advance. I’d recommend speaking to a statistician to draft these methods appropriately and avoid errors in interpreting the estimates found.


