AI-Assisted Monitoring Validation for Neovascular AMD Treatment in England

AI-Assisted Monitoring Validation for Neovascular AMD Treatment in England

Principal findings

This study provides the first insights into the effectiveness and efficiency of AI-enabled treatment monitoring for nAMD relative to current NHS services. Using an internationally recognised ophthalmic reading centre as a reference standard, similar and substantial opportunities to improve on the amount of undertreatment and overtreatment from assessments of disease activity in nAMD care were identified at two independent NHS centres. This finding is at odds with the assumption that reading centres would ‘overcall’ disease relative to real-world care. However, it mirrors findings from a local independent consultant panel benchmarking process for a separate NHS nAMD service evaluation in Exeter [22]. The present study also found that the same thresholds of change in AI-derived volumes of IRF and SRF optimised diagnostic accuracy at both centres and that simple heuristic thresholds offer equivocal performance to thresholds derived through simple machine learning techniques. Whilst taking any increase of IRF, SRF or SHRM between clinic visits demonstrated non-inferior levels of sight-threatening undertreatment for AI compared to standard care, using higher thresholds for disease activity also enabled reductions in demand-generating overtreatment. AI-led assessments of nAMD disease activity and those made in real-world care commonly disagreed with the reference standard for different clinical cases. For decision support use cases, this presents an opportunity for further gains in diagnostic accuracy if modes of clinician-AI interaction can be designed that mitigate the risks in each individual approach, e.g., with the clinical team identifying unsuitable imaging quality and non-anatomical segmentations (Fig.2) [23]. The AI system’s performance appears robust outside of its training dataset and subgroup analysis did not identify any concerns for biased performance between the different age, sex or ethnic groups presented.

Comparison to wider literature

At the time of writing, three Artificial Intelligence as a Medical Device (AIaMD) products regulated for OCT analysis are listed on the UK Medicines and Healthcare products Regulatory Agency’s (MHRA) Public Access Registration Database (PARD) [24]. A few others are approved for similar use cases in other jurisdictions [16]. These all differ from the specific AI system evaluated in this study, but represent current opportunities to implement AIaMD for this use case in the NHS and healthcare services in other regulatory jurisdictions. A key similarity across these products is that they do not have regulatory approval for autonomous use cases, and so AI-enabled healthcare pathways for nAMD treatment monitoring must retain suitably qualified healthcare professionals as the responsible clinical decision maker [16]. This appears appropriate for the current evidence base, which focuses on the technical validity of segmentation outputs rather than the clinical validity of treatment decisions based upon those segmentations [25, 26]. However, evidence presented here suggests that the quantitative analysis of IRF and SRF enabled by AIaMD will not deliver its full potential without the application of non-zero decision thresholds, distinct from the existing paradigm of qualitative OCT interpretation in nAMD care. This need for non-zero decision thresholds to be used to deliver value from AIaMD in such use cases has been independently reported with other technologies [27]. If these thresholds are adequately simple, such as those presented here, then clinicians may choose to apply such thresholds to AIaMD outputs themselves to inform their practice as they do with other guidelines or heuristics, e.g., a threshold of 400 µm central subfield thickness to initiate treatment in diabetic macular oedema [28]. In doing so, they and their employing healthcare provider could expect to absorb more of the liability for any clinical errors that breach the duty of care they hold to their patients [29, 30]. This liability could be distributed more toward AIaMD manufacturers if thresholds (or the ability for users to set them) were to be incorporated into the AIaMD and their regulated intended use statements. Then, threshold-based nAMD treatment monitoring could more clearly fall within the intended use of the AIaMD, rather than the discretion of the clinical user. However, this would appear to represent a change in the intended use and risk profile against which currently available AIaMD achieved their regulatory approval, requiring new certification supported by an appropriate evidence base. In the current UK and European regulatory landscape, such a recertification process takes an approximate median of 18 months even when adequate clinical evidence is available [31]. This is due to the current capacity of regulatory authorities to follow their certification processes.

Limitations

The assignment of ‘diagnostic error’ status to decisions made in real-world care is both reductionist and opaque within this retrospective study. It is entirely possible that so-called ‘undertreatment’ simply represented a holistic decision by a clinician delaying a follow-up appointment to accommodate competing demands on a patient’s time. This question over the study’s findings is partly mitigated by the apparent good performance of AI across all intuitive rule sets evaluated here, suggesting it is not simply over-fitting to the reference standard. It is, however, inherent to this retrospective study design, which was the most appropriate design given the lack of prior evidence to justify the clinical risk of an interventional study of AI-led assessment. Interventional research will be needed to determine the impact of any form of AI-enabled nAMD treatment monitoring on visual outcomes.

A key value proposition of AI-enabled care is presumed to be time-saving for users. Without simulated or actual AI-enabled clinical workflows, the present retrospective study had no means of measuring this. Given the relatively short period of time clinicians typically spend assessing OCT imaging once displayed, it seems unlikely that decision support AIaMD will reduce the clinician time required to review an individual case. If such efficiencies do arise, they will likely come from the platforms on which AIaMD is hosted rather than the AI technologies themselves. Autonomous use cases of future AIaMD may however, come to reduce the clinician time spent on individual reviews. The precedent for such autonomous products has recently been set with the certification of a class III AI-enabled dermatoscopic image interpretive software, listed on PARD [24]. This study illustrated a separate potential efficiency saving at the service level, rather than the individual case level. This efficiency saving is directly attributable to the AI technology and comes from the reduction of overtreatment in a nAMD service (measured by PPV). This value proposition will clearly depend on how users incorporate AIaMD outputs into their decision-making and may not be stable over the course of patients’ treatment. Clarity on this value proposition will require interventional evaluation.

The MEH dataset substantially improved on the ethnic diversity offered by the NEC dataset. However, the ethnic diversity of the validation data remains low overall. Similarly, the absence of labels for high myopia, a characteristic identified in some failure cases, limits the assurance this study can provide on robust performance, which must be addressed in future evaluations.

Further work

A strength of this external validation study is its inclusion of two independent centres, using different OCT imaging equipment suppliers, compared to the same reference standard. To confirm the generalisability of the potential to improve PPV and NPV with this AI system and the decision thresholds specified, further replication studies at different sites would be valuable. Similarly, replication of the study with different AIaMD for OCT segmentation would also be valuable to understand if and how the value proposition of these technologies differs and whether or not optimal decision thresholds are consistent across them.

With current AIaMD approved only for decision support purposes, human-computer interaction and cognitive biases will hold a strong influence over the clinical impact of their use [32]. There is little evidence to understand this influence or what kinds of user training or workflows may help to optimise it [23]. Simulated or interventional studies of AI-enabled nAMD treatment monitoring will be required if patients and services are to unlock the value of AIaMD currently available for use [22].

To fully deliver the value proposition of AI-enabled nAMD treatment monitoring, it may be that AIaMD with regulatory approval for decision thresholds to be applied to segmentation outputs are required, potentially with approval for those outputs to be acted upon without immediate clinician review of each output. Whilst patient acceptance of autonomous AI decisions for their real-world care is largely untested, recent qualitative research suggests that patients and other stakeholders would find autonomous contributions from AI to nAMD treatment decisions acceptable if appropriately validated [7]. Interventional clinical trials and implementation research will be required to generate this evidence, to be followed by regulatory submissions that include an appropriate intended use statement and enable clinical application [15, 33].

Concluding remarks

This study highlights a replicable opportunity to reduce the clinical demand associated with patient care, without compromising the accuracy of disease activity detection, in NHS nAMD services. If AI-enabled nAMD treatment monitoring is to improve the quality and efficiency of NHS services, it will depend upon the application of quantitative decision thresholds to OCT segmentation outputs that take opportunities to safely increase the frequency of treatment interval extension. Such an approach may be feasible with existing AIaMD, but further evidence is required and future AIaMD with explicit regulatory approval for autonomous use may be beneficial.

Leave a Comment

Your email address will not be published. Required fields are marked *