Abstract
Non-canonical (NC) base pairs play crucial roles in RNA structural stability, functionality, and recognition processes. Despite their importance, limited benchmark resources and deep learning approaches have hindered accurate NC base pair prediction. We introduce NCBench, a comprehensive benchmark dataset containing 925 RNA sequences with 6,708 high-quality NC annotations curated from protein-RNA complexes and RNA-only 3D structures. NCBench features rigorous annotation standards, redundancy reduction, and multiple RNA types. Additionally, we present NCfold, a novel deep learning framework that fuses sequence features with structural priors from RNA foundation models. NCfold employs a dual-branch architecture with Representative Embedding Fusion (REF) to integrate multiple RNA foundation models and uses Base Pair Motif energy matrices as structural priors. Extensive experiments demonstrate NCfold achieves state-of-the-art performance, with the AttnMatFusion_net variant significantly outperforming existing methods. Analysis reveals that longer training sequences and careful foundation model selection improve performance, particularly for G-U wobble pairs and multi-branch loops. NCBench and NCfold provide valuable resources and methodologies for RNA structural biology research.
Performance comparison of NCfold with existing methods. NCfold achieves state-of-the-art performance on NC base pair prediction.
IsoScore distribution across different RNA foundation models. IsoScore is used to select the most representative embeddings for fusion.
Top-k foundation model selection strategy. Selecting the top models by IsoScore improves prediction performance.
Video Presentation
Video presentation coming soon.
Poster
BibTeX
@article {NCBench,
author = {Zhu, Heqin and Li, Ruifeng and Chang, Ao and Li, Mingqian and Chen, Hongyang and Xiong, Peng and Zhou, Shaohua Kevin},
title = {Toward Accurate RNA Non-Canonical Structure Prediction: The NC-Bench Benchmark and the NCfold Framework},
elocation-id = {2025.11.16.688746},
year = {2025},
doi = {10.1101/2025.11.16.688746},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/11/17/2025.11.16.688746},
eprint = {https://www.biorxiv.org/content/early/2025/11/17/2025.11.16.688746.full.pdf},
journal = {bioRxiv}
}