Publications

(* indicates equal contribution)

When THINKING BACKFIRES: MECHANISTIC INSIGHTS INTO REASONING-INDUCED MISALIGNMENT
H. Yan, H, Xu*, S. Qi, S. Yang, Y. He
ICLR26 | Paper
Representation Interpretability

Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation
Z.Hu, Q.Zhu, S. Qi, Y. He, H. Yan, L. Gui
AAAI25 Oral | Paper
application Representation

GraphMind: Interactive Novelty Assessment System for Accelerating Scientific Discovery
I. Silva, H. Yan, L. Gui, Y. He
EMNLP25 Demo | Paper
application

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Z. Shen, H. Yan, L. Zhang, Y. Du, Y. He
EMNLP25 | Paper
application Representation

Position: LLMs Need a Bayesian Meta-Reasoning Framework for More Robust and Generalizable Reasoning
H. Yan, L. Zhang, J. Li, Z. S, Y. He
ICML25, Position Track | Paper
Application

Drift: Enhancing LLM Faithfulness in Rationale Generation via Dual-Reward Probabilistic Inference
J. Li, H. Yan, Y. He
ACL25, Main | Paper
application Interpretability

Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Q. Zhu, R. Zhao. H. Yan, Y. He, Y. Chen, L. Gui
ICML25, Spotlight | Paper
Representation application

Direct preference optimization using sparse feature-level constraints
Q. Yin, C. Leong, H. Zhang, M. Zhu, H. Yan, Q. Zhang, Y. He, W. Li, J. Wang, Y. Zhang, L. Yang
ICML25 | Paper
Interpretability Representation

Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
H. Yan, Y. Xiang, G Chen, Y. Wang, L. Gui, Y. He
EMNLP24, main | Paper
Interpretability Representation

Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems
I. Silva, H. Yan, L. Gui, Y. He
EMNLP24, main | Paper
Causality application

The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities
Y. Zhou, J. Li, Y.Xiang, H.Yan, L. Gui, Y. He
EMNLP24, main | Paper
Interpretability

Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning
H. Yan, Q. Zhu, X. Wang, L. Gui, Y. He
ACL24, main | Paper
application

Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models.
Y. Xiang, H. Yan, L. Gui, Y. He
ACL24, findings | Paper
Representation

Counterfactual Generation with Identifiability Guarantee
H. Yan, L. Kong, L. Gui, Y. Chi, Eric. Xing, Y. He, K. Zhang
Neurips23, main | Paper
Causality Representation application

Explainable Recommender with Geometric Information Bottleneck
H. Yan, L. Gui, M. Wang, K. Zhang and Y. He
TKDE | Paper
Interpretability application

Hierarchical Interpretation of Neural Text Classification
H. Yan, L. Gui and Y. He
Computational Linguistics, Present at EMNLP23 | Paper
Interpretability application

Addressing Token Uniformity in Transformers via Singular Value Transformation
H. Yan, L. Gui, W. Li and Y. He
UAI22, spotlight | Paper
Representation

Distinguishability Calibration to In-Context Learning
H. Li, H. Yan, L. Gui, W. Li and Y. He
EACL23, findings | Paper
Representation

A Knowledge-Aware Graph Model for Emotion Cause Extraction
H. Yan, L. Gui and Y. He
ACL21, Oral | Paper
Causality application