PhD student at The Ohio State University working on understanding and controlling large language models. I also vibe code research prototypes, developer tools, and AI systems.
-
Understanding Linear Steering (ongoing)
Investigating the geometry, linearity, and causal structure of steering directions in LLM representation space. -
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features ArXiv, OpenReview
Developed a principled proximal-gradient framework that unifies SAE variants (ReLU, JumpReLU, TopK) and reveals that non-negativity constraints prevent bidirectional feature representation. Proposed AbsTopK, a magnitude-based sparse operator that recovers complete semantic axes and improves interpretability and steering in LLMs. -
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models Arxiv
Showed that linear directions in representation space can enable and control self-reflection behavior in pretrained LLMs without finetuning.
If you are interested in collaboration, feel free to open an issue or connect with me.
GitHub stats cards are powered by github-readme-stats. Many thanks to the authors for building and maintaining it.