Anna Min | 闵安娜
Hi! I am a senior undergraduate at Tsinghua University.
I appreciate the opportunities to work with or learn from all of my kind and amazing collaborators and mentors. I also love generally chatting about research or life with people in different backgrounds. Feel free to reach out for any reason!
Email: anna.min1754@gmail.com (if you prefer an edu email, annamin@csail.mit.edu (valid for several more months); my Tsinghua email sometimes experiences delays)
Google Scholar /
Email /
Twitter /
Github /
Linkedin /
Wechat
|
|
Updates
- Feb. 2025: One paper got accepted to CVPR 2025. See you in Nashville TN. Feel free to reach out!
- Sep. 2024: I gave a talk at MIT ML Tea Time titled "Multi-sensory Perception from Top to Down." [link]
- Feb. 2024: The feature for multi-modal generation based on the algorithm and the demo that I designed gained over 246k views on Twitter during my internship at Pika Labs.
|
Research
My previous research lies at the intersection of machine learning, computer vision, and signal processing.
How can machine intelligences form concepts, think, and combine ideas? Humans/Animals do this by inheritance, integrating multiple sensory inputs and types of reasoning, blending these with individual experiences to create diverse pathways for thought.
Currently, I have a broad interest in multimodal perception and interactive generation, whether machine-centered or human-centered. I have worked with natural signals, such as vision and audio, sometimes exploring them through text as an intermediary.
In my research, I apply bold imagination, principled thinking and rigorous empirical studies.
|
|
Supervising Sound Localization Using In-the-wild Ego-motion
Anna Min,
Ziyang Chen,
Hang Zhao,
Andrew Owens
CVPR 2025
Learn spatial sound sources in the wild using ego-motion signals derived from visual cues with limited perspectives.
|
|
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min*,
Chenxu Hu*,
Yi Ren,
Hang Zhao
Interspeech 2024 / [Paper]
/ code and datasets coming soon
Propose a dataset and pipeline with aligned bilingual audio tracks sharing similar emotions without using text as an intermediate for lesser-spoken languages and dialects.
|
Selected Awards
- Tsinghua University Academic Excellence Award (2/103), 2023
- Tsinghua University Research Excellence Award (2/103), 2022-2024
- Spark Innovative Talent Cultivation Program (50/3900 undergraduates in Tsinghua for research performance), 2022
- Meritorious Winner of Mathematical Contest in Modeling (Beijing), 2021
|
Service
- Reviewer: ICASSP 2025, CVPR 2025, NeurIPS 2024
- Volunteered with the Program Buddy Group at Tsinghua University, working with underrepresented students interested in coding, 2022
|
Misc: Art Creation
I like reading. Previously, I was an amateur illustrator with a passion for portraiture and landscapes. As a piano player for over 15 years, I ventured into composing music and writing poetry. Some of the practice include Chilling Spring (composed and written by Anna, performed by Vocaloid and Aoi Sakura from The University of Tokyo), Hushed Ripples Under the Moon (lyrics written by Anna and accompanied by AI generation. another version is Hush_dual by Anna and Aoi Sakura.). Recently, I have been deeply intrigued by the principles and potential of generative models.
In the past I designed and developed video games (eg: A Player vs AI Strategy Game).
My Chinese name is pronounced as Ān Nà, which is similar to Anna.
|
|