Anna Min | 闵安娜

Hi! I am a senior undergraduate at Tsinghua University. I work with Prof. Jim Glass, Prof. Andrew Owens and Prof. Hang Zhao.

I appreciate the opportunities to work with or learn from all of my kind and amazing collaborators and mentors. I also love generally chatting about research or life with people in different backgrounds. Feel free to reach out for any reason!

I am applying for a PhD position starting in 2025. Please contact me through email if you are interested in my research!

Email: anna.min1754@gmail.com (if you prefer an edu email, annamin@csail.mit.edu (valid for several more months); my Tsinghua email sometimes experiences delays)

Google Scholar  /  Email  /  Twitter  /  Github  /  Linkedin  /  Wechat

profile photo

Updates

  • Sep. 2024: I gave a talk at MIT ML Tea Time titled "Multi-sensory Perception from Top to Down." [link]
  • Feb. 2024: The feature for multi-modal generation based on the algorithm and the demo that I designed gained over 246k views on Twitter during my internship at Pika Labs.

Research

大图

My previous research lies at the intersection of machine learning, computer vision, and signal processing.

How can machine intelligences form concepts, think, and combine ideas? Humans/Animals do this by inheritance, integrating multiple sensory inputs and types of reasoning, blending these with individual experiences to create diverse pathways for thought.

Currently, I have a broad interest in multimodal perception and interactive generation, whether machine-centered or human-centered. I have worked with natural signals, such as vision and audio, sometimes exploring them through text as an intermediary.

In my research, I apply bold imagination, principled thinking and rigorous empirical studies.

Supervising Sound Localization Using In-the-wild Ego-motion
Anna Min, Ziyang Chen, Hang Zhao, Andrew Owens
in submission

Learn spatial sound sources in the wild using ego-motion signals derived from visual cues with limited perspectives.

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min*, Chenxu Hu*, Yi Ren, Hang Zhao
Interspeech 2024 / [Paper] / code and datasets coming soon

Propose a dataset and pipeline with aligned bilingual audio tracks sharing similar emotions without using text as an intermediate for lesser-spoken languages and dialects.

Selected Awards

  • Tsinghua University Academic Excellence Award (2/103), 2023
  • Tsinghua University Research Excellence Award (2/103), 2022-2024
  • Spark Innovative Talent Cultivation Program (50/3900 undergraduates in Tsinghua for research performance), 2022
  • Meritorious Winner of Mathematical Contest in Modeling (Beijing), 2021
-->

Service

  • Reviewer: ICASSP 2025
  • Volunteered with the Program Buddy Group at Tsinghua University, working with underrepresented students interested in coding, 2022

Last updated: 2025-01-25 20:39:41

Template from Jonathan Barron.