Ha-Jin Yu is a visiting scholar from the University of Seoul in South Korea, where he has taught in the School of Computer Science since 2002. Yu is visiting the Department of Engineering for one year to research a deep learning algorithm with applications for speaker identification. EWU professor Min-Sung Koh is working alongside and hosting Yu within the department, in addition to the support from the Office of Global Initiatives.
Can you talk about the research you’re doing?
My research area is speaker authentication, which is about verifying humans by voice. I’m working with Professor Koh in the Electrical Engineering Department, and he’s doing some research on the signal processing. We need signal processing for processing the speech signals, so I’m working with him.
I started this research more than 20 years ago, and Professor Koh has some ideas about applying his research to my work. I’ve known him for years because we have similar interests, and he’s working with signal processing, while I’m doing speech processing using some of the techniques in his field. We started working together years ago, and since we’ve been far apart, I wanted to meet with him face-to-face so that we can work together closely.
What led you into this field of study?
Speech processing has a long history, and speech recognition is the major goal of artificial intelligence. These days, artificial intelligence is emerging in the industry. It’s an interesting area, and I started researching artificial intelligence in 1990, so I’ve been working in this field for about 37 years.
What are some of the biggest challenges you’re facing?
The success of speech recognition is based on a lot of mixed data. Google collected a lot of speech data, and using it, they created a machine the can recognize human voice. But you can’t collect a lot of data from one human. If you want a machine to recognize your voice, then it needs a lot of data from you, but the users don’t want to speak a lot to the machine because it takes a lot of time and it’s annoying. The machine has to recognize you with only a small amount of speech, which is very difficult.
So if you recognize somebody, for example, your family or friends, you can recognize the person by their voice. But in that case you have listened to their voice a lot, so you’re familiar with their voice. If you talk with someone for just 10 seconds, it would be very hard to recognize their voice. And this is what the machines must be able to do.
Do you have any hobbies outside of the academic realm?
My hobbies involve playing musical instruments. I like to play the cello, but I didn’t bring my cello here so I can’t play here. I also like to sing.
I’ve been playing the cello for about 15 years and I learned to sing traditional Italian songs several years ago and it’s fun. I suppose I could have the chance here to join an orchestra or learn about new music. Maybe learn how to sing country.