These last two weeks I did nothing but work. I had a “block” (i.e. intensive) course on voicebuilding (text-to-speech synthesis) that lasted the whole two weeks, and then I had two deadlines for a paper and a programming project due at the end of those two weeks. This means all this beautiful spring was wasted on me.
The voicebuilding course covered many similar things that I did at my last job, where I worked for a speech recognition company. It was a bit of a flashback in that way, but whereas at my last company we had homegrown tools, this time we used a bunch of different open source software. As a result, the project ended up being a massive sprawling mess of interconnected systems. We worked in groups, and I actually got to record my voice for synthesis in the studio, since I was one of the native English speakers. That was neat. Some of the tools I got exposure to were:
- MaryTTS – the text-to-speech synthesis platform using unit selection,
- HTK – a hidden Markov model based TTS synthesis tool,
- webMAUS – a web based wav file force alignment tool,
- Kaldi – a more modern speech recognition toolkit that can be used for force alignment as well,
- Praat scripting – for annotating and editing wave files; of course I’ve used Praat before but never scripted in it (it’s clunky),
- Sox – a command line tool for audio manipulation,
- Gradle – a build tool which we used to keep the project together and automate everything,
- Docker – a tool for creating portable virtual environments.
Although I got some exposure to the above from the class, I would not say that I got an in-depth understanding of all of them. We worked the most with Gradle throughout the process, and I actually enjoyed that, since it seemed very practical, but it did occasionally add a layer of complexity that didn’t seem entirely justified for a small class project. In terms of Praat scripting, I can see how it makes sense to use it for specific tasks, but in general, I would try to avoid it as much as possible. Docker seems great at scale, but for small class projects, it’s a bit of overkill. Juggling all the different formats and phonetic symbol sets between the force alignment tools and the TTS synthesis tools is a nightmare, as usual, so half of the tasks revolve just around getting this right. Keeping everything in Git is, as ever, a good idea.
Apart from the voicebuilding course, I had to write a sort-of literature review style essay on natural language generation. I’m not entirely sure if I did it the way I was supposed to–the problem was that the papers I was meant to write my essay on all use neural networks (the prevailing machine learning model used for nearly everything in computational linguistics these days), and although I worked with neural networks at my last job, I never really learned how they worked.
So instead of talking out of my ass about things I didn’t know, I decided to take a good couple of weekends to just focus on learning. My essay became a sort of “intro to neural networks,” plus a little bit of discussion about the contributions my citations made to the field. Basically, I covered feedforward networks including backpropagation, CNNs, RNNs, LSTMs, and word embeddings. I don’t know if I’ll get a good grade from this or not, but I know it was worth learning it this way. Now I know a little bit more about the math behind neural networks. In the future, I hope I will have the chance to try to implement one on my own (or with a little bit of help), because I think this would be very instructive.
Finally, I had a programming project to finish. In this case I was trying to re-implement (using Python3) Keshava & Pitler’s algorithm for unsupervised morpheme induction. I wish I had more time to work on this, because I don’t feel like the final product turned out as I would like– but the deadline came up on me, and with everything else, I had to just throw a few things together. First of all, I’m pretty sure this thing is fucking riddled with bugs. Also, its performance is shit, and it’s hard to tell if that’s because of a failure in my implementation (very possible), or because I manually threw together a crappy evaluation corpus (since it would have taken me too long to convert other corpora), or if it was because there was some important implementation detail not mentioned in the paper, or if it was because the algorithm is legitimately not that great. I needed to explore all of this a lot more, but I just didn’t have time. Long story short…. =(
So basically in the last two weeks, the voicebuilding course went on from 10am-5pm most days, and the rest of the time I just worked on those other deadlines. Since I had traveled the entire start of March, I only had a basic outline of work done for both of the deadlines, so these two weeks I had to really knuckle down. I basically didn’t go outside nearly this whole time (except for one day when my resolve could not withstand the siren call of the spring sunshine), didn’t exercise, didn’t eat very well, and didn’t sleep much. There was a stint of days there where I rapidly decreased the amount of sleep I was getting from 6 hours, to 4 hours, to finally just 2 hours at the end.
I turned everything in on Friday, but I couldn’t go home yet because I had to wait around on campus for some appointments. The weather was nice (yay, spring!), so I went outside, found a nice patch of grass, and crashed out. It was fantastic– the birds singing in the forest behind the campus, the light breeze, the soft mossy grass, and the sweet relief of a peaceful slumber, knowing that there were no more upcoming deadlines.
In fact, the birds have been going crazy all spring, and it’s been so nice:
Yesterday, I finally exercised, did laundry, cooked, and just basically took care of myself. Now I am off to Paris to see my family and my husband who are visiting for two weeks. We will travel around Paris, Geneva, and Munich in the next couple weeks. It’s gonna be great.
- €85 – groceries (but honestly, lots of crappy snacks to avoid cooking)
- €4 – laundry
- €34 – eating out
- €10 – bouldering
- Total: €133