Course Text
Schmandt, C. Voice Communication with Computers: Conversational Systems. New York, NY: Van Nostrand Reinhold, 1993. ISBN: 9780442239350.
Note: This book is now out of print, and is provided below in downloadable PDF form.
Complete book (PDF - 61.5 MB)
Individual chapters
- Front matter: Table of contents, preface, introduction (PDF - 3.1 MB)
- Chapter 1: Speech as communication (PDF - 2.8 MB)
- Chapter 2: Speech production and perception (PDF - 3.0 MB)
- Chapter 3: Speech coding (PDF - 4.1 MB)
- Chapter 4: Applications and editing of stored voice (PDF - 4.0 MB)
- Chapter 5: Speech synthesis (PDF - 3.5 MB)
- Chapter 6: Interactive voice response (PDF - 6.3 MB)
- Chapter 7: Speech recognition (PDF - 4.0 MB)
- Chapter 8: Using speech recognition (PDF - 5.2 MB)
- Chapter 9: Higher levels of linguistic knowledge (PDF - 5.8 MB)
- Chapter 10: Basics of telephones (PDF - 3.7 MB)
- Chapter 11: Telephones and computers (PDF - 6.9 MB)
- Chapter 12: Desktop audio (PDF - 5.5 MB)
- Chapter 13: Toward more robust communication (PDF - 1.6 MB)
- Bibliography and index (PDF - 2.5 MB)
Reading Assignments
In this table, the article titles are linked to publisher abstract pages; if you have subscriber or site license access for the publisher, you will be able to click through to the paper. Additionally, where a public version of the paper has been provided by the author, that is also linked.
| LEC # | TOPICS | READINGS | 
|---|---|---|
| 1 | Introduction and genres of conversation | |
| 2 | Components of conversation and speech production | Chapanis, A. “Interactive Human Communication.” Scientific American 232 (1975): 36-42. Chalfonte, B. L., R. S. Fish, and R. E. Kraut. “Expressive Richness: A Comparison Of Speech and Text As Media For Revision.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Orleans LA, 1991. Isaacs, E. A., and J. C. Tang. “What Video Can and Cannot Do For Collaboration: A Case Study.” Proceedings of the First ACM International Conference on Multimedia, Anaheim, CA, 1993. Schmandt, chapter 1. Look at these wideband and narrow band spectrogram examples (PDF) | 
| 3 | Speech production and hearing | Arons, B. “Techniques, Peception and Applications of Time-Compressed Speech.” Proceedings of AVIOS 1992. (PDF - 1.9 MB) (Courtesy of Barry Arons. Used with permission.) Schmandt, chapter 2. Also: play with Interactive Sagittal Section of the Head Look at Quileute Alphabet (PDF) | 
| 4 | Speech coding | Schmandt, chapter 3. (This is longer than the previous chapters - read with care) | 
| 5 | Accessing recorded speech | Arons, B. “A Review of the Cocktail Party Effect.” Journal of the American Voice I/O Society, July 1992. (PDF - 1.7 MB) (Courtesy of Barry Arons. Used with permission.) Mullins, M., and C. Schmandt. “AudioStreamer: Exploring Simultaneity for Listening.” Proceedings of CHI 1995. Kobayashi, M., and C. Schmandt. “Dynamic Soundscape: Mapping Time to Space for Audio Browsing.” Proceedings of CHI 1997. (PDF) Schmandt, chapter 4. (skim) | 
| 6 | Speech synthesis | Spiegel, M. “The Difficulties with Names.” Speech Technology Magazine, June 2003. Pisoni, D. B., H. C. Nusbaum, and B. G. Greene. “Perception of Synthetic Speech Generated by Rule.” Proceedings of the IEEE 73 (1985): 1665-1676. Lai, J., D. Wood, and M. Considine. “The Effect of Task Conditions on the Comprehensibility of Synthetic Speech.” Proceedings of CHI 2000, The Hague, Netherlands. Gong, L., and J. Lai. “Shall We Mix Synthetic Speech and Human Speech?: Impact on User’s Performance, Perception, and Attitude.” Proceedings of CHI 2001, Seattle, WA. Lai, J., K. Cheng, P. Green, and O. Tsimhoni. “On the Road and on the Web?: Comprehension of Synthetic and Human Speech While Driving.” Proceedings of CHI 2001, Seattle, WA. Schmandt, chapters 5 and 6. Also, listen to these historical speech synthesis samples compiled by Dennis Klatt. | 
| 7 | Review of problem sets and speech recognition intro | Schmandt, chapter 7. | 
| 8 | Hidden Markov models | Rudnicky, A., and A. Hauptman. “Models for Evaluating Interaction Protocols in Speech Recognition.” Proceedings of CHI 1991, New Orleans, LA. Suhm, B., et. al. “A Comparative Study of Speech in the Call Center: Natural Language Call Routing vs. Touch-tone Menus.” Proceedings of CHI 2002, Minneapolis, MN. Walker, M. A., et al. “What can I say? Evaluating a spoken language interface to email.” Proceedings of CHI 1998, Los Angeles, CA. (PDF) Yankelovich, N., et al. “Designing SpeechActs: Issues in Speech User Interfaces.” Proceedings of CHI 1995, Denver, CO. (PDF) Marx, M., and C. Schmadt. “MailCall: Message Presentation and Navigation in a Nonvisual Environment.” Proceedings of CHI 1996, Vancouver, BC. Schmandt, chapter 8. | 
| 9 | Applications of recognition: retrieval in voice docs | Vemuri, S., et al. “Improving Speech Playback Using Time-compression and Speech Recognition.” Proceedings of CHI 2004, Vienna, Austria. Ranjan, A., et al. “Searching in Audio: The Utility of Transcripts, Dichotic Presentation, and Time Compression.” Proceedings of CHI 2006, Montréal, Québec. Whittaker, S., et al. “SCANMail: A Voicemail Interface that Makes Speech Browsable, Readable, and Searchable.” Proceedings of CHI 2003, Minneapolis, MN. Vemuri, S., and W. Bender. “Next-generation Personal Memory Aids.” BT Technology Journal 22, no. 4 (October 2004): 125-138. (PDF) Tucker, S., and S. Whitaker. “Time is of the Eessence: An Evaluation of Temporal Compression Algorithms.” Proceedings of CHI 2006, Montréal, Québec, Canada. | 
| 10 | Discourse | Schmandt, chapter 9. Duncan, S. “Some Signals and Rules for Taking Speaking Turns in Conversations.” Journal of Personality and Social Psychology 23, no. 2 (1972): 283-292. Goffman, Erving. “Replies and Responses.” Chapter 1 in Forms of Talk. Philadelphia, PA: University of Pennsylvania Press, 1981, pp. 5-77. ISBN: 9780812211122. [Preview this chapter in Google Books.] Grosz, B., and C. Sidner. “Attention, Intentions, and the Structure of Discourse.” Computational Linguistics 12, no. 3 (July-September, 1986): 176-204. (PDF - 2.9 MB) Clark, H., and S. Brennan. “Grounding in Communication.” Chapter 7 in Perspectives on Socially Shared Cognition. Edited by L. Resnick, J. Levine, and S. Teasley. Washington DC: American Psychological Association, 1991. (PDF) | 
| 11 | Identity, community, and participation | Nowak, K., and C. Rauh. “The Influence of the Avatar on Online Perceptions of Anthropomorphism, Androgyny, Credibility, Homophily, and Attraction.” Journal of Computer Mediated Communication 11, no. 1 (2005). Lampel, J., and A. Bhalla. “The Role of Status Seeking in Online Communities: Giving the Gift of Experience.” Journal of Computer Mediated Communication 12, no. 2 (2007). Ren, Y., R. Kraut, and S. Kiesler. “Applying Common Identity and Bond Theory to Design of Online Communities.” Organizational Studies 28 (2007): 377. (PDF) Davis, F. “Do Clothes Speak? What Makes them Fashion?” Chapter 1 in Fashion, Culture, and Identity. Chicago, IL: University of Chicago Press, 1994. ISBN: 9780226138091. [Preview this chapter in Google Books.] Yahoo! Design Pattern Library. Skim through the social section of this site. It’s not earth-shattering, but it’s a nice aggregation of common community techniques for representing identity and incenting participation that I think complement the papers in this section nicely from a very practical perspective. | 
| 12 | Presence and being there | Zhao, S. “Towards a Taxonomy of Copresence.” Presence 12, no. 5 (October 2003): 445-455. ( PDF) Casanueva, J., and E. Blake. “The Effects of Group Collaboration on Presence in a Collaborative Virtual Environment.” Proceedings of Eurographics Workshop on Virtual Environments, 2000. (PDF) Hollan, J., and S. Stornetta. “Beyond Being There.” Proceedings of the Conference on Human Factors in Computing Systems 1992, Monterey, CA. Smith, I., and S. Hudson. “Low Disturbance Audio for Awareness and Privacy in Media Spaces.” Proceedings of ACM Multimedia, 1995. (Note: the link to audio samples given in the paper is still good.) Neustaedter, C., S. Greenberg, and M. Boyle. “Blur Filtration Fails to Preserve Privacy for Home-based Video Conferencing.” ACM Transactions on Computer Human Interaction 13 (March 2006): 1-36. | 
| 13 | Managing interruption | Forgarty, J., et al. “Predicting Human Interruptibility with Sensors.” ACM Transactions on Computer-Human Interaction 12, no. 1 (March 2005): 119-146. Avrahami, J., and S. Hudson. “Responsiveness in Instant Messaging: Predictive Models Supporting Interpersonal Communication.” Proceedings of CHI 2006, Montréal, Québec. Horvitz, E., P. Koch, and J. Apacible. “BusyBody: Creating and Fielding Personal Models of the Cost of Interruption.” Proceedings of CSCW 2004, Chicago, IL. Hseih, G., et al. “Can Markets Help? Applying Market Mechanisms to Improve Synchronous Communication.” Proceedings of CSCW 2008, San Diego, CA. | 
 
		 
		 
		 
		 
		 
		 
		