Course Text

Buy at Amazon Schmandt, C. Voice Communication with Computers: Conversational Systems. New York, NY: Van Nostrand Reinhold, 1993. ISBN: 9780442239350.

Note: This book is now out of print, and is provided below in downloadable PDF form.

Complete book (PDF - 61.5 MB)

Individual chapters

Reading Assignments

In this table, the article titles are linked to publisher abstract pages; if you have subscriber or site license access for the publisher, you will be able to click through to the paper. Additionally, where a public version of the paper has been provided by the author, that is also linked.

1 Introduction and genres of conversation  
2 Components of conversation and speech production

Chapanis, A. "Interactive Human Communication." Scientific American 232 (1975): 36-42.

Chalfonte, B. L., R. S. Fish, and R. E. Kraut. "Expressive Richness: A Comparison Of Speech and Text As Media For Revision." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Orleans LA, 1991.

Isaacs, E. A., and J. C. Tang. "What Video Can and Cannot Do For Collaboration: A Case Study." Proceedings of the First ACM International Conference on Multimedia, Anaheim, CA, 1993.

Schmandt, chapter 1.

Look at these wideband and narrow band spectrogram examples (PDF)

3 Speech production and hearing

Arons, B. "Techniques, Peception and Applications of Time-Compressed Speech." Proceedings of AVIOS 1992. (PDF - 1.9 MB) (Courtesy of Barry Arons. Used with permission.)

Schmandt, chapter 2.

Also: play with Interactive Sagittal Section of the Head

Look at Quileute Alphabet (This resource may not render correctly in a screen reader.PDF)

4 Speech coding Schmandt, chapter 3. (This is longer than the previous chapters - read with care)
5 Accessing recorded speech

Arons, B. "A Review of the Cocktail Party Effect." Journal of the American Voice I/O Society, July 1992. (PDF - 1.7 MB) (Courtesy of Barry Arons. Used with permission.)

Mullins, M., and C. Schmandt. "AudioStreamer: Exploring Simultaneity for Listening." Proceedings of CHI 1995.

Kobayashi, M., and C. Schmandt. "Dynamic Soundscape: Mapping Time to Space for Audio Browsing." Proceedings of CHI 1997. (This resource may not render correctly in a screen reader.PDF)

Schmandt, chapter 4. (skim)

6 Speech synthesis

Spiegel, M. "The Difficulties with Names." Speech Technology Magazine, June 2003.

Pisoni, D. B., H. C. Nusbaum, and B. G. Greene. "Perception of Synthetic Speech Generated by Rule." Proceedings of the IEEE 73 (1985): 1665-1676.

Lai, J., D. Wood, and M. Considine. "The Effect of Task Conditions on the Comprehensibility of Synthetic Speech." Proceedings of CHI 2000, The Hague, Netherlands.

Gong, L., and J. Lai. "Shall We Mix Synthetic Speech and Human Speech?: Impact on User's Performance, Perception, and Attitude." Proceedings of CHI 2001, Seattle, WA. (This resource may not render correctly in a screen reader.PDF)

Lai, J., K. Cheng, P. Green, and O. Tsimhoni. "On the Road and on the Web?: Comprehension of Synthetic and Human Speech While Driving." Proceedings of CHI 2001, Seattle, WA.

Schmandt, chapters 5 and 6.

Also, listen to these historical speech synthesis samples compiled by Dennis Klatt.

7 Review of problem sets and speech recognition intro Schmandt, chapter 7.
8 Hidden Markov models

Rudnicky, A., and A. Hauptman. "Models for Evaluating Interaction Protocols in Speech Recognition." Proceedings of CHI 1991, New Orleans, LA.

Suhm, B., et. al. "A Comparative Study of Speech in the Call Center: Natural Language Call Routing vs. Touch-tone Menus." Proceedings of CHI 2002, Minneapolis, MN.

Walker, M. A., et al. "What can I say? Evaluating a spoken language interface to email." Proceedings of CHI 1998, Los Angeles, CA. (This resource may not render correctly in a screen reader.PDF)

Yankelovich, N., et al. "Designing SpeechActs: Issues in Speech User Interfaces." Proceedings of CHI 1995, Denver, CO. (This resource may not render correctly in a screen reader.PDF)

Marx, M., and C. Schmadt. "MailCall: Message Presentation and Navigation in a Nonvisual Environment." Proceedings of CHI 1996, Vancouver, BC.

Schmandt, chapter 8.

9 Applications of recognition: retrieval in voice docs

Vemuri, S., et al. "Improving Speech Playback Using Time-compression and Speech Recognition." Proceedings of CHI 2004, Vienna, Austria.

Ranjan, A., et al. "Searching in Audio: The Utility of Transcripts, Dichotic Presentation, and Time Compression." Proceedings of CHI 2006, Montréal, Québec.

Whittaker, S., et al. "SCANMail: A Voicemail Interface that Makes Speech Browsable, Readable, and Searchable." Proceedings of CHI 2003, Minneapolis, MN.

Vemuri, S., and W. Bender. "Next-generation Personal Memory Aids." BT Technology Journal 22, no. 4 (October 2004): 125-138. (This resource may not render correctly in a screen reader.PDF)

Tucker, S., and S. Whitaker. "Time is of the Eessence: An Evaluation of Temporal Compression Algorithms." Proceedings of CHI 2006, Montréal, Québec, Canada.

10 Discourse

Schmandt, chapter 9.

Duncan, S. "Some Signals and Rules for Taking Speaking Turns in Conversations." Journal of Personality and Social Psychology 23, no. 2 (1972): 283-292.

Buy at Amazon Goffman, Erving. "Replies and Responses." Chapter 1 in Forms of Talk. Philadelphia, PA: University of Pennsylvania Press, 1981, pp. 5-77. ISBN: 9780812211122. [Preview this chapter in Google Books.]

Grosz, B., and C. Sidner. "Attention, Intentions, and the Structure of Discourse." Computational Linguistics 12, no. 3 (July-September, 1986): 176-204. (This resource may not render correctly in a screen reader.PDF - 2.9 MB)

Clark, H., and S. Brennan. "Grounding in Communication." Chapter 7 in Perspectives on Socially Shared Cognition. Edited by L. Resnick, J. Levine, and S. Teasley. Washington DC: American Psychological Association, 1991. (This resource may not render correctly in a screen reader.PDF)

11 Identity, community, and participation

Nowak, K., and C. Rauh. "The Influence of the Avatar on Online Perceptions of Anthropomorphism, Androgyny, Credibility, Homophily, and Attraction." Journal of Computer Mediated Communication 11, no. 1 (2005).

Lampel, J., and A. Bhalla. "The Role of Status Seeking in Online Communities: Giving the Gift of Experience." Journal of Computer Mediated Communication 12, no. 2 (2007).

Ren, Y., R. Kraut, and S. Kiesler. "Applying Common Identity and Bond Theory to Design of Online Communities." Organizational Studies 28 (2007): 377. (This resource may not render correctly in a screen reader.PDF)

Buy at Amazon Davis, F. "Do Clothes Speak? What Makes them Fashion?" Chapter 1 in Fashion, Culture, and Identity. Chicago, IL: University of Chicago Press, 1994. ISBN: 9780226138091. [Preview this chapter in Google Books.]

Yahoo! Design Pattern Library. Skim through the social section of this site. It's not earth-shattering, but it's a nice aggregation of common community techniques for representing identity and incenting participation that I think complement the papers in this section nicely from a very practical perspective.

12 Presence and being there

Zhao, S. "Towards a Taxonomy of Copresence." Presence 12, no. 5 (October 2003): 445-455. (This resource may not render correctly in a screen reader. PDF)

Casanueva, J., and E. Blake. "The Effects of Group Collaboration on Presence in a Collaborative Virtual Environment." Proceedings of Eurographics Workshop on Virtual Environments, 2000. (This resource may not render correctly in a screen reader.PDF)

Hollan, J., and S. Stornetta. "Beyond Being There." Proceedings of the Conference on Human Factors in Computing Systems 1992, Monterey, CA.

Smith, I., and S. Hudson. "Low Disturbance Audio for Awareness and Privacy in Media Spaces." Proceedings of ACM Multimedia, 1995. (Note: the link to audio samples given in the paper is still good.)

Neustaedter, C., S. Greenberg, and M. Boyle. "Blur Filtration Fails to Preserve Privacy for Home-based Video Conferencing." ACM Transactions on Computer Human Interaction 13 (March 2006): 1-36.

13 Managing interruption

Forgarty, J., et al. "Predicting Human Interruptibility with Sensors." ACM Transactions on Computer-Human Interaction 12, no. 1 (March 2005): 119-146.

Avrahami, J., and S. Hudson. "Responsiveness in Instant Messaging: Predictive Models Supporting Interpersonal Communication." Proceedings of CHI 2006, Montréal, Québec.

Horvitz, E., P. Koch, and J. Apacible. "BusyBody: Creating and Fielding Personal Models of the Cost of Interruption." Proceedings of CSCW 2004, Chicago, IL.

Hseih, G., et al. "Can Markets Help? Applying Market Mechanisms to Improve Synchronous Communication." Proceedings of CSCW 2008, San Diego, CA.