# 5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

## Quick Question

Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?

Exercise 1

Numerical Response

How many commands do you need to run to stem the document?

Exercise 2

Numerical Response

Explanation

In R, you can clean up the irregularities with two lines:

corpus = tm_map(corpus, tolower)

corpus = tm_map(corpus, removePunctuation)

And you can stem the document with one line:

corpus = tm_map(corpus, stemDocument)

CheckShow Answer

#### Learning Resource Types

theaters Lecture Videos
notes Lecture Notes
assignment_turned_in Problem Sets with Solutions