Quick Question
Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?
Exercise 1
Numerical Response
How many commands do you need to run to stem the document?
Exercise 2
Numerical Response
Explanation
In R, you can clean up the irregularities with two lines:
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
And you can stem the document with one line:
corpus = tm_map(corpus, stemDocument)
CheckShow Answer