Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?
How many commands do you need to run to stem the document?
In R, you can clean up the irregularities with two lines:
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
And you can stem the document with one line:
corpus = tm_map(corpus, stemDocument)