5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

Quick Question

 

Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?

Exercise 1

 Numerical Response 

 

How many commands do you need to run to stem the document?

Exercise 2

 Numerical Response 

 

Explanation

In R, you can clean up the irregularities with two lines:

corpus = tm_map(corpus, tolower)

corpus = tm_map(corpus, removePunctuation)

And you can stem the document with one line:

corpus = tm_map(corpus, stemDocument)

CheckShow Answer

 

Course Info

Learning Resource Types

theaters Lecture Videos
notes Lecture Notes
assignment_turned_in Problem Sets with Solutions