15.071 | Spring 2017 | Graduate

The Analytics Edge

5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

Quick Question

 

Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?

Exercise 1

 Numerical Response 

 

How many commands do you need to run to stem the document?

Exercise 2

 Numerical Response 

 

Explanation

In R, you can clean up the irregularities with two lines:

corpus = tm_map(corpus, tolower)

corpus = tm_map(corpus, removePunctuation)

And you can stem the document with one line:

corpus = tm_map(corpus, stemDocument)

CheckShow Answer

 

Course Info

As Taught In
Spring 2017
Level
Learning Resource Types
Lecture Videos
Lecture Notes
Problem Sets with Solutions