Extended Usage Example

To show you how text analysis might work in practice, we're going to work with a text corpus composed of political speeches from American presidents given as part of the State of the Union Address tradition.

using TextAnalysis, MultivariateStats, Clustering

crps = DirectoryCorpus("sotu")

standardize!(crps, StringDocument)

crps = Corpus(crps[1:30])

remove_case!(crps)
prepare!(crps, strip_punctuation)

update_lexicon!(crps)
update_inverse_index!(crps)

crps["freedom"]

m = DocumentTermMatrix(crps)

D = dtm(m, :dense)

T = tf_idf(D)

cl = kmeans(T, 5)