% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sort_tf.R
\name{sort_tf}
\alias{sort_tf}
\title{Find High Frequency Terms}
\usage{
sort_tf(x, top = 10, type = "dtm", todf = FALSE, must_exact = FALSE)
}
\arguments{
\item{x}{a matrix, or an object created by \code{\link{corp_or_dtm}} or 
by \code{\link[tm]{DocumentTermMatrix}}, or \code{\link[tm]{TermDocumentMatrix}}.
Data frame is not allowed. If it is a matrix, the column names (if \code{type} is "dtm") 
or row names (if \code{type} is "tdm") is taken to be terms, see below. If the names 
are \code{NULL}, terms are set to "term1", "term2", "term3"...automatically.}

\item{top}{a length 1 integer. As terms are in the decreasing 
order of the term frequency, this argument decides how many top terms should be returned.
The default is 10. If the number of terms is smaller than \code{top}, all terms are returned.
Sometimes the returned terms are more than \code{top}, see below.}

\item{type}{should start with "D/d" representing document term matrix, 
or "T/t" representing term document matrix.
It is only used when \code{x} is a matrix. The default is "dtm".}

\item{todf}{should be \code{TRUE} or \code{FALSE}. If it is \code{FALSE} (default) 
terms and their frequencies will be pasted by "&" and messaged on the screen, nothing is 
returned. Otherwise, terms and frequencies will be returned as data frame.}

\item{must_exact}{should be \code{TRUE} or \code{FALSE} (default). It decides whether 
the number of returned words should be equal to that specified by \code{top}. See Detals.}
}
\value{
return nothing and message the result, or return a data frame.
}
\description{
By inputing a matrix, or a document term matrix, or term document matrix, this function counts
the sum of each term and output top n terms. The result can be messaged on the screen, so 
that you can manually copy them to other places (e. g., Excel).
}
\details{
Sometimes you may pick more terms than specified by \code{top}. For example, you specify to 
pick up the top 5 terms, and the frequency of the 5th term is 20. But in fact there are 
two more terms that 
have frequency of 20. As a result, \code{sort_tf} may pick up 7 terms. If you want the 
number is exactly 5, set \code{must_exact} to \code{TRUE}.
}
\examples{
require(tm)
x <- c(
  "Hello, what do you want to drink?", 
  "drink a bottle of milk", 
  "drink a cup of coffee", 
  "drink some water", 
  "hello, drink a cup of coffee")
dtm <- corp_or_dtm(x, from = "v", type = "dtm")
# Argument top is 5, but more than 5 terms are returned
sort_tf(dtm, top = 5)
# Set must_exact to TRUE, return exactly 5 terms
sort_tf(dtm, top=5, must_exact=TRUE)
# Input is a matrix and terms are not specified
m=as.matrix(dtm)
colnames(m)=NULL
mt=t(m)
sort_tf(mt, top=5, type="tdm")
}
