What is Text Mining

Businesses collect many types of text data, but it’s difficult to get value out of this information when manual handling is required to process it. It’s impossible to perform these duties at a scale that many companies require when you rely on manual processing, but text mining technology provides the automation needed to accomplish this goal.

At its most basic level, text mining is an automated method of extracting information from written data. There are three major categories that text mining can fall under:

Information extraction: The text analysis software can identify and pull information directly from the text, which is often presented in a natural language form. The software can find data that is the most important by structuring the written input and identifying any patterns that show up in the data set.

Text analysis: This type of text mining analyzes the written input for various trends and patterns and prepares the data for reporting purposes. It relies heavily on natural language processing to work with this information, as well as other types of automated analysis. The business receives actionable insights into their unstructured data.

Knowledge discovery and extraction: Data contained in unstructured sources is processed with machine learning and allows companies to quickly track down relevant and useful information contained in a variety of resources. 

Learn more about how text mining impacts business.

How is Text Mining Used

One of the most common ways to use text mining in a business environment is for sentiment analysis. This use case allows customer experience teams, research professionals, human resource teams, marketers, and other professionals to understand how their audience feels about specific questions or topics. Net Promoter Scores and similar surveys significantly benefit from text mining.

In sentiment analysis, the data will either be positive, negative, or neutral. The software also determines how far in a certain direction this sentiment goes. You can use this information to guide your decision-making, respond to feedback from customers or employees, and to strengthen the data that you collect from other sources.

The type of text mining needed for your business depends on the data that you’re working with, the information that you’re trying to get out of your written sources, and the end use of that analysis. Text mining tools come in many shapes and forms, and the right solution depends on many factors. Here are a few options to consider.

Best Free Text Mining Tools

When you’re not sure exactly how you want to use text mining for your organization, working with a free tool makes a lot of sense. You can experiment with a variety of options to see the ones that provide the best utility and will work with your current infrastructure. Here are some of the best free text mining tools on the market.


Aylien is an API designed for analyzing text contained in Google Sheets and other text sources. You can set this up as a business intelligence tool that’s capable of performing sentiment analysis, labeling documents, suggesting hashtags, and detecting the language that a particular data set is in. One particularly useful feature is that it can use URLs as a source as well, and it’s designed to only extract the text from a web page rather than pulling in all of the content. Your organization would need a development team that can work with the API


Keatext is an open source text mining platform that works with larger unstructured data sets. The system can do sentiment analysis without your company needing to configure a complete text mining solution, which can involve a lot of work on the backend if it’s not cloud-based. In addition to picking up on customer sentiment in the text, it also categorizes responses into broad topics: suggestions, questions, praise, and problems.


Datumbox is not strictly a text mining solution. Instead, it’s a Machine Learning framework in Java that has many capabilities that allow your company to leverage it for this purpose. It’s also an open source service. This platform groups its services into different applications, and here are the ones most relevant to text analysis software.

  • Text Extraction
  • Language Detection
  • Sentiment Analysis
  • Topic Classification
  • Keyword Extraction

This robust framework offers a REST API to use these functions in your custom development projects. It includes many algorithms and models for working with unstructured data. It’s relatively straightforward to work with, although this is better for organizations that have custom development resources available.


KHCoder provides text mining capabilities that support a range of languages. Many text analysis software is limited in the language support available, which is not an ideal situation for companies that operate on a global level, or for those that are in regions where more than one language is spoken at a native level. KHCoder covers 13 major languages, ranging from Dutch to Simplified Chinese.

RapidMiner Text Mining Extension

RapidMiner Text Mining Extension is part of a comprehensive data science platform. This solution is designed for advanced users, such as data scientists and data engineers. You can extract useful information from written resources, including social media updates, research journals, reviews, and others. However, this platform may be overkill for organizations that are not trying to get into the nuts and bolts of data science. It’s easy to get overwhelmed with the functionality, which results in a long implementation and training period. 


Textable is an open source text mining software that focuses on visualizing the insights that you gain. For its basic text analysis functions, you can filter segments, create random text for sampling, and put expressions in place to automate segmenting text data.

For more advanced text mining, you can use complex algorithms that include clustering, look at segment distribution, and leverage linguistic complexity analysis. It can also recode the text that you input into it as needed.

This software is flexible and extendable, although it’s limited to smaller sets of data overall, making it better for smaller businesses than larger organizations. It’s compatible with many technologies, allowing you to use Python for additional scripting. It supports practically any text data format and encoding.

Unlike many other free text mining solutions, Textable is relatively user-friendly and offers a visual interface and built-in functions that cover your typical text mining operations. It has significant support from the developer community, so if your company has questions, support is readily available. This is a relatively beginner-friendly option, with some good features for intermediate users as well. 

Google Cloud Natural Language API

Google offers its own cloud-based Natural Language API for companies that are looking to leverage a robust set of functionalities offered by this tech giant. This is a machine learning platform that supports classification, extraction, and sentiment analysis.

Google has pre-trained its natural language processing models in this solution, so you don’t have to go through that step before extracting your data. It’s deeply integrated with Google’s cloud storage solutions, which can be handy if you’re already using that as a key part of your infrastructure or you need an accessible location to store large volumes of text data.

This text mining solution also supports audio analysis through the Speech-to-Text API and optical character recognition to quickly analyze documents scanned into the system. Another integration that can prove useful is being able to use the Google Translation API in order to get a sentiment analysis run on data sources with multiple languages.

This solution is best for mid to large-sized companies that have advanced text analysis needs and the development team to support custom solutions and models. Smaller companies may not have the resources they need to get enough value out of this platform, and may not have users with enough technical knowledge to get everything set up and operational. 

General Architecture for Text Engineering

General Architecture for Text Engineering, otherwise known as GATE, is a comprehensive text processing toolkit that equips development teams with the resources they need for their text mining needs.

This toolkit requires a team capable of implementing it in the organization through custom development, and there is an active community surrounding it. If you want a strong toolkit for implementing text mining into your own applications, this is a good place to get started. 

The Benefits of Premium Text Mining Tools

Free text mining solutions are useful for discovering what type of capabilities you need for your organization, but they often have a significant outlay for scaling it up to the level that your organization needs in a production environment. Premium text mining tools, such as the Ascribe Intelligence Suite, provide many benefits that provide a strong ROI that more than balances out the costs associated with implementing this solution. Here are several of the benefits that this premium text analysis suite brings to the table.

Code Verbatim Comments Quickly and Accurately

Ascribe Coder simplifies the process of categorizing large sets of verbatim comments. This computer assisted coding solution empowers your staff with the capabilities they need to be highly productive when working their way through survey responses, email messages, social media text, and other sources. Some ways that you can implement this in your organization is through testing advertising copy, studying your Net Promoter Score, working through employee engagement surveys, and doing a deep dive into what people are really saying in customer satisfaction surveys.

Gain Actionable Insights From Customer Feedback

Ascribe CX Inspector goes through sets of verbatim comments and provides complete analysis and visualization, allowing your organization’s decision-makers to quickly act on this information. The automated functionality cuts down the time it takes to get usable insights from your unstructured data sets. An Instant Dashboard makes it easy to look at the information with different types of visualization, and the advanced natural language processing technology is incredibly powerful. Whether you want to use this to go through employee engagement surveys or fuel your Voice of the Customer studies, you have a robust tool on hand.

Automatically Learn More About Your Customer Experience

CX Inspector offers a highly customizable solution for researchers looking for an advanced text analytics utility. Both sentiment analysis and topic classification are included in this part of the suite, discovered with the help of natural language processing and AI. You get to better understand the “Why” behind why your customers gave you a particular score, and how you can improve.

It includes many advanced features including removing personally identifiable information, supporting custom rulesets, working with unstructured and structured data, creating customized applications with an API connector, and results comparison.

The interface is user-friendly and includes drag-and-drop control for developing custom taxonomies.

It’s especially useful for companies who have many languages used in their verbatim comments – this solution supports 100 languages with automatic translation.

Better Understand Your Text Analysis Through Powerful Visualizations

Getting a lot of insights out of your text mining is the first step to truly using them to fuel a data-driven action plan. Ascribe Illustrator allows you to take the next step by providing your organization with powerful and flexible data visualizations that help non-technical stakeholders understand the analytics produced by text mining platforms.

Here are a few of the many visualizations that you can use with this premium text mining software:

  • Real-time visualization
  • Correlation matrix
  • Dynamic visualization
  • Granular report detail
  • Mirror charts
  • Powerful filters
  • Structured data integration
  • Easily import and export data from multiple sources
  • User-friendly dashboards
  • Enhanced Word Clouds
  • Heat maps
  • Co-occurrence charts

While paying for a premium text analytics software may require an upfront investment, you end up getting a significantly higher value out of a platform like Ascribe than you would with free solutions. Your verbatim feedback and other text data contain some of the most valuable data available in your organization. It makes sense to work with a premium product that allows everyone from non-technical stakeholders to data scientists to effectively use it.