We are going to talk about taxonomies as used for text analytics.  But to get to that, let’s first look at what we mean by a taxonomy.  In general terms, taxonomies classify things.  You may remember this from biology classes.  Biologists classify humans into primates, mammals, and then the animal kingdom (skipping a few levels).  But you get the idea.  A human is a primate, a primate is a mammal, and a mammal is an animal.  Taxonomies are an “is-a” hierarchy.  Each level belongs to the next level up with an “I am an instance of” relationship.

Taxonomies are an important tool in natural language processing in general, and in text analytics in specific.

While can have several levels: Animal => Mammal => Primate => Human, in text analytics it is common to have only two levels.   For example, we could have a taxonomy that looks like this:

  • Government agency
    • Internal Revenue Service
    • United States Treasury
    • Office of Management and Budget
  • Commercial enterprise
    • Apple
    • IBM
    • Tesla
  • Non-Profit
    • American Red Cross
    • Sierra Club
    • Doctors without borders

This taxonomy classifies organizations by type.  In the realm of text analytics for customer experience management we are more likely to want to classify specific topics into more general topics.  We can think of this as placing topics into groups.  Supposing we are an airline company, we might want a taxonomy like this:

  • Baggage
    • Bag
    • Luggage
    • Suitcase
  • In-flight technology
    • Wi-Fi
    • Video
    • Movies
  • Airport amenities
    • Lounge
    • Gate
    • Tram

The intent of the taxonomy is to group customer mentions so that we can consider them as a whole.  Supposed we used text analytics to find the topics mentioned in an airline customer satisfaction survey.   The taxonomy above could be used to group the topics into the general areas we want to consider together.

This application of the concept of a taxonomy is really a way of grouping topics into broader concepts that make sense for our particular business.  It is a business domain specific method of grouping or aggregating.  It is a powerful technique for tailoring text analytics findings to a specific business vertical.

When used in this way we can view the taxonomy as a list of groups, each of which has one or more synonyms.  The taxonomy maps synonyms into groups.

As you can imagine from the brief examples, real taxonomies can become very large.  There are techniques to manage this.  First, some software products are capable of creating taxonomies automatically.  These use algorithms that examine the results of the text analytics and attempt to create taxonomies suitable for the analyzed text.  Second, the taxonomy may allow more powerful means of specifying synonyms than simple text matching.  For example, the software may allow the use of regular expressions to specify synonyms.