USING TEXT MINING IN A QUALITATIVE SYSTEMATIC REVIEW OF DIGITAL HEALTH ENGAGEMENT AND RECRUITMENT – HOW TO SEARCH AND PRIORITISE LARGE TEXT DATASETS
We thought we would share a recent example to illustrate this and perhaps encourage you to tap into the often overlooked and unused potential that text mining offers businesses looking to learn from customer experience and improve market share.
Take a survey form like this. We have all been asked to complete these at the end of a conference or workshop and they almost always contain a section where one ranks an element of the event on a sliding scale corresponding to a satisfaction number. Maybe we had a bad experience and scored “ Likelihood to re-attend our events” as 2 = Quite Poor.
This very helpful information and definitely something to be addressed as a priority but with a score alone how do you go about responding to this feedback to make sure people do re-attend? It is impossible to take meaningful action in light of the score alone because we don't know what exactly they found off-putting about the event.
Luckily there is usually a free text box as well, often left blank but when completed represents the richest source of customer feedback one could wish for. This is were we get comments like:
“I would have preferred demonstrations to be longer and more detailed to explore features more thoroughly.”
“I really didn't really understand the benefit or applications of it.”
Two points that can be quickly resolved to improve the chances of re-attendance. And that might be all that is necessary unless there are many thousands of responses, too many to read, process, and distil anything meaningful with a high degree of confidence.
So jumping from this low frequency example to something more voluminous, say an on-line customer satisfaction survey or call centre logs where it is not unusual to have to process tens of thousands of records, without text mining all that can be done is evaluate the scores.
Say you are a high street retailer concerned about losing market share so decide to run a survey containing a free text box that says: “Tell us what we can do to improve our products and services”.
Well this is a genuine story and here's some real comments we got back:
“I suffered terrible service the previous visit so stayed away ever since.”
“The online service was awful. The website just wouldn't let me place orders, repeatedly. The customer service help, just wasn't any help. Was left with the feeling that you really just didn't want my money.”
“Absolutely terrible, placed order in plenty of time for delivery when it arrived the contents were not useable due to insufficient packaging.”
“I was quite disappointed by a staff member in the local store who was very rude and miserable and when I asked for a new rewards card was told they were too busy to do this.”
And on the positive side:
“The products was just as delightful, scrumptious and definitely worth every penny and the boxes have many uses.”
“Very good, excellent discounts and a good variety of products.”
In this example we got nearly 3.5K snippets like this and a great way of processing these is to automatically read each statement, classify it as expressing positive, negative sentiment or indeed both and assign a strength value to this statement. Then determine the features of the statement to which the sentiment is directed, so packaging, service, website, etc. and create some simple charts to convey the results like so.
By doing so we were able to advise the retailer that they had some significant generic problems with the unavailability of items customers had traveled a long way to purchase, that their on-line buying experience was frustrating to many customers and the delivery firm they used was letting them down. With this analysis we were also able to pinpoint a number of stores where the attitude of staff was turning customers away.
We hope you agree this is impossible to do with the scale ranking element of the survey alone and only by processing the comments in free text do we get a enlightening and sometimes stunning view of customer experience.
Do you get frustrated by the lack of access to the knowledge locked away in your company? Hope in vain the Intranet will return the document you are looking for. Have you ever wondered if there was a better way of finding the information you need to do a proper job?
Well so have we and in response data scientists at Text Mining Solutions have combined the power of text mining with highly reliable search and navigation software to help you get straight to the information that matters to you. Whether it be for customer experience management, business and competitive intelligence, horizon scanning, general document management, etc., text mining offers exciting new opportunities.
“Organisations embracing text mining all reported having an epiphany moment when they suddenly know more than before”. Russom. P. 2007 TDWI. BI Search and Text Analytics.
Our process is simple and designed with one purpose in mind, helping our customers get the job done. Providing access to up to date information, with a clear and organised presentation style, we provide effective search functions and simple navigation to text based insights delivering quality content every time.
Now you can take a look for yourself, just clink on the following link and you will be directed to a demonstration index that contains a small number of corporate documents including memos, technical manuals, marketing brochures, annual reports and health and safety sheets, etc., the kind of things you will have lying around in your own company.
The documents are listed in the main pane and you can scroll down and view the next page at will and you can also search for a specific document by typing your query into the “Find” box, words and phrases are accepted. Give it a try by typing "accidental unauthorized copying" and you will find a document entitled “paper06.pdf. Clicking on any of the associated links opens the full text document. By the way that's not a very helpful document name if you ever want to find it again but if you know the subject matter you are looking for retrieval becomes more straight forward.
So far so good but the clever bit is how these documents have been processed and classified based on certain facets, in this case we used text mining to identify organizations, locations, and the type of document in question plus a date range facet. Hence we can now do a facet search and pick out all the business development type documents, or all the docs that mention a specific company/organisation like the EU, Apple or Facebook for instance, and every time we apply a filter a shorter list of documents is returned making what you are looking for very hard to miss.
But wait, the big news is that you don't have to take what you are given, no! you can now specify what facets are of interest to your business and they can be incorporated in your own customised index. You might like a facet search on person names, or maybe sentiment scores if the documents are survey results, or how about chemical/drug names or doses? So whether you need to extract detailed information on earnings per share, operating expenses, capital expenditure, EBITDA, etc, from financial reports; are managing an investment/IP portfolio and need some market and competitor analysis; or just want to find out if there is any evidence for the presence of Pine Martens on the Isle of Mull, we've got it covered!
Don't put up with poor quality search and navigation, see the bigger picture with TMS and experience that epiphany moment for yourself.
Systematic reviews are necessary to inform evidence-based practice in a wide range of disciplines, with health care being among the most established. Systematic reviews are often time and resource intensive. There are a number of tools available to assist in producing systematic reviews, but it can be challenging to keep up-to-date with the most recent developments in this area.
The Systematic Review Toolbox (SR Toolbox) is the first and, to-date, only dedicated web-based catalogue of tools to support systematic reviews. The resource was developed in response to a lack of easily accessible information about what tools were currently available. Since its launch in May 2014, the SR Toolbox has been received positively by the academic community (particularly across social media) and is actively used by many research staff and students in healthcare. The SR Toolbox has developed a high profile within the systematic review community, most notably establishing links with the Cochrane Collaboration, whose work is recognised as the international gold standard for systematic reviews. The resource was cited in the 2014 #CochraneTech symposium editorial (Elliot et al 2014) and has been presented at a number of conferences and seminars. Furthermore, a webpage on support tools maintained by Cochrane is no longer updated, and now refers visitors to the Systematic Review Toolbox instead.
The Toolbox will be of interest to anyone involved in producing systematic reviews, because it provides easy access to tools which might improve the efficiency of review production. The tools also have potential to be useful in supporting other types of research. The SR Toolbox supports all systematic reviews and is a multi-disciplinary resource and it can also be used as a teaching and learning resource.
York Health Economics Consortium (YHEC), in association with the Systematic Review Toolbox, have organised two linked workshops in York, UK in October 2016, for researchers to learn about software tools currently available, and to share experiences of using tools in practice. The first workshop (Day 1) will review both commercial and not-for-profit systematic review management packages, with sessions from representatives of Covidence, DistillerSR and EPPI-Reviewer. The second workshop (Day 2) will review a range of free and commercial tools to support single tasks within the systematic review process.
The Toolbox editors welcome suggestions of new resources to add to the website and any other contributions to the development of the site: Contact Chris Marshall at email@example.com
While GATE's raison d'être is to create document features for use in machine learning algorithms the new Learning Framework is so elegant it makes the use of GATE for machine learning tasks a highly attractive proposition.
The new plugin facilitates three core applications in addition to export and evaluation features.
Regression, Classification, and Chunking applications are available to choose depending on the problem you are trying to solve and each come with and impressive array of algorithms to select and evaluate. Tuning models can be done by directly specifying an algorithm parameter, cost for example, specifying a Java Class for the algorithm to use, scaling or by amending your n-gram features used for classification or sequence features for chunking.
GATE have helpfully incorporated an array of algorithms from LIBSVM, WEKA and Mallet including CRF, Decision Trees, Max Entropy, Regression algorithms and WEKA's deep learning Multilayer Perceptron algorithm.
For those familiar with GATE it is all very straightforward just select the required PR (processing resource), set the run-time parameters and train a model on your reference corpus before evaluating performance and deploying the application on your specific task. For each run time parameter tool tips are provided to guide the user
We have just recently used the new Learning Framework to classify scientific abstracts in the field of health economics using the Classification PR and obtained very encouraging results when applying the Regression PR to customer surveys in order to understand the level of customer satisfaction.
For the economic evaluation work we used the WEKA J48 classification model as this gave better results in evaluation mode than LIBSVM generating the following document statistics:
Observed Agreement Cohen's Kappa Pi's Kappa
0.9545 0.9545 0.9416
This is a brilliant addition to the existing suite of plugins from GATE and you can either clone the latest version of the Learning Framework from Github at the following link and build it using Ant or alternatively activate the Learning Framework from the CREOLE Plugins Manager.
Learn more about training Machine Learning Models within GATE at the 9th GATE training course from 6-10 June 2016 at the University of Sheffield. https://gate.ac.uk/conferences/fig/fig9.html
Recently exploring the use of D3 for text mining results visualization we generated a few infographics to illustrate the potential of this technique. The pie chart above highlights the most frequently mentioned diseases and disorders described in PubMed abstracts when querying on "Disease". The interactive chord diagram below shows the correlation of key terms in the results set of a telemedicine systematic literature review.
Text Mining Solutions and York Health Economics team up to deliver another exciting training event. On this occasion we will address the challenges of developing search strategies to capture complex topics in large bibliographic databases using different conceptual combinations and search techniques.
If you missed it Advanced Search Strategy Design for Complex Topics training is scheduled to run again in October 2016, but can be delivered at workplaces on-demand (please email firstname.lastname@example.org for details).
Step 1: Utilise Stop-Motion software.
This can be in app form, such as Stop Motion Studio or software downloaded onto a desktop or laptop. You will need to bear in mind how you will take the photos for your video; if it is on your phone then you will need to download the software straight onto this device. The software will help you create a professional outcome by allowing you to extend certain frames, add sounds and transition effects.
Step 2: Find a blank canvas (literally) to create your video.
You will need somewhere that has a white or at least a plain background so that you can manipulate it to your needs.
Step 3: Get acquainted with a flexible tripod.
This will help to keep your shots steady so that the movement is focused on your characters and words rather than on your background.
Step 4: Lights, camera, action.
Although light is normally our friend, when filming it becomes the sworn enemy. Rather than relying purely on natural light or artificial lights from above try and use lamps that you can control and move in relation to your shots. This will help you to stop any shadows appearing (unless of course it is part of your creative image).
Step 5: Release your inner photographer.
Take lots of photos. The end product will look far more realistic if you capture the intricacies of movement, such as the process of bending down to pick something up. So instead of taking two snapshots, one bending down and one standing up, take 6-8 snapshots of the character going to bend down and then straightening up again.
If you would like to see how Stop-Motion filming could be used effectively for your business have a look at our recent blog post on 'How to be your own film director'.
The Eureka moment
Relying on the eureka moment to happen seems a little far fetched in the real world, yet when it comes to unleashing your inner artist we all find ourselves waiting for that very cliché light bulb moment to arrive. Instead what we should all be doing is experimenting with any and ever idea that pops into our heads.
By stripping back my ideas I discovered the filming process, ‘stop-motion’.
What is stop motion filming?
Stop-motion filming is a technique that may initially draw a blank with many but in reality we have far more contact with it. Examples range from the 1960’s Magic Roundabout children series to Wallace and Gromit to television adverts.
The technique of stop motion filming captures movements through individual photos that are then merged together to form frames that are a fraction of a second in length. This creates the illusion that your items/ characters have been brought to life
How can stop-motion filming be used?
As you may have guessed by reading to this point, stop motion, as a filming technique is not for the faint hearted with over 900 photos being captured for a two-minute video. Stop- motion is a process that has to be stuck with until the bitter end. By moving characters or letters slightly in each frame you are able to manipulate the video to exactly your needs. This not only brings your storyboard to life in a relatively fool proof manner but also gives you a unique marketing tool for your business.
Could prostate cancer risk be linked to coffee consumption? Visualisations of text mining analysis of 19,000 medical publications suggest some link is possible, but coffee is just one of many factors that could influence the risk of developing this most common form of cancer in men.
Prostate cancer is a complex topic. Large number of scientific papers have been written as the research continues towards identifying a cure. A key part of the search strategy is to explore the relationship between prostate cancer and other potential influencers– such as forms of treatment or risk factors. For example, is there any evidence to suggest a link between coffee consumption and prostate cancer? Many of these linkages maybe obvious, with large bodies of evidence to support or refute the connection. But what about those curious, unexpected relationships that may lie buried deep within the body of literature, rendered virtually invisible by their rarity or counter-intuitive nature?
Sophisticated analysis of documents by text mining techniques, followed by visualisation of results by ‘knot analysis’ can reveal surprising insights into any topic. ‘Knot analysis’ is one way to illustrate the connections between topics in large, complex document sets. Coloured traces on the diagram relate to specific words or phrases in the texts. All traces start at the same point, and at each instance of the specified word (i.e. ‘prostate’ or ‘men’ in this example), the trace makes a turn. Tightly knotted traces indicate many occurrences of the word in the texts, traces with long straight lines are indicative of lower incidences of the selected word. Traces which lie over each other suggest linkages between topics in the texts, while traces which are separate indicate no linkage between topics (e.g. ‘women’ and ‘prostate’ in the diagram below).
Applying this analysis to large document sets can throw up some intriguing insights, as shown in the second diagram. While prostate cancer, leptin and non-steroidal anti-inflammatory drugs (NSAIDS) are all tightly linked as shown by overlapping traces, the C-reactive protein (CRP) marker for inflammation is also clearly visible.
Innovation often arises through the detection and exploration of unexpected avenues. As humans, there are physical limits as to how much we can read and identify patterns in information – and this imposes a limit on the speed of innovation. Removing the barrier imposed by physical reading through the use of text mining to detect links and relationships in huge libraries of documents, and subsequent visualisation of the output, can reveal insights that would not be visible using conventional methods. The illustrations used in this article give a glimpse of what is possible. While it is true that text mining will not provide a cure for prostate cancer, it could well play a role in accelerating the progress of research in novel and ultimately successful directions.