Over the past decade, there has been a steep rise in
the data-driven analysis in major areas of medicine, such as clinical
decision support system, survival analysis, patient similarity analysis,
image analytics etc. Most of the data in the field are well-structured
and available in numerical or categorical formats which can be used
for experiments directly. But on the opposite end of the spectrum,
there exists a wide expanse of data that is intractable for direct
analysis owing to its unstructured nature which can be found in the
form of discharge summaries, clinical notes, procedural notes which
are in human written narrative format and neither have any relational
model nor any standard grammatical structure. An important step
in the utilization of these texts for such studies is to transform
and process the data to retrieve structured information from the
haystack of irrelevant data using information retrieval and data mining
techniques. To address this problem, the authors present Q-Map in
this paper, which is a simple yet robust system that can sift through
massive datasets with unregulated formats to retrieve structured
information aggressively and efficiently. It is backed by an effective
mining technique which is based on a string matching algorithm
that is indexed on curated knowledge sources, that is both fast
and configurable. The authors also briefly examine its comparative
performance with MetaMap, one of the most reputed tools for medical
concepts retrieval and present the advantages the former displays over
 Bodenreider, O. (2004). The unified medical language system (UMLS):
integrating biomedical terminology. Nucleic acids research, 32(suppl 1),
 Lindberg, D. A., & Humphreys, B. L. (1990, November). Concepts,
Issues, and Standards. Current Status of the NLM’s Umls Project: The
UMLS Knowledge Sources: Tools for Building Better User Interfaces.
In Proceedings of the Annual Symposium on Computer Application in
Medical Care (p. 121). American Medical Informatics Association.
 Schuyler, P. L., Hole, W. T., Tuttle, M. S., & Sherertz, D. D. (1993).
The UMLS Metathesaurus: representing different views of biomedical
concepts. Bulletin of the Medical Library Association, 81(2), 217.
 Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS
Metathesaurus: the MetaMap program. In Proceedings of the AMIA
Symposium (p. 17). American Medical Informatics Association.
 World Health Organization. (1992). The ICD-10 classification of mental
and behavioural disorders: clinical descriptions and diagnostic guidelines
(Vol. 1). World Health Organization.
 Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of
the Medical Library Association, 88(3), 265.
 Donnelly, K. (2006). SNOMED-CT: The advanced terminology and
coding system for eHealth. Studies in health technology and informatics,
 McDonald, C. J., Huff, S. M., Suico, J. G., Hill, G., Leavelle, D.,
Aller, R., ... & Williams, W. (2003). LOINC, a universal standard for
identifying laboratory observations: a 5-year update. Clinical chemistry,
 PubMed - NCBI. (n.d.). Retrieved from
 PMC - NCBI. (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/pmc
 Liu, S., Ma, W., Moore, R., Ganesan, V., & Nelson, S. (2005). RxNorm:
prescription for electronic drug information exchange. IT professional,
 Tuttle, M. S., Olson, N. E., Keck, K. D., Cole, W. G., Erlbaum, M. S.,
Sherertz, D. D., ... & Safran, C. (1998). Metaphrase: an aid to the clinical
conceptualization and formalization of patient problems in healthcare
enterprises. Methods of information in medicine, 37(04/05), 373-383.
 Evans, D. A.-W. (1991, April). Automatic indexing using selective NLP
and first-order thesauri. RIAO (Vol. 91, pp. 624-643).
 Savova, G. K.-S. (2010). Mayo clinical Text Analysis and Knowledge
Extraction System (cTAKES): architecture, component evaluation and
applications. Journal of the American Medical Informatics Association,
 Browne, A. C. (2000). The specialist lexicon. National Library of
Medicine Technical Reports, 18-21.
 Aho, A. V. (1975). Efficient string matching: an aid to bibliographic
search. Communications of the ACM, 18(6), 333-340.
 Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., &
Buchanan, B. G. (2001). A simple algorithm for identifying negated
findings and diseases in discharge summaries. Journal of biomedical
informatics, 34(5), 301-310.