Unlocking Information from Unstructured Data

CogStack ( is a lightweight distributed ecosystem of tools that enable unlocking information from unstructured data such as Electronic Health Records (EHR). Unstructured data can be in multiple formats, PDF, word document, rtf, xml, etc. Natural Language Processing (NLP) engines are available in the ecosystem that can be used to annotate unstructured data and make them searchable by clinicians and researchers in the search engine, i.e Elastic Stacks. Annotated data then can be used for further analysis such as clinical trial patient identification, patient classification, alerting systems, etc.

Monash Helix, Monash Data Science & AI Platform and Monash eResearch have engaged with King’s College London to adapt their CogStack platform for the Australian context and developed collaborations with technical and clinical experts in Australian health organisations to implement CogStack. These collaborations include 1) understanding hospital data, 2) identifying use cases that would most benefit from NLP and AI 3) deploying the CogStack in the hospital environment, 4) developing AI algorithms to use unstructured data and generate meaningful insights to health organisations.

Key Features and Benefits

  • Lightweight distributed ecosystem for data processing.
  • Scalable even on standard IT hardware. Can be deployed as a platform-as-a-service on premise or on cloud.
  • Comprised microservices that can be used per need, including ElasticSearch, Kibana, clinical natural language processing for named entity extraction and linking tools such as GATE, Bio-Yodie and MedCAT, Tesseract for OCR, and Apache Tika for documents to text conversion.
  • Uses Apache Nifi as a data workflow engine that can be configured to use required services for the document processing and NLP.
  • Deployed on NeCTAR Research Cloud (Local Monash Instance)

To date we have deployed the ecosystem in multiple health organisations, pre-processed over 13 million EMR documents and annotated ~10M data using NLP methods.|

