HomeResearchAI in PsychiatryNLP in Mental Health


Natural Language Processing in Mental Health Research

By Ryan S. Sultan, MD
Assistant Professor of Clinical Psychiatry, Columbia University Irving Medical Center
March 28, 2026

Dr. Ryan Sultan applies natural language processing (NLP) to psychiatric research at Columbia University, where he directs the Mental Health Informatics Lab. Trained under Carol Friedman -- the National Academy of Medicine member who created the MedLEE NLP system at NewYork-Presbyterian -- Sultan uses computational methods to extract clinically meaningful patterns from unstructured psychiatric documentation, including detecting substance use in clinical notes, predicting treatment outcomes, and identifying at-risk patients.


Why NLP Matters for Psychiatry

Psychiatry is, arguably, the medical specialty most dependent on language. We do not have blood tests for depression. There is no imaging study that diagnoses ADHD. The fundamental data of psychiatry -- what patients report, what clinicians observe, what changes over time -- lives in words.

Those words are captured in clinical notes. Progress notes, intake assessments, discharge summaries, therapy documentation, medication management records -- the electronic health record contains millions of pages of unstructured psychiatric text that collectively represent the richest available source of real-world clinical data in mental health.

The problem is that this information is locked inside free text. It is not coded, not structured, not queryable. A clinician's observation that a patient "has been using cannabis daily since the breakup, reporting paranoid thoughts and difficulty sleeping" contains critically important clinical information -- substance use pattern, psychotic symptoms, sleep disturbance, psychosocial stressor -- but none of it is captured in the structured data fields that researchers and quality improvement teams typically analyze.

Natural language processing is the branch of artificial intelligence that solves this problem. NLP enables computers to read, interpret, and extract structured information from human language -- transforming the narrative richness of psychiatric documentation into analyzable data.

What Makes Psychiatric NLP Uniquely Challenging

Applying NLP to psychiatric text is harder than applying it to most other medical specialties. The challenges are substantial:

These challenges are exactly why my training background matters.


Training Under Carol Friedman: The MedLEE Legacy

Carol Friedman, PhD, is a member of the National Academy of Medicine and one of the foundational figures in clinical NLP. She created MedLEE -- the Medical Language Extraction and Encoding System -- at NewYork-Presbyterian Hospital. MedLEE was among the first NLP systems capable of reliably extracting structured clinical information from the unstructured text of medical records.

What made MedLEE groundbreaking was its clinical grounding. Carol Friedman did not build an abstract language processing tool and then try to apply it to medicine. She studied how clinicians actually write -- the shorthand, the implicit knowledge, the contextual assumptions -- and built a system that understood medical language on its own terms. MedLEE could parse radiology reports, clinical notes, and discharge summaries with a level of accuracy that demonstrated NLP's potential to transform clinical research.

Working with Carol Friedman taught me several principles that continue to guide my NLP research:

Lessons from the MedLEE Tradition:

  • Clinical validity comes first. An NLP system that achieves high accuracy on benchmark datasets but fails on real clinical text is useless. Validation must happen on actual clinical data, in the clinical context where the tool will be deployed.
  • Domain expertise is non-negotiable. You cannot build reliable clinical NLP without clinicians deeply involved in system design, training data annotation, and output validation. NLP researchers who treat clinical text as "just another corpus" produce systems that miss clinically important distinctions.
  • Error analysis matters more than accuracy metrics. Understanding where and why an NLP system fails reveals more about the system -- and about clinical language itself -- than aggregate performance statistics.
  • Practical utility is the measure of success. If an NLP system does not answer a question that clinicians or researchers actually need answered, its technical sophistication is irrelevant.

Research Advisors and Collaborators

My NLP research at Columbia is supported by a network of computational and clinical experts who bring complementary expertise:

Adler Perotte, MD, MA -- Machine Learning and Clinical Phenotyping

Adler Perotte is a research advisor at Columbia DBMI (Department of Biomedical Informatics) who specializes in machine learning, clinical phenotyping, and EHR analysis. His work on computational phenotyping -- using algorithmic methods to identify clinical conditions and patient cohorts from electronic health record data -- directly informs my approach to extracting psychiatric patterns from clinical notes.

The intersection of Adler's machine learning expertise and my clinical knowledge of psychiatry is where the most productive work happens. He understands what the algorithms can do; I understand what the clinical questions are. Together, we can build systems that are both technically sound and clinically meaningful.

Thomas McCoy, MD -- Computational Psychiatry and NLP Phenotyping

Thomas McCoy serves as a grant advisor and is based at Massachusetts General Hospital and Harvard Medical School. He is a leader in computational psychiatry, specifically in the application of NLP to psychiatric phenotyping. His pioneering work using NLP to identify psychiatric phenotypes -- depression subtypes, treatment response patterns, suicidality risk markers -- from EHR data has been instrumental in shaping my research methodology.

Tom's work at MGH demonstrated that NLP applied to psychiatric notes could identify clinical patterns that were invisible to traditional research methods relying on billing codes and structured data. His influence on my research is methodological: how to design NLP studies that produce clinically actionable results, how to validate NLP-derived phenotypes against clinical ground truth, and how to navigate the regulatory and ethical complexities of working with psychiatric text data.

Noemie Elhadad, PhD -- DBMI Department Chair

Noemie Elhadad is the chair of Columbia's Department of Biomedical Informatics, with deep expertise in NLP and clinical informatics. As department chair, she has built an environment at Columbia where clinical researchers and computer scientists collaborate on problems that neither discipline could solve independently.

Noemie's own research in patient-generated health data, information extraction from clinical text, and health language understanding has created the intellectual infrastructure at DBMI that makes my NLP work possible. Her leadership ensures that clinical informaticians at Columbia have access to computational resources, interdisciplinary collaboration opportunities, and the institutional support needed for this kind of translational research.


Applications: What NLP Can Do for Psychiatry

The practical applications of NLP in psychiatry are substantial, and most remain underexplored. My Mental Health Informatics Lab at Columbia focuses on several high-impact areas:

Detecting Substance Use Patterns in Clinical Notes

Substance use information is among the most poorly captured data in structured EHR fields. Clinicians frequently document detailed substance use histories in their progress notes -- types of substances, frequency, quantity, route of administration, triggers, consequences -- but this information rarely makes it into the coded data that researchers typically analyze.

This matters enormously for my cannabis research. Understanding population-level patterns of cannabis use -- how potency, frequency, and age of onset relate to psychiatric outcomes -- requires data that exists in clinical notes but not in structured fields. NLP enables extraction of this information across millions of clinical encounters, creating datasets for epidemiological analysis that would be impossible to construct through chart review.

For example, an NLP system can identify that a clinical note contains information about daily cannabis use of high-potency concentrates with onset of paranoid ideation -- and extract the specific details (daily frequency, concentrate form, paranoid symptoms) into structured variables suitable for statistical analysis. Doing this manually across even thousands of notes would be prohibitively expensive and slow. NLP makes it feasible across millions.

Predicting Treatment Outcomes

The language clinicians use in progress notes contains implicit signals about treatment trajectory that are not captured in structured data. Subtle shifts in how a clinician describes a patient -- word choice, level of detail, tone, emphasis -- can predict treatment response, relapse risk, and clinical deterioration before these outcomes become evident through standard monitoring.

This is an active area of research in my lab. We are investigating whether NLP-derived features from longitudinal clinical notes can improve prediction of treatment outcomes for patients with mood disorders, ADHD, and substance use disorders. Early results suggest that documentation patterns do carry predictive information that complements -- and in some cases outperforms -- structured clinical data alone.

Identifying At-Risk Patients

Suicidal ideation, psychotic symptoms, and other high-risk presentations are often documented in narrative clinical notes but missed by structured screening tools. This gap is particularly dangerous because the patients at highest risk may be the ones least likely to endorse symptoms on standardized questionnaires -- they may disclose to a clinician in conversation but not check a box on a form.

NLP-based surveillance systems can continuously scan clinical documentation for language patterns associated with elevated risk -- references to suicidal thoughts, command auditory hallucinations, homicidal ideation, acute substance intoxication -- and flag cases for clinical review. This is not about replacing clinical judgment. It is about ensuring that no patient who has disclosed risk to a clinician falls through the cracks because the information was buried in an unread note.

Research Phenotyping

For my ADHD and cannabis research, NLP enables identification of clinical cohorts from EHR data with far greater precision than billing codes alone. ICD codes are imprecise -- a code for "cannabis use disorder" tells you nothing about potency, frequency, route of administration, age of onset, or clinical severity. NLP can extract these details from clinical notes, creating research cohorts defined by clinically meaningful characteristics rather than billing artifacts.

This capability is essential for the large-scale observational studies that form a core part of my research program at the Sultan Lab. The difference between a study that defines "cannabis users" by ICD code and one that uses NLP to identify daily high-potency users with paranoid symptoms is the difference between noise and signal.


The Mental Health Informatics Lab

My Mental Health Informatics Lab at Columbia operates at the intersection of clinical psychiatry and computational methods. The lab's mission is to apply NLP, machine learning, and clinical informatics to psychiatric data in ways that produce clinically actionable insights.

The lab benefits from Columbia's position as one of the largest academic medical centers in the country. NewYork-Presbyterian Hospital generates an enormous volume of clinical documentation daily, and the Columbia DBMI infrastructure provides the computational tools and data governance frameworks needed to work with this data responsibly.

Current lab projects include:


Ethical Considerations

NLP applied to psychiatric data raises ethical questions that require careful attention. I take these seriously because psychiatric notes contain the most sensitive health information that exists -- trauma histories, substance use details, sexual behavior, suicidal thoughts, family conflicts, legal problems. The stakes of mishandling this data are not abstract.

Algorithmic Bias

NLP systems trained on clinical data will inevitably reflect the biases present in that data. Racial disparities in psychiatric diagnosis are well-documented -- Black patients are more likely to receive psychotic disorder diagnoses and less likely to receive mood disorder diagnoses for equivalent symptom presentations. Gender biases exist in ADHD documentation, with women's symptoms more frequently attributed to anxiety or depression. Socioeconomic factors influence the detail and quality of clinical documentation.

An NLP system that learns from this data without explicit bias mitigation will reproduce and potentially amplify these disparities. Bias auditing across demographic groups is not optional -- it is an ethical requirement for any NLP tool intended for clinical use.

Privacy

Psychiatric notes contain information that, if disclosed, could cause severe harm to patients -- employment discrimination, insurance denial, custody loss, social stigma. NLP systems that process psychiatric text must operate under the strictest data protections, with de-identification, access controls, and audit trails that exceed the minimum requirements of HIPAA.

Clinical Validation

NLP-derived findings must be validated against clinical ground truth before they are used to inform patient care. This means expert clinician review of NLP output, not just comparison to other automated systems. In psychiatry, where clinical judgment is inherently subjective, validation requires particular care -- multiple clinician reviewers, clear annotation guidelines, and transparent reporting of inter-rater agreement.

Transparency

Clinicians and patients have a right to understand how NLP tools reach their conclusions. Black-box systems that flag patients as "high risk" without explaining why are not acceptable in psychiatric care, where clinical trust depends on understanding and shared decision-making. NLP tools must be designed with interpretability as a core requirement, not an afterthought.


Frequently Asked Questions

What is natural language processing in mental health?

Natural language processing (NLP) in mental health is the use of computational methods to analyze, interpret, and extract meaningful clinical information from unstructured psychiatric text. This includes clinical notes, progress reports, intake assessments, and discharge summaries. Because psychiatry relies heavily on narrative documentation rather than lab values or imaging, NLP is uniquely valuable for unlocking clinical insights embedded in psychiatric records that are invisible to traditional data analysis.

Who is Carol Friedman and what is MedLEE?

Carol Friedman, PhD, is a member of the National Academy of Medicine and a pioneer of clinical NLP. She created MedLEE (Medical Language Extraction and Encoding System) at NewYork-Presbyterian Hospital -- one of the first NLP systems capable of reliably extracting structured information from unstructured medical text. MedLEE was foundational to the entire field of clinical NLP. I trained under Carol Friedman at Columbia, and her emphasis on clinical grounding, rigorous validation, and practical utility continues to shape my research approach.

How does Dr. Sultan use NLP in psychiatric research?

I apply NLP to electronic health records at Columbia to detect substance use patterns in clinical notes that are not captured in structured data, predict treatment outcomes based on longitudinal documentation language, identify at-risk patients whose symptoms were disclosed in narrative notes but not flagged by structured screening tools, and perform research phenotyping that defines clinical cohorts by meaningful clinical characteristics rather than imprecise billing codes. This work is conducted through my Mental Health Informatics Lab at Columbia.

What are the ethical considerations of NLP in psychiatry?

The key ethical considerations are algorithmic bias (NLP systems can reproduce racial, gender, and socioeconomic disparities present in clinical documentation), patient privacy (psychiatric notes contain the most sensitive health information), clinical validation (NLP-derived findings must be validated by expert clinicians before informing care), and transparency (clinicians and patients must understand how NLP tools reach their conclusions). These are not secondary concerns -- they are fundamental requirements for responsible NLP deployment in mental health.


Further Reading


ADHD Resources

ADHD Guide
Diagnosis
Medications
ADHD in Women
Children
Self-Assessment

Clinical Content

RSD
ADHD Paralysis
ADHD Burnout
OCD & ADHD
ADHD vs Autism

Research & Publications

Publications
Research Grants
Articles
Presentations
Blog

About & Contact

Profile
CV
Contact
Practice
ADHD Services NYC