Research Projects

I have worked in many different research areas. Below is a description of my primary projects in each of these areas: deepfake detection, machine learning for computational biology, geometric combinatorics, chronic pain, and molecular immunology.

Deepfake Detection

Research conducted at TrueMedia.org with Dr. Oren Etzioni.

Benchmarking & creation of a real-world deepfake dataset

Deepfakes pose a major threat to democracy. Existing academic datasets and SOTA detection methods claim generalizability to real-world deepfakes. Here we show that existing models perform poorly on the challenges of modern deepfakes currently being circulated on social media. We present a new in-the-wild dataset of deepfakes collected in 2024 from social media. Our dataset includes images, videos, and audio AI-generated content. We then evaluated modality-specific SOTA models on our in-the-wild dataset. I lead this project, from dataset labeling decisions to model analysis. I also implement, train, and benchmark SOTA audio models.

Pre-print.

Using contrastive audio models for efficient audio deepfake detection

TSNE plot of LION-CLAP embeddings of Donald Trump speech audio. AI-generated audio is colored orange.

LION-CLAP embeddings of audio clips of Donald Trump. Embeddings of AI-generated audio (orange) cluster separately from real audio samples.

Current audio deepfake detection models are computationally intensive. I developed a new deepfake audio detection model by harnessing large contrastive audio language models that drastically reduces inference time. First, I observed that pre-trained audio model embeddings of fake and real audio from the same person cluster differently. I created a simple binary classifier with pretrained audio model embedding inputs and found that it had a comparable accuracy to SOTA commercial models, with significantly faster inference time. Inspired by UFD image detection, I hypothesized this model would be able to generalize better on deepfakes generated by models unseen during training. I am currently testing this hypothesis.

Ongoing work. Pre-print coming in early 2025

Detecting AI-generated images using diffusion inversion trajectories

Past work has shown that AI-generated images can be differentiated from real images by inverting and reconstructing the images. This approach, known as DIffusion Reconstruction Error (DIRE) compares the difference between an original image and the same image inverted to a noise latent space and reconstructed using Denoising Diffusion Implicit Models (DDIMs). We asked whether the trajectory over which the inversion occurs provides additional signal that improves detection. To do this I concatenated CLIP and DINO embeddings of the reconstructed images and used that as input into a variety of model architectures, from MLPs to feed-forward attention networks. Preliminary results suggest that this approach was able to effectively discriminate between fake and real images. Further some results suggested that the model could discriminate between and images that were included in the diffusion model training dataset and those that were not. Unfortunately due to compute limitations we were not able to continue this line of experimentation.

LLM semantic analysis for audio deepfake detection

We take a novel approach to audio deepfake detection based on semantics. We find that deepfakes can be accurately detected using semantic analysis of audio transcripts with >80% accuracy on real-world deepfake data, performing on par or better than academic audio deepfake detection models and commercially available audio deepfake detection models. This is the first known use of semantic analysis for deepfake detection. I led the technical implementation of this project. I compared the performance of different LLMs on the task of semantic analysis and trained an optimal prompt using DSPy, which increased accuracy by ~20% as compared to manual prompt engineering. To reduce the cost of automated prompt optimization, I ran experiments which showed that evaluating prompts on sub-sampled test data can provide accurate comparisons between different prompts without needing to evaluate full test sets. This semantic analysis approach was productionized and used by clients in the TrueMedia.org application.

Machine Learning for Computational Biology

Research conducted with Dr. Sara Mostafavi at the University of Washington Allen School of Computer Science. This research was supported by the Washington State Research Foundation, the Goldwater Foundation, and the Mary Gates Endowment for Students.

Decoding gene regulation of immune cells with deep learning

Accurately predicting the cellular consequences of genetic variants is necessary to understand genetic disease susceptibility and develop personalized treatment systems. I developed bpAI-TAC, a deep learning model to find DNA sequence patterns which regulate chromosome accessibility across a large collection of different immune cell types. Using propagation-based feature attribution methods, I identified the biological sequence patterns the model learns, allowing me to evaluate the model’s performance by comparing learned patterns to known regulatory protein binding sites, and elucidate novel biological insights about gene regulation in immune cells.

Preprint: Nuria Alina Chandra, Yan Hu, Jason D. Buenrostro, Sara Mostafavi, Alexander Sasse. Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution. bioRxiv 2025.01.24.634804. Jan. 24, 2025.

Preliminary results were also presented at MLCB2023:

Nuria Alina Chandra, Alexander Sasse, Sara Mostafavi. Base-pair resolution learning improves regional chromatin accessibility prediction in immune cells. [Poster]. Machine Learning in Computational Biology (MLCB2023). Nov. 30, 2023. Accepted extended abstract available here.

Geometric Combinatorics

Research conducted with Dr. Rekha R. Thomas in the University of Washington department of mathematics.

Graphical Designs of Path Graphs

Graphs are important for modeling data, such as networks and biological systems. Algorithms derive important information from data in graphs, but running most algorithms on large graphs is infeasibly slow. We need to develop methods to create smaller, more usable graphs. Graphical designs are a method of finding a subset of vertices on a graph that retain characteristics of the original graph. I discovered a closed form for the extremal graphical design of path graphs. I wrote scripts to model designs for path graphs, observed patterns from experimenting, and collaborated with another student to prove the observed patterns were consistent across all path graphs. We then condensed our findings into theorems and wrote up our results so that they could be used for further research exploration.

Chronic Pain

Research conducted with Dr. Jennifer Rabbitts at Seattle Children’s Hospital. This research was supported by the Mary Gates Endowment for Students and the SCAN Designs Foundation.

Predicting post-surgical chronic pain outcomes in children

It is estimated that 80% of children experience moderate to severe pain two weeks post-surgery, and around 20% of those children develop chronic pain, affecting their long-term health and quality of life. Due to enhanced recovery approaches, patients are being discharged from the hospital sooner. Providers needed a tool to identify patients at risk for poor acute pain-related recovery prior to discharge. I analyzed data collected as part of a longitudinal study to determine if a patient’s functional ability as measured by the Youth Acute Pain Functional Ability Questionnaire (YAPFAQ) during hospitalization can predict pain and recovery at two-weeks post-surgery. I compared the predictive ability of YAPFAQ to standard-of-care hospital pain intensity numerical rating score self-report. I worked with my mentors to decide the project aims, I organized and cleaned the data, conducted the statistical analyses, and drafted the statistical methods, results, and figures for publication.

Powelson EB, Chandra NA, Jessen-Fiddick T, Zhou C, Rabbitts J. August 31 2022. A Brief Measure Assessing Adolescents’ Daily In-Hospital Function Predicts Pain and Health Outcomes at Home After Major Surgery. Pain Medicine. 23(9):1469-1475.

A systematic review & meta-analysis of post-traumatic chronic pain

Previous studies have reported prevalence of chronic pain following traumatic musculoskeletal injury (TMsI) ranging from 11% to over 80%. Musculoskeletal chronic pain is associated with lowered quality of life and comorbid mental health disorders. We conducted a systematic review of chronic musculoskeletal pain after trauma to estimate prevalence with meta-analysis, and to provide a new conceptual model for the development of chronic pain and disability following TMsI. I drafted the project objectives and sought input from my mentors and collaborators across disciplines to decide study inclusion and exclusion criteria. I designed the database search of four scientific databases to identify relevant studies. I tested different database queries and evaluated the outputs based on quantity and quality of produced matches; created a data management system for extracting information to determine inclusion eligibility; and reviewed titles and abstracts for inclusion.

Nuria Alina Chandra, Elisabeth B Powelson, Brittany N Rosenbloom, Jennifer A Rabbitts. Prevalence and Predictors of Chronic Pain Following Traumatic Musculoskeletal Injury: A Systematic Review [Talk]. University of Washington Undergraduate Research Program Summer Symposium, Aug. 2020.

Molecular Immunology

Research conducted with Dr. Naeha Subramanian at the Institute for Systems Biology

Engineering an NLR Expression System

NOD-like receptors (NLRs) are a family of cytosolic sensors involved in the innate immune response, the body’s first line of defense against pathogens. Genome-wide association studies have identified NLR gene mutation as a risk factor for a multitude of diseases, however, the biological functions of many NLR proteins remain unknown. The goal of my project was to develop an over-expression system to mimic ligand activation and to ultimately explore the function of 4 poorly characterized NLRs: NLRX1, NLRP7, NLRP9, and NLRP12. With another high school intern, I used a multi-step cloning process to create a drug inducible lentiviral expression system for the NLRs, using restriction-enzyme based ligation and gateway cloning. We performed a transient transfection of a human cell line with our lentiviral vector and found our engineered constructs were able to stably express the NLRs (without excessive protein degradation), and the expression was successfully drug inducible.

Read a blog post about this work here: link

Nuria Alina Chandra, B. Ozhan, Leah Rommereim, Naeha Subramanian. Engineering an NLR Expression System [Poster]. Institute for Systems Biology summer intern symposium. September 2018. Seattle, WA.