Paper Presentations

Paper Presentations

Paper presentations are the heart of a PharmaSUG conference. Here is the list of confirmed paper selections so far. Papers are organized into 14 academic sections and cover a variety of topics and experience levels.  Sections not listed have no confirmed papers yet.

Note: This information is subject to change.  Last updated 28-Dec-2025.

Sections

 

AI in Pharma, Biotech and Clinical Data Science

Paper No.Author(s)Paper Title (click for abstract)

AI-101 Sri Pavan Vemuri How to Train your Dragon – An AI-Powered Clinical Trials Plot Generator for Oncology Studies
AI-103 Mayank Singh Data Without Borders: CDISC Data Hub for Multi-Language Clinical Analytics & AI
AI-123 Samiul Haque Building a Model Context Protocol Server for AI-Driven SAS Workflow Automation
AI-125 Pavan Kumar Tatikonda
& Ryo Nakaya
& Sravan Kongara
Enhancing ADaM Specification Validation and Generation of SAS Codes Using LLM through Amazon Bedrock: A Practical Framework
AI-135 Kevin Lee
& Nathan Lee
The Next Frontier of Statistical Programming: Vibe Coding with AI Coding Agents into SAS, R & Python
AI-141 Nattawit Pewngam
& Chotika Chatgasem
& Titipat Achakulvisut
Accelerating CDISC-SEND Conversion with LLM-RAG
AI-157 Louise Hadden Tips and Considerations for Preparing Health Data for Efficient and Accurate AI and LLM Modelling

 

Advanced Programming

Paper No.Author(s)Paper Title (click for abstract)

AP-011 Richann Watson Take CoMmanD of Your Log: Using CMD to Check Your SAS® Program Logs
AP-108 Jennifer McGrogan
& Mario Widel
The Problems Surrounding Rounding
AP-124 Bart Jablonski SAS Packages – an Ask About Anything Game
AP-127 Keith Shusterman
& Mario Widel
Implementing Laboratory Toxicity Grading for CTCAE Version 6 and Beyond
AP-128 Sharad Chhetri
& Ryo Nakaya
proc pharmaforest data=open_source out=- ;
AP-136 Stephen Sloan 2026 Efficiency Techniques in SAS 9.4
AP-148 Lisa Mendez
& Richann Watson
The Tipsy Hangover: Avoiding Indent Headaches in SAS Reports
AP-158 Jayanth Iyengar Applications of PROC COMPARE to parallel programming and other projects

 

Advanced Statistical Methods

Paper No.Author(s)Paper Title (click for abstract)

AS-160 Sumanjali Mangalarapu
& Chuqing Chen
& Anilkumar Anksapur
Oncology Solid Tumor Subcutaneous vs Intravenous Late-stage Study Analysis

 

Career Development, Leadership & Soft Skills

Paper No.Author(s)Paper Title (click for abstract)

CD-138 Zhen (Laura) Li Perspectives on Leading Effectively in Platform Trials: Leadership and Technical Approaches

 

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

Paper No.Author(s)Paper Title (click for abstract)

DS-131 Elizabeth Dennis
& Grace Fawcett
CTCAE v6.0: The Good, the Bad, and the Ugly
DS-154 Murali Kanakenahalli
& Vamsi Kandimalla
Navigating the Statistical Programming Strategies for Cytokine Release Syndrome (CRS) and ICANS in Oncology Clinical Trials.

 

Data Visualization & Interactive Analytics

Paper No.Author(s)Paper Title (click for abstract)

DV-105 Chunting Zheng
& Margaret Huang
& Xindai Hu
A Standardized R Graph Library for Production-Ready Analysis Figures
DV-120 Ilya Krivelevich Swimmer Plots – Some Practical Advice

 

Emerging Technologies (R, Python, GitHub etc.)

Paper No.Author(s)Paper Title (click for abstract)

ET-140 Kevin Lee Unleash the R-volution: A Blueprint for Building Package Validation Capabilities in our own organization

 

Study Data Integration & Analysis

Paper No.Author(s)Paper Title (click for abstract)

SI-109 Jingyuan Chen Insights and Experience Sharing with Patient-Reported Outcome Data Analysis in FDA’s Submission

 

Tools, Tech & Innovation

Paper No.Author(s)Paper Title (click for abstract)

TT-118 Shih-Che (Danny) Hsu
& Wei Qian
Automated Quality Checks for SDTM and ADaM Datasets Using R Shiny
TT-130 Jason Su My DIY Swiss Army Knife of SAS® Procedures: A Macro Approach of Forging with My Favorite PROCs
TT-132 Zhongan Chen A fully automated PDF solution using SAS without third-party PDF tools
TT-139 Jyoti (Jo) Agarwal From ChatGPT to Copilot: Evolving AI Support in SAS and Beyond
TT-147 James Sun Bridging the Gap: Table-Driven SAS Programming as a Pathway to AI in Clinical Trials Statistical Programming
TT-156 Jyoti (Jo) Agarwal SAS to R: A Practical Bridge for Programmers

 

 


Abstracts

AI in Pharma, Biotech and Clinical Data Science

AI-101 : How to Train your Dragon – An AI-Powered Clinical Trials Plot Generator for Oncology Studies
Sri Pavan Vemuri, Regeneron

This paper introduces a framework for reliable AI-powered code generation, focused on oncology clinical trial swimmer plots. The architecture uses structured prompt engineering, validation checkpoints, and guardrails to reduce hallucinations and ensure reproducible, compliant outputs. While demonstrated with python, Oncology visualizations, the approach extends to statistical programming, and other domains requiring consistent AI results.This work represents the first step in a broader project that aims to cover all Oncology trial visualizations.

AI-103 : Data Without Borders: CDISC Data Hub for Multi-Language Clinical Analytics & AI
Mayank Singh, Johnson and Johnson MedTech

In the evolving landscape of clinical research, fragmented data environments hinder rapid insights, cross-study analysis, and regulatory compliance. This paper introduces a scalable and flexible approach for centralized clinical data repositories, designed to be agnostic to underlying relational database systems- “our implementation leverages Amazon (AWS) Redshift. The framework employs structured SDTM schemas and a dynamic approach for ADaM. Automated, language-agnostic ETL (Extraction, Transformation, Loading) pipelines facilitate seamless data access across SAS, R, Python, and SQL, supporting advanced analytics, machine learning, and meta-analyses. By transforming traditional study-specific storage into an integrated ecosystem, this framework addresses data inconsistency and silos, promotes collaboration among multidisciplinary teams, and ensures compliance with industry standards. The proposed solution empowers clinical organizations to accelerate scientific discovery, foster innovation, and adapt to evolving data standards- “paving the way for a future of truly borderless clinical data analytics.

AI-123 : Building a Model Context Protocol Server for AI-Driven SAS Workflow Automation
Samiul Haque, SAS Institute

This paper introduces sastool-mcp, a lightweight Model Context Protocol (MCP) server that enables clinical programmers and statisticians to interact with SAS Viya through AI assistants such as Claude Desktop (Sonnet 4.5) and GitHub Copilot. Built using FastMCP and SASPy, the implementation exposes SAS capabilities as MCP tools-allowing AI systems to execute SAS programs, inspect libraries, debug errors, and iteratively refine code in real time. In this work, we demonstrate a fully customizable and open approach that clinical programmers and engineering teams can reuse to build their own MCP servers, integrating SAS with AI safely within enterprise environments. The design pattern is simple and extensible: define Python functions as MCP tools, route them through SASPy for SAS execution, and securely return LOG and ODS outputs. Teams can easily extend this template to add new AI-powered automation such as ADaM and SDTM validation tools, TFL generation routines, log quality checks, SAS macro helpers, CDISC compliance services, and MLOps workflows in Viya. By bridging modern AI orchestration with regulated SAS analytics, sastool-mcp provides a practical foundation for building AI-assisted clinical programming infrastructure that accelerates delivery, improves code quality, and enhances reproducibility- “while still preserving control, auditability, and compliance expectations.

AI-125 : Enhancing ADaM Specification Validation and Generation of SAS Codes Using LLM through Amazon Bedrock: A Practical Framework
Pavan Kumar Tatikonda, Takeda
Ryo Nakaya, Associate Director
Sravan Kongara, TAKEDA

Accurate and well-documented ADaM specifications are crucial for reliable clinical data analysis, yet their manual creation and validation remain time-consuming, subjective, and error-prone. As clinical data complexity grows and regulatory standards change, there is an increasing demand for intelligent, scalable tools that help programmers validate derivation logic and maintain programming consistency. This paper introduces a practical GenAI-powered framework that automates the review and improvement of ADaM programming specifications using Claude through Amazon Bedrock. The framework reads ADaM specifications from an Excel source, queries the large language model to assess derivation logic, and provides structured validation feedback along with SAS code suggestions. Designed to emulate the output standards of traditional SAS workflows, the tool also generates a complete SAS program(.sas) for direct testing and logs results in a log file (.log) to enhance auditability. The approach balances flexibility and automation- “handling real-world variation in specification formats while ensuring traceability and reproducibility. This paper outlines the technical design, prompt engineering strategies, and error-handling techniques developed to incorporate large language model (LLM) capabilities into the validation workflow. Lessons learned from practical application are shared, highlighting both opportunities and limitations of using Generative AI in clinical programming. By connecting statistical programming and GenAI, this work provides an early glimpse into how modern tools can improve the quality, consistency, and efficiency of clinical deliverables in a regulated setting.

AI-135 : The Next Frontier of Statistical Programming: Vibe Coding with AI Coding Agents into SAS, R & Python
Kevin Lee, Clinvia
Nathan Lee, Clinvia

The rise of Gen AI is revolutionizing how coding is performed across industry, and “vibe coding” stands at the forefront of this transformation. Coined from Andrej Karpathy’s idea of “embracing the vibes” of AI-assisted coding, vibe coding represents a seamless flow between human logic and AI coding agents- “where programmers prompt, review, and collaborate with AI to produce code efficiently and intelligently. As an example, Microsoft CEO Satya Nadella estimated in April 2025 that 20% to 30% of Microsoft’s code was generated by AI. The presentation explores how vibe coding will reshape programming in Biometrics, where SAS, R, and Python remain essential. It illustrates how conversational AI tools (ChatGPT, Gemini, Claude), AI-native IDEs (Cursor, Windsurf, GitHub Copilot) and customized AI Coding Agents and Agentic workflow enhance traditional workflow – automating coding, debugging, and validation processes while preserving scientific and regulatory rigor. The talk introduces customized Vibe Coding Agents and Agentic workflow built for Biometrics, integrating CDISC, ADaM, and TLF standards with GxP-compliant validation frameworks. Real-world examples demonstrate how these systems accelerate programming cycles by 25- 40%, improve documentation, and lower technical barriers across languages in SAS, R and Python. While the benefits are profound such as productivity gains, democratization of coding, and cross-functional collaboration, the presentation also addresses risks such as AI hallucination, compliance, and over-reliance without human oversight. Finally, it offers a forward-looking view of the “AI-augmented biometrics team,” where statistical programmers evolve into AI managers and collaborators, driving innovation and quality in clinical research and development.

AI-141 : Accelerating CDISC-SEND Conversion with LLM-RAG
Nattawit Pewngam, Ravis technology
Chotika Chatgasem, Ravis Technology
Titipat Achakulvisut, Department of Biomedical Engineering, Faculty of Engineering, Mahidol University

The Standard for Exchange of Nonclinical Data (SEND), developed by the Clinical Data Interchange Standards Consortium (CDISC), defines the standardized structure and format for submitting nonclinical study data to regulatory authorities. Converting extensive unstructured study materials, often consisting of reports, tables, and scanned documents, into SEND-compliant data sets remains a manual, error-prone, and time-consuming process that relies on repetitive data entry. This inefficiency reduces consistency, traceability, and overall regulatory readiness. Here, we introduce CDISC-SEND Conversion platform, an automated framework that integrates large language models (LLMs) with retrieval-augmented generation (RAG), to streamline and standardize this data transformation. Our platform rapidly normalizes and maps unstructured study content into SEND structures defined in the SEND Implementation Guide (SENDIG v3.1.1). Controlled terminology and sponsor metadata are retrieved dynamically to produce traceable, auditable, and standards-compliant mappings that demonstrate conformance and regulatory alignment. An expert review stage enables human validation and ensures accuracy before final data set approval. Results show that the workflow reduces preparation time from several weeks to less than a day while improving data consistency, and strengthening the key quality dimensions of completeness, structure, conformance, and format. Although originally developed for nonclinical SEND, the same architecture extends to clinical Study Data Tabulation Model (SDTM), providing a scalable and regulatory-aligned framework for AI-driven data standardization.

AI-157 : Tips and Considerations for Preparing Health Data for Efficient and Accurate AI and LLM Modelling
Louise Hadden, Cormac Corporation

Large language models (LLMs) and AI systems present new opportunities for pharmaceutical research and clinical insight generation from data derived from electronic medical record systems (EMRs). However, health-related data- “especially data streams with open-ended narratives and diverse coding systems- “requires rigorous preparation for use in compliant, accurate, and efficient AI workflows. This paper outlines a practical framework for preparing transcoded EMR data for AI models and LLM use within health analytics pipelines. Topics include data cleaning, normalization, de-identification, prompt engineering, and iterative refinement cycles. Two use cases are explored: (1) detecting behavioral health issues from free-text ‘other specify’ fields, and (2) linking disparate medical coding systems (SNOMED, ICD-10, Common Formats, FHIR/HL7, etc.) Implementations using SAS, AWS AI tools, and open source software are discussed.

Advanced Programming

AP-011 : Take CoMmanD of Your Log: Using CMD to Check Your SAS® Program Logs
Richann Watson, DataRich Consulting

Regardless of the industry, part of writing a SAS® program is to ensure that the log is free of any unwanted log messages. When running the program in an interactive SAS session, we can review the log as we execute the program and SAS is good about highlighting ERROR and WARNING messages using colors to draw the eye. Other types of unwanted log messages, such as INFO, uninitialized, character to numeric conversion, may not be so easily spotted. When running the program in batch, each program needs to be opened and scanned for unwanted log message, which is tedious and prone to overlooking a message. There have been several papers illustrating the creation of macros that will check the logs by parsing the logs after the programs have been executed. While these macros are great when you are running a lot of programs for a deliverable and need to check all the logs, these check log macros are not necessarily ideal during development. It is during development that we need to ensure the program is running clean. Although we could possibly use the same macro that is used to check all the programs and filter to run on one program, that would require us to run an extra program. What if there is an easier way? This paper demonstrates the use of the command line interpreter to execute the program in batch as well as check the log and provide a summary.

AP-108 : The Problems Surrounding Rounding
Jennifer McGrogan, Biogen
Mario Widel, Independent

Rounding is not a novel concept- “examples can be found in ancient civilizations, such as approximations made by the Mesopotamians. Since then, the need for rounding has not diminished, it has rather increased. It was necessary long before the use of computers for numerical calculations and introducing their use has made the rounding process more complex and indispensable. Based on the referenced literature and our own experience, we will show typical problems including: 1. Rounding is unavoidable, for a. table readability, b. validation of results, c. keeping results within reasonable precision, and d. ensuring accurate assignment of CTCAE toxicity grades. 2. Computer representation of numeric results may a. affect number precision, b. be inadvertently rounded, c. introduce calculation errors due to rounding, or d. cause compare differences on numbers that appear identical. Throughout this paper we will cover some impacts as well as mitigations and solutions with respect to clinical analysis.

AP-124 : SAS Packages – an Ask About Anything Game
Bart Jablonski, yabwon

Modern data focused languages, like R or Python, have vast ecosystems for building packages. Those environments allow their users to share their ideas, inventions, and code in an easy, almost seamless way. Unfortunately SAS, with its profound and historically well-established impact on data analysis, have not embraced such a marvelous idea yet. The article covers the following topics: what are SAS packages, how to use and develop them, and how to make code-sharing a piece of cake, and of course, what opportunities, possibilities, and benefits SAS packages bring to the community of SAS programmers. Additionally, the article will provide a list of frequently asked questions about SAS packages and answers to those questions too.

AP-127 : Implementing Laboratory Toxicity Grading for CTCAE Version 6 and Beyond
Keith Shusterman, Disc Medicine
Mario Widel, Independent

CTCAE Version 6.0 has been released, further clarifying toxicity grade terms and definitions from Version 5.0. However, CTCAE is not the only standard for toxicity grading. Other criteria also exist, including the grading criteria from the Health and Human Services Division of AIDS (DAIDS) for adult and pediatric adverse events, as well as the FDA guidance on the toxicity grading scale for healthy adult and adolescent volunteers enrolled in preventive vaccine clinical trials. We will show a flexible method for deriving CTCAE grades that can handle cases where the grading derivation requires information external to the lab value itself, such as the FDA toxicity grading guidance, as well as any existing version of CTCAE, including versions up to 6.0.

AP-128 : proc pharmaforest data=open_source out=- ;
Sharad Chhetri, Takeda
Ryo Nakaya, Associate Director

The recent surge in the use of R and open-source software in drug development has sparked a powerful movement focused on reducing redundancy, fostering collaboration, and driving innovation. Notably, this shift has produced an unexpected- “but positive- “side effect: it has inspired many long-time SAS users to adopt the same spirit of openness and community sharing. Increasingly, SAS professionals are recognizing the strength of collective knowledge and the value of contributing their own work as open-source code. Emerging from this momentum is PharmaForest- “a unified repository of SAS packages built upon the SAS Packages Framework (SPF). This initiative is dedicated to accelerating open collaboration among SAS users and cultivating a vibrant, sustainable community- “one where ideas, code, and best practices can flourish together, much like a living forest. This presentation will introduce PharmaForest, explore how it bridges the traditional SAS ecosystem with the open-source movement, and highlight its transformative role in the future of collaborative pharmaceutical programming.

AP-136 : 2026 Efficiency Techniques in SAS 9.4
Stephen Sloan, Dawson D R

Using space and time efficiently has always been important. We want to use available space without having to obtain new servers or other resources, and without deleting variables or observations to make the SAS data sets fit into the available space. We want our jobs to run more quickly to reduce waiting times and ensure that scheduled job streams finish on time and successor jobs are not unnecessarily delayed. Internal mainframe billing algorithms have always rewarded efficiency. As we move toward cloud computing efficiency will become even more important because the billing algorithms in cloud environments charge for every byte and CPU second, putting an additional financial premium on efficiency. Sometimes we are in a hurry to get our jobs done, so we don’t pay attention to efficiency, sometimes we don’t know at the start how much time and space our jobs will use (and the important time is the time allocated to our assignment), and sometimes we’re asked to go into existing jobs and make changes that are seemingly incremental but cause large increases in the space and/or time required. Finally, there can be jobs that have been running for a long time and “if it ain’t broke, don’t fix it” because we don’t want to cause the programs to stop working, especially if they’re not well-documented. With a good knowledge of SAS® Base, we can help our organizations optimize the use of space and time without causing any loss of observations or variables or change in program results.

AP-148 : The Tipsy Hangover: Avoiding Indent Headaches in SAS Reports
Lisa Mendez, Army MEDCOM
Richann Watson, DataRich Consulting

Creating visually appealing SAS reports shouldn’t feel like recovering from a formatting hangover- “but when it comes to hanging indents in REPORT procedure, things can get a little… tipsy. In this paper, we serve up a curated cocktail of techniques using a real-world example to help you straighten out those stubborn hanging indents. We’ll explore multiple ways to achieve indentation in PROC REPORT using ODS options and clever preprocessing methods. You’ll learn why TWIPS (twentieth of a point) matter more than you think, and how understanding them can help you fine-tune your layout with precision. We’ll also explain why ODS ESCAPECHAR and the SPLIT option should never share a drink, and how to pre-process variables with line breaks and non-breaking spaces for smoother text flow. As always, we provide a curated list of references to help with other PROC REPORT formatting tips. Whether you’re formatting footnotes, crafting multi-line cells, or just trying to avoid the dreaded indent misalignment, this paper offers tips, tricks, and troubleshooting advice to help you sober up your reports and keep your formatting headache-free and avoid the dreaded hangover.

AP-158 : Applications of PROC COMPARE to parallel programming and other projects
Jayanth Iyengar, Data Systems Consultants LLC

PROC COMPARE is a valuable BASE SAS® procedure which is used heavily in the Pharma industry and other areas. By default, the capability of PROC COMPARE is to reconcile two data sets to determine if they have equivalent sets of records and sets of variables. In the clinical field and elsewhere, PROC COMPARE is often used to validate data sets in projects which involve parallel programming, where programmers independently perform the same tasks. In this paper, I will discuss the role PROC COMPARE plays in different SAS tasks, including DATA STEP merges, parallel programming, generation data sets, and more.

Advanced Statistical Methods

AS-160 : Oncology Solid Tumor Subcutaneous vs Intravenous Late-stage Study Analysis
Sumanjali Mangalarapu, Merck
Chuqing Chen, Merck
Anilkumar Anksapur, Merck

This paper presents practical programming approaches for analyzing a subcutaneous versus intravenously administered therapeutic, with pharmacokinetics (PK) designated as the primary endpoint. It focuses on the key Analysis Data Model (ADaM) datasets creation, their mock shells and programming logic used for key summary tables supporting the endpoints. Key content covers primary and secondary endpoints. Because PK as the primary endpoint is uncommon for late-stage solid tumor oncology trials, the programming required tailored analytic strategies, including model-based PK parameter estimation and specific endpoint derivations. This paper concludes that effective ADaM implementation and analyses require structured cross functional collaboration between analysis and reporting (A&R) and PK/PD programming teams with oversight from statisticians from the respective programming groups.

Career Development, Leadership & Soft Skills

CD-138 : Perspectives on Leading Effectively in Platform Trials: Leadership and Technical Approaches
Zhen (Laura) Li, AstraZeneca

Platform trials- “run under a master protocol with sub study-specific protocols- “accelerate innovation by adaptively evaluating multiple therapies within a unified, evolving framework, presenting distinctive technical and leadership challenges. This presentation draws on my experience as a product lead programmer for an in house platform trial and shares how I addressed these challenges while leading a study programming team. On the technical side, establishing and maintaining programs and specifications that balance consistency with adaptability is foundational to high quality deliverables. Practical examples will show how these approaches flex to diverse analysis requirements and evolving protocols. On the leadership side, effective study management is grounded in comprehensive planning, thoughtful resource allocation, and proactive, solution focused communication, and supported by fit for purpose tools that streamline workflow and collaboration. Cultivating a collaborative, adaptive, and growth oriented culture, alongside structured support for study programmers at varying experience levels, helps the study programming team navigate steep learning curves, resolve blockers, manage pressure, and sustain continuous learning. Real world examples will illustrate these approaches in action. The presentation shares practical experiences and approaches for leading a study programming team toward resilience, productivity, and continuous improvement in adaptive platform trials.

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

DS-131 : CTCAE v6.0: The Good, the Bad, and the Ugly
Elizabeth Dennis, EMB Statistical Solutions, LLC
Grace Fawcett, Syneos Health

The National Cancer Institute’s Common Terminology Criteria for Adverse Events (CTCAE) is an important tool for reporting the severity of adverse events as grades. In many clinical trials, these grades are applied to lab results, and often are programmatically determined in the production of the ADaM lab results dataset. In 2025, version 6.0 of CTCAE was released. As compared to version 5.0, it contains additions, deletions, and revisions. The release included an Excel version, with a tab that shows a comparison to version 5.0, and the type of change. These criteria have evolved over time, with many of the criteria becoming clearer. Unfortunately, some ambiguities remain, which leave questions on how the grades should be programmatically determined. This paper will walk through some of the grades that have been revised and now have clear guidelines, and others where the interpretation is murky. An overview of the ADaM bi-directional toxicity variables, the possible limitation in common lab shift tables will be discussed.

DS-154 : Navigating the Statistical Programming Strategies for Cytokine Release Syndrome (CRS) and ICANS in Oncology Clinical Trials.
Murali Kanakenahalli, Kite Pharma
Vamsi Kandimalla, Kite Pharma

Cytokine Release Syndrome (CRS) and Immune Effector Cell-Associated Neurotoxicity Syndrome (ICANS) are two of the most critical and often dose-limiting toxicities associated with novel immunotherapies, particularly CAR T-cell therapy within oncology clinical trials. Accurate and standardized reporting of these adverse events (AEs) is paramount for assessing the risk-benefit profile of these revolutionary treatments and ensuring patient safety. This paper provides a comprehensive review, from the statistical programming perspective, of the evolution of data collection, standardization, and reporting of CRS and ICANS. We first establish the clinical context and critical importance of these syndromes. We then delve into the challenging pre-MedDRA era, illustrating how programmers employed complex, symptom-based logic to link Adverse Event (AE) data with separate, customized syndrome-tracking Case Report Forms (CRFs) to generate meaningful safety metrics. The paper then details the pivotal shift following the inclusion of explicit CRS and ICANS terms in the Medical Dictionary for Regulatory Activities (MedDRA), which necessitated major redesign of CRFs, moving from separate CRFs to codified severity scales. Crucially, we outline robust programming strategies for mapping these complex data structures into regulatory submission standards, specifically detailing the best practices for Study Data Tabulation Model (SDTM) mapping, Analysis Data Model (ADaM) derivations (including time-to-onset and maximum severity calculations), and the construction of standardized Tables, Listings, and Figures (TFLs) required for regulatory submission packages. The insights shared are designed to equip statistical programmers with the necessary framework to handle the complexities of ensuring data integrity, traceability, and effective communication of these high-stakes safety endpoints.

Data Visualization & Interactive Analytics

DV-105 : A Standardized R Graph Library for Production-Ready Analysis Figures
Chunting Zheng, Syneos Health
Margaret Huang, Vertex Pharmaceuticals, Inc.
Xindai Hu, Vertex Pharmaceuticals Inc

Production of Clinical Study Report (CSR) figures require strict adherence to internal standards, including exact specifications for fonts, layouts and aesthetics. Traditionally, programmers implement the same code blocks for each new figure, leading to redundancy, inefficiency, and increased risk of inconsistency. To address this challenge, we developed a comprehensive R graph library that automates the creation of actual analysis figures from standardized templates for common graph types such as scatter, bar, box, swimmer, line, spaghetti, forest and Kaplan-Meier plots. Each template encapsulates a uniform structure and theme, ensuring consistency across projects while minimizing the need to manually reproduce code. At the same time, flexibility is preserved through high-level parameters (e.g., point size, opacity, text size, model type, formula) and customized arguments, which accept additional layers for plot-specific customization. This design is based on the R ggplot2 package, however, it allows users to produce production-ready CSR figures with concise, readable code, while maintaining alignment with industry practices. To demonstrate the utility of this approach, we provide side-by-side comparisons showing the difference between R and SAS code. Our R graph library reduced code complexity, improved reproducibility, and made readability clearer across many CSR graph scenarios. Our templated, automation-ready system balances efficiency, flexibility, and regulatory compliance which ultimately streamlines the generation of high-quality CSR graphics. Keywords: R programming, ggplot2, automation, uniform, CSR figures, reproducibility, efficiency, accurate, data visualization

DV-120 : Swimmer Plots – Some Practical Advice
Ilya Krivelevich, Eisai Inc.

Graphs are an integral part of modern data analysis of clinical trials. Viewing data in a graph along with the tabular results of a statistical analysis can greatly improve understanding of the collected data. Visualized data can very often be the most informative way to understand the insights from the results. Swimmer plots are an effective graphical presentation of subject status and longitudinal data such as duration of treatment, dose changing, occurrences and durations of events. This type of graph is usually very popular in the early phases of drug development (Phase I / Phase II). An essential objective for medical monitors is to make it possible to visually review when specific medications were administered in response to specific safety and efficacy information throughout the study duration. This visual representation is crucial for tracking patient responses and treatment safety/efficacy. Each subject is represented by an individual horizontal bar (lane). There are many possibilities, with the main restrictions being considerations of readability and not overloading the plot with too much information. This paper aims to provide some practical advice on how to overcome such restrictions and make enhanced swimmer plots more readable and informative.

Emerging Technologies (R, Python, GitHub etc.)

ET-140 : Unleash the R-volution: A Blueprint for Building Package Validation Capabilities in our own organization
Kevin Lee, Clinvia

As the use of R continues to expand in clinical trial programming, ensuring that R packages are validated for regulatory compliance, reproducibility, and traceability has become essential. This presentation provides a step-by-step blueprint for building robust R package validation capabilities within your organization. We will begin by clarifying what R and R packages are such as base and recommended packages maintained by the R Core Team to community-contributed and in-house packages. The session then focuses on why validation matters, emphasizing principles such as accuracy, reproducibility, and data integrity to meet FDA and EMA expectations for reliable, auditable results. A central part of the presentation introduces a risk-based validation framework that classifies packages according to their purpose, maintenance quality, community usage, and testing rigor. Tools like the R Validation Hub’s riskmetric and Sanofi’s risk.assessr packages will be demonstrated for risk scoring and assessment. We’ll also cover unit testing practices using testthat and show how to generate comprehensive validation reports including purpose, environment, documentation, dependencies, and testing results. Finally, the presentation outlines how to integrate validation into your computing environment through Installation (IQ), Operational (OQ), and Performance (PQ) Qualifications, ensuring traceable and reproducible results. Attendees will gain practical insights and best practices for establishing an internal, sustainable framework to confidently leverage R in regulated environments.

Study Data Integration & Analysis

SI-109 : Insights and Experience Sharing with Patient-Reported Outcome Data Analysis in FDA’s Submission
Jingyuan Chen, Genentech

Recent years have seen a growing interest on the inclusion of Patient-Reported Outcomes (PRO) data in confirmatory trials. This presentation will explore key insights from the FDA’s guidance on ‘Submitting Patient-Reported Outcome Data in Cancer Clinical Trials,’ with a focus on critical definitions and analytical approaches. We will share our experiences in addressing FDA information requests related to PRO data, including successful cases where PRO analyses contributed to labeling during NDA submissions. Additionally, we will outline the aligned strategies for PRO data analysis within our molecule program, highlighting their impact on future filing studies.

Tools, Tech & Innovation

TT-118 : Automated Quality Checks for SDTM and ADaM Datasets Using R Shiny
Shih-Che (Danny) Hsu, Pfizer
Wei Qian, Pfizer

In clinical trial analysis programming, SDTM and ADaM data sets are an essential component in the process of ensuring accuracy and reliability of clinical trial data reporting; managing a project that involves numerous SDTM and ADaM datasets along with their corresponding log files can be complex and resource-intensive. Identifying data quality issues- “such as inconsistent timestamps, missing values, or log errors- “often requires manual inspection across multiple files, which is time-consuming and prone to oversights. To address this challenge, we developed an R Shiny application that automates overall dataset quality checks and presents the results in a centralized, interactive dashboard. Users interact with the application by specifying a folder path containing SDTM and ADaM datasets along with associated log files. Upon submission, the application performs key validations including timestamp consistency checks, log issue detection, and structural integrity assessments across both SDTM and ADaM datasets. With a single click, programming leads are presented with a visualized report that highlights potential issues and can be filtered by dataset or domain and display detailed summaries for further investigation. This tool not only streamlines the review process but also enhances transparency, traceability, and reproducibility in clinical data workflows. By integrating R Shiny’s dynamic interface with robust and customizable back-end logic, the application empowers teams to proactively monitor data quality and reduce the risk of downstream reporting errors. This presentation will showcase the application’s design, core functionalities, and real-world impact on improving efficiency and accuracy in clinical programming review cycles.

TT-130 : My DIY Swiss Army Knife of SAS® Procedures: A Macro Approach of Forging with My Favorite PROCs
Jason Su, Daiichi Sankyo, Inc.

Here I take advantage of SAS® macro facility and forge these following four (4) extremely popular procedures into one (1) Swiss-Army knife (SAK)-styled macro %pfs (the acronyms): PROC PRINT, PROC CONTENTS (not in the acronym), PROC FREQ, and PROC SQL. Controlled by a mode-switch parameter (MODE), the macro can fashion out any one of the 4 procedures in a succinct version supporting popular options in various procedures, such as OBS, FIRSTOBS, WHERE, SHORT, VAR, etc. The new macro has the capacity to carry out my most-frequent jobs from such procedures, such as selectively printing some records from a dataset, displaying its data structure, quickly deriving certain variable frequency, counting certain variables, etc. Based on the sprit, fellow programmers are encouraged to create their own version of the macro %pfs. Upon being called with different modes, the SAK macro can perform any of the procedures, and immediately release the programmers from much of the repetitive syntax-typing work. Additionally, the functionalities of the tool can be expanded and many innovative capacities can be added such as performing fuzzy searching on ID variables, automatically saving the counting results into a macro variable for later use, so that macro can become smart and surprisingly powerful.

TT-132 : A fully automated PDF solution using SAS without third-party PDF tools
Zhongan Chen, Metsera

In clinical trials, statistical programmers often produce well-bookmarked PDF packages of TFLs for review or submission purposes. Traditional ways often include a manual process, using Adobe Acrobat, or other third-party PDF tools (Python, Libreoffice, Sejda etc.) to convert RTF to PDF and then combine PDF files into a package. They often come with extra cost in software licenses and could be time-consuming. This paper outlines a novel, free, fast and fully automated approach to generate bookmarked PDF packages with TOCs (Table of Content) using SAS. The only software needed other than SAS is Microsoft Word which should be available to most users already. No third-party PDF tools, such as Adobe Acrobat is needed in this process. Everything is done with SAS macros and is fully automated.

TT-139 : From ChatGPT to Copilot: Evolving AI Support in SAS and Beyond
Jyoti (Jo) Agarwal, Gilead Sciences

Building on insights from PharmaSUG 2025 Paper SI 294, which showcased ChatGPT as a transformative assistant for SAS programming workflows, this 2026 submission expands the conversation to a broader ecosystem of AI powered tools, focusing on GitHub Copilot and Microsoft Copilot and their practical applications in statistical programming. As the pharmaceutical industry moves beyond prompt-based experimentation, Copilot tools are redefining how programmers write, debug, and optimize code across SAS, R, Python, and other platforms. This paper presents real world use cases where Copilot enhances productivity in clinical trial programming, including automating repetitive tasks, generating documentation, and facilitating cross language translation. It highlights Copilot’s integration into development environments, enabling seamless code suggestions, intelligent error detection, and contextual learning tailored to clinical data standards. Through comparative analysis, the paper clarifies the distinct roles of ChatGPT, Microsoft Copilot, and GitHub Copilot: Prompt engineering remains central to maximizing AI utility, and this paper offers refined strategies for crafting effective prompts that yield accurate, reproducible, and audit ready outputs. It also addresses critical considerations such as data privacy, model bias, and validation protocols essential for deploying AI tools in regulated environments. By showcasing practical implementations and lessons learned from integrating Copilot into statistical programming pipelines, this paper provides a roadmap for programmers and organizations seeking to harness the next generation of AI tools. The future of clinical programming is not just about faster code, it is about smarter, safer, and more collaborative development powered by AI.

TT-147 : Bridging the Gap: Table-Driven SAS Programming as a Pathway to AI in Clinical Trials Statistical Programming
James Sun, Rocket Pharmaceticals

General-purpose Large Language Models (LLMs) demonstrate a robust, out-of-the-box knowledge of clinical trial process and industry data standards, handling tasks like SDTM data generating can produce decent SAS code. However, the challenge for genuine AI integration lies in adapting these models to the specific programming requirements while also consider the impacts of the highly specific, proprietary business rules unique to an individual company. This paper establishes table-driven programming as the critical pathway to overcome this barrier. By systematically externalizing hard-coded logic into clean, declarative metadata tables, this technique simplifies maintenance, enhances code flexibility, and, most importantly, creates the structured, high-quality “fuel” necessary for accurate LLM training. This modernized environment not only improves programming standards today but also enables LLMs to automatically generate logic specifications from complex regulatory text (e.g., FDA rules) and extract crucial metadata from documents like Protocols & SAP fully integrating AI with clinical trial statistical programming.

TT-156 : SAS to R: A Practical Bridge for Programmers
Jyoti (Jo) Agarwal, Gilead Sciences

As the analytics landscape evolves, R has emerged as a versatile and powerful tool for data analysis, visualization, and statistical modeling. While SAS remains a trusted-platform in many industries, R offers unique advantages, including flexibility, open-source accessibility, and a rich-ecosystem of packages for advanced analytics. For SAS programmers, transitioning to R can be challenging due to differences in syntax, data structures, and programming paradigms. Building on insights from the 2025 SI-95 paper on SAS programming efficiency, this paper provides a practical, hands-on bridge for SAS users entering the R environment. It guides readers through environment setup, console operations, variable assignments, and arithmetic and sequence operations, highlighting parallels and key differences with SAS. The discussion extends to R’s data structures: vectors, matrices, arrays, data frames, and lists; and explains data type coercion, helping users understand how R manages heterogeneous data. The paper emphasizes modern R workflows, including data wrangling with tidyverse functions, creating variables, handling missing data, and reverse coding, reflecting SAS data-step operations. Visualization is another key focus: barplots, histograms, boxplots, scatterplots, line-charts, and clustering dendrograms, with examples demonstrating how R enables more customizable, visually appealing, and interactive analyses compared to SAS. Finally, the paper highlights the use of R packages, and functional programming approaches that simplify complex workflows. Through step-by-step examples and real-world applications, SAS programmers gain actionable insights, enabling them to leverage R’s capabilities while building on existing SAS expertise. This paper serves as a comprehensive guide for users seeking to expand their data analysis toolkit beyond SAS.