Paper presentations are the heart of a PharmaSUG conference. Here is the current list of confirmed paper selections. Papers are organized into 14 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 14-Mar-2026.

Sections

AI in Pharma, Biotech and Clinical Data Science
Advanced Programming
Advanced Statistical Methods
Career Development, Leadership & Soft Skills
Data Standards Implementation (CDISC, SEND, ADaM, SDTM)
Data Visualization & Interactive Analytics
Emerging Technologies (R, Python, GitHub etc.)
Hands-On Training

PK/PD/ADA and Quantitative Pharmacology
Panel Discussion
Real-World Data (RWD) and Real-World Evidence (RWE)
Study Data Integration & Analysis
Submission Standards for Global Health Authorities
Tools, Tech & Innovation
ePosters

AI in Pharma, Biotech and Clinical Data Science

Paper No.	Author(s)	Paper Title (click for abstract)
AI-101	Sri Pavan Vemuri	How to Train Your Dragon – Embedding AI in Clinical Workflow. Illustrated through Oncology Swimmer Plots
AI-103	Mayank Singh	Data Without Borders: CDISC Data Hub for Multi-Language Clinical Analytics & AI
AI-123	Samiul Haque	Building a Model Context Protocol Server for AI-Driven Workflow Automation
AI-125	Pavan Kumar Tatikonda & Ryo Nakaya & Sravan Kongara	Enhancing ADaM Specification Validation and Generation of SAS Codes Using LLM through Amazon Bedrock: A Practical Framework
AI-135	Kevin Lee & Hohyun Lee	The Next Frontier of Statistical Programming: Vibe Coding with AI Coding Agents into SAS, R & Python
AI-141	Nattawit Pewngam & Chotika Chatgasem & Titipat Achakulvisut	Accelerating CDISC SEND Conversion with AI: From Raw Preclinical Data to Regulatory-Ready Datasets
AI-157	Louise Hadden	Tips and Considerations for Preparing Health Data for Efficient and Accurate AI and LLM Modelling
AI-164	Phil Bowsher	Agentic R in Clinical Trials: Empowering Statistical Programmers with Open Source LLM Packages & Positron Tools
AI-185	Xi Jiang & Scott McClain	From Molecular Subtypes to Bedside Decisions: An AI Approach for Personalized Critical Care Recommendations
AI-201	Jaime Yan & Jason Zhang	Eliminating QC Programming Duplication Through Claude AI-Assisted Independent Code Generation: A Practical Framework for Regulatory-Compliant Validation
AI-206	Siqi Wang & Toshio Kimura & Songgu Xie & Weiming Du	An Agentic AI Framework That Reads Statistical Analysis Plans and Generates TFL Table of Contents
AI-240	Saikrishnareddy Yengannagari	DoxySAS: An End-to-End AI-Powered SAS Documentation Pipeline
AI-259	Vihar Patel	AI-Augmented SDTM Review: A Practical Framework Enhanced by a Structured Prompt Library
AI-278	Tingting Zeng & Jia He & Yaohui Zhu & Shuang Gao & Jinling Li & Nana Yang	AI-Driven Intelligent Platforms for ADaM Specification and Code: Empowering Clinical Data Analysis
AI-305	Junze Zhang & Amy Zhang	Improving AI SAS-to-R Code Migration via an Intermediate Design Document Layer
AI-311	Vitaliya Lysenko & Gargee Vaidya & Lluis Gabarro	SpecToSAS: Translating ADaM Specs to SAS code using RAG
AI-316	Ryan Lafler	A Practical Roadmap for the 2026 Enterprise Generative AI Stack: AI Agent Architectures, Frameworks, and Secure Deployment
AI-328	Tyler Rowsell & Nandini Thampi & Dishank Jani & Hao Xu	Leveraging LLMs in Data Science Web Applications: Beyond the Chat Interface
AI-330	Devendra Toshniwal & Sanjay Koshatwar & Gopal Joshi	Eligibility Meets Intelligence: Using AI to Bridge Protocol Complexity and Patient Access
AI-332	Chunqiu Xia & Feiyang Du	A Human-in-the-Loop AI-Assisted Framework for ADaM Standardization
AI-339	Sachin Heerah & Darren Jeng	Automating Table Generation in Real-World Data Programming: An AI-Assisted Approach
AI-344	Sundaresh Sankaran & Sherrine Eid	Protocol Analysis, Optimisation and Generation: Artificial Intelligence Enables a Unified View
AI-345	Sherrine Eid	Beyond Today’s Evidence: How AI-Enabled RWE Will Transform Development Speed, Study Quality, and Real-World Impact
AI-349	Rohit Kadam & Saurabh Das & Niketan Panchal	AI-Powered Risk Management: Enhancing RBQM with Generative AI and Real-Time Insights
AI-352	Jianlin Li & Andy Shen	Practical Lessons in AI-Assisted Metadata Conversion: From Database Specs to EDC, SDTM, and P21 – ” Successes, Pitfalls, and Regulatory Considerations
AI-362	Kishore Reddy Rollakanti	Using Large Language Models to Validate TLF Outputs Against Statistical Review Comments: An End-to-End Python Framework
AI-363	Dushyant Rao	An AI Framework for CSR, using Microsoft CoPilot: Traceable Side by Side Efficacy Comparisons:
AI-420	Rostislav Markov	A specification-driven approach to improve the reliability of AI-generated SDTM transformation programs
AI-431	Lucas Liu & Bo Ci	AI-Powered Multiple-Agent Pipeline for Automating ADaM Dataset Generation
AI-432	Ajay Gupta & Misikir Tesfaye	Evaluation of Azure OpenAI ChatGPT API as Code Assistance Tools for Statistical Programming in SAS, R and Python.
AI-438	Lida Gharibvand	Operationalizing Generative AI in Regulated Analytics: Applied Implementation Patterns for Enterprise Deployment

Advanced Programming

Paper No.	Author(s)	Paper Title (click for abstract)
AP-011	Richann Watson	Take CoMmanD of Your Log: Using CMD to Check Your SAS® Program Logs
AP-108	Jennifer McGrogan & Mario Widel	The Problems Surrounding Rounding
AP-116	Chengxin Li	ARMed AutoTable Macro Agents: An ARM-Driven Framework for Automated Analysis Table Generation
AP-124	Bart Jablonski	SAS Packages – an Ask About Anything Game
AP-127	Keith Shusterman & Mario Widel	Implementing Laboratory Toxicity Grading for CTCAE Version 6 and Beyond
AP-128	Sharad Chhetri & Ryo Nakaya	Unleashing Open-Source Potential in SAS: The PharmaForest Ecosystem (proc pharmaforest data=open_source out=- 😉
AP-136	Stephen Sloan	2026 Efficiency Techniques in SAS 9.4
AP-146	Song Liu	AI-Assisted Modular Design for R Markdown Report Generation A Hybrid Architecture for Enhanced Maintainability and Cross-Study Scalability
AP-148	Lisa Mendez & Richann Watson	The Tipsy Hangover: Avoiding Indent Headaches in SAS Reports
AP-158	Jayanth Iyengar	Applications of PROC COMPARE to parallel programming and other projects
AP-159	Alice Cheng	Speeding Up Your Validation Process is As Easy As 1, 2 and 3
AP-208	Weiwei Guo	Programming Challenges in Developing PRO Analysis Datasets Under FDA’s New Submission Guidance
AP-211	Jaime Yan & Ming Yang	Schema-Preserving Generation of Clinical TLF Templates and Executable R Code via Iterative LLM-Guided Debugging
AP-220	Brendan Bartley & Megan Hinger	Do One Task, Get Another Done for Free: Use DEFX Tags in Comments to Fill Out Your define.xml
AP-222	Derek Morgan & Taiana Kazakova	Finding Macros Called Within A Directory of SAS® Programs
AP-228	Carleigh Crabtree	Timing, Masking, and Resolution: Understanding and Debugging SAS Macros
AP-233	Yi Guo	Detecting Abnormal Page Breaks Using Grayscale Pixel-Density Analysis in R Shiny
AP-237	Jayanth Iyengar	SAS® Programming techniques for efficiency and code optimization
AP-246	Brian Varney	Appreciating PROC SUMMARY/MEANS in Many NWAYS vs Summary Function in R
AP-251	Jim Box & Mary Dolegowski	Modernizing Clinical Research Analytics with Cloud- ‘Optimized SAS Procedures
AP-256	Ceng Qian & Xu Wen & Wei Lei & Ling Han	Advanced SAS Graph Template Language (GTL) with Practical Examples from Oncology Trials
AP-257	Qingwei Hu & Zhaoyu Xie & Ji Qi	Patient-Level, Dose-Stratified Swimmer Plots for Comparative Adverse Event Time-Course Assessment in Clinical Trials Using SAS® PROC TEMPLATE
AP-258	Vihar Patel	From Tables to Tolerances: The Evolving Role of Statistical Programmers in Risk-Based Quality Management (RBQM)
AP-267	Charu Shankar	The Log Whisperer: Still Reading SAS Logs? Start Ranking Them
AP-271	Charu Shankar	Stop Guessing. Start Matching. High-Impact SAS PRX Patterns in 20 minutes
AP-274	Shelby Taylor	Experience of an R Programmer Incorporating R in a SAS Studio Flow
AP-295	Zhongqing He & Man Li	User-defined Functions for Programming Population PK and PKPD Datasets
AP-313	Steve Black	Having Your Cake and Eating It Too: Automated Log Analysis Without Losing the SAS EG Log Window
AP-317	Nikhil Jadhav & Kanchan Gund	Analysis Unlocked: Demystifying Kaplan-Meier (KM) Through a 9-Day Fasting Challenge for Programmers and Beginners
AP-342	Ruth Rivera Barragan & Isaac Vazquez	Has your SAS being ‘MEAN’ to your data yet?
AP-375	John LaBore & Josh Horstman	One Word Can Make All the Difference(s): Strengthening Validation Practices with PROC COMPARE
AP-376	Miao Fu & Toshio Kimura	Advanced Parameterization Enables Advanced Statistical Programming
AP-418	Troy Hughes	Code Hard and Put away Wet: Replacing Hardcoded SAS® Software Quality Checks with Data-Driven Design and Defensive Programming Techniques That Validate Code and Control Data

Advanced Statistical Methods

Paper No.	Author(s)	Paper Title (click for abstract)
AS-160	Sumanjali Mangalarapu & Chuqing Chen & Anilkumar Anksapur	Oncology Solid Tumor Subcutaneous vs Intravenous Late-stage Study Analysis
AS-166	Liangwei He	Quantifying Expression Divergence to Identify Candidate RNA-Binding Proteins Modulating Nonsense-Mediated Decay Across Human Tissues Using t-SNE Embedding Analysis
AS-190	Yida Bao & Philippe Gaillard & Wei Yao & Zheng Zhang & Rui Wang	Implementing Dynamic Time Warping in SAS 9.4 Using PROC IML: An Alternative Approach for Time-Series Model Evaluation
AS-269	Lihui Deng & Kylie Fan & Xiaoting Qin	Tipping Point Analysis: An Illustration of Sensitivity Analysis on Non-Administrative Censoring for Progression-Free Survival (PFS)
AS-315	Leon Davoody	Machine Learning Models for Predicting Diabetes Using the PIMA Indians Dataset
AS-359	Sundaresh Sankaran & Pritesh Desai & Sherrine Eid	Beyond Imitation: Selecting Synthetic Data with Purpose and Precision
AS-365	James Austrow	A Faster Algorithm for the Finkelstein-Schoenfeld Test and Win Ratio in Hierarchical Composite Endpoint Analysis
AS-413	Bala Niharika Pillalamarri & Yayu Li & Hongbing Jin	Construction and Evaluation of External Controls Using Propensity Score Methods
AS-422	Prayag Shah	Linear versus Log- Log Confidence Intervals for Kaplan- Meier Survival Estimates: Statistical Rationale and Practical Implementation using SAS 9.4

Career Development, Leadership & Soft Skills

Paper No.	Author(s)	Paper Title (click for abstract)
LS-138	Zhen (Laura) Li	Perspectives on Leading Effectively in Platform Trials: Leadership and Technical Approaches
LS-149	Lisa Mendez	Authentic Leadership for SAS Programming Leaders
LS-173	Xiaohan Zou & Wei Shao & Yi Yan	From Programmer to Influencer: Strategic Leadership for Statistical Programmers in Clinical Development
LS-182	Kirk Lafler	Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
LS-264	Christine Reiff	Drowning in Trackers? Let R Be Your Lifeboat!
LS-284	Ingrid Shu & Xinhui Zhang	Early-Careers Essentials: Practical Checklists for Manual TLF Review
LS-319	Lida Gharibvand	Women Leadership in AI-Driven Clinical Programming: Navigating Intersectionality and Innovation
LS-347	Yuka Tanaka-Chambers	Closing the Expectation Gap: A Leadership Framework for Clinical Programming Success
LS-393	Kevin Lee	AI-First Leadership in Biometrics: Redefining Strategy, Teams and Execution in the Agentic AI Era
LS-407	John LaBore & Josh Horstman & Robert Goodloe	When Code Isn’t Enough: Communication and Leadership Skills for Statisticians and Programmers
LS-408	Steven Tan	Navigating Career Transitions: From Programmer to Executive Leader
LS-410	Steven Tan	Delegation as a Quality Strategy: Building Accountable Biometrics Teams
LS-411	Steven Tan	Leading with Presence from Afar: Building Trust and Engagement in Distributed Biometrics Teams
LS-424	Shivani Gupta	From Statistical Programmer to Analytical Partner: Navigating the Future of Biostatistics
LS-426	Anbu Damodaran	Strategic Change Management Framework: AI-Driven SOP Integration and Alignment Post-M&A
LS-441	Priscilla Gathoni	Strategic AI Coaching for Life Sciences: A Framework for Industry Leaders and Managers
LS-442	Ravi Tejeshwar Reddy Gaddameedi	Effective Collaboration Between Business, Technical and Quality Units in Building GCP Systems

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

Paper No.	Author(s)	Paper Title (click for abstract)
DS-111	Sunil Gupta & Tomás Sabat Stofsel	Ready for Next Level SDTMs and ADaMs Compliance with End-to-End Processing?
DS-131	Elizabeth Dennis & Grace Fawcett	CTCAE v6.0: The Good, the Bad, and the Ugly
DS-134	Sandra Minjoe & Mario Widel	Human Beings Still Needed: Manual ADaM Checks that AI Can’t Do
DS-144	Richann Watson & Karl Miller	Picking Up the Pieces: Implementation of (the forgotten) ADaM Naming Fragments
DS-154	Murali Kanakenahalli & Vamsi Kandimalla	Navigating the Statistical Programming Strategies for Cytokine Release Syndrome (CRS) and ICANS in Oncology Clinical Trials.
DS-168	Wenhao Dong	Automating ADaM Dataset Generation with Dynamic Variable Length Adjustment and Cross-Domain Consistency Checks
DS-174	Derek Morgan & Marckenley Mercie	ISO 8601 and SAS®- “and R! A Practical Approach
DS-196	Pankaj Attri & Matt Becker	Friction to Flow: LLM-Based Automation of Clinical Data Workflows
DS-217	David Bosak	Minimally Invasive Analysis Results: The “ards” Package
DS-265	Kristin Kelly	SDTMIG v4.0: Are You Ready For It?
DS-280	Hardik Sheth & Eldho Alias	From Manual to Automated”: An SAS® and R-Based Toolkit for Scalable SDTM Generation
DS-286	James Zhao & Joshua Lin	Approaches Integrating SAS LSAF and Pinnacle 21 Enterprise for SDTM/ADaM Dataset Validation
DS-341	Inka Leprince & Troy Hughes	DOSEON: Fuzzy Matching DOSE Date Intervals ON Analysis Dates Across SAS (PROC SQL, SAS Macro, DATA Step, and PROC FCMP)
DS-350	Alyssa Wittle & Brian Harris	Analysis Concepts role within the CDISC 360i vision
DS-354	Soumya Rajesh	SUPP to NSV: Transforming Data Representation for Improved Reviewer Utility
DS-360	Pallavi Sadhab	Improving Risk-Based QA in Outsourced Studies Using Cross-Domain ADaM Derivation Comparisons
DS-364	Youlan Shen & Leah Suttner	Two Approaches to Phase-Specific TRTEMFL in ADAE: A Neoadjuvant- Adjuvant Case with Surgery Between Phases
DS-368	Song Liu	Beyond WHODrug: Insight into Concomitant Medication Data Analysis
DS-372	Laura Williams & Andrea Gardani	Handling multiple screenings and multiple enrollments in SDTM: CDISC and FDA Guidance
DS-401	Bhavin Busa	Why Standards Matter More Than Code in the Age of GenAI
DS-423	Prasoon Sangwan	Intelligent Implementation of Data Standards: A New Era of Efficiency and Consistency
DS-440	Lynn Xiuling Zhang & Jacques Lanoue Lanoue & Ulf Nielsen	From Specification to SDTM at Speed: Deploying the SDTM Engine in Production

Data Visualization & Interactive Analytics

Paper No.	Author(s)	Paper Title (click for abstract)
DV-105	Chunting Zheng & Margaret Huang & Xindai Hu	A Standardized R Graph Library for Production-Ready Analysis Figures
DV-120	Ilya Krivelevich	Swimmer Plots – Some Practical Advice
DV-151	Louise Hadden	The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets
DV-155	Richann Watson & Louise Hadden	A Map to Success with Data Visualization Using ODS Statistical Graphics
DV-161	Rohit Kadam & Saurabh Das & Niketan Panchal	From Reactive to Proactive: Transformation of clinical trial monitoring through Agentic AI for Smarter, Safer Clinical Trials
DV-165	Phil Bowsher	Voice-driven Data Science: Real-Time Data Analysis with R
DV-181	Kirk Lafler	Ten Rules for Better Charts, Figures and Visuals
DV-188	Frances Gillespie & Laura Watson	Boston Breakthroughs: A Dashboard-Driven Approach to Metadata and Audit Trails with SAS Clinical Acceleration
DV-214	David Bosak	Introduction to Plotting with the PROCS Package
DV-229	Girish Kankipati & Bala Rajesh Jakka	Python for Survival Analysis: Kaplan-Meier and Reverse KM Plots Made Easy
DV-232	Yi Guo	AI-Recommended Color Palettes with QC for R Shiny Figures
DV-294	Jesse Pratt & Rayce Wiggins	Composite TLFs – A Combined Approach to Data Visualization
DV-298	Raghava Pamulapati	Dynamic Patient Profile Plot Development with SAS Graph Template Language
DV-304	Michelle Harwood & Austin Taylor	Path to Consistent Clinical Graphics: An R Shiny Gallery
DV-308	Chen Li & Hong Wang & Shu Chen & Xuan Jiang	Designing a modular and interactive visualization tool for DMC
DV-314	Leon Davoody	From Exploratory Data Analysis to Machine Learning – Continuing My Python Journey
DV-357	Mary Dolegowski & Robert Collins	Three Ways to Over-Engineer Your SAS Custom Steps
DV-371	Nishanth Chinthala	Interactive Safety Data Visualization Platform: Transforming Adverse Event Analysis Review Through Dynamic Dashboards in Clinical Trials
DV-382	Reneta Hermiz & Jing Ji & Amrit Pradhan & Martin Sandel	From Static to Dynamic: Leveraging R Shiny for Tumor Response Data Review
DV-383	Jun Yang & Yuying Jin	Closing the Loop: An Interactive R Shiny Dashboard for EDC Data Visualization and Real-Time Review Tracking
DV-409	LeRoy Bessler	Be a Multi-Media Wizard: Make Your Output Dance and Sing
DV-419	Troy Hughes	From Word Clouds to Phrase Clouds to Amaze Clouds: A Data-Driven Python Programming Solution To Building Configurable Taxonomies That Standardize, Categorize, and Visualize Phrase Frequency
DV-434	LeRoy Bessler	The Best Data Dashboard Alternative: More Efficient But Equally Effective Performance Monitoring and Reporting
DV-443	Neharika Sharma	From Static Outputs to Living the Data – A Visualization framework transforming Clinical Data into a Continuous Asset

Emerging Technologies (R, Python, GitHub etc.)

Paper No.	Author(s)	Paper Title (click for abstract)
ET-140	Kevin Lee	Unleash the R-volution: A Blueprint for Building Package Validation Capabilities in our own organization
ET-162	Phil Bowsher	Accelerating Open-Source AI with AWS Bedrock: Architecting LLM Integration with Posit Workbench & Positron
ET-163	Phil Bowsher	Current Review of Open Source in New Drug Applications: R & Python
ET-178	David Bosak	Easy Code Generation in R: The “macro” package
ET-179	Patel Mukesh & Nilesh Patel	Building ADaM Datasets from Scratch using R: A SAS programmer’s Perspective
ET-189	Danielle Stephenson & Audrey Chin & Madeleine Penniston	Breaking the Shell: Validated R Workflows To Meet FDA Standards
ET-192	Poornima Alavandi	Transitioning from SAS to R: Implementing Reproducible R Workflows for TLF Validation
ET-195	Vicky Yuan	Creating Reproducible Clinical Output with SASSY- Reporter Package
ET-197	Prabhakara Rao Burma & Latha Donapati	Automated Delta Detection: A Scalable R-Shiny Framework for Comparing Clinical Datasets
ET-205	Kirk Lafler & Ryan Lafler & Joshua Cook & Stephen Sloan & Anna Wade	The Open Source Advantage: Powering Innovation in the 21st Century
ET-207	Ryan Lafler & Miguel Bravo Martinez del Valle	Enhancing Your SAS Viya Workflows with Python: Integrating Python’s Open-Source Libraries with SAS using PROC PYTHON
ET-212	Sydney Hyde & Tamara Martin	A Noval Approach to Inter-Collaboration using IDEs and GitHub
ET-215	Chen Ling & David Bosak	Reproducing the SAS DATE and TIME formats with {fmtr} package in R
ET-218	Zheyuan Yu & Kirk Lafler & Zichun Gao & Jiaxin Xu & Zeqi Li & Ruochen Shao	Regression Analysis Made Easy Using R
ET-223	Ramesh Potluri	The Evolution of Open-Source Technologies in the Pharmaceutical Industry: Python as a Cost-Effective Solution for Clinical Statistical Programming
ET-252	Mydhili Chelikani & Ajay Kumar Tirkey	Reviewing and identifying issues in TFL macro parameter values with R shiny tool
ET-279	Madeleine Penniston & Alyssa Wittle	Goodbye SAS, Hello R: Practical Workflows for CDISC Standards
ET-283	Manish Bhagchandani	An Innovative solution for Interactive Dashboards Using Python Flask Framework for Clinical Data Analytics
ET-285	Gabriela Piasecki & Laura Frederick	mkheader: An R Package for Automated Generation and Management of Program Headers in Clinical Trial Programming
ET-288	Ishwar Singh Chouhan	Survival Analysis, K-M curve, Hazard Ratio and Data Visualization using R Programming: A comprehensive approach
ET-292	Indraneel Chakraborty	Engineering secure and reproducible R based clinical programming systems using open source DevSecOps practices.
ET-296	Huijun An & Blazej Neradilek & Chenchen Yu & Shannon Grant	Building for the Long Haul: Managing Scope, Refactoring, and CI/CD in Internal R Packages
ET-300	Shelby Taylor	Modern Data Science with SAS Viya Workbench: Unified Development with SAS, Python, and R
ET-301	Hardik Sheth & Roshan Stanly	R-Based Translation of Japanese Characters in Clinical Datasets for Regulatory Reporting
ET-327	Ryan Lafler	Building Better Data Science Workflows: Best Practices with Git, GitHub, and Data Version Control (DVC) for Effective Collaboration
ET-358	Lleyton Seymour	Automating Git Workflows in SAS with Git Functions
ET-370	Peng Zhang & Tai Xie & Peilin Zhou & Christine Matakovich	Challenges for small-mid size organization to build a GxP Compliant R Environment (CRE)
ET-374	Jing Yu	A Governed Git Workflow Using Azure Repos for GxP Compliant Statistical Programming
ET-378	Dickson Wanjau	AI-Enhanced R Shiny App for Real-Time Clinical TLF Coding and Preview
ET-381	Jun Yang & Robert Stemplinger	Bridging the Gap A Python-Word Integration for Detecting Ghost Page Breaks in SAS-Generated RTFs
ET-390	David Ward	A Practical Roadmap for Modernizing Legacy Clinical Applications
ET-397	Radhika Etikala & Valeria Duran	Trusting Your R Packages: A Practical, Risk-Based Approach to External Package Validation
ET-399	Sundaresh Sankaran & Mary Dolegowski	Chatting with Your Data, Wherever It Lives: Unlock Insights through Duck DB and Open File Formats
ET-412	David Franklin	An Experience Using R, SASSY and Tidyverse For Clinical Trial Analysis, From A SAS Programmer Perspective
ET-416	Steve Nicholas	It’s a Wonderful Lifecycle: Translating Statistical Programming into Modern Analytics Development

Hands-On Training

Paper No.	Author(s)	Paper Title (click for abstract)
HT-369	Sangeeta Shabadi & Nitesh Patil & Jonathan Henshaw	“Virtual Data, Real Standards – Leveraging Data Simulation for Smarter Clinical Trials”

PK/PD/ADA and Quantitative Pharmacology

Paper No.	Author(s)	Paper Title (click for abstract)
PK-216	Shweta Vadhavkar & Jing Su	Early Unblinding to Pharmacometrics (PMx) Data: Challenges, Practices, and Benefits
PK-219	Sridevi Balaraman	Dissecting PK/PD Data from Analysis to Submission: Breaking the Black Box for Efficient Programming
PK-234	Prasannanjaneyulu Narisetty	Visualizing PK and ADA Data at Scale: A Parameter-Driven SAS Macro for Box Plot Generation
PK-250	Diyu Yang & Sandeep Meesala	Navigating Early Career Challenges in PK/ADA Statistical Programming
PK-272	Qi Liu & Shweta Vadhavkar	An End-to-End R-Based Pharmacokinetics (PK) Workflow for Regulatory Submission: The INAVO120 Study
PK-276	Jeffrey Rathbun & Rebecca Humphrey	SAS and R for Expanding Data for Pharmacometrics Analysis (PMx) Analysis Data Sets
PK-282	Naveen Muppalla & Shibani Harite	From Chaos to Clarity: A Programmer’s Perspective on Standardizing Population Pharmacokinetic (PopPK) Data for Regulatory Success
PK-310	Jianli Ping & Karthik Sankepelli	Enhancing ADaM PK Datasets to Automate PK TFLs Generation
PK-334	Ashok Abburi & Rakhe Jacob & Shibani Harite	Two Paths, One PK Journey: The Art of Balancing ADPC and ADNCA
PK-361	Dheeraj Rupani & Srinivas Bachina & Kiran Kode	Early Restricted Unblinded PK Data Access (ERUPA): Framework to Accelerate PopPK and ERES Deliverables Through Data-Centric, Firewalled Workflows
PK-387	Prema Sukumar & Renuka Hegde & Erin Dombrowsky & Neelima Thanneer	A systematic approach for imputing missing dose information in population pharmacokinetic analysis datasets
PK-437	Anbu Damodaran	Bridging the Gap: The Strategic Evolution of Real-World Evidence in Clinical Pharmacology and Regulatory Decision-Making

Panel Discussion

Paper No.	Author(s)	Paper Title (click for abstract)
PN-236	Amy Gillespie & Daniel Schramek & Francis Kendall & Qin Li & Mariann Micsinai-Balan	The Impact of AI and Automation on Statistical Programming: Opportunities, Risks, and the Path Forward
PN-245	Sandra Minjoe & Nancy Brucken & Alyssa Wittle & Nate Freimark & Richann Watson & Paul Slagle & Tatiana Sotingco	ADaM Pet Peeves Part 2: More Things Programmers Do That Make Us Crazy
PN-446	Eunice Ndungu & Alice Cheng & John LaBore & Josh Horstman & Troy Hughes	Panel Discussion: QC & Validation: Beyond the Basics Participants: Authors of AP-159, AP-375, and AP-418

Real-World Data (RWD) and Real-World Evidence (RWE)

Paper No.	Author(s)	Paper Title (click for abstract)
RW-126	Lihai Song	Streamlining Workflows in Real-World Evidence Studies with an R-Based Automation Tool
RW-186	Xi Jiang & Scott McClain	Corticosteroids in severe COVID-19 across molecular endotypes and vaccination status: an Emulated Target Trial approach to benchmark to and extend upon findings from RECOVERY
RW-210	Li Liu	dbLoadTable: A Robust and Efficient Solution for Bulk Data Transfer in Real-World Evidence Analytics
RW-248	Shuo Cao & Venkat Rajagopal	Real World Data and CDISC – An Evolving Journey
RW-253	Laura Watson & Sherrine Eid	Embracing Novel Approaches to Automated Causal Inference Framework
RW-335	Sanjeev Kumar & Sanjay Koshatwar & Gopal Joshi	Engineering a Scalable Centralized Statistical Monitoring Engine: The Architecture of the Risk Assessment & Mitigation Platform (RAMP)
RW-340	Sanjeev Kumar & Gopal Joshi & Sanjay Koshatwar	Architecting a Unified Healthcare Data Lakehouse: Leveraging Spark, OMOP, and FHIR for Multi-Source Integration
RW-343	Sherrine Eid	From Automation to Evidence: Governing LLMs and AI Agents in Real-World Outcomes Research
RW-367	Ginger Barlow	PSMATCH: Propensity Score Matching of Clinical Trial Data with External Control Arms
RW-379	Shankar Srinivasan & Helen Guo & Yvonne Buttkewitz	Contrast Effects in Curated Observational Data
RW-384	Robert Collins	Assessing Quality of Real-World Data Sources
RW-429	Anbu Damodaran	Biostatistical Foundations 201: Privacy Preserving Patient Linkage Across Real World Data Sources
RW-435	Anbu Damodaran	Operationalizing Real World Data for External Control Arms: An End to End Framework for Rare Disease and Oncology Trials

Study Data Integration & Analysis

Paper No.	Author(s)	Paper Title (click for abstract)
SI-109	Jingyuan Chen	Insights and Experience Sharing with Patient-Reported Outcome Data Analysis in FDA’s Submission
SI-227	Ang Xu	A Statistical Programmer’s Guide to Tipping Point Analysis in SAS
SI-293	Nikita Joseph & Gayathri Mahadevan	From Data to Dossier: Lessons from Cross-Company Regulatory Submissions
SI-306	Ryan Hernandez-Cancela & Jeff Cheng & Sandeep Meesala	An Approach for Generating Tumor Biopsy Datasets for Drug Development
SI-348	Shankar Srinivasan & Yvonne Buttkewitz & Regina Uttenreuther & Baldeep Chani Talwar & Manjari Dissanayake	Algorithms to align the distribution of follow-up across independently collected cohorts when comparing time to event endpoints using conventional Kaplan Meier and Cox regression methods.
SI-353	Rohit Kadam & Saurabh Das & Niketan Panchal	Simplifying Clinical Site Oversight with GenAI Site Narratives: Transforming Raw Data into Actionable, Inspection-Ready Insights
SI-373	Isaac Vazquez & Jose Hernandez Rivero	From Chaos to Consistency: Standardizing External Clinical Data with Excel Power Query
SI-414	Bhargav Koduru & Kavitha Guddam & Santosh Reddy Lekkala	Harmonizing History: A Framework for Deriving Line of Therapy in Complex Integrated Summaries.

Submission Standards for Global Health Authorities

Paper No.	Author(s)	Paper Title (click for abstract)
SS-114	Sampath Madanu	Statistical Programmer’s role in using Lorenz eValidator to validate contributions to eCTD
SS-176	Priyanka Sawant	Continuous GxP: Implementing Change Control and Revalidation for a Living Open Source Library
SS-193	Tingting Tian & Chao Su & Erica Davis	Structure for Success: Delivering a Complex, Accelerated NDA with Evolving Scope
SS-209	Jeff Xia	Lessons to Guardrails: Operationalizing Early Checks for FDA Submission Readiness
SS-243	Srivathsa Ravikiran & Sri raghunadh Kakani & Yang Xu	Efficiency in Action: Automating Bookmarking for CRFs and Other Regulatory Submission Documents
SS-254	Liyuan Huang & Amanda Plaisted & Sreedhar Bodepudi	A Practical Approach to Multiple-Period CLINSITE Preparation
SS-255	Vikas Patil & Ramesh Sundaram	Investigator-Initiated Trials: Navigating Statistical Programming Challenges with Practical Solutions
SS-261	Pietro Belligoli & Constantin Weberpals & Yarhy Flores Lopez	Closing the Loop: Validating AI-Generated SDTM Mappings using CDISC CORE and Synthetic Data
SS-268	Yunyi Jiang & Christine Teng	Post-DBL Programming Update Tracker: Automating Revision Capture to Strengthen Audit Readiness, Oversight, and Compliance Across Programs and ADaM Specifications
SS-277	Vijaya Lakshmi Cherakam & Latha Donapati	Optimizing Clinical TFL Review with Python and Power BI: A Reproducible Workflow to Reduce QC Time and Improve Traceability
SS-307	Seiko Yamazaki	From Data Flood to Insight: Efficient SDTM Validation for High-Frequency Sources
SS-336	Devendra Toshniwal & Gopal Joshi & Sanjay Koshatwar	Beyond the First Draft: How Generative AI Is Redefining Medical Writing in Pharma

Tools, Tech & Innovation

Paper No.	Author(s)	Paper Title (click for abstract)
TT-102	Mayank Singh	Sync & Scale: Empowering Cloud Hub & Team Synergy with SAS Bridge
TT-106	Weishan Song & Wanting Jin & Weiyu Zhou & Margaret Huang	Automating Bioresearch Monitoring (BIMO) Listings Using R
TT-118	Shih-Che (Danny) Hsu & Wei Qian	Automated Quality Checks for SDTM and ADaM Datasets Using R Shiny
TT-121	Praneeth Adidela & Sagar Koona	R Package Management in LSAF: Challenges and Solutions
TT-130	Jason Su	My DIY Swiss Army Knife of SAS® Procedures: A Macro Approach of Forging with My Favorite PROCs
TT-132	Zhongan Chen	A fully automated PDF solution using SAS without third-party PDF tools
TT-139	Jyoti (Jo) Agarwal	From ChatGPT to Copilot: Evolving AI Support in SAS and Beyond
TT-147	James Sun	Bridging the Gap: Table-Driven SAS Programming as a Pathway to AI in Clinical Trials Statistical Programming
TT-152	Carlo Radovsky	aCRF In Full: A Complete Solution for Relatively Little Work
TT-156	Jyoti (Jo) Agarwal	SAS to R: A Practical Bridge for Programmers
TT-175	Ratheesh Gunda	Enhancing Quality and Efficiency in Clinical Programming with a Python-Based Automated File Comparison Tool
TT-204	Kevin Lee	From SAS Servers to AI Agentic SCEs: Integrating Agentic AI into GxP-Compliant Biometrics Workflows
TT-241	Saikrishnareddy Yengannagari	TOON Format: A Token-Efficient Data Exchange Solution for AI-Enhanced Clinical Programming
TT-263	Charu Shankar	Boolean Rhapsody: 50 shades of true Is this the real code? Is this just fantasy?
TT-275	Shelby Taylor	SAS for Microsoft 365: Integrating SAS Programs, Data, and Reports Across the Microsoft 365 Ecosystem
TT-309	Zhuo Chen	Global Macro for Master Tracker
TT-312	Steve Black	Trust but Verify: How ChatGPT and SAS Can Be Comrades!
TT-391	David Ward & Troy Wolfe	Taming Polyglot Analytics: Simplifying Cross-Language Workflows in a Unified IDE
TT-394	Zun Wang & Juntao Yan	Structure-Preserving Preprocessing of Clinical Documents for Large Language Model Analysis
TT-403	Bhavin Busa	From SAP to CSR: A Metadata-Driven TFL Workflow
TT-417	Troy Hughes	A Fantasy in Three with PROC FCMP: Memoization of Resource-Intensive Calculations, in-Memory Hash-Object Storage and Retrieval Operations, and Disk-Based Persistent Data Set Modification and Preservation
TT-439	Xinran Hu	From Suspicion to Evidence: Automating Character Truncation Risk Audits with a Parameterized SAS Macro and Review-Ready Excel Output

ePosters

Paper No.	Author(s)	Paper Title (click for abstract)
PO-007	Xianhua Zeng	DefinePageChecker: A Python Tool for Verifying Page Number Hyperlinks in Define.xml
PO-009	Xianhua Zeng	Word2PDF: A Python Tool for Converting and Merging Word or PDF Files into a Single PDF
PO-117	Shih-Che (Danny) Hsu	From Learner to Innovator: A Journey in R Empowered by AI to Enhance Narrative Review in Clinical Studies
PO-122	Yang Gao	Implementation of Quality Tolerance Limits in Statistical Programming
PO-143	Cindy Stroupe-Davis & Trevor Mankus & Tatiana Sotingco & Alyssa Wittle	Consolidation of CDISC ADaM
PO-187	Jose Hernandez Rivero & Ruth Rivera Barragan	An Alternative Option to Create XPT Files with a SAS Function
PO-194	Hong Wang & Shu Chen & Chen Li	Managing Unblinded Activities Internally: The Independent Statistical Analysis Team (iSAT) Model
PO-213	Michael Garside	%Compare_counts: A Macro for Speeding Up the QC Process When Proc Compare Slows it Down
PO-221	Himanshu Patel & Chintan Pandya	One Study, Many Regulators: Submission-Ready Data Package for Multi-Region Filings
PO-230	Carleigh Crabtree	Automating Character Variable Length Updates Using SAS Macros
PO-244	Haoran Li & Junruo Xia & Yixuan Zhang	Dynamic table generation for ongoing studies with unstable or changing cohorts
PO-266	Ming Yang & Ludmila Navolodskaya	A Modular SAS Macro Library for Clinical Trial Table Generation: From SAS Foundation to R-Ready Architecture
PO-270	Jumin Geng	SAS viya dynamic visualization of data
PO-273	Max Hu	Why SAS viya Speeds Up Analytics
PO-287	Joshna reddy Nimmala & Yu Feng	Maintaining a Multi-Lingual Code Inventory in Real-World Evidence
PO-331	Julie Ann Hood	Hero-in-the-Loop: A Super Squad Approach to SDTM Creation & Validation
PO-377	Srinivas Bachina & Dheeraj Rupani & Hasi Mondal & Kiran Kode	Pharmacometric Analysis Dataset Generation Process: Data, Roles & Tools from a programmers Perspective
PO-386	Praseeda Rajan & Renuka Hegde & Erin Dombrowsky & Neelima Thanneer & Yue Zhao	Case Study: Integrating ADPPK CDISC Standards into Pharmacometric Programming and Analysis Workflows
PO-392	David Ward	Running Python from SAS: A Practical Comparison of Available Approaches
PO-395	Hohyun Lee	Implementing AI Agent-Driven Tools to Accelerate Clinical Research Workflows
PO-406	Christina Scienski & Christine Rossin	Smarter, Faster, Better: GenAI- ‘Driven Authoring for Data Reviewer’s Guides
PO-415	Ashwini Yermal Shanbhogue	Traceability in Real World trials- just an aERD away.
PO-433	Ajay Gupta & Misikir Tesfaye	Enhancing CDISC Standards Implementation (SDTM and ADaM) with PROC FCMP, PROC IML and Macro Loop Integration in Oncology Clinical Trials.

Abstracts

AI in Pharma, Biotech and Clinical Data Science

AI-101 : How to Train Your Dragon – Embedding AI in Clinical Workflow. Illustrated through Oncology Swimmer Plots
Sri Pavan Vemuri, Regeneron

This paper introduces a framework for reliable AI-powered code generation, focused on oncology clinical trial swimmer plots. The architecture uses structured prompt engineering, validation checkpoints, and guardrails to reduce hallucinations and ensure reproducible, compliant outputs. While demonstrated with python, Oncology visualizations, the approach extends to statistical programming, and other domains requiring consistent AI results.This work represents the first step in a broader project that aims to cover all Oncology trial visualizations.

AI-103 : Data Without Borders: CDISC Data Hub for Multi-Language Clinical Analytics & AI
Mayank Singh, Johnson and Johnson MedTech (Neurovascular)

In the evolving landscape of clinical research, fragmented data environments hinder rapid insights, cross-study analysis, and regulatory compliance. This paper introduces a scalable and flexible approach for centralized clinical data repositories, designed to be agnostic to underlying relational database systems- “our implementation leverages Amazon (AWS) Redshift. The framework employs structured SDTM schemas and a dynamic approach for ADaM. Automated, language-agnostic ETL (Extraction, Transformation, Loading) pipelines facilitate seamless data access across SAS, R, Python, and SQL, supporting advanced analytics, machine learning, and meta-analyses. By transforming traditional study-specific storage into an integrated ecosystem, this framework addresses data inconsistency and silos, promotes collaboration among multidisciplinary teams, and ensures compliance with industry standards. The proposed solution empowers clinical organizations to accelerate scientific discovery, foster innovation, and adapt to evolving data standards- “paving the way for a future of truly borderless clinical data analytics.

AI-123 : Building a Model Context Protocol Server for AI-Driven Workflow Automation
Samiul Haque, SAS Institute

This paper introduces sastool-mcp, a lightweight Model Context Protocol (MCP) server that enables clinical programmers and statisticians to interact with SAS Viya through AI assistants such as Claude Desktop (Sonnet 4.5) and GitHub Copilot. Built using FastMCP and SASPy, the implementation exposes SAS capabilities as MCP tools-allowing AI systems to execute SAS programs, inspect libraries, debug errors, and iteratively refine code in real time. In this work, we demonstrate a fully customizable and open approach that clinical programmers and engineering teams can reuse to build their own MCP servers, integrating SAS with AI safely within enterprise environments. The design pattern is simple and extensible: define Python functions as MCP tools, route them through SASPy for SAS execution, and securely return LOG and ODS outputs. Teams can easily extend this template to add new AI-powered automation such as ADaM and SDTM validation tools, TFL generation routines, log quality checks, SAS macro helpers, CDISC compliance services, and MLOps workflows in Viya. By bridging modern AI orchestration with regulated SAS analytics, sastool-mcp provides a practical foundation for building AI-assisted clinical programming infrastructure that accelerates delivery, improves code quality, and enhances reproducibility- “while still preserving control, auditability, and compliance expectations.

AI-125 : Enhancing ADaM Specification Validation and Generation of SAS Codes Using LLM through Amazon Bedrock: A Practical Framework
Pavan Kumar Tatikonda, Takeda
Ryo Nakaya, Associate Director
Sravan Kongara, TAKEDA

Accurate and well-documented ADaM specifications are crucial for reliable clinical data analysis, yet their manual creation and validation remain time-consuming, subjective, and error-prone. As clinical data complexity grows and regulatory standards change, there is an increasing demand for intelligent, scalable tools that help programmers validate derivation logic and maintain programming consistency. This paper introduces a practical GenAI-powered framework that automates the review and improvement of ADaM programming specifications using Claude through Amazon Bedrock. The framework reads ADaM specifications from an Excel source, queries the large language model to assess derivation logic, and provides structured validation feedback along with SAS code suggestions. Designed to emulate the output standards of traditional SAS workflows, the tool also generates a complete SAS program(.sas) for direct testing and logs results in a log file (.log) to enhance auditability. The approach balances flexibility and automation- “handling real-world variation in specification formats while ensuring traceability and reproducibility. This paper outlines the technical design, prompt engineering strategies, and error-handling techniques developed to incorporate large language model (LLM) capabilities into the validation workflow. Lessons learned from practical application are shared, highlighting both opportunities and limitations of using Generative AI in clinical programming. By connecting statistical programming and GenAI, this work provides an early glimpse into how modern tools can improve the quality, consistency, and efficiency of clinical deliverables in a regulated setting.

AI-135 : The Next Frontier of Statistical Programming: Vibe Coding with AI Coding Agents into SAS, R & Python
Kevin Lee, Clinvia
Hohyun Lee, Clinvia

The rise of Gen AI is revolutionizing how coding is performed across industry, and “vibe coding” stands at the forefront of this transformation. Coined from Andrej Karpathy’s idea of “embracing the vibes” of AI-assisted coding, vibe coding represents a seamless flow between human logic and AI coding agents- “where programmers prompt, review, and collaborate with AI to produce code efficiently and intelligently. As an example, Microsoft CEO Satya Nadella estimated in April 2025 that 20% to 30% of Microsoft’s code was generated by AI. The presentation explores how vibe coding will reshape programming in Biometrics, where SAS, R, and Python remain essential. It illustrates how conversational AI tools (ChatGPT, Gemini, Claude), AI-native IDEs (Cursor, Windsurf, GitHub Copilot) and customized AI Coding Agents and Agentic workflow enhance traditional workflow – automating coding, debugging, and validation processes while preserving scientific and regulatory rigor. The talk introduces customized Vibe Coding Agents and Agentic workflow built for Biometrics, integrating CDISC, ADaM, and TLF standards with GxP-compliant validation frameworks. Real-world examples demonstrate how these systems accelerate programming cycles by 25- 40%, improve documentation, and lower technical barriers across languages in SAS, R and Python. While the benefits are profound such as productivity gains, democratization of coding, and cross-functional collaboration, the presentation also addresses risks such as AI hallucination, compliance, and over-reliance without human oversight. Finally, it offers a forward-looking view of the “AI-augmented biometrics team,” where statistical programmers evolve into AI managers and collaborators, driving innovation and quality in clinical research and development.

AI-141 : Accelerating CDISC SEND Conversion with AI: From Raw Preclinical Data to Regulatory-Ready Datasets
Nattawit Pewngam, Ravis technology
Chotika Chatgasem, Ravis Technology
Titipat Achakulvisut, Department of Biomedical Engineering, Mahidol University

The Standard for Exchange of Nonclinical Data (SEND), developed by the Clinical Data Interchange Standards Consortium (CDISC), defines the standardized structure and format for submitting nonclinical study data to regulatory authorities. Converting extensive unstructured study materials, often consisting of reports, tables, and scanned documents, into SEND-compliant data sets remains a manual, error-prone, and time-consuming process that relies on repetitive data entry. This inefficiency reduces consistency, traceability, and overall regulatory readiness. Here, we introduce CDISC-SEND Conversion platform, an automated framework that integrates large language models (LLMs) with retrieval-augmented generation (RAG), to streamline and standardize this data transformation. Our platform rapidly normalizes and maps unstructured study content into SEND structures defined in the SEND Implementation Guide (SENDIG v3.1.1). Controlled terminology and sponsor metadata are retrieved dynamically to produce traceable, auditable, and standards-compliant mappings that demonstrate conformance and regulatory alignment. An expert review stage enables human validation and ensures accuracy before final data set approval. Results show that the workflow reduces preparation time from several weeks to less than a day while improving data consistency, and strengthening the key quality dimensions of completeness, structure, conformance, and format. Although originally developed for nonclinical SEND, the same architecture extends to clinical Study Data Tabulation Model (SDTM), providing a scalable and regulatory-aligned framework for AI-driven data standardization.

AI-157 : Tips and Considerations for Preparing Health Data for Efficient and Accurate AI and LLM Modelling
Louise Hadden, Cormac Corporation

Large language models (LLMs) and AI systems present new opportunities for pharmaceutical research and clinical insight generation from data derived from electronic medical record systems (EMRs). However, health-related data- “especially data streams with open-ended narratives and diverse coding systems- “requires rigorous preparation for use in compliant, accurate, and efficient AI workflows. This paper outlines a practical framework for preparing transcoded EMR data for AI models and LLM use within health analytics pipelines. Topics include data cleaning, normalization, de-identification, prompt engineering, and iterative refinement cycles. Two use cases are explored: (1) detecting behavioral health issues from free-text ‘other specify’ fields, and (2) linking disparate medical coding systems (SNOMED, ICD-10, Common Formats, FHIR/HL7, etc.) Implementations using SAS, AWS AI tools, and open source software are discussed.

AI-164 : Agentic R in Clinical Trials: Empowering Statistical Programmers with Open Source LLM Packages & Positron Tools
Phil Bowsher, RStudio Inc.

This talk will review a new generation of Open Source R packages that enable statistical programmers and data scientists to move beyond basic code generation and build reliable, agentic LLM workflows. This talk introduces a suite of cutting-edge R tools (chores, predictive, btw, gander, mcptools etc.) that are driving R and LLM interfacing. We will demonstrate how these open source packages integrate with R. Positron Assistant and Databot will be reviewed and how open source packages provide additional support like Model Context Protocol (MCP)-driven governance. The suite of tools help teams in clinical trials accelerate LLM innovation while driving R-centric workflows used often in GxP-regulated development.

AI-185 : From Molecular Subtypes to Bedside Decisions: An AI Approach for Personalized Critical Care Recommendations
Xi Jiang, SAS Institute
Scott McClain, SAS Institute

The COVID-19 pandemic highlighted the limitations of a one-size-fits-all approach for viral illnesses, which affect patients differently. Personalized treatment strategies are critical, but in ICU settings, decisions must often be made within hours, making real-time personalization challenging. While genetic data can guide therapy in some settings, it is rarely available in the clinical setting. Machine learning (ML) offers an opportunity by leveraging large clinical datasets and real-world expertise to provide individualized, potentially lifesaving treatment recommendations. We developed an AI-powered framework for personalized therapy in three steps. First, we identified two molecular subgroups of severe COVID-19 patients predicted to respond differently to therapies, patterns that likely extend to other critically ill patients with acute lung injury. Second, we created an analytic template to clean, process, and transform real-world electronic health record (EHR) data into model-ready features, integrating clinical expertise throughout. Third, we validated a predictive model capable of real-time patient stratification using EHR data alone, which was evaluated and interpreted through a combination of data science and clinical insights. By integrating AI with EHR data in a clinically informed manner, this approach translates complex molecular information into actionable insights quickly, offering a practical strategy to reproducibly improve patient outcomes and pandemic readiness for infectious diseases.

AI-201 : Eliminating QC Programming Duplication Through Claude AI-Assisted Independent Code Generation: A Practical Framework for Regulatory-Compliant Validation
Jaime Yan, Merck
Jason Zhang, 126 E Lincoln Ave

Quality Control (QC) programming in clinical trials requires independent recreation of production programs, resulting in substantial code duplication and inefficient use of resources. Traditional QC approaches require validation programmers to reimplement identical logic without referencing production code, increasing effort without improving independence. This paper presents a practical framework that uses an AI assistant deployed on company-managed AWS infrastructure to generate independent QC programs directly from specifications while maintaining regulatory compliance and reviewer independence. The framework applies to a separated AI workflow to ensure true independence: one AI instance supports production programming, while a separate instance generates QC code using specifications only, with no access to production implementation details. Integration with automated, version-controlled processes enables QC code generation triggered by controlled updates, ensuring that each production program has corresponding independent validation code. The validation process includes AI-generated QC code, programmer review against specifications, and standard output comparison to identify discrepancies. The framework was implemented across multiple Phase 2 and Phase 3 clinical studies and demonstrated meaningful reductions in QC programming time while preserving full discrepancy detection capability. Key success factors include well-structured specifications, validation of AI-generated code by qualified programmers, and documentation of the QC process in study records. The AWS-hosted deployment ensures secure handling of study artifacts with complete audit trails and compliance with corporate data security policies. The paper also provides implementation templates, prompt libraries, validation checklists, and integration guidance to support adoption across clinical programming teams.

AI-206 : An Agentic AI Framework That Reads Statistical Analysis Plans and Generates TFL Table of Contents
Siqi Wang, Arcsine Analytics
Toshio Kimura, Arcsine Analytics
Songgu Xie, Regeneron Pharmaceuticals
Weiming Du, Alnylam Pharmaceuticals

What if AI can read and interpret a Statistical Analysis Plan (SAP) and learn about endpoints, analysis sets, and analysis methods? Can AI then generate the TFL table of contents (TOC)? We developed an AI-based solution that automatically generates a TFL TOC by reading and processing the SAP. The solution is implemented using Python and large language models (ChatGPT). After the SAP is uploaded, AI agents systematically process the document, extract structured metadata, and generate a complete TFL TOC. The solution consists of three main stages. First, the SAP is preprocessed to create an AI-ready document suitable for large-context analysis. Next, specialized AI agents collect structured metadata from the SAP, including endpoints, analysis populations, and analysis methods. Finally, the collected metadata is used by the AI system to generate the TFL TOC in a consistent manner. We will demonstrate the AI solution and its ability to generate a TFL TOC directly from an SAP. Key learnings from this work include the need for scalable strategies when processing large-context documents such as SAPs, the effectiveness of specialized AI agents for targeted metadata extraction, and the importance of combining AI with traditional programming approaches to achieve robust and reliable results. This presentation is intended for statisticians, statistical programmers, and data scientists involved in clinical trial analysis and reporting, with a familiarity with SAPs and TFL deliverables. If AI can generate the TFL TOC, this work invites discussion on what other downstream clinical trial processes may be automated using similar approaches.

AI-240 : DoxySAS: An End-to-End AI-Powered SAS Documentation Pipeline
Saikrishnareddy Yengannagari, BMS

What if your SAS macro library could document itself and answer programmer questions? DoxySAS makes this possible by combining AI-powered documentation generation with interactive chat capabilities, revolutionizing how pharmaceutical organizations manage macro documentation. DoxySAS is a web-based tool that analyzes SAS macro code using Azure OpenAI to automatically generate comprehensive Doxygen-compatible headers including descriptions, parameter documentation, usage examples, and called macro references. The tool intelligently distinguishes positional from keyword parameters, processes bulk uploads simultaneously, and produces consistent, professional documentation in seconds rather than hours. The complete pipeline transforms raw macros into searchable HTML documentation deployed via RStudio Connect, with AI chat injection enabling programmers to interactively query the macro library. This paper presents the architecture, implementation challenges, and real-world results from deploying DoxySAS across an enterprise SAS macro repository, demonstrating 95% reduction in documentation time while significantly improving accessibility and usability.

AI-259 : AI-Augmented SDTM Review: A Practical Framework Enhanced by a Structured Prompt Library
Vihar Patel, PPD, part of Thermo Fisher Scientific

SDTM review is foundational to clinical trial data quality and regulatory readiness. Historically, this has been a manual and time-intensive process involving extensive cross-checking of CRFs, mapping specifications, the Protocol, SAP, and SDTM datasets themselves. As the volume and complexity of clinical data grow, conventional review approaches struggle to scale effectively. Recent advancements in AI, particularly Large Language Models (LLMs), offer a new paradigm: AI-Augmented SDTM Review, where AI assists by performing structured data checks, semantic reasoning, document alignment, and anomaly detection while human programmers retain responsibility for adjudication and interpretation. This aligns with emerging regulatory expectations on responsible AI use, which emphasize human-centric design, clear context of use, transparency, and governance when AI supports data used in regulatory decision-making. A key innovation in this paper is a curated library of more than 10 specialized AI prompts designed to guide review workflows, including CRF reconciliation, protocol-window validation, safety logic checks, derivation readiness, and submission-readiness assessment. The proposed framework is positioned as a pre-rule-based screening and decision-support tool that enhances data quality early in the review process and complements deterministic validation tools such as Pinnacle 21. Human oversight remains essential to resolve ambiguity, manage edge cases, and ensure reproducibility. This paper presents a practical, implementable workflow supported by detailed fabricated examples, diagrams, expanded AI prompts, evaluation comparisons, and both SAS and R pseudocode illustrating how responsible AI integration can improve SDTM review efficiency, accuracy, and regulatory confidence.

AI-278 : AI-Driven Intelligent Platforms for ADaM Specification and Code: Empowering Clinical Data Analysis
Tingting Zeng, BeOne Medicines
Jia He, BeOne Medicines
Yaohui Zhu, BeOne Medicines
Shuang Gao, BeOne Medicines
Jinling Li, BeOne Medicines
Nana Yang, BeiGene

Traditional ADaM implementations encounter critical challenges, including excessive complexity, time constraints, poor reusability, lack of version control, and complicated permission management. To address challenges, we have developed two innovative AI-driven platforms: SpecMaster (ADaM Specification Management) and ACIRA (ADaM Code Intelligence for Rapid Analysis), which transform clinical data analysis workflows through: Centralized Resource Repository: Repository with multi-dimensional spec and code pools enhances standardization and reusability, featuring optimized AI prompts for improved accuracy and metadata databases for traceability. Multi-Agent Collaboration for Intelligent Spec and Code Generation: Spec Creator parses documents and generates reliable specifications with human confirmation. Code Generator analyzes project requirements and produces optimized code snippets by utilizing resource libraries strategically. Reviewer identifies code defects with configurable inspection strategies, provides repair solutions to ensure quality and integration readiness. Additionally, advanced agents are being developed: Debugger executes code, diagnoses issues, and optimizes rules while enabling human-AI collaboration for complex scenarios; Reflector monitors modifications and optimizes prompts based on user feedback. Metadata Flow Intelligence: Enable visualization of entire metadata flow from CRF to SDTM, ADaM, and TFL, facilitating identification of upstream changes and impact for AI-assisted decision-making and dynamic dataset generation based on analytical scenarios. Visual Human-machine Interactive Interface: Feature intelligent rule generation system for real-time credibility monitoring and risk logic confirmation, collaborative coding workspace for code generation and debugging, and metadata flow and project monitoring for enhanced team collaboration, ensuring flexibility, efficiency, and quality at critical checkpoints.

AI-305 : Improving AI SAS-to-R Code Migration via an Intermediate Design Document Layer
Junze Zhang, Merck Co., Inc
Amy Zhang, Merck & Co.

The pharmaceutical industry is increasingly migrating clinical programming workflows from SAS to R. SAS and R have resemblance with one another and perform similar functions, but they also differ in some fundamental ways. “Direct” AI translation from SAS to R often reverts to block-by-block literal conversion. This can produce non-idiomatic R code, higher maintenance burden, and subtle logic defects, especially when SAS programs rely on sort-dependent BY-group semantics (e.g., first./last., retain, and merge) and SAS-specific missing-value behavior. We propose a design-first, two-step workflow that inserts an intermediate design document layer between SAS and R. First, a large language model (LLM) analyzes SAS programs and generates a structured design document capturing purpose, inputs, outputs, transformation steps, and key assumptions (e.g., ordering requirements for baseline/last-value derivations, join keys and expected cardinality, missing-value handling, deduplication rules, and validation checks). A high-level human review, rather than in-depth line-by-line checks, refines the design to ensure requirement alignment. Then, the LLM generates R code directly from the reviewed design document, enabling a vernacular-driven implementation that is more idiomatic, modular, and testable. We demonstrate this approach using a SDTM-to-ADaM derivation example and compare its results to those from a direct translation. Outcomes are evaluated by output alignment with SAS, number of iteration cycles, and maintainability indicators. This intermediate design layer decouples understanding from implementation, reduces literal-translation artifacts, and improves migration reliability for regulated clinical deliverables.

AI-311 : SpecToSAS: Translating ADaM Specs to SAS code using RAG
Vitaliya Lysenko, AstraZeneca
Gargee Vaidya, AstraZeneca
Lluis Gabarro, AstraZeneca

Creating ADaM datasets requires translating detailed clinical specifications into SAS code. These specifications typically define over a hundred variables with multi-step derivations and dataset-specific rules. Writing code from scratch, or even relying on reference code, would take time, and inconsistencies can easily emerge, making the code difficult to understand, maintain, and verify. This exposes the code to higher chances of misinterpretation, extended team review, and reduced time for programmers to work on higher priority tasks. SpecToSAS is the conversion of Excel-based ADaM specification into SAS programs through an AI-assisted approach. It blends Python automation with a multi-agent AI framework to cover both simple mappings and complex derivations, while preserving transparency and human oversight. Using retrieval-augmented generation (RAG), SpecToSAS implements internal companies’ code repositories, so the output follows established standards and reuses reference programs and macros. In early use across 14 safety ADaM datasets oncology trials, SpecToSAS correctly programmed 75- 95% of variables. Code generation took 5- 15 minutes, followed by up to 1 hour of set-up time and human review to make corrections and reach 100% accuracy. Although human validation remains essential, the system substantially cuts initial coding effort, letting programmers focus on quality checks and tackling complex issues rather than repetitive tasks.

AI-316 : A Practical Roadmap for the 2026 Enterprise Generative AI Stack: AI Agent Architectures, Frameworks, and Secure Deployment
Ryan Lafler, Premier Analytics Consulting, LLC

Generative AI is rapidly reshaping how organizations search, reason over, and contextualize information, yet deploying these systems on private, confidential, and sensitive data introduces distinct architectural, performance, and governance challenges. This paper presents a practical roadmap for the 2026 enterprise generative AI stack, with emphasis on agentic AI architectures built on retrieval-augmented generation (RAG), vector embeddings, and secure model deployment. Core concepts including encoding, similarity search, and vector databases are introduced to explain how knowledge is stored, retrieved, and reused across AI agents, alongside API-driven patterns that enable tools for search, reasoning, and contextualization. The discussion contrasts proprietary large language models (LLMs) and open-source small language models (SLMs), highlighting trade-offs in output quality, performance, and deployment, while examining how ecosystems such as Hugging Face support localized inference and domain-specific adaptation. Implementation considerations are presented primarily in Python, with extensions to R and SAS® Viya®, focusing on secure, reproducible analytics workflows. This paper concludes with prompt engineering strategies and ethical considerations for responsible generative AI use on sensitive enterprise data in the life sciences, pharmaceutical, and healthcare domains.

AI-328 : Leveraging LLMs in Data Science Web Applications: Beyond the Chat Interface
Tyler Rowsell, AstraZeneca
Nandini Thampi, AstraZeneca
Dishank Jani, AstraZeneca Inc.
Hao Xu, astrazeneca

Large Language Models (LLMs) have become ubiquitous in modern web applications, yet their implementation is often understood to be limited to conversational chat interfaces. This narrow consideration overlooks the transformative potential of LLMs as backend analytical services that can power data science workflows through strategic prompt engineering and response parsing. This paper explores architectural patterns for embedding LLM capabilities into analytical web applications, where the model operates as a backend service rather than solely a user-facing chatbot. We demonstrate how structured prompt engineering, combining explicit output schemas, delimiter-based serialization, and multi-stage API orchestration, enables LLMs to perform complex subroutines. As representative use cases, we present this approach through two complementary AI capabilities within an R Shiny application. First, we implement semantic annotation detection where an LLM parses TLF shells to extract population definitions, analysis subsets, formatting constraints, transforming unstructured specifications into structured metadata. Second, we leverage this metadata to power interactive code generation with iterative refinement, where the LLM generates executable R code, tests it against live data, and autonomously corrects errors through multi-iteration debugging cycles. These examples demonstrate that LLMs can serve as programmable reasoning engines when their outputs are constrained through careful prompt design. By treating LLM responses as structured data rather than conversational text, developers can create more powerful, context-aware applications that leverage natural language understanding while maintaining the reliability and predictability required for data science workflows.

AI-330 : Eligibility Meets Intelligence: Using AI to Bridge Protocol Complexity and Patient Access
Devendra Toshniwal, Circulants INC.
Sanjay Koshatwar, Circulants
Gopal Joshi, Senior Scientist

Patient enrollment continues to pose significant operational and financial challenges in clinical trials, with recruitment delays affecting approximately 80% of studies and contributing to premature terminations. A primary driver is the manual, often inconsistent interpretation of complex protocol inclusion and exclusion criteria- “particularly those involving nuanced clinical judgment and unstructured free-text elements- “that traditional rule-based systems struggle to operationalize at scale. This paper presents a pragmatic, GxP-compliant framework for AI-assisted eligibility screening, harnessing Generative AI (GenAI) and Retrieval-Augmented Generation (RAG) techniques to augment clinical teams. By integrating protocol documents with clinical data sources (e.g., Electronic Data Capture systems, EHRs, or SDTM-structured datasets), the framework enables automated pre-screening, structured logic derivation from natural language criteria, consistent eligibility evaluation, and transparent documentation of screen-failure rationales. Positioned explicitly as a decision-support tool, GenAI incorporates human-in-the-loop oversight to preserve clinical accountability and regulatory integrity. Key implementation elements include risk-based validation, controlled prompt engineering, bidirectional traceability to source data, bias mitigation strategies, and robust governance aligned with emerging FDA guidance on AI in drug development. Drawing on recent real-world evidence- “such as randomized trials demonstrating nearly doubled enrollment rates and 24-50% improvements in eligible patient identification- “we illustrate quantifiable gains in recruitment efficiency, site-level consistency, and operational burden reduction.

AI-332 : A Human-in-the-Loop AI-Assisted Framework for ADaM Standardization
Chunqiu Xia, Merck & Co., Inc.
Feiyang Du, Merck & Co.

Standardized ADaM (Analysis Data Model) implementations play a critical role in ensuring consistency, traceability, and regulatory compliance in clinical trial analysis. When organizations acquire clinical studies from external companies, statistical programming teams often receive ADaM SAS code and specifications developed under different programming standards and conventions. These externally developed deliverables typically require substantial manual rework before teams can integrate them into internal workflows and reuse standard Tables, Listings, and Figures (TLF) SAS programs. This paper explores a human-controlled, template-driven framework that integrates large language model assistance, via the LLM API, into the ADaM standardization workflow for acquired studies. The framework treats internal ADaM specifications and SAS template programs as the authoritative reference and supports the transformation of both non-standard ADaM SAS code and specifications into internal standard formats. The approach focuses on structural alignment, derivation normalization, and specification reconciliation under programmer oversight rather than fully automated dataset generation, with collaboration between statistical programmers and statisticians to review and confirm derivation logic where data collection methods or internal standard interpretations differ. By producing standardized ADaM datasets aligned with internal templates, the proposed framework enables efficient reuse of standard TLF programs, improves consistency across acquired studies, reduces study-specific rework, and maintains regulatory expectations for compliant and traceable ADaM deliverables.

AI-339 : Automating Table Generation in Real-World Data Programming: An AI-Assisted Approach
Sachin Heerah, Pfizer
Darren Jeng, Pfizer

Real-world data (RWD) programming in pharmaceutical research requires standardized, reproducible generation of statistical tables for regulatory submissions and outcomes research. The Pfizer RWD programming teams developed a structured framework for creating consistently formatted tables. While this framework standardizes workflows, programmers still manually construct complex function calls for table generation. This paper presents voxr_oroutput(), an AI-assisted function that leverages large language models (LLMs) and retrieval-augmented generation (RAG) to automate R code generation from natural language specifications. By embedding comprehensive documentation in a vector database, the system constrains LLM outputs to validated functions, limiting hallucination while maintaining organizational standards. The function generates complete R scripts with project metadata, database connections, table generation code, and preserved natural language queries for auditability. Human-in-the-loop validation requires programmers to explicitly approve generated code before execution, ensuring quality control. This implementation demonstrates how constrained AI assistance can enhance RWD programming efficiency while maintaining the transparency, rigor, and human oversight essential for pharmaceutical research.

AI-344 : Protocol Analysis, Optimisation and Generation: Artificial Intelligence Enables a Unified View
Sundaresh Sankaran, SAS Institute
Sherrine Eid, SAS Institute

A successful clinical trial maximises multiple objectives defined by relevant primary and secondary endpoints, enrollment and retention, statistically significant results and rigorous evaluation. To design a comprehensive, executable and data-driven protocol, inputs are sourced from clinicians, epidemiologists, statisticians, programmers, and compliance officers from a protocol review board to ensure that all requirements are met. These activities involve significant financial and quality costs. Each step of the way is a potential introduction for quantifiable error. Existing solutions tend to be siloed and only address one task at a time. We propose a unified approach to minimise errors and inefficiences in clinical trials through an end-to-end solution for protocol generation which allows seamless handoffs. The solution uses Artificial Intelligence (AI) and Agentic workflows on SAS Viya and Retrieval Agent Manager (RAM) to automate and optimize repeated tasks for data ingestion,search and retrieval, benchmarking, simulation and generation. Agents are capable of reflection and automatically retrieve information, trigger simulation and generate consumable insights. Controlled by humans in the loop, results are standardised and reviewed and enable generation of a draft protocol with maximum first-pass yield. Output drafts are editable and can be collaboratively reviewed prior to finalisation. This session gives you a recipe for automated and efficient, data-driven protocol design and generation. It suggests a Generative AI framework as an enabler to increase your productivity and unlock time and bandwidth for high-value tasks.

AI-345 : Beyond Today’s Evidence: How AI-Enabled RWE Will Transform Development Speed, Study Quality, and Real-World Impact
Sherrine Eid, SAS Institute

Real-world evidence (RWE) has evolved from a supplementary data source to a foundational pillar of modern drug development, regulatory decision-making, and value demonstration. Global regulators- “including the FDA, the EMA, and PMDA- “now formally recognize RWE as a critical component of integrated evidence packages supporting approvals, label expansions, safety evaluations, and post-market commitments. As adoption accelerates, the life sciences industry faces a defining challenge: how to deliver faster, richer insights without compromising scientific rigor, transparency, or trust. We examine how AI-enabled analytics are reshaping evidence generation across the product lifecycle. Grounded in regulatory guidance expanding use of RWD, the session positions RWE as a governed complement to RCTs that strengthens development decisions. The next era of RWE is driven by the convergence of multimodal data, advanced algorithms, and scalable compute. Electronic health records, claims, registries, imaging, genomics, and unstructured clinical text- “combined with AI-enabled signal detection, causal inference, and machine learning- “are enabling faster feasibility assessments, more confident study designs, and stronger external control methodologies. Regulators emphasize that governance is essential. AI-enabled RWE must be transparent, reproducible, and fit-for-purpose, with clear assumptions, bias mitigation, and continuous performance monitoring. This session demonstrates how modern analytics platforms operationalize governance-by-design, embedding data lineage, explainability, and human oversight directly into the RWE lifecycle. The session concludes with a forward-looking perspective on how governed AI will define the future of RWE- “accelerating development while delivering evidence that regulators trust and patients deserve.

AI-349 : AI-Powered Risk Management: Enhancing RBQM with Generative AI and Real-Time Insights
Rohit Kadam, Mr.
Saurabh Das, Tata Consultancy Services
Niketan Panchal, Tata Consultancy Services

Risk-Based Quality Management (RBQM) helps ensure patient safety and data quality in clinical trials by identifying and managing risks early. Traditional RBQM tools rely on dashboards and reports, manual risk identification, siloed systems, and delayed detection of quality issues, which can be slow and hard to interpret. Generative AI (GenAI) offers a way to address these challenges by analyzing both structured and unstructured data, identifying emerging risks, and automating routine quality tasks. This paper introduces an AI-powered chatbot that makes RBQM interactive and easy to use. The chatbot allows study teams to ask questions, view risk summaries, and take action all in real time. This paper presents a practical framework for applying GenAI to RBQM. The approach covers key components: risk identification, risk assessment and scoring, risk control with KPIs, centralized monitoring and signal detection, and issue management. The workflow includes data ingestion from CTMS, EDC, eTMF, safety systems, and site performance metrics; pattern detection using large language models; generation of automated insights; and real-time dashboards for oversight. It also uses trend analysis and Central Statistical Monitoring (CSM) to detect outliers. Users can see heatmaps and charts, run quick analytics, and start CAPA workflows directly from chat. The paper demonstrates how this approach improves speed, collaboration, and audit readiness. By turning risk management into a conversation, this solution makes RBQM more accessible and effective for modern trials. The benefits for study teams include reduced manual review workload, faster detection of compliance risks, improved site oversight and higher inspection readiness.

AI-352 : Practical Lessons in AI-Assisted Metadata Conversion: From Database Specs to EDC, SDTM, and P21 – ” Successes, Pitfalls, and Regulatory Considerations
Jianlin Li, Evermedix
Andy Shen, SJ Biopharm Solutions

As artificial intelligence tools rapidly enter the clinical data lifecycle, sponsors and programmers are exploring new ways to automate metadata transformation- “such as converting proprietary database specifications into EDC specifications, generating CDASH-to-SDTM mapping workbooks, and deriving P21/Define-XML metadata. While early results are promising, real-world experience reveals both powerful advantages and non-obvious risks. This paper presents a practitioner’s perspective on using large language models (LLMs) to accelerate metadata workflows across multiple steps: DB Spec – ‘ EDC Spec, EDC Spec – ‘ SDTM Spec, and SDTM Spec – ‘ Pinnacle 21 Spec. The content is informed by extensive hands-on experimentation and multi-iteration refinement using AI-assisted transformations. We highlight where AI performs exceptionally well- “normalization of vendor specs, structural conversions, first-draft mappings, derivation suggestions, and generation of SAS code scaffolding. We then discuss the major pitfalls, including hallucinated CDISC variables, inconsistent domain generation, misapplication of mapping rules, instability across iterations, and difficulties handling multi-select fields or sponsor-specific conventions. A dedicated section contrasts metadata-level automation (which is feasible and highly productive) with data-level automation (raw data – ‘ SDTM data), which currently suffers from fundamental limitations. We explain why black-box AI execution of SDTM transformations is generally unsuitable in regulated environments due to lack of determinism, traceability, and reproducibility- “resulting in workflows that are not submission-safe. The paper concludes with a practical framework for incorporating AI responsibly: where it accelerates legitimate work, where humans must remain in control, and how teams can build a repeatable, reviewable, and regulator-friendly process.

AI-362 : Using Large Language Models to Validate TLF Outputs Against Statistical Review Comments: An End-to-End Python Framework
Kishore Reddy Rollakanti, Cytel Inc

Quality control of clinical Tables, Listings, and Figures (TLFs) requires verifying that statistical review comments are fully and correctly addressed in the final output process that is typically manual, time-consuming, and susceptible to inconsistency. Large studies often involve hundreds of review items, multiple programming revisions, and extensive output packages, making it challenging to ensure that each comment has been resolved and documented consistently. This paper presents an end-to-end Python framework that leverages Large Language Models (LLMs) to automate validation of statistical review comments against final TLF outputs. The solution combines structured review metadata (output identifier, review finding, and programmer resolution) with an Excel tracker with automated RTF-to-text extraction from individual output files packaged as a zip archive. For each review item with a reviewer comment, the framework prompts an LLM to assess whether the stated resolution is supported by the updated output text, and classifies the item as Addressed, Partially Addressed, Not Addressed, Error, or N/A. The implementation is delivered as a Streamlit application that produces an auditable summary report for team leaders and supports iterative review cycles. The approach demonstrates a practical and scalable method to reduce manual reconciliation effort, improve consistency, and strengthen traceability in clinical reporting workflows. Design considerations, reproducible setup instructions, and complete reference code are provided to enable readers to implement and extend the solution in their own environments.

AI-363 : An AI Framework for CSR, using Microsoft CoPilot: Traceable Side by Side Efficacy Comparisons:
Dushyant Rao, Independent

Clinical study reports (CSRs) routinely require side-by-side comparisons of longitudinal efficacy across treatment arms and baseline subgroups. These comparisons are often assembled manually from CSR tables and PDFs, creating inefficiencies, transcription errors, and audit challenges. We describe a practical framework using Microsoft Copilot to generate audit-ready Excel comparison workbooks directly from finalized CSR output PDFs, without recalculation, statistical inference, or data transformation. Using predefined extraction rules, Copilot captures least squares means, 95% confidence intervals, treatment and percentage differences, and p-values across time points and subgroups. Each extracted value is traceably linked to its originating CSR table and exact source location. A case study demonstrates the creation of a multi-tab, side-by-side workbook aligned by treatment arm and baseline subgroup, significantly reducing manual effort while improving consistency and traceability. We also outline regulatory guardrails to support controlled use in submission-ready workflows and present a reusable prompt pattern that statistical programmers can apply across studies. This approach enables faster, more reliable efficacy comparisons while preserving auditability and reviewer confidence, supporting efficient CSR review and downstream regulatory activities.

AI-420 : A specification-driven approach to improve the reliability of AI-generated SDTM transformation programs
Rostislav Markov, Amazon

Clinical data transformation pipelines are increasingly augmented with artificial intelligence (AI)-assisted code generation. However, in regulated environments, the uncontrolled use of AI can introduce unacceptable risks, including incorrect transformations, inconsistent traceability and reproducibility, and increased validation burden. This paper presents a practical, specification-driven approach to governing AI-assisted function development using software development lifecycle (SDLC) artifacts including Behavior-Driven Development (BDD) specifications, test contracts, and validation rules. We evaluate how different levels of AI prompt governance affect the correctness, compliance, and human effort required to produce transformation functions. Using real production artifacts from an SDTM transformation framework, we compare loosely specified AI prompts against increasingly governed prompts grounded in formal syntax, structured test contracts, curated examples, and explicit validation criteria. All AI-generated code is evaluated programmatically using existing testing framework. Results show that specification-driven code governance dramatically improves the correctness and reduces rework, while additional governance layers further reduce failure modes and manual intervention. The approach requires no model fine-tuning and integrates seamlessly into existing SDLC workflows. This paper demonstrates that specification quality, not model sophistication, is the primary determinant of safe and scalable AI-assisted development in regulated clinical data pipelines.

AI-431 : AI-Powered Multiple-Agent Pipeline for Automating ADaM Dataset Generation
Lucas Liu, Yesod AI, Inc
Bo Ci, Yesod AI, Inc

Preparing Analysis Data Module (ADaM) datasets for each clinical trial is a resource-intensive and time-consuming task, often taking several months. To address this challenge, we are developing an AI-driven, multi-agent pipeline that automatically generates and executes statistical programming code to produce ready-to-use ADaM datasets. The system also includes built-in quality assurance and error handling. It will significantly reduce manual effort and turnaround time. This approach has the potential to substantially lower operational costs and accelerate project timelines.

AI-432 : Evaluation of Azure OpenAI ChatGPT API as Code Assistance Tools for Statistical Programming in SAS, R and Python.
Ajay Gupta, Daiichi Sankyo
Misikir Tesfaye, Daiichi Sankyo Inc.

Statistical programming in pharmaceutical research requires high levels of accuracy, responsibility and traceability. Recent advances in Generative Artificial Intelligence (GenAI) have introduced large language models capable of assisting with code generation through application programming interfaces(APIs).This paper presents a structured evaluation of the ChatGPT model accessed via the Azure OpenAI as a code assistance tool for statistical programming in SAS,R and Python. The evaluation was conducted across three execution platforms, using six predefined prompts grouped by task complexity, ranging from dummy data creation to reusage of macro and function generations. The same prompts group were applied consistently across platforms. A total of 18 language-platform execution scenarios were evaluated. All code generated by ChatGPT was executed without manual modification. Successful execution was achieved in 100% test cases, with correct output generation and creation of corresponding log files in user-specified directories. These findings are consistent with previously published peer-reviewed studies that report high rates of synthetically correct code generation by large language models under controlled prompt conditions. This result suggests that API-based GenAI can serve as a reliable code assistance tool for statistical programming workflows when used with appropriate human oversight and governance control.

AI-438 : Operationalizing Generative AI in Regulated Analytics: Applied Implementation Patterns for Enterprise Deployment
Lida Gharibvand, Loma Linda University

Generative artificial intelligence is moving quickly from experimentation to deployment settings, including regulated pharmaceutical and life sciences environments, where auditability, reproducibility, and controlled access to data are given more importance. SAS® Viya® is widely accepted for validated analytics workflows, and the overarching practical difficulty is in integrating GenAI capability, such as natural language interaction, automated narrative generation, and code assistance without compromising enterprise governance. The present manuscript serves as a hand-on blueprint in operationalizing Generative AI in SAS Viya using agile and enterprise-grade implementation patterns. A modular architecture is defined in which (1) the CAS/SAS Visual Analytics-curated analytical assets remain the definitive sources for structured results, (2) retrieval-augmented generation (RAG) grounds LLM responses in approved documents and tables, and (3) validation controls policy checks deterministic post-processing, human-in-the-loop review, and logging to enable regulated deployment. We describe three high-impact use cases appealing to the PharmaSUG audience: conversational analytics over curated clinical/safety outputs, automated regulatory and executive reporting orthwithstanding attribution to original source tables of cited materials, and SAS code scaffolding to expedite recreatable workflows. Operational experience across production-style deployments suggests directional benefits: generally less effort from analysts to address ad-hoc requests, more rapid development of stakeholder-ready summary material, and greater consistency in narratives across teams while remaining mindful of security controls. GeneralKeywords: SAS Viya, Generative AI, Retrieval-Augmented Generation (RAG), Conversational Analytics, Governance, Regulated Analytics

Advanced Programming

AP-011 : Take CoMmanD of Your Log: Using CMD to Check Your SAS® Program Logs
Richann Watson, DataRich Consulting

Regardless of the industry, part of writing a SAS® program is to ensure that the log is free of any unwanted log messages. When running the program in an interactive SAS session, we can review the log as we execute the program and SAS is good about highlighting ERROR and WARNING messages using colors to draw the eye. Other types of unwanted log messages, such as INFO, uninitialized, character to numeric conversion, may not be so easily spotted. When running the program in batch, each program needs to be opened and scanned for unwanted log message, which is tedious and prone to overlooking a message. There have been several papers illustrating the creation of macros that will check the logs by parsing the logs after the programs have been executed. While these macros are great when you are running a lot of programs for a deliverable and need to check all the logs, these check log macros are not necessarily ideal during development. It is during development that we need to ensure the program is running clean. Although we could possibly use the same macro that is used to check all the programs and filter to run on one program, that would require us to run an extra program. What if there is an easier way? This paper demonstrates the use of the command line interpreter to execute the program in batch as well as check the log and provide a summary.

AP-108 : The Problems Surrounding Rounding
Jennifer McGrogan, Biogen
Mario Widel, Independent

Rounding is not a novel concept- “examples can be found in ancient civilizations, such as approximations made by the Mesopotamians. Since then, the need for rounding has not diminished, it has rather increased. It was necessary long before the use of computers for numerical calculations and introducing their use has made the rounding process more complex and indispensable. Based on the referenced literature and our own experience, we will show typical problems including: 1. Rounding is unavoidable, for a. table readability, b. validation of results, c. keeping results within reasonable precision, and d. ensuring accurate assignment of CTCAE toxicity grades. 2. Computer representation of numeric results may a. affect number precision, b. be inadvertently rounded, c. introduce calculation errors due to rounding, or d. cause compare differences on numbers that appear identical. Throughout this paper we will cover some impacts as well as mitigations and solutions with respect to clinical analysis.

AP-116 : ARMed AutoTable Macro Agents: An ARM-Driven Framework for Automated Analysis Table Generation
Chengxin Li, AutoCheng Clinical Data Services LLC

Analysis Results Metadata (ARM) serves as table specifications, commonly used for annotating table shells and generating ARM-Define.XML for regulatory submission. Another important ARM application is automated table generation through the AutoTable framework. AutoTable is implemented entirely in SAS macros, where parameters are directly aligned with the extended ARM fields to enable transparent, metadata-driven automation. Targeting non-inferential analyses, four core macros- “%ADSLEVL, %BDSSTAT, %BDSSHIFT, and %OCCFREQ- “have been developed. Each macro invocation produces a complete and traceable output package that includes a table (RTF, PDF, or XLSX), plain executable submission-ready SAS code, a final result dataset (RDS), an ARM dataset (ARMDS facilitating Define-XML generation), and a log file. The macro toolsets are designed to adapt to any table shell by capturing its core features and display patterns through option settings, eliminating the needs for shell standardization or digitization. The design aims to achieve 95% analytical coverage and 85% layout coverage for non-inferential analyses in any studies, while ongoing development extends the framework to inferential macros such as %MMRM, %ANCOVA, %CMH, and %TTE. The approach markedly enhances programming efficiency compared to legacy “copy-and-modify” workflows. The paper will present the design and implementation of AutoTable, with detailed examples of the shift table macro to demonstrate its flexibility and multi-layout capability. Looking ahead, AutoTable’s roadmap incorporates AI/ML-powered automation with AI Agent and R/Python interoperability, reflecting the continued evolution of modern statistical programming ecosystems.

AP-124 : SAS Packages – an Ask About Anything Game
Bart Jablonski, yabwon

Modern data focused languages, like R or Python, have vast ecosystems for building packages. Those environments allow their users to share their ideas, inventions, and code in an easy, almost seamless way. Unfortunately SAS, with its profound and historically well-established impact on data analysis, have not embraced such a marvelous idea yet. The article covers the following topics: what are SAS packages, how to use and develop them, and how to make code-sharing a piece of cake, and of course, what opportunities, possibilities, and benefits SAS packages bring to the community of SAS programmers. Additionally, the article will provide a list of frequently asked questions about SAS packages and answers to those questions too.

AP-127 : Implementing Laboratory Toxicity Grading for CTCAE Version 6 and Beyond
Keith Shusterman, Disc Medicine
Mario Widel, Independent

CTCAE Version 6.0 has been released, further clarifying toxicity grade terms and definitions from Version 5.0. However, CTCAE is not the only standard for toxicity grading. Other criteria also exist, including the grading criteria from the Health and Human Services Division of AIDS (DAIDS) for adult and pediatric adverse events, as well as the FDA guidance on the toxicity grading scale for healthy adult and adolescent volunteers enrolled in preventive vaccine clinical trials. We will show a flexible method for deriving CTCAE grades that can handle cases where the grading derivation requires information external to the lab value itself, such as the FDA toxicity grading guidance, as well as any existing version of CTCAE, including versions up to 6.0.

AP-128 : Unleashing Open-Source Potential in SAS: The PharmaForest Ecosystem (proc pharmaforest data=open_source out=- 😉
Sharad Chhetri, Takeda
Ryo Nakaya, Associate Director

The recent surge in the use of R and open-source software in drug development has sparked a powerful movement focused on reducing redundancy, fostering collaboration, and driving innovation. Notably, this shift has produced an unexpected- “but positive- “side effect: it has inspired many long-time SAS users to adopt the same spirit of openness and community sharing. Increasingly, SAS professionals are recognizing the strength of collective knowledge and the value of contributing their own work as open-source code. Emerging from this momentum is PharmaForest- “a unified repository of SAS packages built upon the SAS Packages Framework (SPF). This initiative is dedicated to accelerating open collaboration among SAS users and cultivating a vibrant, sustainable community- “one where ideas, code, and best practices can flourish together, much like a living forest. This presentation will introduce PharmaForest, explore how it bridges the traditional SAS ecosystem with the open-source movement, and highlight its transformative role in the future of collaborative pharmaceutical programming.

AP-136 : 2026 Efficiency Techniques in SAS 9.4
Stephen Sloan, Dawson D R

Using space and time efficiently has always been important. We want to use available space without having to obtain new servers or other resources, and without deleting variables or observations to make the SAS data sets fit into the available space. We want our jobs to run more quickly to reduce waiting times and ensure that scheduled job streams finish on time and successor jobs are not unnecessarily delayed. Internal mainframe billing algorithms have always rewarded efficiency. As we move toward cloud computing efficiency will become even more important because the billing algorithms in cloud environments charge for every byte and CPU second, putting an additional financial premium on efficiency. Sometimes we are in a hurry to get our jobs done, so we don’t pay attention to efficiency, sometimes we don’t know at the start how much time and space our jobs will use (and the important time is the time allocated to our assignment), and sometimes we’re asked to go into existing jobs and make changes that are seemingly incremental but cause large increases in the space and/or time required. Finally, there can be jobs that have been running for a long time and “if it ain’t broke, don’t fix it” because we don’t want to cause the programs to stop working, especially if they’re not well-documented. With a good knowledge of SAS® Base, we can help our organizations optimize the use of space and time without causing any loss of observations or variables or change in program results.

AP-146 : AI-Assisted Modular Design for R Markdown Report Generation A Hybrid Architecture for Enhanced Maintainability and Cross-Study Scalability
Song Liu, Merck & Co., Inc

Automated report generation systems for scientific study reports in R often begin as sequential codebases that become increasingly difficult to maintain and adapt as complexity grows. This paper presents a comprehensive methodology for designing AI-friendly modular architectures using Chemistry, Manufacturing, and Controls (CMC) Process Characterization (PC) study report as a case study. Our approach employs a hybrid architecture combining child R Markdown documents with functional programming patterns to create modular components, unified through a structured project environment serving as a controlled namespace for sharing data between modules. The main R Markdown file retains YAML headers and setup chunks and orchestrates execution by hosting the minimal set of parent chunks that invoke the specialized child modules with detailed documentation specifying inputs, outputs, purposes, and dependencies. Comprehensive documentation serves dual purposes: guiding human developers and providing instructions for AI systems. This modular architecture enables efficient AI-assisted development by allowing AI systems to focus on single modules rather than entire codebases. Documentation acts as executable specifications, enabling AI to generate appropriate code quickly; AI typically updates or creates one small, well-scoped module at a time. The methodology addresses critical challenges: (1) functional isolation prevents cascading failures and supports independent development, (2) controlled variable sharing via a custom project environment promotes reuse, (3) AI-assisted error handling surfaces contextual, user-facing messages at the point of failure and proposes remedies, and (4) centralized content management improves maintainability. Together, these practices enhance statistical programmer efficiency, reliability, transparency in failure modes, and scalability across diverse study types.

AP-148 : The Tipsy Hangover: Avoiding Indent Headaches in SAS Reports
Lisa Mendez, Army MEDCOM
Richann Watson, DataRich Consulting

Creating visually appealing SAS reports shouldn’t feel like recovering from a formatting hangover- “but when it comes to hanging indents in REPORT procedure, things can get a little… tipsy. In this paper, we serve up a curated cocktail of techniques using a real-world example to help you straighten out those stubborn hanging indents. We’ll explore multiple ways to achieve indentation in PROC REPORT using ODS options and clever preprocessing methods. You’ll learn why TWIPS (twentieth of a point) matter more than you think, and how understanding them can help you fine-tune your layout with precision. We’ll also explain why ODS ESCAPECHAR and the SPLIT option should never share a drink, and how to pre-process variables with line breaks and non-breaking spaces for smoother text flow. As always, we provide a curated list of references to help with other PROC REPORT formatting tips. Whether you’re formatting footnotes, crafting multi-line cells, or just trying to avoid the dreaded indent misalignment, this paper offers tips, tricks, and troubleshooting advice to help you sober up your reports and keep your formatting headache-free and avoid the dreaded hangover.

AP-158 : Applications of PROC COMPARE to parallel programming and other projects
Jayanth Iyengar, Data Systems Consultants LLC

PROC COMPARE is a valuable BASE SAS® procedure which is used heavily in the Pharma industry and other areas. By default, the capability of PROC COMPARE is to reconcile two data sets to determine if they have equivalent sets of records and sets of variables. In the clinical field and elsewhere, PROC COMPARE is often used to validate data sets in projects which involve parallel programming, where programmers independently perform the same tasks. In this paper, I will discuss the role PROC COMPARE plays in different SAS tasks, including DATA STEP merges, parallel programming, generation data sets, and more.

AP-159 : Speeding Up Your Validation Process is As Easy As 1, 2 and 3
Alice Cheng, Independent

In their 2016 paper, Alice Cheng et al have introduced tips and techniques to speed up the validation process by means of the COMPARE procedure in SAS(R). Also introduced is the use of &SYSINFO, an automatic macro variable generated by PROC COMPARE. This single value held by &SYSINFO enables us to know accurately the result of the comparison. This paper will begin with an introduction of &SYSINFO and the meaning of its values. It will then demonstrate how one can take advantages of its features to speed up the validation of numerous deliverables such as datasets, tables, listings and figures, and how one can use Excel spreadsheet to make the result easier to read. Armed with the techniques introduced in this paper, speeding up the validation process is as easy as 1, 2 and 3. KEYWORDS: PROC COMPARE, validation, &SYSINFO, Speed Up

AP-208 : Programming Challenges in Developing PRO Analysis Datasets Under FDA’s New Submission Guidance
Weiwei Guo, Merck

Patient-Reported Outcomes (PRO) data are measurements of a patient’s health status, symptoms, functional abilities, health-related quality of life, and treatment satisfaction that are captured directly from the patient, without clinician interpretation. Because PRO collection depends on active patient participation, missing data are common, posing substantial challenges for statistical programming and downstream analysis. In November 2023, the U.S. Food and Drug Administration (FDA) issued “Submitting Patient-Reported Outcome (PRO) Data in Cancer Clinical Trials,” which outlines expectations for CDISC SDTM and ADaM datasets and recommends specific tables, listings, and figures (TLFs) for oncology submissions. In September 2025, our study team received an FDA information request to provide PRO TLFs in accordance with this guidance. This paper describes programming challenges and practical solutions for constructing analysis-ready PRO datasets, focusing on handling missing item-level responses and derived summary scores, and on deriving key visit-level flags to support the TLFs requested in the FDA’s new guidance.

AP-211 : Schema-Preserving Generation of Clinical TLF Templates and Executable R Code via Iterative LLM-Guided Debugging
Jaime Yan, Merck
Ming Yang, Kura Oncology

Manual authoring of Tables, Listings, and Figures (TLFs) for Clinical Study Reports (CSRs) is slow, error-prone, and demands 800- 3,200 expert hours per Phase III study. We evaluate large language model (LLM) approaches to automate TLF template creation and translation into executable R code. In 1,999 bootstrap experiments across five methods and three LLM providers, a hybrid retrieval-augmented generation (RAG) with reranking significantly outperformed direct prompting (mean quality 85.7 vs. 81.7, p < 0.05), consistently across therapeutic areas. For code generation, zero-shot success was low, but iterative LLM-guided debugging achieved 70% success within 3- 5 rounds, with higher-fidelity templates requiring fewer iterations. These findings support a multi-agent framework coupling schema-grounded template generation with automated debugging to improve compliance with ICH E3 and CDISC ADaM and potentially shorten CSR timelines by 3- 6 months. All analysis code is available for reproducibility.

AP-220 : Do One Task, Get Another Done for Free: Use DEFX Tags in Comments to Fill Out Your define.xml
Brendan Bartley, Harvard T.H. Chan School of Public Health
Megan Hinger, Rho Inc

Have you ever wanted to take care of two tasks while only doing one? If so, this is the commenting system for you. We all know we need to comment our programs, but sometimes are lax about this. I suspect this may happen less often if the comment would save time and effort filling out the define.xml at the end of a project. At our workplace, code review is also a big part of our validation process, so our comments must ensure that the intent of the code matches what is in the code. The DEFX tag system will save time and effort finding programs and writing comments for the define.xml while encouraging the good coding practice of commenting one’s code.

AP-222 : Finding Macros Called Within A Directory of SAS® Programs
Derek Morgan, Bristol Myers Squibb
Taiana Kazakova, Parexel

As an organization heavily dependent on standard SAS macros, BMS needed a way to efficiently document the use of those macros in its programs. In the past, this was a time-consuming, manual task requiring significant effort, and was prone to errors. Either the list was compiled post-programming, or it was the programmer’s job to document macro use in a header for each program. It isn’t difficult for one or two programs, but if you have two hundred, it becomes onerous. Also, the manual process isn’t good at capturing macros called from within other macros. This method is used to find all the macros and nested macros for a given study and produces an RTF table that can be copy-pasted into an ADRG. The methodology described in this paper only uses base SAS, and only needs read access to files. It does not modify any of the code it inspects. How long it takes is dependent on how much code is to be processed. In benchmark testing, it processed 66,422 lines across 245 files, 177 of which were report-generation programs, in approximately two minutes.

AP-228 : Timing, Masking, and Resolution: Understanding and Debugging SAS Macros
Carleigh Crabtree, SAS

Special characters and unexpected text resolution are a common source of frustration when working with the SAS macro language. While functions such as %STR and %NRSTR are often introduced as simple solutions, misunderstanding when and how macro text is resolved can still lead to subtle and difficult-to-debug issues. This paper explores the mechanics of macro timing, masking, and resolution to explain why certain macro variables behave unexpectedly, particularly when special characters are involved. Beginning with %NRSTR, we examine how masking affects text stored in the macro symbol table and how masked characters can become unmasked at execution time. The paper then demonstrates how macro functions can inadvertently trigger premature resolution, reintroducing masked characters and causing incorrect results. To address these issues, macro Q-functions are introduced as a mechanism for preserving masking during macro function execution. Through practical examples and log analysis, this paper provides a deeper understanding of how the SAS macro processor handles text. By understanding why macros behave the way they do, users can more effectively prevent resolution errors, select the appropriate masking technique, and write more robust and predictable macro code.

AP-233 : Detecting Abnormal Page Breaks Using Grayscale Pixel-Density Analysis in R Shiny
Yi Guo, Pfizer Inc.

Unexpected page breaks in tables, listings, and figures (TLFs) are difficult to prevent because page capacity is hard to estimate in advance. For example, overlong or newly added footnotes and constrained column widths may lead to truncation, resulting in more rows or larger figures than can fit on a single page. Since each TLF has a unique layout, programmers must still visually review outputs case by case even after dataset validation. While page break issues are relatively easy to detect in shorter outputs, manual review becomes increasingly time-consuming and less reliable as outputs span many pages, making such issues more likely to be overlooked. Inspired by color science, this paper presents a standalone R Shiny tool that applies grayscale pixel-density analysis to support detection of abnormal page breaks, assessing page layout through content distribution rather than visual appearance. The tool converts uploaded RTF or PDF files into page-level images, transforms them into grayscale representations, and evaluates pixel-density distributions to identify unusually large blank regions indicative of unintended page breaks. The approach does not rely on predefined layouts and is applicable across different TLF types. This work demonstrates how image-based analytical techniques can be repurposed as a practical and reusable QC strategy for managing output-level quality risks. Within existing QC workflows, it provides near real-time, page-level visual feedback, improving manual review efficiency and reducing the chance of missing abnormal pages. At the same time, this type of interactive feedback is naturally well supported by R and Shiny-based workflows.

AP-237 : SAS® Programming techniques for efficiency and code optimization
Jayanth Iyengar, Data Systems Consultants LLC

There are multiple ways to measure efficiency in SAS® programming; programmers’ time, processing or execution time, memory, input/output (I/O) and storage space considerations. As data sets are growing larger in size, efficiency techniques play a larger and larger role in the programmers’ toolkit. This need has been compounded further by the need to access and process data stored in the cloud, and due to the pandemic as programmers find themselves working remotely in distributed teams. As a criteria to evaluate code, efficiency has become as important as producing a clean log, or expected output. This paper explores best practices in efficiency from a processing standpoint, as well as others.

AP-246 : Appreciating PROC SUMMARY/MEANS in Many NWAYS vs Summary Function in R
Brian Varney, Experis

The summary/means procedure in SAS is a very powerful and flexible procedure for summarizing continuous analysis variables and returning simple summary statistics. This paper intends to examine what happens when the NWAY option is left off PROC SUMMARY in detail and the problems that can arise when you have many class variables. We will examine a SAS Macro that mimics the PROC SUMMARY without the NWAY option. We will also compare the same approach in R using the Tidyverse summary function.

AP-251 : Modernizing Clinical Research Analytics with Cloud- ‘Optimized SAS Procedures
Jim Box, SAS Institute
Mary Dolegowski, SAS

As clinical research organizations continue migrating analytical workloads to the cloud, they gain access to scalable, multi- ‘threaded computing environments capable of accelerating processing across datasets of any size. This session explores a new generation of cloud- ‘based SAS procedures designed to take advantage of these modern architectures. We will examine how these procedures differ from their traditional SAS 9 counterparts, highlight performance and usability improvements, and discuss practical considerations for adopting them in real- ‘world clinical analytics workflows

AP-256 : Advanced SAS Graph Template Language (GTL) with Practical Examples from Oncology Trials
Ceng Qian, Gilead Sciences, Inc.
Xu Wen, Gilead Sciences
Wei Lei, Gilead Science
Ling Han, Gilead

With increasing complexity of clinical trials, the business needs for high-dimensional and dynamic data visualization become more crucial to help decision-making and drug development. Compared to traditional SAS SG procedures, SAS Graph Template Language (GTL) provides more user-defined features and flexibility. In this paper, we illustrated several practical examples of customized figures using advanced SAS GTL code, including multi-layer waterfall plots, data-driven heatmaps, longitudinal swimmer plots, and cycle-based adverse event (AE) incidence plots from Oncology trials. The purpose of this paper is to motivate SAS programmers to utilize the powerful GTL and create high quality graphs.

AP-257 : Patient-Level, Dose-Stratified Swimmer Plots for Comparative Adverse Event Time-Course Assessment in Clinical Trials Using SAS® PROC TEMPLATE
Qingwei Hu, VeraMed Inc
Zhaoyu Xie, VeraMed Inc
Ji Qi, BioPier LLC a Veramed Company

Background: Patient-level adverse event (AE) data are critical in clinical safety assessment for new drug applications. Yet standard summary tables and listings provide limited visibility into AE time course, including onset, duration, recurrence, and resolution, and can hinder intuitive comparisons between active treatment and control. Methods: We describe a reproducible swimmer-plot methodology implemented with SAS® ODS Graphics using PROC TEMPLATE to visualize longitudinal AE trajectories at the individual participant level while incorporating treatment arm and dose exposure. Each participant is represented by a horizontal timeline spanning treatment and follow-up, with grouping by randomized arm and dose-stratified ordering or faceting within active-treatment groups. AE episodes are rendered as time-stamped intervals, enabling representation of multiple occurrences per participant. Event-level attributes (e.g., severity grade, seriousness, action taken, outcome/resolution status) can be encoded via symbols, line properties, and annotations to support clinical interpretability. Results: The resulting displays facilitate rapid review of AE frequency and heterogeneity across participants and enable visual assessment of arm- and dose-related patterns in timing (early vs. late onset), persistence (short-lived vs. prolonged), recurrence, and final outcome (resolved vs. ongoing). This patient-level perspective complements conventional outputs and supports efficient safety signal evaluation and clear communication in clinical and regulatory review. Conclusion: Patient-level, dose-aware swimmer plots generated with PROC TEMPLATE provide a robust, publication-quality visualization approach that enhances comparative AE time-course assessment for new drug development and approval submissions. Keywords: patient-level data; adverse events; safety; dose; treatment comparison; swimmer plot; PROC TEMPLATE; SAS ODS Graphics

AP-258 : From Tables to Tolerances: The Evolving Role of Statistical Programmers in Risk-Based Quality Management (RBQM)
Vihar Patel, PPD, part of Thermo Fisher Scientific

Risk-Based Quality Management (RBQM) has become a regulatory and operational expectation in modern clinical trials with the evolution of ICH E6(R2) and E6(R3), together with the Quality by Design (QbD) framework introduced in ICH E8(R1). While RBQM is often viewed as the responsibility of Clinical Operations, Data Management, and Quality Assurance, statistical programmers play an increasingly vital role in translating risk concepts into measurable and actionable insights. They also act as integrators across Biostatistics, Data Management, and external vendors by assessing the feasibility of key metrics such as Key Risk Indicators (KRIs) and Quality Tolerance Limits (QTLs), as well as the suitability and availability of underlying data sources. This paper presents a role-specific competency model describing the skill sets required for statistical programmers to effectively support RBQM activities. Using a fully synthetic Phase II clinical trial dataset, the paper illustrates the programming frameworks used to compute KRIs, QTLs, and statistical risk signals used in Centralized Statistical Monitoring (CSM). The paper also describes collaboration points with cross-functional teams during risk assessment and review. These examples show how programmers can elevate RBQM from a conceptual framework to a practical, data-driven system that strengthens study oversight, supports regulatory expectations, and improves overall trial quality.

AP-267 : The Log Whisperer: Still Reading SAS Logs? Start Ranking Them
Charu Shankar, SAS Institute

SAS logs are vital for QC, but review is often slow and inconsistent- “manual scanning, uneven triage, and too much time lost to noise. This paper introduces a SAS-only “Log Whisperer” that turns log review into a repeatable workflow using a rules control table, DATA step parsing, PRX pattern matching, and PROC SQL rollups. A lightweight rules dataset defines patterns, severity, and weight. The engine scans a folder of .log files, captures line-level hits, and rolls results up to a file score with stoplight status (RED/AMBER/YELLOW/GREEN). A rule-frequency view highlights the most common root causes across the project. Rather than reading every log line, reviewers get a ranked queue that focuses attention where it matters most- “reducing QC thrash and improving consistency without a heavy framework. Takeaway: a reusable SAS utility that helps focus reviewer attention where it counts, reduce QC thrash, and bring consistency to log review- “without a heavy framework. The solution is shared as four modular “Lego block” programs that can be run in sequence and reused across studies or projects.

AP-271 : Stop Guessing. Start Matching. High-Impact SAS PRX Patterns in 20 minutes
Charu Shankar, SAS Institute

Messy text is everywhere in clinical programming- “IDs embedded in free text, inconsistent visit labels, unpredictable separators, and key-value strings that require precise parsing. SAS PRX brings Perl-compatible regular expressions into the DATA step, enabling accurate validation and extraction without brittle cascades of INDEX/SUBSTR logic. This 20-minute, code-forward session is delivered as three fast demo segments: 1. Anchors and boundaries to validate common values (such as USUBJID and visit labels) and eliminate false positives. 2. Capturing groups to extract structured components- “such as USUBJID and visit numbers- “from unstructured strings. 3. Lookarounds to target exactly the value you want (for example, extracting AESER from key-value text) without consuming labels or delimiters. We finish by layering a small set of standardization patterns- “normalizing separators, collapsing whitespace, and applying safe character filtering- “to make downstream derivations and reporting more reliable. Attendees will leave with a concise PRX recipe card and a compile-once/apply-many template they can drop into SDTM/ADaM preparation, mapping support, reporting pipelines, and QC workflows to reduce review churn and improve robustness.

AP-274 : Experience of an R Programmer Incorporating R in a SAS Studio Flow
Shelby Taylor, SAS Institute

Integration of open-source tools into enterprise analytics workflows enhances flexibility and reproducibility for statistical programmers, especially in regulated industries such as pharmaceuticals. This presentation shares the experience of an R programmer adopting the R Runner custom step within SAS Studio Flows on SAS Viya, illustrating how native R code can be executed seamlessly in a low-code pipeline environment. R Runner, accessible from the SAS Studio Custom Steps GitHub repository, enables users to embed R scripts or inline R code directly into analytic flows, expanding SAS’s traditional capabilities with the extensive ecosystem of R packages. Practical examples demonstrate fitting regression models using base R, leveraging tidyverse for data transformation, and producing visualizations with ggplot2- “all executed within a SAS flow and interoperable with core SAS datasets. The talk also highlights setup considerations, such as configuring R and the rpy2 bridge in the Viya environment, and best practices for using script files versus inline code. By enabling hybrid workflows where SAS datasets feed into R and results (including new tables and graphics) are written back into the SAS environment, R Runner empowers analysts to combine strengths of both platforms without disrupting existing processes. Attendees will gain insights into extending SAS workflows with R, improving analytical productivity, and fostering collaboration across language boundaries.

AP-295 : User-defined Functions for Programming Population PK and PKPD Datasets
Zhongqing He, Regeneron
Man Li, Regeneron

The Population PK (ADPPK) and Pharmacodynamic (ADPPKPD) datasets are the analysis-ready CDISC compliant datasets for population PK and Exposure-Response modeling analysis. These datasets are longitudinal, consist of various data elements with complexity across the clinical database domains, involve a lot of derivations in covariates (baseline and/or time-varying) in addition to dosing, PK, PD which often require using different formulations for one covariate or different covariates for modeler to explore different analysis approaches through the model development and selection process. We developed a set of user-defined functions to cope with the challenges of the timeline constraint, frequent change requests in derivations. We will share several key functions which have being proven to be easy to use, accurate, and efficient in pharmacometrics data programming and generating output.

AP-313 : Having Your Cake and Eating It Too: Automated Log Analysis Without Losing the SAS EG Log Window
Steve Black, Neurocrine Biosciences

For many SAS programmers, the SAS Enterprise Guide (EG) log presents a familiar challenge: while the log is fully visible in the EG interface, it is difficult to programmatically scan for subtle but important conditions such as uninitialized variables, problematic MERGE BY statements, and implicit type conversions. Redirecting the log to an external file enables automated review but removes it from the EG log window, forcing an uncomfortable trade-off. This paper demonstrates that with a little fancy footwork, we can have our cake and eat it too! Leveraging SAS Enterprise Guide’s ability to execute custom code before and after program submission, the SAS log can be temporarily redirected to a physical file for automated analysis and then seamlessly restored and replayed into the EG log window. This approach preserves the interactive EG log experience while enabling robust, programmatic log checks. The solution also generates structured HTML reports, including consolidated summaries when multiple programs are run.

AP-317 : Analysis Unlocked: Demystifying Kaplan-Meier (KM) Through a 9-Day Fasting Challenge for Programmers and Beginners
Nikhil Jadhav, Syneos Health
Kanchan Gund, Syneos Health

Time- ‘to- ‘event analysis is widely used in clinical research to evaluate how long it takes for an event to occur. In today’s era, the industry is heavily focused on automation tools, many professionals without a statistical background or those focused only on programming often miss the underlying concepts and struggle to interpret survival KM plots, which may appear like simple staircases. Yet each step represents meaningful information about patient risk and censoring, concepts often hidden inside a black- ‘box process. This paper bridges that gap use a real-world example 9- ‘day communal fasting challenge to demystify survival analysis. Completing the fast represents the event, while early withdrawal is treated as censoring. We manually derive the at- ‘risk set and conditional survival probabilities for each day, then build the KM estimator step- ‘by- ‘step to show how survival evolves over time. These manually computed KM values, and the resulting curve are compared with outputs from automated tools to demonstrate alignment between human logic and software calculations. We also clarify key features of the KM plot highlighting that drops occur only when someone breaks the fast, while tick marks indicate censored participants. This approach transforms programmers and non- ‘statisticians from “code executors” into informed “data interpreters”. By breaking down the KM method through a relatable example, attendees gain confidence in validating results and effectively communicating survival findings to clinical teams. Particularly valuable for Clinical SAS programmers, data analysts, and early- ‘career professionals, paper empowers readers to understand the logic behind survival curves and apply time- ‘to- ‘event concepts with clarity and confidence.

AP-342 : Has your SAS being ‘MEAN’ to your data yet?
Ruth Rivera Barragan, Ephicacy Consulting Group
Isaac Vazquez, Ephicacy Consulting Group

Daily work in the pharmaceutical industry requires extensive data handling and report generation, including the creation of standardized datasets (SDTM) and analysis datasets (ADaM) to support protocol and SAP requirements. During data preparation, statistical programmers routinely apply formats, perform mathematical calculations, and structure data to ensure meaningful and accurate presentation. At the reporting stage, SAS procedures such as PROC MEANS and PROC UNIVARIATE are commonly used to summarize and analyze data. However, under certain data conditions, these procedures may produce unexpected or inconsistent results. Such behavior can be difficult to detect and may have downstream implications for reporting accuracy and interpretation. This paper presents a practical example illustrating how seemingly minor data attributes specifically, the presence and handling of a sorting character variable, can influence the output of standard SAS summary procedures. The example highlights how small differences in data ordering can lead to variations in results, emphasizing the importance of careful data preparation and validation prior to analysis and reporting.

AP-375 : One Word Can Make All the Difference(s): Strengthening Validation Practices with PROC COMPARE
John LaBore, SAS Institute
Josh Horstman, PharmaStat LLC

PROC COMPARE is a cornerstone of validation in pharmaceutical processes- “especially for confirming independently programmed outputs, verifying transformations, and supporting submission-ready analyses. Programmers often rely on the message: “Note: No unequal values were found. All values compared are exactly equal.” Yet users that are unaware of the nuances of PROC COMPARE may make false assumptions based on this statement. In several common clinical-data scenarios, PROC COMPARE’s default output can mask discrepancies, particularly when comparing derived variables, handling missing or special values, or validating large analysis datasets. This paper highlights four specific situations where these implicit assumptions may lead to misleading conclusions in double-programming and regulatory validation contexts. Through practical, reproducible examples- “including code, log excerpts, and comparison output- “we illustrate how (possibly) unnoticed differences can slip through standard PROC COMPARE checks. We also demonstrate how a single keyword option, introduced in Version 9, can be added to the PROC COMPARE statement as a best practice to ensure that these cases are better identified and handled within the pharmaceutical context.

AP-376 : Advanced Parameterization Enables Advanced Statistical Programming
Miao Fu, Arcsine Analytics
Toshio Kimura, Arcsine Analytics

As statistical programming evolves alongside advances in artificial intelligence, automation, and large-scale analytics, analytical workflows have become increasingly complex. Modern statistical programming in the pharmaceutical industry therefore demands tools that are not only robust and reproducible, but also flexible, expressive, and scalable. Advanced analytics and their correspondingly flexible outputs require equally advanced and flexible parameterization. Whether parameterizing a SAS macro or a R package/program, parameterization serves as the user interface, and, therefore, parameterization is a crucial part of the user experience. As such, SAS macro/R package/program development benefits from a design perspective that treats parameterization not as isolated inputs, but as structured and extensible components capable of representing richer analytical intent to meet the increasing demand of complex AI-driven and automated workflows. To ground this broader discussion, the paper will present a sub-parameter-based input interface using SAS as an operational example. This design illustrates how grouping related inputs and supporting nested sub-parameters can provide a flexible and expressive structure, offering one possible representation of how fundamental programming units may adapt to increasing analytical complexity. Additionally, the concept of a parameter dataset (PDS) will be introduced to store parameterization for both documentation and future reuse (with modifications). Overall, this work illustrates a broader design philosophy, showing how evolving even small programming components can enable more expressive, configurable, and automation-ready analytical frameworks. The session is accessible to statistical programmers of all experience levels, with the greatest benefit for those with prior SAS macro programming experience and an interest in advanced programming design.

AP-418 : Code Hard and Put away Wet: Replacing Hardcoded SAS® Software Quality Checks with Data-Driven Design and Defensive Programming Techniques That Validate Code and Control Data
Troy Hughes, Data Llama Analytics

You wouldn’t ride your pony hard and put her away wet, so why subject your SAS® software to the same ill treatment?! Defensive programming describes a risk management strategy that aims to identify threats to software functionality and/or performance before they occur and, where possible, to identify pathways to programmatically mitigate those risks (or to communicate realized threats to stakeholders). This talk introduces various defensive programming techniques that can be implemented before software executes, such as verifying SAS program file state (i.e., availability, accessibility) and program file metadata (e.g., filename, checksum, create date, version). Data-driven software design further espouses the reality that “software” comprises not only code but also the underlying control data that drive that code’s functionality and flexibility. Thus, defensive data-driven design requires a fuller risk management strategy that evaluates risks not only to code but also to control data, including lookup tables, control tables, configuration files, and other control files. Finally, although not considered to be “software,” domain data sets and other data sources can be no less essential to ensuring software success, so the state and quality of these transactional data can be evaluated as well. Given this more expansive view, this talk further demonstrates defensive programming techniques that evaluate the state and metadata of required input data, be they SAS data sets, spreadsheets, CSV files, XML, JSON, or other interoperable formats. Defensive programming methods should be implemented where robust software must execute reliably, and data-driven software should be designed to incorporate these best practices.

Advanced Statistical Methods

AS-160 : Oncology Solid Tumor Subcutaneous vs Intravenous Late-stage Study Analysis
Sumanjali Mangalarapu, Merck
Chuqing Chen, Merck
Anilkumar Anksapur, Merck & Co., Inc

This paper presents practical programming approaches for analyzing a subcutaneous versus intravenously administered therapeutic, with pharmacokinetics (PK) designated as the primary endpoint. It focuses on the key Analysis Data Model (ADaM) datasets creation, their mock shells and programming logic used for key summary tables supporting the endpoints. Key content covers primary and secondary endpoints. Because PK as the primary endpoint is uncommon for late-stage solid tumor oncology trials, the programming required tailored analytic strategies, including model-based PK parameter estimation and specific endpoint derivations. This paper concludes that effective ADaM implementation and analyses require structured cross functional collaboration between analysis and reporting (A&R) and PK/PD programming teams with oversight from statisticians from the respective programming groups.

AS-166 : Quantifying Expression Divergence to Identify Candidate RNA-Binding Proteins Modulating Nonsense-Mediated Decay Across Human Tissues Using t-SNE Embedding Analysis
Liangwei He, University of Southern California

Nonsense-mediated mRNA decay (NMD) is a critical post-transcriptional regulatory mechanism that degrades aberrant transcripts. While its core mechanism is well characterized, the set of RNA-binding proteins (RBP) that modulate NMD across tissues and individuals remains poorly defined. In this study, we developed a computational framework to identify candidate NMD factors by leveraging large-scale transcriptomic data from the GTEx database. We introduced a divergence-based Ratio Score, which quantifies how strongly an RBP’s expression stratifies individual samples in the expression level of known NMD-target transcripts using the t-SNE embedding. This was compared against a traditional correlation-based baseline score, which computes the average Pearson correlation between each RBP and NMD-targets across individuals. Applying both methods across 31 human tissues, we found a moderate global correlation between the two scores, but significant divergence in their top-ranked RBPs. Tissue-specific analyses revealed consistent patterns in some tissues (e.g., Adipose, Muscle), but notable method-specific differences in others (e.g., Brain, Vagina). Statistical comparison of pairwise distances among high and low RBP expression groups confirmed that high-expression samples tend to form more consistent clusters in NMD transcript space, supporting the significant relevance of the Ratio Score. Top candidate RBPs were prioritized based on recurrence across tissues and passed on for downstream validation using expression datasets. Our findings suggest that inter-individual transcriptomic variation can be used to uncover novel factors of RNA decay pathways, and demonstrate the utility of divergence-based methods in post-transcriptional network inference.

AS-190 : Implementing Dynamic Time Warping in SAS 9.4 Using PROC IML: An Alternative Approach for Time-Series Model Evaluation
Yida Bao, University of Wisconsin Stout
Philippe Gaillard, Augusta University
Wei Yao, University of Wisconsin-Stout
Zheng Zhang, Murray State University
Rui Wang, 4707911632

Dynamic Time Warping (DTW) is a useful method for comparing time-series signals when timing shifts or delays are present. Common evaluation metrics such as Mean Squared Error (MSE) focus on point-by-point differences and often fail to reflect similarity in overall shape or timing. As a result, models that visually match the behavior of a biological signal may still receive poor numerical scores. In this paper, we implement DTW in SAS 9.4 using PROC IML and provide a practical way to compute dynamic distances between time-series signals directly within SAS. Using both simple examples and real ECG data, we show that DTW captures waveform similarity more effectively than traditional error-based measures when temporal misalignment exists. This work provides SAS users with a straightforward and interpretable tool for evaluating time-dependent data.

AS-269 : Tipping Point Analysis: An Illustration of Sensitivity Analysis on Non-Administrative Censoring for Progression-Free Survival (PFS)
Lihui Deng, Bristol Myers Squibb
Kylie Fan, BMS
Xiaoting Qin, Bristol Myers Squibb

Missing data due to unbalanced non-administrative censoring- ” such as disproportionate loss to follow-up or unequal rates of treatment discontinuation between study arms can introduce substantial bias in time-to-event clinical trials and compromise the validity of treatment effect estimates. Regulatory agencies emphasize the need for sensitivity analyses to address these concerns, with tipping point analysis serving as a key tool to test result robustness under informative censoring. This paper presents a practical tipping point analysis workflow in a Phase 3, open-label, randomized trial setting comparing an experimental treatment arm to the Standard of Care arm, focusing on progression-free survival (PFS). The approach is model-free and places emphasis on arm imbalance. The process starts by identifying participants who were censored for non-administrative reasons. It then compares the rates of non-administrative censoring between the two randomized arms to detect early or imbalance censoring, which may indicate the presence of informative censoring. Survival times for censored subjects are imputed, followed by sequential analyses using stratified Cox models and log-rank tests. For each time of an imputed survival time, the dataset was updated with imputed values, statistical significance is evaluated; the tipping point is reached when significance is lost, indicating sensitivity to non-administrative censoring assumptions, and potential impact of efficacy results. Kaplan- Meier estimates quantify the probability of reaching this threshold. This structured analytic framework enhances transparency and the reliability of clinical trial conclusions, supporting regulatory decisions and comprehensive benefit- risk assessments.

AS-315 : Machine Learning Models for Predicting Diabetes Using the PIMA Indians Dataset
Leon Davoody, Student

Diabetes mellitus is a major global health problem, and early identification of high-risk individuals can help prevent complications. This study compares two commonly used machine learning approaches for diabetes prediction using the PIMA Indians Diabetes dataset (n = 768). After replacing physiologically impossible zeros with missing values and applying mean imputation, models were evaluated using an 80/20 stratified split. We report accuracy, precision, recall, specificity, F1-score, and AUC-ROC. On the held-out test set, Logistic Regression achieved accuracy 69.48% and AUC 0.812, and Random Forest achieved accuracy 74.68% and AUC 0.809, with glucose and BMI ranking as the most influential predictors. These results show that routinely collected clinical variables can support diabetes risk prediction, while highlighting the importance of transparent preprocessing and validation. Keywords: diabetes prediction; machine learning; PIMA Indians dataset; logistic regression; random forest; ROC curve

AS-359 : Beyond Imitation: Selecting Synthetic Data with Purpose and Precision
Sundaresh Sankaran, SAS Institute
Pritesh Desai, sas
Sherrine Eid, SAS Institute

Synthetic data has applications for various areas of life sciences such as clinical trials, data sharing, real-world data, and data imbalances. While synthesization approaches have been variously studied as machine learning, simulation or anonymization problems, they tend to rest on assumptions involving original data distribution, which may not be sufficient or completely known. Through an illustrative example involving SAS Data Maker and SAS/OR (Operations Research) on SAS Viya, we demonstrate a method which views synthetic data as a “selection” problem. In this method, several synthetic observations serve as contenders for selection in a desired dataset and are “selected” based on specified target objectives and rules defining constraints. This method involves augmenting conventional machine learning-based generation algorithms with optimization techniques that solve user-specified objective functions under constraints. We find two benefits from this approach – one, it guards against stochastically generated synthetic data breaking logical rules governing real data, and two, it helps tackle “unknown knowns”, where only aggregations and summaries are known but not individual observations. This is especially the case when addressing rare and orphan diseases and when data access is restricted due to privacy policies and regulations. Awareness of this method gives stakeholders access to an additional technique for synthetic data generation in scenarios involving insufficient relevant data volumes for an application.

AS-365 : A Faster Algorithm for the Finkelstein-Schoenfeld Test and Win Ratio in Hierarchical Composite Endpoint Analysis
James Austrow, Cleveland Clinic

Composite outcomes based on the Finkelstein-Schoenfeld test have emerged as a popular efficacy endpoint in clinical trials, especially the Win Ratio. Unfortunately, these methods are computationally hamstrung by the need to perform a full pairwise comparison against the entire study population. We present a novel algorithm that achieves significant asymptotic speedup. The method addresses the computational challenge of pairwise comparisons in large datasets, where the naive approach requires O(n²) time complexity. The efficiency gain is particularly impactful during the repeated Monte Carlo simulations typically required for estimating sample size with these endpoints. An implementation of the algorithm is provided in Python along with empirical results that demonstrate real time savings of over 10x against existing R and C++ solutions.

AS-413 : Construction and Evaluation of External Controls Using Propensity Score Methods
Bala Niharika Pillalamarri, LLX Solutions, LLC
Yayu Li, LLX Solutions
Hongbing Jin, LLX Solutions, LLC

In randomized studies, subjects are assigned to treated or control groups, ensuring balanced covariates and unbiased estimation of treatment effects. However, in rare diseases and early oncology development, randomized controlled trials are often infeasible due to ethical and logistical constraints. As a result, external control (EC) arms constructed from historical registry, or prior trial data are increasingly used. An EC arm requires alignment of baseline clinical characteristics with treated group to reduce bias, commonly through propensity score (PS) methods. This paper describes the construction of an EC arm and evaluates multiple PS-based adjustment strategies, including trimming, matching, Average Treatment Effect (ATE), and Average Treatment Effect on the Treated (ATT). Propensity scores were estimated using PROC PSMATCH based on key baseline covariates, with covariate balance assessed using graphical diagnostics. These graphical diagnostics provide an intuitive assessment of the adequacy of covariate balance and the degree of common support between treatment groups. They enable early detection of residual imbalance, extreme weights, or poor overlap that could compromise the validity and stability of treatment effect estimates. Visual inspection complements numerical balance metrics by highlighting patterns that may not be evident from summary statistics alone. In regulatory and exploratory settings, these plots support transparent justification of modeling choices and sensitivity analyses. Collectively, they enhance confidence in the robustness and interpretability of EC-based causal inferences.

AS-422 : Linear versus Log- Log Confidence Intervals for Kaplan- Meier Survival Estimates: Statistical Rationale and Practical Implementation using SAS 9.4
Prayag Shah, Revolution Medicines

Kaplan- Meier (KM) methods are routinely applied to time-to-event endpoints in oncology clinical trials, including Duration of Response (DoR), Progression Free Survival (PFS), etc. While the Kaplan- Meier estimator itself is invariant to the choice of confidence interval (CI) method, the construction of pointwise confidence limits has important implications for statistical validity and interpretability. Linear confidence intervals, derived directly from Greenwood’s standard error, may violate probability boundaries by extending below 0 or above 1, particularly when survival probabilities are near the extremes. Log- log confidence intervals apply a variance-stabilizing transformation to the survival function, ensuring valid probability limits and improved coverage. This paper compares linear and log- log confidence intervals using SAS® PROC LIFETEST, demonstrates their behavior through manual calculation and empirical examples, explains SAS OUTSURV behavior, and discusses implications for DoR/PFS analyses using ADTTE dataset.

Career Development, Leadership & Soft Skills

LS-138 : Perspectives on Leading Effectively in Platform Trials: Leadership and Technical Approaches
Zhen (Laura) Li, AstraZeneca

Platform trials- “run under a master protocol with sub study-specific protocols- “accelerate innovation by adaptively evaluating multiple therapies within a unified, evolving framework, presenting distinctive technical and leadership challenges. This presentation draws on my experience as a product lead programmer for an in house platform trial and shares how I addressed these challenges while leading a study programming team. On the technical side, establishing and maintaining programs and specifications that balance consistency with adaptability is foundational to high quality deliverables. Practical examples will show how these approaches flex to diverse analysis requirements and evolving protocols. On the leadership side, effective study management is grounded in comprehensive planning, thoughtful resource allocation, and proactive, solution focused communication, and supported by fit for purpose tools that streamline workflow and collaboration. Cultivating a collaborative, adaptive, and growth oriented culture, alongside structured support for study programmers at varying experience levels, helps the study programming team navigate steep learning curves, resolve blockers, manage pressure, and sustain continuous learning. Real world examples will illustrate these approaches in action. The presentation shares practical experiences and approaches for leading a study programming team toward resilience, productivity, and continuous improvement in adaptive platform trials.

LS-149 : Authentic Leadership for SAS Programming Leaders
Lisa Mendez, Army MEDCOM

Leading a team of SAS programmers isn’t just about knowing your PROC SQL from your DATA steps- “it’s about showing up as a real human. In this paper, I’ll share my take on authentic leadership in technical environments, where precision, autonomy, and complexity often dominate the landscape. Authentic leadership, to me, means leading as yourself, communicating transparently, and listening like it actually matters. We’ll explore how these three attributes- “being yourself, transparent communication, and active listening- “can transform the way we lead analytical minds. I’ll share stories from the trenches, including how showing vulnerability helped me navigate a tough interpersonal moment, and how listening (without jumping in to “fix”) sparked a fresh idea from a quiet team member. We’ll also talk about the challenges of leading introverted, detail-oriented personnel who thrive on clarity but don’t always speak up. I’ll offer practical strategies for mentoring junior programmers, building psychological safety, and encouraging authenticity without oversharing. This paper is about about real habits that help leaders foster trust, creativity, and collaboration in data-driven teams. You’ll leave with reflection questions to challenge your current leadership style and maybe even a nudge to ditch the “manager mask” and lead with more heart. Because at the end of the day, SAS programmers don’t just need a boss who knows how to code, they need a leader who gets them.

LS-173 : From Programmer to Influencer: Strategic Leadership for Statistical Programmers in Clinical Development
Xiaohan Zou, BMS
Wei Shao, Bristol Myers Squibb
Yi Yan, Bristol Myers Squibb

The role of statistical programmers is evolving- “from technical executors to strategic influencers shaping clinical development. Building on strategies for bridging functional gaps discussed at PharmaSUG 2025- “where strengthening collaboration between programming, biostatistics, and regulatory teams proved critical for delivering high-quality, compliant outputs- “this paper takes the next step. We explore how programmers can leverage these collaborative foundations to transition from technical contributors to influential leaders, shaping processes, driving innovation, and guiding organizational change in an increasingly complex regulatory environment. We showcase a recent initiative led by statistical programmers to design an end-to-end FDA Bioresearch Monitoring (BIMO) submission process. Acting as strategic leaders, programmers influenced cross-functional alignment with biostatistics, regulatory, and clinical operations, driving proactive adherence to the FDA BIMO Technical Conformance Guide (TCG). By championing automation through metadata-driven SAS macros and Python integration, they transformed fragmented workflows into a streamlined, compliant process- “eliminating manual effort and accelerating timelines. A practical framework- “Initiation, Engagement, Influence, Leadership- “will guide attendees toward leadership development and process harmonization, illustrated through this real-world case. The session concludes with a look at future-ready competencies, including AI-enabled workflows, open-source integration, and strategies for driving innovation in clinical programming.

LS-182 : Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
Kirk Lafler, sasNerd

Today’s economy requires members of the workforce to develop two essential categories of skills: hard skills and soft skills. Hard skills refer to job-specific knowledge and technical abilities that enable individuals to perform specific responsibilities effectively. Examples include SAS, R, Python, and other technical, programming, data analysis, project management, and market research techniques. Soft skills, on the other hand, are less tangible and often difficult to measure. They encompass the personal qualities, attributes, and interpersonal traits that shape how individuals interact and collaborate with others in the workplace. Soft skills are typically developed through life experiences and workplace interactions. The encouraging news is that soft skills can be learned and mastering them provides a significant competitive advantage in today’s fast-paced and rapidly evolving job market.

LS-264 : Drowning in Trackers? Let R Be Your Lifeboat!
Christine Reiff, Ephicacy Consulting Group, Inc.

R is a powerful, open-source programming language that is becoming more prevalent in Clinical Programming. Using packages that have been developed and posted to CRAN, programmers are able to use R to generate everything required for a submission, from SDTM to ADaM to TLFs. But what about the efforts surrounding the submission programming? If you are anything like me, you have trackers. Trackers for deliverables, trackers of staff allocation, trackers of team members, trackers of documents, trackers for training- Excel is certainly getting a workout! But combining spreadsheets in Excel is not easy; it requires VBA code or manual manipulation to get data aligned and matched properly which takes time and effort. But we can do better – we can use R in conjunction with our trackers to bring the data together and allow us to collate and analyze the data in one place.

LS-284 : Early-Careers Essentials: Practical Checklists for Manual TLF Review
Ingrid Shu, Merck
Xinhui Zhang, Merck & Co. Inc

In clinical reporting, Tables, Listings, and Figures (TLFs) must be accurate in wording, counts, and formatting in order to support meaningful, high-quality deliverables. New programmers may be tempted to rely on error-free logs and the mere existence of output as evidence of readiness, only to discover that superficial checks miss critical defects, leading to rework and extra review for stakeholders. This paper presents a practical checklist for finalizing TLFs, helping elevate outputs from “ran successfully” to “ready for delivery.” Additionally, these checklists help new programmers develop intuition for the critical items to inspect and correct in TLFs. The checklists for TLF review are built around three critical elements: (1) labeling consistency (2) numeric coherence, and (3) aesthetic factors. This paper first presents a master checklist that applies to all TLFs, regardless of therapeutic area, then provides oncology-specific TLF checks within five categories: Baseline & Exposure, Safety, Efficacy, Public Disclosure, and Appendices/Listings. By adopting these checklists into everyday workflows, teams can reduce rework, improve reviewer trust, and expedite the path from first draft to delivery-ready TLFs. It can be especially helpful for guiding new programmers as they develop sharper review skills.

LS-319 : Women Leadership in AI-Driven Clinical Programming: Navigating Intersectionality and Innovation
Lida Gharibvand, Loma Linda University

The increasing use of artificial intelligence (AI) and machine learning (ML) in clinical programming is reshaping leadership within the pharmaceutical industry. Women, who remain underrepresented in technical leadership roles, face both opportunities and challenges in this evolving landscape. This qualitative study explores how intersecting identities influence women’s leadership in AI/ML project development. Semi-structured interviews were conducted with eight female leaders from CROs and consulting firms across North America, Europe, and Asia, representing managerial, director, and executive roles. Using thematic analysis through an intersectional lens, four key themes emerged: challenges to technical credibility, the impact of intersecting identities on leadership, the role of inclusive leadership in successful AI/ML implementation, and the importance of sponsor- mentor relationships. The findings offer practical recommendations for building credibility, fostering innovation, and advancing women’s leadership in AI/ML-driven clinical programming.

LS-347 : Closing the Expectation Gap: A Leadership Framework for Clinical Programming Success
Yuka Tanaka-Chambers, Phastar

Successful clinical trial delivery depends on effective collaboration across cross-functional teams, whether operating within a sponsor company or across multiple vendors. However, many projects fail not because of technical limitations, but because of fundamental misalignment in expectations between programmers, non-programmers, sponsors, and CROs. Non-programmers often view programming as an assembly-line process: provide the inputs and a perfect output should appear on a predictable schedule. In reality, clinical programming is complex knowledge work, filled with evolving requirements, hidden dependencies, and technical uncertainty. At the same time, sponsors who are accustomed to internal environments where data and outputs are easily accessible frequently expect rapid, informal access to results. This clashes directly with the CRO delivery model, where data access is tightly controlled and formal deliverables cannot be produced safely until key documents such as the protocol and SAP are finalized and the raw data are stable. These mismatched assumptions create constant tension, unstable timelines, growing quality risk, and unsustainable pressure on programming teams. This presentation introduces a practical leadership framework to understand and close the expectation gap. Drawing from the perspective of a programmer who has worked in both sponsor and CRO environments, the session demonstrates how leaders can translate technical uncertainty into business risk, reset stakeholder expectations, and build delivery systems that protect both quality and people while improving project outcomes.

LS-393 : AI-First Leadership in Biometrics: Redefining Strategy, Teams and Execution in the Agentic AI Era
Kevin Lee, Clinvia

As AI evolves from passive tools to autonomous, goal-driven systems, Biometrics faces a fundamental leadership inflection point. Agentic AI, AI Agents capable of planning, reasoning, executing tasks, and collaborating with humans, require Biometrics leaders to move beyond AI adoption toward AI-first leadership, where AI actively participates in how work is designed and delivered. The presentation reframes AI in biometrics from single-use Gen AI applications to Agentic workflows that orchestrate complex tasks. Through practical examples, including statistical programming, exploratory data analysis, clinical documentation development, quality control, and cross-functional coordination, the presentation demonstrates how Agentic AI can function as a digital team member that initiates tasks, manages dependencies, and escalates decisions while preserving human scientific oversight. The presentation introduces an AI-first leadership framework structured around system, process, and people. From a systems perspective, leaders must design validated, compliant computing environments that safely host agentic capabilities alongside existing statistical platforms. At the process level, traditional linear workflows are reimagined as agent-driven pipelines, where humans focus on review, judgment, and regulatory accountability. From a people perspective, AI-first leadership emphasizes role evolution, capability development, and trust preparing Biometrics teams to supervise, collaborate with, and govern AI agents rather than compete with them. The presentation concludes with actionable guidance for Biometrics leaders seeking to operationalize Agentic AI while maintaining data integrity, auditability, and regulatory compliance. Attendees will leave with a leadership-oriented roadmap for building scalable, resilient Biometrics organizations where humans and AI agents work together to accelerate clinical development in the AI-first era.

LS-407 : When Code Isn’t Enough: Communication and Leadership Skills for Statisticians and Programmers
John LaBore, SAS Institute
Josh Horstman, PharmaStat LLC
Robert Goodloe, Indiana University

Statisticians and statistical programmers working in SAS® and R and other software are entering a period of rapid change, driven by automation and Artificial Intelligence (AI)-assisted development. While these tools can generate code and accelerate analyses, they cannot replace the human skills that turn results into decisions and insights into action. As technical barriers continue to fall, communication and leadership skills are becoming the primary differentiators for career growth. Whether you aspire to be a trusted technical expert, lead analytics teams, or expand your influence as a consultant, your ability to clearly explain results, defend analytical choices, and guide others now matters more than ever. In an AI-driven world, those who can communicate with confidence and lead with clarity will stand out. Toastmasters International offers a practical, supportive way for technical professionals to deliberately build these skills. Drawing on the authors’ combined 40+ years of Toastmasters experience, this paper challenges statisticians and statistical programmers to invest in themselves and take ownership of the skills that will define their impact- “and their careers- “when code alone is no longer enough.

LS-408 : Navigating Career Transitions: From Programmer to Executive Leader
Steven Tan, Wu Consulting Group

Career progression in biometrics often follows an unspoken assumption: technical excellence naturally leads to leadership. In practice, the transition from programmer or scientist to executive leader requires an entirely new skill set- “one rarely taught in formal training. This presentation examines that transition through the real-world experience of launching and leading a biometrics FSP organization in the life sciences industry. It traces the evolution from hands-on technical contributor to organizational leader, highlighting the critical inflection points where technical instincts must give way to leadership judgment, communication, and strategic thinking. Key topics include redefining success beyond individual output, building trust while delegating technical work, mentoring high-potential staff, and communicating complex analytical concepts to executives, clients, and cross-functional partners. Attendees will gain practical insights into how leadership presence, emotional intelligence, and clarity of expectations directly affect quality, retention, and delivery in biometrics teams. Examples may reference commonly used tools such as SAS®, R, and dashboarding platforms (e.g., R Shiny), but the focus is on people and processes rather than software. There are no operating system or software version dependencies. This session is designed for programmers, statisticians, and scientists at any career stage who are considering leadership roles, as well as current managers looking to better support technical professionals transitioning into people leadership.

LS-410 : Delegation as a Quality Strategy: Building Accountable Biometrics Teams
Steven Tan, Wu Consulting Group

High-performing biometrics teams are built not only on technical excellence, but on trust, clarity, and shared accountability. Leaders who struggle to delegate often find themselves caught between protecting quality and empowering their teams. This presentation examines how delegation, when done well, strengthens scientific quality rather than weakens it. Using real-world leadership experiences from building biometrics teams, the session explores how expectation-setting, structured feedback, and coaching create a culture where quality is owned at every level. Attendees will learn how to assess readiness for delegation, tailor oversight to individual experience levels, and use review processes as learning tools rather than compliance checkpoints. The discussion highlights how transparent communication and consistent feedback reduce errors, improve morale, and support long-term retention. Examples may reference biometrics tools such as SAS® and R for context only. No specific software knowledge is required, and there are no operating system or version dependencies. This session is designed for current and aspiring leaders in biometrics, clinical programming, and analytics who want to improve team performance while maintaining scientific and regulatory standards.

LS-411 : Leading with Presence from Afar: Building Trust and Engagement in Distributed Biometrics Teams
Steven Tan, Wu Consulting Group

Distributed and remote work offers unprecedented flexibility for biometrics teams, yet it also challenges traditional leadership models that rely on visibility and proximity. Building trust, engagement, and accountability becomes more complex when teams rarely- “or never- “meet in person. This presentation focuses on the human side of leading remote and global biometrics teams. Using real-world leadership experiences, the session explores how communication, empathy, and clarity replace physical presence as the primary tools of effective leadership. Attendees will learn strategies for creating psychological safety, fostering inclusion across cultures, and supporting career development in virtual environments. The discussion emphasizes how consistent feedback, intentional check-ins, and shared norms help distributed teams maintain high performance and morale over time. References to biometrics tools such as SAS® and R are included for context only; no software instruction is provided. There are no operating system or software version dependencies. This session is designed for current and aspiring leaders in biometrics, clinical programming, and analytics seeking to strengthen team culture in remote settings.

LS-424 : From Statistical Programmer to Analytical Partner: Navigating the Future of Biostatistics
Shivani Gupta, Clymb Clinical

The field of biostatistics and statistical programming is undergoing rapid transformation driven by advances in artificial intelligence (AI), increased automation, and the adoption of modern analytical tools. These changes are reshaping traditional programming roles and raising concerns among statistical programmers regarding long-term career sustainability, potential job displacement, and evolving skill requirements over the next five to ten years. This paper explores the future role of the statistical programmer within pharmaceutical and clinical research settings, with a particular focus on those working in regulated environments. It reviews current and emerging trends in AI-enabled analytics, automation of routine programming tasks, and the growing integration of complementary tools such as R and Python into established SAS/R-based workflows. The paper distinguishes between tasks that are becoming increasingly automated such as standard code generation and repetitive analyses, and those that continue to require human expertise, including interpretation of regulatory requirements, study-specific problem solving, validation, and quality oversight. In addition, this paper identifies key competencies that statistical programmers should develop to remain relevant and effective. These include advanced SAS/R programming skills, cross-language proficiency, understanding of clinical data standards, familiarity with AI-assisted development tools, and the ability to adapt to evolving regulatory expectations. By focusing on practical guidance and real-world implications, this paper aims to help SAS and R programmers proactively navigate industry changes and position themselves as critical contributors to the future of biostatistics and clinical research.

LS-426 : Strategic Change Management Framework: AI-Driven SOP Integration and Alignment Post-M&A
Anbu Damodaran, Alexion, AstraZeneca Rare Disease

Mergers and acquisitions (M&A) in the pharmaceutical and life sciences industry often trigger extensive change management challenges, particularly in harmonizing Standard Operating Procedures (SOPs) across organizations. Misaligned processes can lead to compliance risks, operational inefficiencies, and cultural friction. This paper presents a practical blueprint for streamlining SOP integration and alignment post-M&A through AI-driven strategies. We explore how advanced language models, document clustering, and robotic process automation can accelerate SOP comparison, gap analysis, and harmonization while maintaining regulatory compliance. Real-world use cases and AI prompt examples demonstrate how organizations can reduce manual effort, improve accuracy, and foster agility during change management. By leveraging AI as a strategic enabler, companies can transform SOP integration from a labor-intensive process into a structured, efficient, and less disruptive transition, ensuring both compliance and business continuity.

LS-441 : Strategic AI Coaching for Life Sciences: A Framework for Industry Leaders and Managers
Priscilla Gathoni, Wakanyi Enterprises Inc.

As we stand on the cusp of the Fourth Industrial Revolution, leaders are facing a “virtual thunderstorm of change” propelled by the rapid acceleration of Artificial Intelligence (AI). To remain competitive in this high-stakes landscape, the life sciences industry must move beyond traditional management toward a holistic hybrid coaching model. This paper introduces the Farhan-5As framework- “a systematic roadmap of Analysis, Architecture, Application, Ascertainment, and Adjustment- “designed to harmonize AI-driven efficiency with a necessary company-wide shift in mindset. While technology often triggers a sense of things “falling apart,” this framework proves that strategic AI integration actually brings organizational structures together for a greater good. We explore the transformative power of CoachBots and generative AI assistants, which empower leaders to shift their focus from routine technical tasks to high-value social and personal competencies. By positioning leaders as the primary “designers” of human-AI interaction, organizations can ensure that digital transformation remains human-centered, ethical, and transparent. Join me to discover how to turn technological overload into a sustainable competitive advantage, ensuring your leadership remains resilient in the age of the Singularity.

LS-442 : Effective Collaboration Between Business, Technical and Quality Units in Building GCP Systems
Ravi Tejeshwar Reddy Gaddameedi, Merck Inc

In pharmaceutical product development, effective stewardship of regulated analytics environments (e.g., SAS/R tools, clinical data platforms, metadata repositories, and submission pipelines) benefits from a cross-functional governance model that unites Technical (programming/engineering, biostatistics, data management, etc.), Business (clinical operations, portfolio, finance), and Quality (GCP, CSV, QA) leaders under a single accountable framework. Rather than sequential handoffs, these groups co-own lifecycle controls from architecture and supplier qualification through requirements, risk assessment, design, change control, computerized system validation, access/security, and continued process verification so that technical modernization (cloud, automation, reusable code, standard data models) is matched with business outcomes (cycle-time reduction, reuse, scalability, inspection readiness) and embedded compliance (ALCOA+ data integrity, GxP traceability, audit trails, validation evidence). Joint steering and clear RACI across design authorities, change advisory boards, and validation boards enable consistent standards, faster decisions, and proportional risk-based controls, while harmonized KPIs (e.g., release lead time, deviation/defect rates, inspection observations, submission timeliness) align incentives. This integrated leadership replaces siloed workflows with a unified quality system, ensuring data products and analytics are robust, reproducible, and inspection-ready, support reliable clinical decision-making and submissions (eCTD, SDTM/ADaM, TLFs), and advance organizational strategy without compromising GCP compliance or patient safety. This paper explores how aligning leadership among these groups enables organizations to balance technical advancement, business value, and Good Clinical Practice (GCP) regulatory compliance.

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

DS-111 : Ready for Next Level SDTMs and ADaMs Compliance with End-to-End Processing?
Sunil Gupta, Gupta Programming
Tomás Sabat Stofsel, Verisian

From specifications to results, smarter tools enable better SDTM and ADaM compliance for higher quality submissions. This is achieved by anticipating common issues and checklists during the SDTM and ADaM mapping process. While many organizations have SOPs for QC, smarter organizations incorporate continuous monitoring of program development. First, SDTM and ADaM specifications and SAPs are imported for metadata processing and compliance checking. This metadata is essential for cross-referencing the logic of all derived SDTM and ADaM variables. While SDTM and ADaM structures and rules must pass the litmus test, the single story message from raw data must be transparent and consistent with SDTMs and ADaMs. Finally, SDTM and ADaM required variables must be created and all raw data must be mapped to SDTMs. With this end-to-end system, onboarding new members to the programming team becomes easier because the documentation and all related programs can be quickly explained.

DS-131 : CTCAE v6.0: The Good, the Bad, and the Ugly
Elizabeth Dennis, EMB Statistical Solutions, LLC
Grace Fawcett, Syneos Health

The National Cancer Institute’s Common Terminology Criteria for Adverse Events (CTCAE) is an important tool for reporting the severity of adverse events as grades. In many clinical trials, these grades are applied to lab results, and often are programmatically determined in the production of the ADaM lab results dataset. In 2025, version 6.0 of CTCAE was released. As compared to version 5.0, it contains additions, deletions, and revisions. The release included an Excel version, with a tab that shows a comparison to version 5.0, and the type of change. These criteria have evolved over time, with many of the criteria becoming clearer. Unfortunately, some ambiguities remain, which leave questions on how the grades should be programmatically determined. This paper will walk through some of the grades that have been revised and now have clear guidelines, and others where the interpretation is murky. An overview of the ADaM bi-directional toxicity variables, the possible limitation in common lab shift tables will be discussed.

DS-134 : Human Beings Still Needed: Manual ADaM Checks that AI Can’t Do
Sandra Minjoe, ICON PLC
Mario Widel, Independent

ADaM Rules v5.0 contains almost 2000 spreadsheet rows of automatable checks. Running that many checks might seem like it would be sufficient to ensure that data is ADaM-compliant. However, the ADaM Rules Guide states that “conformance rules are machine-readable (i.e., programmable within computer software) and capable of being implemented by ADaM users”, and we are told that manual review should also be done. But what constitutes a manual review? This paper and presentation provide many checks that are not fully automatable (they need a human component), though many of them are partially automatable. They are designed to find potential ADaM conformance issues that the fully automatable checks cannot. Adding these types of checks to your review will aid you in determining whether your ADaM data is compliant with both ADaM rules and fundamental principles.

DS-144 : Picking Up the Pieces: Implementation of (the forgotten) ADaM Naming Fragments
Richann Watson, DataRich Consulting
Karl Miller, IQVIA

In working with ADaM standards, trying to build data sets and their variables can be a repetitive process that can become tedious as creating meaningful variable names within the 8-character limit. As opinions vary, even our own ideas can depend on the day-to-day, so it’s good to be reminded that ADaM fragments exists and are required to help in the consistency of naming conventions and helping with regulatory review efficiency. Earlier versions or “client specifications” veer off path over time, so it is good from time to time to revisit the more common fragments as well as look at the less common fragments.

DS-154 : Navigating the Statistical Programming Strategies for Cytokine Release Syndrome (CRS) and ICANS in Oncology Clinical Trials.
Murali Kanakenahalli, Kite Pharma
Vamsi Kandimalla, Kite Pharma, A Gilead Company

Cytokine Release Syndrome (CRS) and Immune Effector Cell-Associated Neurotoxicity Syndrome (ICANS) are two of the most critical and often dose-limiting toxicities associated with novel immunotherapies, particularly CAR T-cell therapy within oncology clinical trials. Accurate and standardized reporting of these adverse events (AEs) is paramount for assessing the risk-benefit profile of these revolutionary treatments and ensuring patient safety. This paper provides a comprehensive review, from the statistical programming perspective, of the evolution of data collection, standardization, and reporting of CRS and ICANS. We first establish the clinical context and critical importance of these syndromes. We then delve into the challenging pre-MedDRA era, illustrating how programmers employed complex, symptom-based logic to link Adverse Event (AE) data with separate, customized syndrome-tracking Case Report Forms (CRFs) to generate meaningful safety metrics. The paper then details the pivotal shift following the inclusion of explicit CRS and ICANS terms in the Medical Dictionary for Regulatory Activities (MedDRA), which necessitated major redesign of CRFs, moving from separate CRFs to codified severity scales. Crucially, we outline robust programming strategies for mapping these complex data structures into regulatory submission standards, specifically detailing the best practices for Study Data Tabulation Model (SDTM) mapping, Analysis Data Model (ADaM) derivations (including time-to-onset and maximum severity calculations), and the construction of standardized Tables, Listings, and Figures (TFLs) required for regulatory submission packages. The insights shared are designed to equip statistical programmers with the necessary framework to handle the complexities of ensuring data integrity, traceability, and effective communication of these high-stakes safety endpoints.

DS-168 : Automating ADaM Dataset Generation with Dynamic Variable Length Adjustment and Cross-Domain Consistency Checks
Wenhao Dong, Edwards Lifesciences

Creating ADaM datasets often requires manually defining variable lengths, updating specifications, and verifying consistency with predecessor datasets and SDTM metadata. These steps are time-consuming, repetitive, and prone to human error. To address these challenges, we developed a reusable SAS macro that automates several key tasks in ADaM dataset generation. Specifically, the macro: Dynamically adjusts variable lengths based on the actual data; Automatically updates the corresponding length information in ADaM specifications; Performs cross-checks to ensure variable label and length consistency with SDTM; and Validates variable types and formats to prevent data inconsistencies. When applied in real-world studies, this macro has demonstrated notable gains in efficiency and accuracy, reducing manual QC burden while improving alignment between ADaM datasets and their specifications. The approach promotes standardization and provides a scalable framework that can be readily applied across studies and therapeutic areas. This work highlights how process automation and integrated validation can enhance the quality and reproducibility of statistical programming deliverables in clinical data analysis.

DS-174 : ISO 8601 and SAS®- “and R! A Practical Approach
Derek Morgan, Bristol Myers Squibb
Marckenley Mercie, Bristol Myers Squibb

The ISO 8601 standard for dates and times has long been adopted for clinical data by regulatory agencies around the world. While there are many homemade solutions for working in this standard, SAS has many built-in solutions, from formats and informats to the IS8601_CONVERT routine, which painlessly handles durations and intervals. As R gains popularity, several packages are available to handle this mandated standard. This paper will detail both SAS and R capabilities around this standard.

DS-196 : Friction to Flow: LLM-Based Automation of Clinical Data Workflows
Pankaj Attri, SAS
Matt Becker, SAS

Clinical trial data submissions for FDA review are often hampered by complex, manual processes. We will be demonstrating an LLM-powered solution that automates the clinical data pipeline and, importantly, establishes end-to-end traceability. The solution automates the conversion of raw source data into standardized SDTM, generates analysis-ready ADaM datasets, and creates final TLFs for submission, along with the associated code (used to create datasets and analytical outputs). The other important deliverable of the solution is the lineage graph view. By interpreting the Statistical Analysis Plan, the system attempts to create a direct, visual link between SAP requirements and the datasets that fulfill the requirements. Each TLF is linked to its source ADaM datasets, which in turn trace back through SDTM to the original raw data. This automates traceability to the study design, providing verifiable proof that protocol and SAP requirements have been met, ensuring compliance and confidence for regulatory submissions.

DS-217 : Minimally Invasive Analysis Results: The “ards” Package
David Bosak, r-sassy.org

The CDISC Analysis Results Data Standard (ARDS) is a machine-readable way to store the results of an analysis. The ARDS initiative is an attempt to separate the analysis from the display of the analysis. The separation of analysis and display brings many benefits, and those benefits are well understood. The problem comes in trying to implement it. Many people assume the implementation of ARDS requires a complete overhaul of the way TFLs are produced. This paper will question that that assumption. Does an ARDS implementation really require throwing away everything we know about creating TFLs? This paper will discuss alternative designs of an ARDS implementation, and explore whether it is possible to produce ARDS datasets in a minimally invasive way – without throwing away everything. Specifically, the paper will introduce the “ards” R package, which was developed for just such a minimally invasive ARDS implementation.

DS-265 : SDTMIG v4.0: Are You Ready For It?
Kristin Kelly, Pinnacle 21 by Certara

The imminent release of the CDISC SDTMIG v4.0 / SDTM v3.0 represents a pivotal moment for SDTM standards. As a major version update, it introduces significant changes that sponsors should begin preparing for now. Key updates include support for Multiple Subject Instances (MSI), the transition from SUPPQUAL to Non-Standard Variables (NSVs), and metadata restructuring aligned with SDTM Model v3.0. The – BLFL variable has been removed and Sections 1- 4, which provide guidance across SDTM domains, have been reorganized for improved clarity and usability. Variables are now organized into Variable Groups, enhancing structure and interpretation. New domains such as Event Adjudication (EA) and Gastrointestinal Findings (GI) further expand the ability to handle more types of data. Also noteworthy are updates to the Protocol Deviations (DV) domain and specimen-based Findings domains. This paper will explore what’s new in SDTMIG v4.0 and provide practical guidance to help sponsors navigate this era of transition.

DS-280 : From Manual to Automated”: An SAS® and R-Based Toolkit for Scalable SDTM Generation
Hardik Sheth, Catalyst Clinical Research
Eldho Alias, Catalyst Clinical Research

Clinical trials require the submission of SDTM datasets; however, manual programming of SDTM domains is time-consuming and resource intensive. Fully automated solutions often lack flexibility and efficiency when customization is needed to accommodate varying standards or non-standardized data. This paper presents the development of SAS- and R-based toolkits that automate SDTM generation to improve efficiency and reduce manual effort. The toolkits are built on standardized case report forms (CRFs) aligned with CDASH standards, while retaining flexibility to accommodate study-specific requirements. Standard mapping processes are automated, with options for programmers to perform pre- and post-processing as needed. Implementation of the toolkit reduced SDTM development time by over 30%, while decreasing manual programming effort by over 40% of the traditional total hours spent across multiple studies. The automated approach also improved consistency and data quality, resulting in fewer validation findings and enhanced data visual dashboard compatibilities. The toolkits support a broad range of RAVE data across multiple sponsors, rather than relying on a single sponsor-specific standard. This paper describes the toolkit development methodology and presents measured efficiency gains observed during real-world study implementations.

DS-286 : Approaches Integrating SAS LSAF and Pinnacle 21 Enterprise for SDTM/ADaM Dataset Validation
James Zhao, Eikon Therapeutics, Inc.
Joshua Lin, Eikon Therapeutics

Early and continuous validation of clinical trial datasets is essential for ensuring regulatory compliance, improving data quality, and reducing late-stage rework. While industry practice has traditionally focused on validating transport (XPT) files immediately prior to submission, this approach delays the identification of structural and standards-related issues. This paper describes enterprise-grade approaches for integrating SAS Life Science Analytics Framework (LSAF) with Pinnacle 21 Enterprise to enable automated validation of SAS XPT datasets throughout SDTM and ADaM development. Four most commonly-used architectural patterns for connecting LSAF and Pinnacle 21 are discussed, including: Pattern A: a straightforward manual file handoff in which a SAS programmer exports datasets from LSAF to a shared location, runs Pinnacle 21 manually (either on the desktop or server), and then uploads the resulting PDF/Excel reports back into LSAF; Pattern B: a batch/command-line execution triggered from LSAF using System API Key (common and practical); Pattern C: a pipeline integration with CI/CD concepts (quality gate); and Pattern D: a service-oriented integration (Enterprise P21 + metadata/workflow systems). Governance considerations and audit-readiness practices are also discussed, with emphasis on scalable and inspection-ready implementations.

DS-341 : DOSEON: Fuzzy Matching DOSE Date Intervals ON Analysis Dates Across SAS (PROC SQL, SAS Macro, DATA Step, and PROC FCMP)
Inka Leprince, InkaStat Solutions, LLC
Troy Hughes, Data Llama Analytics

In clinical trial analysis data sets, the treatment variable DOSEON, the treatment dose at record start, was introduced in ADaM OCCDS v1.1 for use in non-ADSL ADaM data sets. DOSEON is typically derived from exposure data by determining if and when an analysis (start) date falls within a treatment dose level start-end interval. Although the definition appears simple, performing the merge of exposure data with another analysis data set is complex because the CDISC data structures do not provide a natural primary key for joining these data sets. Instead, the derivation of DOSEON requires matching a single date to a date range. This paper explores procedural and functional approaches to solving this fuzzy matching lookup operation. Procedural approaches require one or more DATA steps and/or SAS procedures, as demonstrated with PROC SQL joins, a SAS macro, and a hash object lookup within a DATA step. In contrast, functional approaches leverage PROC FCMP to compile a user-defined function that is called in a single SAS statement- “one line of code- “to achieve equivalent, efficient, reusable functionality. Finally, these functional design solutions represent the first-ever white paper published that declare a hash iterator object as a parameter (or pass a hash iterator object as an argument), and the second-ever white paper published that declare a hash object as a parameter (or pass a hash object as an argument).

DS-350 : Analysis Concepts role within the CDISC 360i vision
Alyssa Wittle, Atorus Research
Brian Harris, AstraZeneca

Since January 2025, a CDISC working group has been working on defining a model for Analysis Concepts to address a gap in enabling the end-to-end linking, automation, and interoperability envisioned by the CDISC 360i initiative. Just as biomedical concepts (BC) serve as an essential semantic framework that connects and aligns various foundational standards in clinical research, analysis concepts can provide a similar semantic framework that not only extends into analysis and reporting components but also ensures a clear link to the protocol (as represented by the Unified Study Definition Model (USDM)) as well as the existing BC library. Analysis Concepts aim to: 1) Provide a standardized, configurable framework for expressing how clinical questions are translated into analytical outputs, 2) Enhance the statistical consideration component of the USDM & support creation of a digital, machine-readable Statistical Analysis Plan, and 3) Inform both data collection needs and programming logic for analysis. Importantly, this work is being developed in alignment with other CDISC standards including USDM, CDASH, SDTM, ADaM, and ARS. Analysis Concepts focus on the clinical intent and analytical approach, which is then realized in these different standards. This presentation will share the following: 1) The overall scope and vision of CDISC 360i, including The key gap that Analysis Concepts is seeking to fill & The linkages to existing CDISC standards, 2) Early model and draft controlled terminology developed by the working group, 3) Breast Cancer use case, and 4) Future directions for standardizing analysis planning and enabling traceable, automated pipeline.

DS-354 : SUPP to NSV: Transforming Data Representation for Improved Reviewer Utility
Soumya Rajesh, IQVIA

Ever since the advent of SDTM standards, non-standard data has been mapped using the vertical structure outlined in Supplemental Qualifier (SUPPQUAL or SUPP–) datasets. This has caused issues sometimes when the data needs to be merged back with parent domains and used for analysis further down the data reporting process. That’s going to change majorly with the release of the new version of SDTM v3.0 and SDTMIG v4.0. This paper discusses the transformation from vertical to horizontal structure for mapping non-standard datasets to the newer, more efficient Non-Standard Variables (NS–) datasets in SDTM, which provides the ability for each NSV to have its own variable level metadata – not just VLM (Value Level Metadata). We can now have numeric non-standard variables in addition to character, and an easier merge criterion of non-standard data with parent domains. This leads to more efficient management of NSV (potentially leveraging them for inclusion into the NSV registry and perhaps even future promotion to standard variables). It would also result in streamlining submissions by reducing structural limitations and promoting consistency within SDTM standards.

DS-360 : Improving Risk-Based QA in Outsourced Studies Using Cross-Domain ADaM Derivation Comparisons
Pallavi Sadhab, AstraZeneca

In outsourced clinical trials, sponsor oversight commonly relies on a risk-based quality assurance approach rather than full independent reprogramming. While this model effectively verifies the correctness of derived variables within individual ADaM datasets, it often relies on implicit assumptions that analytically related derivations implemented across datasets remain consistent. As a result, cross-domain consistency is frequently trusted rather than explicitly verified. This paper identifies a structural gap in standard risk-based QA practices and proposes a targeted sense-check strategy: systematic comparison of conceptually linked derivations across ADaM domains. In many studies, key derivations- “such as analysis window end dates, first qualifying event dates, and censoring indicators- “are repeatedly implemented across multiple time-to-event parameters, including composite and component endpoints. Although each derivation may adhere to its respective specification, subtle differences in logic (e.g., handling of early withdrawal, partial dates, or analysis window boundaries) can lead to analytical incoherence that is not detected by traditional domain-level validation. Using a detailed time-to-event example, this paper demonstrates how cross-domain consistency checks can identify discrepancies that standard risk-based QA does not capture, even when all individual datasets are technically correct. A practical, output-based framework is presented to formalize these commonly assumed checks, enabling sponsors to verify analytical coherence without re-derivation or expansion of validation scope. Incorporating cross-domain derivation comparisons into risk-based QA enhances sponsor oversight, improves confidence in complex analyses, and supports inspection readiness while remaining aligned with fit-for-purpose validation principles for outsourced studies

DS-364 : Two Approaches to Phase-Specific TRTEMFL in ADAE: A Neoadjuvant- Adjuvant Case with Surgery Between Phases
Youlan Shen, Merck & Co., Inc
Leah Suttner, Merck & Co., Inc.

Most ADAE implementations assume a single treatment phase. The standard TRTEMFL derivation works well in single treatment phase setting, but it does not provide the flexibility to attribute adverse events (AEs) to distinct clinical phases (e.g., neoadjuvant vs adjuvant). In multi-phase trials, treatments, mechanisms, and risk windows often differ by phase; merging all AEs under a single TRTEMFL obscures phase-specific risk. This paper uses a neoadjuvant- surgery- adjuvant oncology case to show how to define TRTEMFL in a multi-phase study without duplicate AE occurrence in each phase. The paper proposes two specification approaches for separating treatment-emergent AEs by phase: (1) assign APERIOD and derive a single TRTEMFL; and (2) derive phase-specific flags (TRTEM0xFL) alongside an overall TRTEMFL. Both approaches support phase-specific AE tables. With the practical specification and SAS code example from a neoadjuvant- adjuvant case, the paper explains the similarities and differences of the two approaches.

DS-368 : Beyond WHODrug: Insight into Concomitant Medication Data Analysis
Song Liu, Novo Nordisk

Concomitant Medication (conmed) is not just WHODrug B3/C3 coding. ICH Guidance requires spon-sor to collect Conmed. It is important to assess its impact on subject safety, drug-related adverse events, confounding or additive effects on efficacy endpoints, and population Pharmacokinetics (PopPK) modeling. Conmed such as systemic anti-cancer therapy can be intercurrent events under ICH E9(R1) adden-dum. It affects the treatment effects on disease progression. In cardiovascular trials, both baseline and on-treatment diuretics use can affect treatment effects in heart failure subjects. This paper will share phuse conmed data collection standard, the strategies avoiding getting unreliable and dirty data, the ADaM data standard in the connecting conmed with medical history, and adverse events and display common statistical analysis with conmed, and pragmatic methods in deriving average daily dose for study specific commands.

DS-372 : Handling multiple screenings and multiple enrollments in SDTM: CDISC and FDA Guidance
Laura Williams, Alira Health
Andrea Gardani, Alira Health

It is becoming increasingly more common for clinical studies to allow multiple screenings and/or multiple enrollments for the same subject. There is not well-established guidance from CDISC nor FDA about how these subjects should be handled in SDTM for regulatory submissions. Currently, there is a proposal from CDISC about handling such cases in a special-purpose domain, Demographics as Collected (DC), that could be added to future versions of SDTM. This seems to align with FDA’s suggestions in the Technical Conformance Guide, however FDA does not give formal direction either. However, there are some tips that can be taken from current guidance and used in practice, considering that sponsors should discuss the approach with their FDA reviewer prior to submission. Consider these three possible scenarios: a subject is screened multiple times and fails each time; a subject is screened and fails the first time(s) but upon rescreening is eligible and enrolls; or a subject is enrolled multiple times within the same study. This paper will summarize the current guidance that is available, what is proposed from both CDISC and FDA, and practical examples in SDTM.

DS-401 : Why Standards Matter More Than Code in the Age of GenAI
Bhavin Busa, Clymb Clinical

The growing adoption of generative AI (GenAI) in statistical programming and clinical reporting has raised important questions around reliability, reproducibility, and regulatory acceptability. While large language models (LLMs) are inherently probabilistic, their outputs become significantly more predictable when guided by well-defined, machine-readable specifications. CDISC Analysis Results Standards (ARS), together with emerging Analysis Concepts (AC), provide a structured foundation for enabling standards-driven automation in GenAI-enabled workflows. This paper demonstrates how ARS-based metadata can be used as formal input to LLMs to support rapid and reliable generation of Tables, Figures, and Listings (TFL) code. By supplying LLMs with ARS metadata, explicit analysis intent, and predefined SAS macros and R functions, TFL programs can be generated in seconds with minimal post-generation refinement. The approach significantly reduces manual programming effort while improving consistency and traceability between specifications and outputs. The work further illustrates how this standards-driven use of GenAI shifts the focus of statistical programming away from manual code development toward design, validation, and governance of analysis specifications. Overall, this paper demonstrates how CDISC standards can play a critical role in enabling reliable, controlled, and regulator-ready use of generative AI in clinical reporting.

DS-423 : Intelligent Implementation of Data Standards: A New Era of Efficiency and Consistency
Prasoon Sangwan, TCS

Clinical data standards such as CDISC SDTM, ADaM, and ARS are essential for ensuring regulatory compliance data consistency and interoperability in clinical research. However, implementation of clinical data standards remains a resource intensive and error prone process. Advances in Artificial Intelligence (AI) and metadata driven automation- “offer a transformative opportunity to simplify this process. This paper describes an AI enabled approach to streamline and enhance the standard implementation across clinical data submission pipeline. It presents an AI enabled framework that automates the interpretation of standards, accelerates metadata and variable mapping, and generates standardized and specifications and outputs. The approach reduces manual effort, strengthens traceability, and supports continuous compliance with evolving regulatory expectations. While human oversight remains critical for scientific judgment and regulatory interpretation AI serves as a powerful augmentation to the traditional implementation workflows. This paper can help achieve organization a scalable consistency and future ready standard adoption in the current increasing complex clinical research environment.

DS-440 : From Specification to SDTM at Speed: Deploying the SDTM Engine in Production
Lynn Xiuling Zhang, Merck & Co., Inc.
Jacques Lanoue Lanoue, Merck
Ulf Nielsen, MSD

Clinical research faces a persistent clinical data bottleneck: manual, fragmented workflows delay compliant SDTM delivery. This paper introduces the SDTM Engine, a metadata-driven, multilanguage platform (Python, SAS, SQL) that unifies ingestion, specification authoring, and generation of audit-ready SDTM outputs. Centralizing mappings and derivations in a machine-understandable specification (MUS), the Engine collapses mapping and programming into a single, natural-language-like authoring step, reducing Excel-based instruction overhead and accelerating delivery with traceable, executable specifications. Key capabilities include pre-defined and extendable derivation functions, Bring Your Own Function (BYOF) with governed workflows via GitHub, comprehensive lineage and traceability reporting, and embedded automation for validation and observability. Cloud-native architecture and a centralized metadata store enable scaling across heterogeneous sources (EDC, labs, operational systems) while ensuring completeness, consistency, and compliance. To maintain speed without compromising GxP policy, the program adopts a gated SDLC with automated testing and parallel BYOF development under a separate SOP, yielding predictable, audit-ready releases. Beyond technical advances, the Engine catalyzes business transformation by redefining roles (Study Lead, Developer/System Architect, Metadata Analyst), formalizing BYOF lifecycle artifacts, and investing in training for MUS authoring and multilanguage proficiency. The results demonstrate faster, safer SDTM delivery; reduced handoffs; improved transparency; and enhanced reuse, positioning teams to meet evolving regulatory expectations with higher data quality and scalability. The paper provides practical guidance on architecture, workflow patterns, features, and validation strategies to modernize SDTM automation across complex trials.

Data Visualization & Interactive Analytics

DV-105 : A Standardized R Graph Library for Production-Ready Analysis Figures
Chunting Zheng, Syneos Health
Margaret Huang, Vertex Pharmaceuticals, Inc.
Xindai Hu, Vertex Pharmaceuticals Inc

Production of Clinical Study Report (CSR) figures require strict adherence to internal standards, including exact specifications for fonts, layouts and aesthetics. Traditionally, programmers implement the same code blocks for each new figure, leading to redundancy, inefficiency, and increased risk of inconsistency. To address this challenge, we developed a comprehensive R graph library that automates the creation of actual analysis figures from standardized templates for common graph types such as scatter, bar, box, swimmer, line, spaghetti, forest and Kaplan-Meier plots. Each template encapsulates a uniform structure and theme, ensuring consistency across projects while minimizing the need to manually reproduce code. At the same time, flexibility is preserved through high-level parameters (e.g., point size, opacity, text size, model type, formula) and customized arguments, which accept additional layers for plot-specific customization. This design is based on the R ggplot2 package, however, it allows users to produce production-ready CSR figures with concise, readable code, while maintaining alignment with industry practices. To demonstrate the utility of this approach, we provide side-by-side comparisons showing the difference between R and SAS code. Our R graph library reduced code complexity, improved reproducibility, and made readability clearer across many CSR graph scenarios. Our templated, automation-ready system balances efficiency, flexibility, and regulatory compliance which ultimately streamlines the generation of high-quality CSR graphics. Keywords: R programming, ggplot2, automation, uniform, CSR figures, reproducibility, efficiency, accurate, data visualization

DV-120 : Swimmer Plots – Some Practical Advice
Ilya Krivelevich, Eisai Inc.

Graphs are an integral part of modern data analysis of clinical trials. Viewing data in a graph along with the tabular results of a statistical analysis can greatly improve understanding of the collected data. Visualized data can very often be the most informative way to understand the insights from the results. Swimmer plots are an effective graphical presentation of subject status and longitudinal data such as duration of treatment, dose changing, occurrences and durations of events. This type of graph is usually very popular in the early phases of drug development (Phase I / Phase II). An essential objective for medical monitors is to make it possible to visually review when specific medications were administered in response to specific safety and efficacy information throughout the study duration. This visual representation is crucial for tracking patient responses and treatment safety/efficacy. Each subject is represented by an individual horizontal bar (lane). There are many possibilities, with the main restrictions being considerations of readability and not overloading the plot with too much information. This paper aims to provide some practical advice on how to overcome such restrictions and make enhanced swimmer plots more readable and informative.

DV-151 : The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets
Louise Hadden, Cormac Corporation

SAS® procedures can convey an enormous amount of information – sometimes more information than is needed. Most SAS procedures generate ODS objects behind the scenes. SAS uses these objects with style templates that have custom buckets for certain types of output to produce the output that we see in all destinations (including the SAS listing). By tracing output objects and ODS templates using ODS TRACE (DOM) and by manipulating procedural output and ODS OUTPUT objects, we can pick and choose just the information that we want to see. We can then harness the power of SAS data management and reporting procedures to coalesce the information collected and present the information accurately and attractively. This presentation is suitable for all levels of proficiency and will be useful for programmers working in all industries. Examples shown were run using SAS 9.4 Maintenance Release 8 on a Unix Server platform, using Display Manager, Enterprise Guide, and batch processing.

DV-155 : A Map to Success with Data Visualization Using ODS Statistical Graphics
Richann Watson, DataRich Consulting
Louise Hadden, Cormac Corporation

Creating custom graphics does not have to be a daunting experience. Anyone who has produced a graph using ODS Graphics has unknowingly used the Graph Template Language (GTL). We take you on a guided tour of how to create a truly custom graph. Our first stop starts with an illustration of a basic plot with little complexity produced with Statistical Graphics (SG) procedures. We then make a pit stop with the TMPLOUT option to help convert the simple plot to GTL. On our road to create a custom graph we need to get out our map to build a map. Our last stop of this adventure takes us to the combining of these two graphs to illustrate the power of GTL to truly customize your graphs.

DV-161 : From Reactive to Proactive: Transformation of clinical trial monitoring through Agentic AI for Smarter, Safer Clinical Trials
Rohit Kadam, Mr.
Saurabh Das, Tata Consultancy Services
Niketan Panchal, Tata Consultancy Services

The clinical trial ecosystem is undergoing rapid transformation driven by increasing protocol complexity, stringent regulatory requirements, and the pressure for accelerated drug development. In this evolving landscape, effective trial monitoring is critical to safeguarding patient safety and ensuring data integrity. Yet, conventional monitoring approaches remain largely retrospective, resource-heavy, and slow to detect emerging risks- “leading delayed interventions and reactive decision-making. To overcome these limitations, we introduce an AI-powered Clinical and Operational Monitoring solution built on an Agentic AI framework. This innovative system leverages multiple specialized AI agents to continuously analyze patient-, site-, and study-level data across diverse sources and visits. By automating interpretation and summarizing complex datasets, it significantly reduces cognitive burden, enhances monitoring efficiency, and ensures compliance with Risk-Based Quality Management (RBQM) principles. Most importantly, it enables proactive, real-time risk identification and mitigation, transforming trial oversight from a reactive process into a predictive, intelligence-driven paradigm. This approach not only accelerates decision-making but also sets out a new benchmark for smarter, safer, and more agile clinical trials. Key Highlights – Agentic AI Framework: Multi-agent architecture enabling continuous, intelligent monitoring across patient, site, and study levels. – Proactive Risk Detection: Real-time identification and mitigation of emerging risks, shifting from reactive to predictive oversight. – Efficiency & Compliance: Automated data interpretation reduces cognitive load, accelerates decision-making, and aligns with RBQM principles. – Transformative Impact: Sets a new benchmark for smarter, safer, and more agile clinical trials through AI-driven operational excellence.

DV-165 : Voice-driven Data Science: Real-Time Data Analysis with R
Phil Bowsher, RStudio Inc.

Voice-driven interactions is a popular way to work with LLMS. This presentation introduces shinyrealtime, an Open Source package integrating OpenAI’s Realtime API with Shiny (R and Python) to create conversational analysis and workflows. This talk will demonstrate how statistical programmers can interact with datasets, generate visualizations, and execute queries using natural speech. The talk will focus on the packages and architecture for using and deploying voice interfaces and integrating Realtime API’s conversational tools. Attendees will gain practical insight into current innovations for building open-source workflows that prioritizes voice driven analysis and discussing topics applicable for pharma.

DV-181 : Ten Rules for Better Charts, Figures and Visuals
Kirk Lafler, sasNerd

The production of charts, figures, and visuals requires presenting data in the most effective way possible. However, this process is neither simple nor automatic. The same dataset can be represented using numerous visual formats – histograms, scatter plots, bar charts, line graphs, pie charts, and more. In addition, the effectiveness of a visual can be significantly influenced by design choices such as color, shading, gradients, and other stylistic elements. To ensure the creation of clear and impactful graphics, it is essential to follow key principles: choose the most appropriate visual for the data, present information in a clear and non-confusing manner, and ensure that the final product is neat, readable, and easy to interpret. This presentation builds upon the work of Nicolas P. Rougier, Michael Droettboom, and Philip E. Bourne by highlighting and expanding on ten rules for improving the quality and effectiveness of charts, figures, and visuals.

DV-188 : Boston Breakthroughs: A Dashboard-Driven Approach to Metadata and Audit Trails with SAS Clinical Acceleration
Frances Gillespie, SAS Institute
Laura Watson, SAS Institute

Clinical trial organizations face increasing demands for transparency, traceability, and regulatory compliance. This presentation introduces the concept of growing regulatory expectations in clinical trials, followed by an overview of the importance of metadata and audit trails in ensuring data integrity and submission readiness. An interactive dashboard built using SAS Visual Analytics on Viya will be demonstrated, leveraging metadata and audit trail data from a clinical trial stored in SAS Clinical Acceleration Repository on Viya. The dashboard visualizes metadata changes and audit trail logs, illustrating why understanding and interpreting these elements is critical for compliance and efficient workflows. The session highlights how dashboard driven insights enable clinical programmers, data managers, and regulatory professionals to conduct faster data reviews, improve traceability, and support regulatory submissions through enhanced transparency.

DV-214 : Introduction to Plotting with the PROCS Package
David Bosak, r-sassy.org

The “procs” R package aims to simulate several SAS procedures in R. For instance, the package has functions like “proc_freq()”, “proc_ttest()” and “proc_reg()” that replicate the basic functionality of the corresponding SAS procedures. A major feature missing from these replicas has been the ability to generate plots. That missing piece has been rectified in a recent release of the “procs” package. This paper will discuss which functions now support plots, and how to generate them. The paper will give several examples demonstrating plot options, and also show how they can be exported and used in a report.

DV-229 : Python for Survival Analysis: Kaplan-Meier and Reverse KM Plots Made Easy
Girish Kankipati, Pfizer Inc
Bala Rajesh Jakka, Pfizer

Survival analysis is a key component of clinical trials, providing insights into time-to- event outcomes such as overall survival and progression-free survival. The Kaplan- Meier (KM) estimator is the most widely used method for estimating and visualizing survival probabilities over time. This paper presents a practical approach to generating KM plots using Python, a language increasingly adopted for clinical data analysis. Techniques for plotting KM curves for single and multiple treatment groups, adding confidence intervals, and customizing charts for regulatory reporting are discussed. Python libraries such as Matplotlib and Plotly are used to create both static and interactive visualizations, supporting reproducible and automated workflows. The paper also highlights the Reverse Kaplan-Meier (RKM) method, which estimates follow-up time by treating censoring as an event. RKM is widely used to report median follow-up in clinical trials and complements standard KM analysis. Practical examples demonstrate how to implement RKM plots in Python, calculate median follow-up, and present results in a clear and compliant format. All examples use reproducible Python code and simulate clinical trial data, making them easy to adapt for real-world applications. Leveraging Python for KM and RKM visualization provides flexibility, automation, and high-quality outputs that meet industry standards and support better decision-making in clinical research.

DV-232 : AI-Recommended Color Palettes with QC for R Shiny Figures
Yi Guo, Pfizer Inc.

In the pharmaceutical industry, the safe and controlled use of artificial intelligence (AI) remains an active topic of discussion. If AI is viewed as a programmer, concerns often focus on unforeseen bugs and outputs. When viewed as an artist, AI can offer unexpected inspiration and expand creative possibilities. In this paper, we use AI as an artist-inspired tool to extend the dynamic color selection feature of an existing R Shiny figures application. Using the {openai} package, the app calls a large language model (LLM) via an API to generate candidate color palettes based on user-defined themes. Within the app, each subgroup is assigned a unique color picker with a default color and an option for manual adjustment, while natural language prompts are used to guide the LLM in generating candidate color palettes for subgroup-level visualization. However, this alone is not sufficient, as an artist’s aesthetic choices may not always meet regulatory submission requirements. To address this, we introduce a QC-oriented evaluation approach based on established color metrics, with primary emphasis on minimum thresholds for luminance, contrast ratio, and color difference (- Ε), and optional consideration of color difference for colorblindness (- Ε(cb)) as a supplementary reference. Rather than treating AI outputs as final results, these metrics provide quantitative QC references to support interactive, human-in-the-loop color selection in regulated visualization workflows. By complementing visual judgment with reference values, this approach improves efficiency, reduces reviewer-to-reviewer variability, and lowers the chance of missing good color options.

DV-294 : Composite TLFs – A Combined Approach to Data Visualization
Jesse Pratt, PPD, part of Thermo Fisher Scientific
Rayce Wiggins, Thermo Fisher Scientific

Clinical trial reviewers are routinely required to interpret results across separate tables, listings, and figures (TLFs). While each output serves a specific purpose, reviewing them independently introduces inefficiencies, increases cognitive burden, and raises the risk of missed signals. This paper introduces Composite TLFs, a unified reporting approach that programmatically integrates tables, listings, and figures into single, cohesive outputs. Composite TLFs consolidate logically related summary, subject-level, and graphical information into one output, allowing reviewers to assess results holistically without navigating between files. Examples are presented using simulated data and implemented in both SAS® and R, demonstrating that this approach is software-agnostic and readily adoptable across clinical programming environments.

DV-298 : Dynamic Patient Profile Plot Development with SAS Graph Template Language
Raghava Pamulapati, Merck

This paper presents a SAS Graphics Template Language (GTL) macro that generates dynamic, integrated patient profile plots to enhance safety review, especially for drug-induced liver injury (DILI). Unlike traditional listings, the approach visualizes multi-domain patient data on a common study-duration axis, consolidating treatment exposure, key laboratory parameters with reference ranges, adverse events with timing and severity, and optional concomitant medications. The result is a clear, time-aligned visualization that supports rapid detection and contextual interpretation of potential safety signals. The solution addresses key automation challenges: harmonizing heterogeneous ADaM domains (ADSL, ADEX, ADLB, ADAE, ADCM), reconciling differing structures and timing conventions, and balancing readability amid varying data density. Core components include a data integration layer, a dynamic space allocation engine for weight-based panel sizing, an intelligent pagination system to manage overflow, subject-specific GTL template generation, and output management for publication-quality RTF/PDF deliverables. The profiles are aligned with FDA Safety Tables and Figures (SSTF) guidance. Graphical profiles emphasize critical lab trajectories relevant to DILI- “annotating baseline, onset, peak, and recovery, and highlighting thresholds (e.g., ALT > 3 – ULN)- “while correlating labs with adverse events and concomitant medications. Innovations such as conditional template logic and automated panel spacing produce consistent, interpretable outputs across subjects, streamlining patient-level review and improving signal detection.

DV-304 : Path to Consistent Clinical Graphics: An R Shiny Gallery
Michelle Harwood, Quantitative Sciences, Alexion, AstraZeneca Rare Disease
Austin Taylor, AstraZeneca RDU

Consistency in clinical reporting is critical, yet figure styles often vary across studies and teams. Limited awareness of modern plotting capabilities often leads teams to revert to outdated or simple visualizations. We developed an R Shiny application that serves as a centralized gallery of commonly used clinical figures. Each figure is paired with interactive customization and replicable R code. Our goal is to standardize figures and coding practices to improve cohesion, reproducibility, and efficiency across projects. The application provides a gallery of commonly used figures, such as the Kaplan-Meier, forest, and waterfall plots, while allowing users to tune statistical layers (confidence intervals, censoring marks) and layout options (faceting, annotations) through an intuitive interface. Every selection dynamically updates both the plot and the underlying R code snippet, enabling all users, including new R users, to adopt best practice templates, learn by example, and rapidly produce fit-for-purpose figures. A key design principle is that the code operates directly on CDISC Pilot ADaM data, increasing reproducibility and seamless integration into existing analysis workflows. This paper will describe the application architecture, standardization guidelines, and the customization options available to users. We will also share lessons learned on building the application and onboarding new R users. This approach encourages best practices and cohesion, enabling teams and new R users to produce consistent, reproducible clinical graphics with confidence.

DV-308 : Designing a modular and interactive visualization tool for DMC
Chen Li, Boehringer Ingelheim
Hong Wang, Boehringer Ingelheim
Shu Chen, Boehringer Ingelheim
Xuan Jiang, Brown University

During DMC meetings, it is important to cross-check both aggregated data and individual records from areas such as adverse events, lab results, and other domains to better understand safety concerns. Traditionally, static reports require manual searches through patient profiles or the creation of extra listings. R Shiny can address this need more efficiently by providing dynamic visualization. We have developed a modular R Shiny application featuring summary tables, listings, figures, and patient profiles that are logically interconnected, which makes the review process more interactive and streamlined. The modular development approach allows us to incorporate both internal and external packages, with the flexibility to add custom features as needed. Challenges include coordinating between various packages, linking IDs across modules, and troubleshooting nested functions. By thoroughly reviewing the source code, we fixed errors in the log and resolved the compatibility issues. After receiving positive feedback from clinical trial teams, we are now preparing for a pilot trial to evaluate the shiny app’s functionality.

DV-314 : From Exploratory Data Analysis to Machine Learning – Continuing My Python Journey
Leon Davoody, Student

This paper builds on my previous work with the Summer Olympics 2016 dataset, showing how Python can be used not just to analyze data, but to make predictions and discover patterns. I’ll share what I learned about machine learning algorithms, data visualization, and how these skills helped me understand sports performance in new ways.

DV-357 : Three Ways to Over-Engineer Your SAS Custom Steps
Mary Dolegowski, SAS
Robert Collins, SAS Institute

Custom Steps in SAS Viya® offer a way to create reusable, user-friendly code for specific tasks. Custom Steps are created as flows which can include SAS and open source code as well as other custom steps or macros. In addition, Custom Steps allow you to create a flexible graphical user interface to guide users when setting parameters and options. This paper illustrates three ways to enhance your Custom Steps by: implementing custom warnings, errors, and debug logic; managing dependencies; and using hidden reference tables.

DV-371 : Interactive Safety Data Visualization Platform: Transforming Adverse Event Analysis Review Through Dynamic Dashboards in Clinical Trials
Nishanth Chinthala, Astrazeneca Pharmaceuticals

Traditional safety data presentations in clinical study reports rely heavily on static tables, requiring manual review of extensive tabulations to identify adverse events (AEs) and serious adverse events (SAEs) of special interest. This process is time-intensive, tedious, and prone to oversight, particularly given the regulatory guidance requiring comprehensive safety endpoint presentations including confidence intervals, Kaplan-Meier estimates, and treatment comparisons at System Organ Class (SOC) and Preferred Term (PT) levels. We developed an R Shiny web application featuring interactive adverse event safety tables with advanced filtering, searching, and collapsible hierarchical display capabilities. The platform was built using pre-analyzed datasets from statistical computing environments without performing in-app derivations. The interface prioritizes minimalism and usability to reduce cognitive load, with dynamic filtering, drill-down, and search that streamline exploration of lengthy outputs. Users can efficiently explore AE data by SOC and PT levels, apply multiple filters simultaneously, and focus on key safety signals without manual table navigation. The interactive safety data visualization platform represents a significant advancement in clinical trial safety analysis, addressing regulatory guidance requirements while dramatically improving reviewer efficiency. The minimal viable product establishes a foundation for future enhancements, including Kaplan-Meier curve visualization and expansion to additional safety datasets. This approach demonstrates the transformative potential of interactive data visualization in regulatory science and clinical decision-making.

DV-382 : From Static to Dynamic: Leveraging R Shiny for Tumor Response Data Review
Reneta Hermiz, Pfizer Inc.
Jing Ji, Pfizer Inc
Amrit Pradhan, Pfizer Inc.
Martin Sandel, Pfizer Inc.

In oncology studies that follow RECIST guidelines, programmatically-derived tumor response is commonly compared to investigator-assessed tumor response to identify and reconcile discrepancies, ensuring adherence to RECIST. However, the review process can be time-consuming due to the volume of data and need for multiple review cycles. This paper presents the development of an R Shiny application whose purpose is to bypass the tedious and error-prone process of manual review. The application will include the following components: Subset View -A set of custom filters such as discrepant response, stratification, or other categories of interest to subset a selectable Patient ID list. Patient ID selection will trigger display of: 1. Patient Level- Displays data that is unchanged across cycles (randomization date, Best Overall Response BOR/Progression Free Survival (PFS) comparisons). Discrepant comparisons will be flagged with color. 2. Assessment Level- Selectable listing of investigator and derived response at each assessment, along with other data of interest such as imagining modality deviations and partial lesion measurements will populate. Discrepancies between investigator and derived would be highlighted. Selecting an assessment will trigger display of the Lesion Level. a. Lesion Level- A listing of individual lesions at the selected and prior assessments. Details such as lesion measurements, modality, and location will be displayed. Source Data View- SDTM data listings that can be used to facilitate review or for further exploration if requested by the reviewer. This integrated approach simplifies transparency and usability and allows real-time access to data updates, streamlining the oncology data review process.

DV-383 : Closing the Loop: An Interactive R Shiny Dashboard for EDC Data Visualization and Real-Time Review Tracking
Jun Yang, Organon LLC
Yuying Jin, Organon

Clinical data review often involves a fragmented workflow where medical monitors and data managers switch between EDC systems, static PDF listings, and Excel trackers. This silos the data from the review process, making it difficult to track query status in real-time. There is a critical need for a centralized platform that combines data visualization with a formal “review and comment” loop. This paper introduces a “Full-Stack” R Shiny application designed to bridge the gap between raw EDC data and structured data review. The application utilizes the shinydashboard framework to provide a multi-tab interface for loading raw datasets (CSV/SAS7BDAT). Interactive visualizations, including patient profiles and safety trend plots, are generated using Plotly and ggplot2. The core innovation is the integration of the DT (DataTables) package with a MySQL database backend. By leveraging DBI and pool packages, the app allows users to select specific rows, mark them as “Reviewed,” and enter comments directly within the UI. These entries are immediately persisted to a MySQL table, ensuring a single, auditable source of truth for the review status. Implementing this dashboard significantly reduces the time from data entry to insight. By automating the link between visualization and documentation, study teams can identify outliers faster and maintain a transparent audit trail of the review process. The session will conclude with a demonstration of the “Write-Back” logic and database schema considerations for clinical environments.

DV-409 : Be a Multi-Media Wizard: Make Your Output Dance and Sing
LeRoy Bessler, Bessler Consulting and Research

Static and simple presentation of information always adequately communicates, but SAS does offer capabilities to take advantage of more communication channels and methods. Among possibles are audio, video, animation, a marquee for your web page (the analogue for that traveling text at the bottom of a television news program- “where it is called a chyron, not a marquee), dozens of things that you can do with images (there are so many different ways you can put them on a web page), examples of 3D graphics that actually communicate versus the bad examples that ALWAYS distort the information, and even the odd phenomenon of making a line of text twinkle.

DV-419 : From Word Clouds to Phrase Clouds to Amaze Clouds: A Data-Driven Python Programming Solution To Building Configurable Taxonomies That Standardize, Categorize, and Visualize Phrase Frequency
Troy Hughes, Data Llama Analytics

Word clouds visualize the relative frequencies of words in some body of text, such as a website, white paper, blog, or book. They are useful in identifying contextual focus and keywords; however, word clouds- “as commonly defined and implemented- “suffer numerous limitations. First, multi-word phrases such as “data set” or “Base SAS” are unfortunately segmented into single words- “”data,” “set,” “Base,” and “SAS.” Second, desired capitalization often cannot be specified, such as visualizing “PROC PRINT” even when its lowercase “proc print” is observed in text or code. Third, spelling variations (e.g., single and plural nouns, various verb conjugations, abbreviations and acronyms) are not mapped to each other. Similarly, and fourth, comparable words or phrases (e.g., “PROC PRINT” and “PRINT procedure”) are not mapped to each other, representing a further lack of entity resolution. This text and its Python Pandas solution seek to overcome data quality, data integrity, and data standardization issues that plague word clouds, by defining and applying configurable taxonomies- “data models that can impart more meaning and precision to ultimate word/phrase cloud visualizations. The result is a phrase cloud that amazes- “an amaze cloud!

DV-434 : The Best Data Dashboard Alternative: More Efficient But Equally Effective Performance Monitoring and Reporting
LeRoy Bessler, Bessler Consulting and Research

You can build a web and email based Performance Monitoring and Reporting package with just Base SAS, ODS Graphics, and ODS HTML5, for an incisive package to make best use of the time, and seize the attention, of its users. Show Them What’s Important! An email, triggered by conditions in the data, can attach (or link to) an Exception Report. It can link to all other relevant information, or you can attach a zip file of the entire package. Why would anybody bother to look at a report when there are no exceptions at the latest monitoring date? A Sufficient and Necessary Reporting Structure (all parts interlinked) consists of: (a) Exception Report (produced only when any occur, OR one line: “No Exceptions”); (b) Regularly scheduled Summary Listing of Actual versus Standard for Every Measurement Being Monitored at the Current Reporting Cycle (Each Actual has link to its Trend)- “for users who want to see this every time, attach it or link it in the email; (c) For Each Measurement, Trend Plot of Actual Start Date through Current (with a reference line at the monitoring standard)- “these are linkable from each of the measurement entries in the Exception and Summary Reports. If the user of your deliverable REALLY needs extra visual gadgets, like dials and gauges, you could pay for an extra license for SAS/GRAPH to use PROC GKPI, in order to report on the “Key Performance Indicators” as add-on to this inherently sufficient communication package for the essential monitoring information.

DV-443 : From Static Outputs to Living the Data – A Visualization framework transforming Clinical Data into a Continuous Asset
Neharika Sharma, GlaxoSmithKline Pharmaceuticals

Keeping unmet patient needs at the center of drug development, delays from the pace of the reporting processes directly translate into lost opportunities. Where power is often said to lie in the data, in the drive for speed, it’s worth reflecting – are we truly learning from the data to its potential? The reliance on labor-intensive static TLFs- other than the cost associated, slows adaptation and constrains rapid insight generation and decision-making. This presentation introduces introduction to an R-shiny and Java based Data Visualization framework- a dynamic, interactive approach that re-imagines clinical data consumption. By shifting from static deliverables to curated, self-service dashboards, the approach enables teams to efficiently explore data – filtering, drilling-down, and participant-level interrogations. This provides Safety, Clinical Scientists, and Medical Writers immediate access to interactive safety/efficacy views, faster signal evaluation, enhanced collaboration, and earlier regulatory initiation. It reshapes the role of Biostats- from ad-hoc output generators to scalable, high-value data enablers. It is embedded across the study lifecycle reducing redundant outputs, minimizing rework, and prioritizing analyses that directly support reporting, including real-life examples of the impacts during high-pressure regulatory interactions. Audience will learn what it takes to develop a framework to dynamically study the data and how adopting a dynamic visualization mindset can materially shorten cycle times, improve cross-functional transparency, and move organizations toward upper-quartile performance. Ultimately, the presentation would demonstrate how clinical data can evolve from static documentation into a continuously accessible asset- “accelerating insights, enhancing quality, and better serving patients awaiting new therapies.

Emerging Technologies (R, Python, GitHub etc.)

ET-140 : Unleash the R-volution: A Blueprint for Building Package Validation Capabilities in our own organization
Kevin Lee, Clinvia

As the use of R continues to expand in clinical trial programming, ensuring that R packages are validated for regulatory compliance, reproducibility, and traceability has become essential. This presentation provides a step-by-step blueprint for building robust R package validation capabilities within your organization. We will begin by clarifying what R and R packages are such as base and recommended packages maintained by the R Core Team to community-contributed and in-house packages. The session then focuses on why validation matters, emphasizing principles such as accuracy, reproducibility, and data integrity to meet FDA and EMA expectations for reliable, auditable results. A central part of the presentation introduces a risk-based validation framework that classifies packages according to their purpose, maintenance quality, community usage, and testing rigor. Tools like the R Validation Hub’s riskmetric and Sanofi’s risk.assessr packages will be demonstrated for risk scoring and assessment. We’ll also cover unit testing practices using testthat and show how to generate comprehensive validation reports including purpose, environment, documentation, dependencies, and testing results. Finally, the presentation outlines how to integrate validation into your computing environment through Installation (IQ), Operational (OQ), and Performance (PQ) Qualifications, ensuring traceable and reproducible results. Attendees will gain practical insights and best practices for establishing an internal, sustainable framework to confidently leverage R in regulated environments.

ET-162 : Accelerating Open-Source AI with AWS Bedrock: Architecting LLM Integration with Posit Workbench & Positron
Phil Bowsher, RStudio Inc.

This session presents integrating enterprise-grade AI into Positron Workbench as the Open Source environment for statistical programming via R/Python and connecting it to the AWS Bedrock foundation. Posit will demonstrate how to leverage Bedrock’s in Positron and help drive clinical reporting LLM pipelines. The talk will cover practical applications- “such as accelerating clinical workflows and data science activites- “while ensuring enterprise AI and supporting human-in-the-loop via Positron AI tools like Databot and the Assistant.

ET-163 : Current Review of Open Source in New Drug Applications: R & Python
Phil Bowsher, RStudio Inc.

The usage around R in pharma, especially in clinical trials, has increased rapidly over the last 3 years. This briefing reviews updates for New Drug Applications (NDAs) and other regulatory submissions, using R and Python. Talk will analyze current trends and provide examples of open source code usage across various submission components. This talk will review various public new drug applications and discuss the usage of open source within various areas of the submissions. Attendees will gain a current understanding of where open source is used in NDAs and how polyglot open source strategy in clinical trial analysis is helping to drive new innovations and progression. This is a great opportunity to learn about current trends of using open-source languages like R, Julia, and python in submissions.

ET-178 : Easy Code Generation in R: The “macro” package
David Bosak, r-sassy.org

You can write a program. Or you can write a program that writes a program. That is called “code generation”. Code generation is advantageous in some scenarios. One advantage is that you can parameterize the outer program, and still produce concise code in the generated program. Another advantage is that the code can be generated in such a way that it is complete, transparent, and readable: as if a human programmer wrote it. This paper will explain how to generate such code in R using the “macro” package. The “macro” package provides a meta-language inspired by SAS Macro, and motivated specifically for the need to generate code easily and efficiently. This package is perfect for creating standard analysis, safety tables, and reusable modules of all kinds. The paper will give an overview of the system and show some simple examples to illustrate how it works.

ET-179 : Building ADaM Datasets from Scratch using R: A SAS programmer’s Perspective
Patel Mukesh, Merck & Co INC
Nilesh Patel, Merck & Co. Inc.

In clinical trials, SAS has traditionally been the primary tool for statistical programming, analysis, and reporting. However, the industry is increasingly embracing a language-agnostic approach, encouraging statistical programmers to develop proficiency in multiple programming languages. This shift is driven by the growing adoption of open-source tools, with R emerging as a powerful alternative due to its flexibility, extensive package ecosystem, and cost-effectiveness. This paper documents a SAS programmer’s transition to building ADaM datasets in R. In this paper, we will share our experience as seasoned SAS programmers developing Analysis Data Model datasets entirely from scratch using R for the first time. We will provide a detailed, step-by-step walkthrough of variable derivation in R, highlighting the similarities and differences compared to traditional SAS programming methods. The discussion includes practical challenges encountered during the transition, strategies employed to overcome these obstacles, and insights into leveraging R’s unique capabilities for clinical data analysis. Through this comparative exploration, we aim to offer valuable guidance for statistical programmers navigating the evolving landscape of clinical trial programming, emphasizing the benefits and considerations of adopting a multilingual programming approach.

ET-189 : Breaking the Shell: Validated R Workflows To Meet FDA Standards
Danielle Stephenson, Atorus Research
Audrey Chin, Atorus Research
Madeleine Penniston, Atorus Research

For years, open-source programming in clinical research has hovered on the edge of regulatory workflow: admired for its flexibility, yet questioned for its reliability. Atorus set out to challenge that perception by building the tables and listings specified in the FDA’s ‘Standard Safety Tables and Figures: Integrated Guide” entirely in R using rigorously validated open-source tools. By tapping into the {pharmaverse} ecosystem, our team developed a complete set of reproducible template table and listing programs that align with FDA expectations. In addition to adopting community-driven standards, we also created a set of internal R-based templates to support consistency and efficiency across projects. With these internal templates, we expanded beyond the scope of the FDA templates by constructing additional template programs for figures and efficacy outputs. This paper outlines our approach, validation strategy, and lessons learned from this process. Our team is demonstrating that R can produce submission-ready outputs for all core safety and efficacy study deliverables. We highlight how an open-source framework can combine validation, standardization, and automation to enable a transparent, collaborative future in regulatory programming. Our results position R not just as a viable alternative to traditional tools, but as a foundation for the next wave of compliant, innovative clinical reporting.

ET-192 : Transitioning from SAS to R: Implementing Reproducible R Workflows for TLF Validation
Poornima Alavandi, Pfizer

Validation of Tables, Listings, and Figures (TLF) is a critical step in the clinical reporting lifecycle, ensuring accuracy, reproducibility and regulatory compliance. As organizations increasingly adopt R for statistical programming and data analytics, the need for robust and efficient validation workflows within the R ecosystem has grown substantially. This paper presents a comprehensive framework for TLF validation in R, leveraging standardized programming practices, automated comparison tools and reproducible reporting processes. Our approach integrates widely used R packages to build a structured, reproducible system for validating Tables, Listings, and Figures. It supports visual and numerical comparisons and generates validation summaries suitable for regulatory documentation. By adopting a fully scriptable R-based pipeline, the clinical programming teams can enhance efficiency, transparency, and scalability to the validation workflow. Keywords: R Adoption, R packages and TLF Validation

ET-195 : Creating Reproducible Clinical Output with SASSY- Reporter Package
Vicky Yuan, Incyte Coperation

As the R programming language continues to gain traction in the pharmaceutical and clinical research industries, many SAS® programmers are seeking robust and user-friendly reporting solutions that align with established clinical and company standards. While numerous open-source reporting packages are available in R, identifying one that offers both flexibility and familiarity can be challenging for those transitioning from SAS®. The SASSY-Reporter package stands out as an exceptional choice, particularly for SAS® users, due to its intuitive design and functional similarities to the widely used PROC REPORT procedure in SAS®. This paper highlights the key features of the SASSY-Reporter package, emphasizing its ease of use, adaptability to clinical reporting requirements, and the convenience it offers to programmers accustomed to SAS®. Through practical examples, we demonstrate how the SASSY-Reporter package streamlines the creation of high-quality, standardized reports, making it an ideal tool for SAS® programmers transitioning to R. The package’s clear syntax and comprehensive functionality not only facilitate a smoother learning curve but also ensure that clinical reporting in R can meet rigorous industry standards with efficiency and confidence.

ET-197 : Automated Delta Detection: A Scalable R-Shiny Framework for Comparing Clinical Datasets
Prabhakara Rao Burma, Ephicacy Consulting Group Inc.
Latha Donapati, Ephicacy Consulting group

In the modern clinical trial landscape, data is characterized by its high velocity and increasing complexity. As studies move toward decentralized models and real-time Electronic Data Capture (EDC) integration, the ability to rapidly identify “deltas” discrepancies between successive data transfers has become a mission-critical task for Data Monitoring Committees (DMC) and Statistical Programming teams. Traditional methodologies, primarily anchored in static SAS® PROC COMPARE outputs, often present a significant bottleneck; these text-heavy listings are difficult for non-programming stakeholders to interpret and lack the interactivity required for high-volume reconciliation. This paper presents a scalable R-Shiny framework designed to modernize the delta detection process. The application utilizes a format-agnostic extraction layer to ingest clinical datasets (SAS7BDAT, CSV, and XLSX) and implements a robust “Delta Engine” powered by dplyr and tidyr. By leveraging reactive left_join logic with many-to-many relationship handling, the tool ensures precise subject-level alignment even across longitudinal domains. Key innovations include a JavaScript-driven “Visual Diff” layer providing multi-tiered feedback: Green highlighting for new records and Red highlighting for modified values. This dual approach significantly reduces the time-to-insight and provides a transparent audit trail from previous to current transfer, ultimately accelerating the path to database lock.

ET-205 : The Open Source Advantage: Powering Innovation in the 21st Century
Kirk Lafler, sasNerd
Ryan Lafler, Premier Analytics Consulting, LLC
Joshua Cook, University of West Florida (UWF)
Stephen Sloan, Dawson D R
Anna Wade, Emanate Biostats

Organizations worldwide are undergoing a fundamental shift in how software is developed, deployed, and sustained. The rapid emergence of new technologies – combined with the accelerating adoption of open-source solutions – has created an ecosystem where innovation thrives alongside increased complexity. As diverse tools, platforms, and communities converge, organizations must balance opportunity with risk. This presentation examines the benefits, challenges, and emerging opportunities of open-source technologies in the 21st century. Participants will explore how organizations and user communities are addressing critical considerations, including system integration and compatibility, security vulnerabilities, intellectual property concerns, licensing and warranties, and the variability of development and support practices. Attendees will gain practical insight into the evolving roles of Python, R, SQL, modern database platforms, cloud computing, and open software standards, as well as the collaborative culture that fuels global open-source innovation. Join us for a forward-looking discussion on how open-source software is reshaping technology strategy, collaboration models, and innovation across industries.

ET-207 : Enhancing Your SAS Viya Workflows with Python: Integrating Python’s Open-Source Libraries with SAS using PROC PYTHON
Ryan Lafler, Premier Analytics Consulting, LLC
Miguel Bravo Martinez del Valle, Premier Analytics LLC

Data scientists, statistical programmers, machine learning engineers, and researchers are increasingly leveraging a growing number of open-source tools, libraries, and programming languages that can enhance and seamlessly integrate with their existing data workflows. One of these integrations, built into SAS® Viya®, is its pre-configured Python runtime integration, PROC PYTHON, that gives SAS programmers access to Python’s open-source data science libraries for wrangling and modeling structured and unstructured data alongside the validated procedures provided in SAS. This paper demonstrates how to install and import external Python libraries into their SAS Viya sessions; generate Python scripts containing methods that can import, process, visualize, and analyze data; and execute those Python methods and scripts using SAS Viya’s PYTHON procedure. By integrating the added functionalities of Python’s libraries for data processing and modeling with SAS procedures, SAS programmers can enhance their existing data workflows with Python’s open-source data solutions.

ET-212 : A Noval Approach to Inter-Collaboration using IDEs and GitHub
Sydney Hyde, Bristol Myers Squibb
Tamara Martin, Bristol Myers Squibb

In the pharmaceutical industry, SAS has long been the dominant programming standard. However, open-source technologies such as R and Python have seen increasing adoption, driven by active user communities, and rapid innovation. As a result, new graduates entering the industry bring increasingly diverse programming backgrounds. In a fast-paced environment, it is critical to leverage both existing expertise and the skills of the next generation of programmers. This goal is often hindered by limited overlap in programming language experience across teams, which can constrain projects to a subset of available talent and lead to duplicated effort. This paper presents a collaborative system, built on existing platforms, that enables cross-language collaboration. Using practical examples, we demonstrate how combining GitHub repositories with modern interactive development environments (IDEs) enables seamless collaboration across SAS, R, and Python, allowing R and Python to be called from SAS and vice versa. This approach allows organizations to fully utilize the strengths of all team members, promote cross-language code reuse, and accelerate development without requiring programmers to abandon their language of choice.

ET-215 : Reproducing the SAS DATE and TIME formats with {fmtr} package in R
Chen Ling, AbbVie
David Bosak, r-sassy.org

As pharmaceutical organizations transition from SAS® to R for clinical reporting, preserving SAS-compatible date, time, and datetime formats remains a critical challenge for regulatory submissions. In current workflows, SAS formats can be applied during XPT export; however, this approach restricts formatting to the transport file and does not allow SAS-consistent formatted values to be generated or reviewed directly within R environment. As a result, programmers cannot fully inspect, validate, or quality-control formatted outputs in R prior to export. The open-source {fmtr} R package addresses this gap by providing the first complete, in-R replication of the SAS DATEw., TIMEw.d, and DATETIMEw.d display formats. These implementations allow SAS-compatible formatting to be applied natively within R environments, enabling consistent presentation throughout tables, listings, figures, and validation workflows, not solely at the point of XPT creation. This paper introduces the DATEw., TIMEw.d, and DATETIMEw.d formats implemented in {fmtr}, supporting a wide range of widths and precision levels. These implementations enable reliable, SAS-compatible DATE, TIME, and DATETIME formatting directly within R, improving reproducibility and confidence in open-source clinical reporting workflows.

ET-218 : Regression Analysis Made Easy Using R
Zheyuan Yu, Walter
Kirk Lafler, sasNerd
Zichun Gao, Stevens Institute of Technology
Jiaxin Xu, Franklin and Marshall College
Zeqi Li, Columbia University
Ruochen Shao, High School Student

Regression analysis is a foundational technique in data science and statistical modeling, enabling analysts to quantify and interpret relationships between a dependent variable and one or more independent variables. This paper presents a practical introduction to the concepts, common types, and real-world applications of regression analysis using the R statistical programming language. Through clear explanations, simple examples, and reproducible R code, readers learn how to develop, interpret, and evaluate regression models, including simple linear regression, multiple linear regression, and logistic regression. Emphasis is placed on model assumptions, interpretation of results, and practical considerations relevant to applied analytics.

ET-223 : The Evolution of Open-Source Technologies in the Pharmaceutical Industry: Python as a Cost-Effective Solution for Clinical Statistical Programming
Ramesh Potluri, Servier Pharmaceutical

SAS has traditionally been the standard platform for statistical analysis and clinical reporting in the pharmaceutical industry. However, advancements in open-source technologies and evolving regulatory perspectives have driven increased adoption of alternative tools that support automation and advanced analytics. Python offers a powerful and cost-effective solution for automated tool development and machine learning integration. This paper describes the setup of Python and associated development environments, reviews key libraries and functions that enable SAS-equivalent functionality and presents practical examples of demographic and adverse event table generation aligned with clinical reporting requirements.

ET-252 : Reviewing and identifying issues in TFL macro parameter values with R shiny tool
Mydhili Chelikani, Merck & Co., Inc.
Ajay Kumar Tirkey, Merck & Co., Inc.

This paper describes an R Shiny application that reads TFL parameter-value SAS datasets from a user-specified folder and enables interactive retrieval and review of macro parameter-value information. Users can select one or more macro programs or process all available programs and filter results using flexible search patterns to locate specific parameter-value datasets. The application performs configurable QC checks via the UI, allowing lead programmers and statisticians to review TFL filter conditions and identify issues much faster than with manual inspection. Manual review of parameter-value information is especially time-consuming when hundreds of TFLs are involved; by automating retrieval, filtering, and preliminary validation, this tool substantially reduces review time and improves the efficiency of the validation workflow.

ET-279 : Goodbye SAS, Hello R: Practical Workflows for CDISC Standards
Madeleine Penniston, Atorus Research
Alyssa Wittle, Atorus Research

The adoption of open-source tools in the pharmaceutical industry is rapidly transforming how clinical data is managed, analyzed, and delivered for regulatory review. Among these tools, R has become the leading choice, but what does programming in R really mean for today’s statistical programmer? This paper presents an end-to-end CDISC workflow demonstrating how statistical programmers can manage the lifecycle of clinical trial data using R as the primary analytical engine. Beginning with SDTM development, the process illustrates techniques using CDISC standards and demonstrates controlled-terminology mapping using packages such as sdtm.oak. Using a consistent programming approach, raw CRF inputs are transformed into standardized datasets across a variety of domain types. The paper then transitions into ADaM dataset creation, highlighting how R can easily accommodate the Subject-Level Analysis (ADSL), Basic Data Structure (BDS), and the Occurrence Data Structure (OCCDS) datasets. Within each example, practical functions in the admiral package are introduced to derive baseline values, flags, and other key analysis variables. Each coding stage highlights reproducibility strategies and environment control through the use of renv. The advantages of R are also explored, including easier transposing techniques, simple row-wise operations, and the absence of fixed length variable requirements. These allow programmers to reduce the risk of data truncation and other structural errors. Together, these components provide a comprehensive view of how R can support a submission ready clinical programming workflow for SDTM and ADaM development.

ET-283 : An Innovative solution for Interactive Dashboards Using Python Flask Framework for Clinical Data Analytics
Manish Bhagchandani, Ephicacy Lifescience Analytics Pvt Ltd

Traditional clinical reporting tools and built-in dashboard solutions often have fixed structures and limited flexibility, which can restrict deeper exploration of clinical trial data. This paper demonstrates how the Python Flask framework can be used to build interactive and customizable dashboards using SDTM and ADaM datasets. By moving away from rigid, pre-packaged tools to a lightweight web-based framework, Flask enables greater control over layout, logic, and user interaction while integrating smoothly with the Pharmaverse ecosystem. An end-to-end workflow is presented, including data preparation using Pandas, interactive visualizations created with Plotly, and enhanced user interactions using JavaScript. These dashboards support data validation, study review, safety monitoring, and exploratory analysis by statistical programmers, clinical scientists, and study teams. They promote effective collaboration by providing timely and easy-to-interpret views of clinical data. Implemented within a secure environment, this framework offers a scalable and practical solution for enhancing clinical data review while aligning with regulatory expectations.

ET-285 : mkheader: An R Package for Automated Generation and Management of Program Headers in Clinical Trial Programming
Gabriela Piasecki, Merck & Co.
Laura Frederick, Merck & Co., Inc.

In clinical trial programming, maintaining accurate and up-to-date program headers is essential for regulatory compliance, traceability, and audit readiness. Program headers capture critical metadata such as study name, input datasets, authorship, and revision history. As R gains traction in pharmaceutical programming, the lack of standardized tools for automated header generation within R leads to inefficient, error-prone, manual processes that increase compliance risks. The mkheader R package addresses these challenges by automating the creation and management of program headers tailored specifically for clinical trial programming. It automatically extracts metadata – including study, author details, program date, and R version – and seamlessly integrates user inputs to generate consistent, comprehensive headers. Key features include interactive header creation and maintenance via RStudio Addins, automated header updates, and seamless integration with existing R programming workflows. By eliminating manual header maintenance, mkheader enhances programming efficiency and ensures headers remain accurate, consistent, and audit-ready throughout the study lifecycle. This package offers a practical solution to improve documentation quality and regulatory compliance in clinical trial programming.

ET-288 : Survival Analysis, K-M curve, Hazard Ratio and Data Visualization using R Programming: A comprehensive approach
Ishwar Singh Chouhan, Senior Scientist at Merck & Co.

This paper offers a comprehensive guide to survival analysis using R programming, with a particular focus on Kaplan-Meier estimation, hazard ratios, and visualization through the ggplot2 package. Survival analysis is a statistical technique used to investigate the time to an event of interest- “such as death, failure, or relapse and is widely applicable in fields like medicine, engineering, and social sciences. The Kaplan-Meier curve, a non-parametric method, is used to estimate and visualize survival probabilities over time while accounting for censored data. This method allows for the comparison of survival rates across different groups. In this paper, we outline the process of implementing the Kaplan-Meier method in R, covering data preparation, survival curve plotting, and result interpretation. Additionally, the paper incorporates the Cox proportional hazards model to calculate hazard ratios (HR) and assess the impact of covariates on survival time. The hazard ratio quantifies the relative risk of an event occurring in one group versus another. Using the survival and ggplot2 packages, we demonstrate how to fit a Cox model, compute hazard ratios, and create enhanced, visually informative survival curves. The ggplot2 package adds customization options to improve the clarity and aesthetic quality of survival plots, aiding in better result interpretation. The analysis also includes model diagnostics and checks for the proportional hazards assumption. This integrated approach not only provides valuable insights into survival distributions but also emphasizes the role of effective data visualization and interpretation for informed decision-making in clinical trial research.

ET-292 : Engineering secure and reproducible R based clinical programming systems using open source DevSecOps practices.
Indraneel Chakraborty, Ephicacy Lifescience Analytics Pvt Ltd

R is increasingly being adopted in clinical programming, evolving from isolated scripts into reusable functions, packages, and automated pipelines that generate analysis-ready datasets and TLF outputs. As reuse expands across studies and teams, risks emerge, including accidental exposure of credentials, result drift across machines, dependency changes that alter outputs or introduce vulnerabilities, automation failures in non-interactive environments, and unintended leakage of sensitive data through logs or artifacts. This session presents an engineering roadmap for moving from scripts to packages to pipelines while making security, auditability, and reproducibility inherent qualities of the system. We emphasize platform-neutral DevSecOps practices that support clinical traceability and consistent execution, including git-based version history, automated quality gates such as tests and style checks, and standardized run configurations that behave reliably across laptops, CI systems, and scheduling platforms. Reproducibility is framed as an end-to-end runtime property rather than only dependency management. We compare approaches for capturing and restoring project dependencies to maintain analytical stability, alongside Dockerfile-based containers that bundle OS libraries, R, and system components to ensure consistent behavior in controlled batch environments, enabling durable reruns months later. Security considerations span the full lifecycle, covering separation of secrets from code, least-privilege access, reduced network exposure in automation, dependency health monitoring, and generating outputs with evidence linking deliverables to exact code versions, configurations, and execution contexts. A brief discussion on LLM-related risks highlights potential leakage pathways. Attendees will gain practical, transferable patterns for building secure, reproducible, and maintainable R-based clinical programming systems that teams can confidently rely on.

ET-296 : Building for the Long Haul: Managing Scope, Refactoring, and CI/CD in Internal R Packages
Huijun An, Fred Hutchinson Cancer Center
Blazej Neradilek, SCHARP at Fred Hutch
Chenchen Yu, Fred Hutchinson Cancer Center
Shannon Grant, SCHARP at Fred Hutch

How do you successfully carry an internal R package with a small team in a non-profit organization when the needs and team change? We share benefits and challenges in developing and using an internal R package for figures in immunological reports in clinical trials. With our plotting functions used in over 300 reports, we chronicle a transition from a single statistician developer to a developer team over 10+ years. We discuss lessons in scope creep and process improvements, including documentation, bug tracking and management, traceability, code re-factoring and continuous integration and continuous delivery (CI/CD). We illustrate our re-factoring effort to extend functionality and collaborate more easily. We also share our CI/CD approach, including unit tests for plots (vdiffr package). Overall, the lessons highlight the importance of good software engineering and developmental practices for evolving internal R packages.

ET-300 : Modern Data Science with SAS Viya Workbench: Unified Development with SAS, Python, and R
Shelby Taylor, SAS Institute

SAS Viya Workbench is a modern, code-first analytics environment that brings SAS, R, and Python together within the SAS Viya ecosystem. This session provides an overview of SAS Viya Workbench and demonstrates how it supports both learning and professional analytics workflows through flexible interfaces and integrated tooling. Attendees will learn how to get started with SAS Viya Workbench, including launching sessions, selecting compute resources, and navigating the project-based file system. The presentation explores the available development interfaces- “VS Code- style IDEs, JupyterLab, and Jupyter Notebooks- “and shows how to execute SAS programs, review logs and results, manage output files, and configure SAS Autoexec settings. The session also highlights collaboration and reproducibility by demonstrating how to connect SAS Viya Workbench to GitHub for version control, code review, and traceable workflows. SAS notebooks are introduced as a user-friendly option for interactive analysis, including techniques for modularizing large SAS programs and transitioning traditional SAS code to notebook-based workflows. In addition, attendees will see how SAS Viya Workbench enables seamless integration of SAS and R within the same project. The presentation covers configuring an R environment, accessing SAS datasets from R, and sharing data across languages using a common file system, culminating in a hybrid SAS and R workflow. Finally, the session briefly discusses how analyses developed in Workbench can be registered in SAS Viya Model Manager as part of a broader analytics lifecycle.

ET-301 : R-Based Translation of Japanese Characters in Clinical Datasets for Regulatory Reporting
Hardik Sheth, Catalyst Clinical Research
Roshan Stanly, Genpro Research Inc.

Clinical studies conducted in Japan or multinational trials frequently generate clinical datasets containing unique and regional specific characters across data sources such as electronic data capture (EDC), SDTM, ADaM, and analysis datasets. Although modern systems support Unicode, many legacy SAS-based clinical reporting and submission workflows remain constrained by encoding limitations, restricting reliable processing of non-ASCII text. These constraints introduce risks to data consistency, traceability, and timely delivery of analysis outputs. This paper presents an R-based approach for translating Japanese characters into English to support integration with established SAS- and R-based clinical programming pipelines. The approach minimizes manual data manipulation, enhances reproducibility, and enables controlled handling of multilingual data, particularly for legacy studies, regulatory-driven ad-hoc analyses, and environments where system-level encoding changes are not feasible, supporting data readiness for downstream reporting and regulatory submissions.

ET-327 : Building Better Data Science Workflows: Best Practices with Git, GitHub, and Data Version Control (DVC) for Effective Collaboration
Ryan Lafler, Premier Analytics Consulting, LLC

This paper presents a practical framework for building reliable, collaborative data science workflows using Git, GitHub, and Data Version Control (DVC). It begins by establishing why version control matters in data science, introducing Git as the foundation for tracking code changes, GitHub as the collaboration layer for shared repositories and team-based projects, and DVC as a lightweight extension for efficiently tracking and versioning datasets alongside code. We then present concrete examples that highlight best practices for maintaining a healthy primary codebase, applying structured branching strategies, writing purposeful commits, managing work in progress (WIP) safely, and reducing merge conflicts in team environments. DVC is incorporated to show how dataset changes can be efficiently tracked, compared, and restored without bloating source repositories. Together, these practices provide an actionable roadmap for improving collaboration, reproducibility, and scalable data science workflows.

ET-358 : Automating Git Workflows in SAS with Git Functions
Lleyton Seymour, SAS

Whether you are an independent developer or part of a larger organization, Git has become a foundational component of the modern development workflow. SAS users are no exception to this; they will typically interact with a Git client in one of two ways: through a command-line interface or through a point-and-click GUI. While both approaches are sufficient, when working with scheduled jobs, flows, or automated pipelines, neither is ideal. This becomes especially apparent in regulated settings where objects, such as models, may be produced by scheduled or automated workflows without a traceable history. Enter Git Functions; these functions enable developers to execute Git operations directly within their SAS code, eliminating the need for manual intervention. By embedding version control directly into SAS programs, Git Functions allow the automated tracking of model artifacts, the execution of conditional operations, and even the ability to produce structured reporting based on your commit history. This paper provides a foundational introduction to Git Functions and how they are incorporated into everyday workflows, leaving attendees with reusable approaches for automating version control in their own environments.

ET-370 : Challenges for small-mid size organization to build a GxP Compliant R Environment (CRE)
Peng Zhang, CIMS Global
Tai Xie, CIMS Global
Peilin Zhou, CIMS Global
Christine Matakovich, CIMS Global

In recent years, R-based approach for clinical trial analysis, reporting and regulatory submission have been rapidly emerging. R-based approach, supporting by rich community-contributed resources (CRAN, Bioc, pharmaverse), demonstrates the feasibility for innovation including interactive results, connecting with LLM/AI, and building automation tools. To ensure the statistical validity and reliability from such practices, a compliant environment may be needed for such adoptions in clinical trials, facilitating R working session, R package management and Shiny deployment platform. While large organization do have sufficient resources on building their own statistical computation environment, small-mid size organization might lack both knowledge of R usage and expertise from IT and QA. In this session, we will discuss about the requirements of a compliant R environment (CRE) and introduce how to leverage open-sourced R package on internal development with good practice and validated evidence, including preferred package, development cycle, validation method (PHUSE suggested validation framework of {valtools}), and potential usage in pharmaceutical context (e.g., R Shiny, Quarto, LLM).

ET-374 : A Governed Git Workflow Using Azure Repos for GxP Compliant Statistical Programming
Jing Yu, Novo Nordisk

Effective version control is essential for collaborative statistical programming, reproducible analyses, and validated deliverables in GxP regulated pharmaceutical environments. This paper describes a standardized Git based workflow implemented at a large pharmaceutical company that uses Azure DevOps (Azure Repos) as the centralized repository for SAS and R programming. Azure DevOps provides integrated capabilities for code collaboration, automated builds and tests, and traceable deployment pipelines. In our server centric Git model, Azure Repos serves as the authoritative store for code and full history; developers maintain local working copies of files and synchronize with the server using a daily cadence – ” Git pull at the start of the day and Git push at the end of the day – ” while task or analysis specific branches are created and managed on the central repository. Governance is enforced through branch policies, peer review via pull requests, and CI checks in Azure Pipelines. This workflow enforces consistent repository structures, reduces reliance on shared network folders, and provides robust audit trails and rollback capability, thereby improving maintainability and reproducibility across studies. Adoption begins with cloning the remote repository into a local development environment. Developers may utilize SAS Enterprise Guide (EG), VS Code, or RStudio, adhering to established Git standards throughout their workflow. The approach enables statistical programming teams to scale collaboration while maintaining validation awareness, operational consistency, and regulatory compliance.

ET-378 : AI-Enhanced R Shiny App for Real-Time Clinical TLF Coding and Preview
Dickson Wanjau, Merck & Co., Inc.

AI-Enhanced R Shiny App for Real-Time Clinical TLF Coding and Preview Dickson Wanjau Merck & Co., Inc., Rahway, NJ, USA In this paper, we present an interactive R Shiny application that streamlines the creation, preview and refinement of RTF-based TLFs for clinical reporting using the open-source “r2rtf” R package. The application merges a live code editor with a real-time PDF preview, enabling users to iteratively write or modify code and immediately preview results. Users begin by uploading “table-ready” datasets in various formats e.g. CSV, Excel, SAS (.xpt or .sas7bdat formats), R file formats (.RData, .rda) etc. The app previews data, offers column filtering, and auto-generates starter r2rtf code which users can refine in a shinyAce editor. Upon code submission, the app executes the user’s script in a controlled environment, updates the preview with a rendered PDF and logs any warnings or errors for debugging. To support adoption and learning, we have embedded a retrieval-augmented generation (RAG) AI chatbot trained specifically on the “r2rtf” package documentation. This facilitates on-demand guidance about functions, syntax and best practices while building submission-ready tables. Additionally, the user can debug some common syntax errors or warnings. Together, these features lower the barrier to TLF creation in R, enhance transparency and reproducibility and provide a unified interface for programmers across experience levels. We discuss implementation logic, user workflow, error handling strategies and integration of AI-assistance and we demonstrate real clinical examples to illustrate practical value. The tool supports reproducible reporting and enhances productivity for statistical programmers.

ET-381 : Bridging the Gap A Python-Word Integration for Detecting Ghost Page Breaks in SAS-Generated RTFs
Jun Yang, Organon LLC
Robert Stemplinger, Organon

Generating Tables, Figures, and Listings (TFLs) via SAS® ODS PDF is the industry standard for clinical trial reporting. However, SAS frequently introduces “extra” page breaks or “orphaned” headers due to margin constraints and font rendering discrepancies. Also, customizable solution integrated business logics like abnormal indent in the first row to identify page breaks fit various requirements in different scenarios. Manually reviewing hundreds of pages to ensure no empty pages or split tables exist is time-consuming and prone to human error. This paper introduces a novel Python-based application designed to automate the detection of pagination anomalies. While standard PDF text-extraction libraries often fail to capture the visual “flow” of a document, this tool leverages Microsoft Word’s rendering engine as a diagnostic bridge. By utilizing the pywin32 library to interface with the Word COM object, the application programmatically opens the SAS output and analyzes the document’s internal pagination structure. The specific mechanism compares expected page-break triggers against Word’s actual rendered layout to identify “ghost” breaks- “instances where content ends prematurely or headers appear without corresponding data. The resulting application provides a user-friendly interface for programmers to batch-process study outputs. This automation significantly reduces the Quality Control (QC) burden, ensuring that submission-ready PDFs are free of formatting defects. By combining the data-processing power of Python with the layout intelligence of Word, programmers can achieve a higher level of precision in clinical reporting.

ET-390 : A Practical Roadmap for Modernizing Legacy Clinical Applications
David Ward, Triam Ltd

Many pharmaceutical organizations depend on long-standing analytical applications that are increasingly difficult to maintain, validate, and extend. While these systems often remain mission-critical, accumulated technical debt can hinder productivity, increase operational risk, and limit the adoption of modern analytical techniques. This paper presents a practical, incremental roadmap for modernizing legacy clinical applications while maintaining continuity with existing SAS-based and regulated workflows. The discussion begins with guidance on project planning and assessing modernization readiness, then moves to strategies for tool and language selection. Attendees will gain a clearer understanding of how alternative products and open-source languages can be evaluated and introduced alongside established platforms, minimizing disruption and regulatory risk. The paper also explores software tools that support the migration process itself, including capabilities for documentation generation, code discovery and cataloging, dependency management, and integration with version control, release, and deployment practices. This paper is intended for data engineers, statistical programmers, and technical leads responsible for maintaining or modernizing legacy analytical systems. Participants will leave with a clear, actionable framework for planning modernization efforts that balance innovation, operational stability, and regulatory confidence.

ET-397 : Trusting Your R Packages: A Practical, Risk-Based Approach to External Package Validation
Radhika Etikala, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
Valeria Duran, Statistical Center for HIV/AIDS Research and Prevention at Fred Hutch

R packages offer powerful features, but they can raise questions about accuracy and reliability, especially in regulated or high-stakes environments. In settings such as pharmaceutical and clinical research, where reproducibility and confidence in results are critical, it is essential to manage the risks associated with external R packages and hold them to a similar standard as internally developed code. At our workplace, we developed a practical, risk-based approach for performance qualification (PQ) of external R packages to support study-level use within a validated R environment. As a statistical center reliant on R for preclinical and early-phase clinical research, we needed a solution that provides meaningful assurance while remaining proportionate to available resources. We describe a package assessment process focused on programmer-driven evaluation, which considers package characteristics, usage context, and study-specific needs. This process produces structured PQ outputs, including package risk metrics. Our approach is guided by principles from the R Validation Hub White Paper and shaped by practical lessons learned during implementation. These include challenges with interpreting riskmetric results, assessing package dependencies, managing package versions, and navigating differences in testing packages sourced from CRAN, Bioconductor, and GitHub. We walk through our end-to-end assessment process, including PQ of external packages and how PQ outputs are used as inputs to a study-specific risk assessment maintained by statistical leads to guide study-level package risk decisions. Finally, we share lessons learned and discuss future enhancements to our external package validation process.

ET-399 : Chatting with Your Data, Wherever It Lives: Unlock Insights through Duck DB and Open File Formats
Sundaresh Sankaran, SAS Institute
Mary Dolegowski, SAS

Enterprises have shifted their approach to accessing data for analysis and insights. They require rapid insights preferably with minimal data movement. There’s greater adoption of open formats like Parquet that reduce data footprint and enable efficient retrieval, and increased inclination to use a multitude of formats like JSON, CSV and Parquet. As part of recent technology trends, we witness the emergence of Duck DB, an open-source, lightweight query processing engine optimised for analytical workloads. Duck DB should be viewed as a multipurpose query processing engine rather than a traditional enterprise database and can analyse a range of data formats residing in several source areas, making use of native readers without having to copy data. Analytical platforms and packages such as SAS, R and Python offer Duck DB drivers and related capabilities. In this session, we demonstrate how to use Duck DB to query clinical study tables in a mixture of open file formats, highlighting features such as query optimisation, predicate pushdown, parallelism and parsing complex nested structures such as JSON. For clinical programmers, statisticians and data scientists, benefits include faster time to insights, simplicity and convenient interfaces while maintaining high accuracy. We also share an open-source repository which provides code snippets and queries based on examples from clinicaltrials.gov which can be extended further.

ET-412 : An Experience Using R, SASSY and Tidyverse For Clinical Trial Analysis, From A SAS Programmer Perspective
David Franklin, TheProgrammersCabin.com

n recent years there has been a move to at the very least consider doing some of the clinical trial submission analysis using other software packages, for example R. Some groups have looked at SDTM and/or ADaM and some have looked at speciality tables, listings and graphs. Few have done a complete package. This paper looks at using R, tidyverse and sassy and putting raw data into SDTM and ADaM datasets, then using this data to create tables, a listing and a figure as one software solution. Along the way we will look at some techniques using these three packages to produce the outputs as .rtf and .xpt, read .xpt and .csv formatted data files, and look at challenges including dates and times.

ET-416 : It’s a Wonderful Lifecycle: Translating Statistical Programming into Modern Analytics Development
Steve Nicholas, Atorus Research

Emerging technologies such as Shiny and open-source analytics frameworks are increasingly being introduced into clinical programming environments that have long relied on established statistical programming processes. While the tools may be new, many of the principles required for successful adoption are already familiar. This paper reframes modern analytics applications through a direct comparison of the software development lifecycle (SDLC) and traditional statistical programming workflows. Using a one-to-one mapping, it translates artifacts such as the Statistical Analysis Plan (SAP), mock shells, ADaM specifications, dry runs, and QC programming into analogous stages of analytics application development. User stories and feature requirements serve a role similar to the SAP, while wireframes function as the equivalent of mock shells, enabling early alignment on purpose, scope, and decision making before development begins. Iterative development cycles mirror dry runs and interim outputs, and testing phases align naturally with independent QC programming and validation practices long embedded in regulated clinical workflows. Through real world examples of enterprise analytics applications, this paper demonstrates how applying a disciplined lifecycle-based approach leads to more sustainable, transparent, and scalable tools. By viewing pharma deliverables through a software lens, we can reimagine how we collaborate, document, and deliver, not just faster, but smarter and with greater transparency.

Hands-On Training

HT-369 : “Virtual Data, Real Standards – Leveraging Data Simulation for Smarter Clinical Trials”
Sangeeta Shabadi, Jazz pharmaceuticals
Nitesh Patil, Cytel Inc
Jonathan Henshaw, Jazz Pharmaceuticals

The Study Data Tabulation Model (SDTM) is central to standardized clinical trial data submissions across global regulatory agencies. While traditionally focused on data organization and compliance, SDTM emerging as a strategic tool for simulation in clinical trial analysis. As clinical data grows in volume and complexity, automation becomes critical. SDTM simulation delivers fast, flexible creation of CDISC-compliant synthetic datasets; customized for any study design or therapeutic area. This paper explores how data simulation in SDTMs involves generating synthetic SDTM-like datasets using random sampling and domain-specific rules to mimic real clinical data. Simulation techniques were used to create realistic subject profiles, dosing patterns, adverse events, and lab values, enforcing cross-domain logic and SDTM structural requirements. This allowed us to develop and test SDTM- ‘ADaM- ‘TFL workflows early, validate derivations, surface edge-case issues, and improve code robustness without waiting for actual trial data. This approach is increasingly valuable in modern statistical programming because it allows development, testing, and validation before real data is available. Our paper demonstrates how R and SAS simulate SDTM domains by empowering faster, more reliable, and more innovative programming by enabling early code development, stronger validation, and risk-free experimentation without dependence on real clinical data showcasing seamless integration and compliance with CDISC standards.

PK/PD/ADA and Quantitative Pharmacology

PK-216 : Early Unblinding to Pharmacometrics (PMx) Data: Challenges, Practices, and Benefits
Shweta Vadhavkar, Genentech-Roche
Jing Su, Merck & Co., Inc

Pharmacometrics (PMx) analyses play a critical role in Model-Informed Drug Development (MIDD), supporting dose selection, trial design, benefit- risk assessment, and regulatory decision-making. However, PMx analyses depend on pharmacokinetic (PK) concentration data, exposure, and baseline covariates of interest. Of these, PK data can act as a surrogate for treatment assignment and hence can imply the treatment assignment, resulting in restricted access until formal study unblinding. This paper describes the scientific rationale, business need, benefits, and risks associated with controlled early access to PMx data prior to unblinding. Drawing on industry practices shared through the ISOP PMx Data Programming Special Interest Group (SIG), we summarize governance models, firewalled team structures, and standard operating procedures (SOPs) used to mitigate risks of unintentional unblinding and bias. Practical considerations- “including documentation, approval pathways, secure environments, vendor engagement, and data reconciliation strategies- “are discussed. We will discuss the increasing need for safety/efficacy data for Model-Informed Drug Development (MIDD) decisions. When implemented with appropriate controls, early access to PMx data enables parallel workflows, improves data quality, and supports timely regulatory submissions while preserving study integrity.

PK-219 : Dissecting PK/PD Data from Analysis to Submission: Breaking the Black Box for Efficient Programming
Sridevi Balaraman, REDBOCK

Pharmacokinetic and pharmacodynamic (PK/PD) analyses rely on high quality, well structured, and traceable data. Transforming clinical trial data into analysis ready PK/PD datasets involves multiple steps from analysis planning through regulatory submission and requires close collaboration among programming, clinical pharmacology, and biostatistics teams. At the center of this process, the PK/PD data flow is often treated as a “black box,” leading to inefficiencies, rework, and challenges during submission. This paper explores how greater transparency in the transition from SDTM and ADaM (ADNCA) to NONMEM datasets can enable smooth and reproducible programming for PopPK and PK/PD analyses. Key focus areas include standardized datasets, clear documentation of data lineage and derivations and independent programming based QC. These practices allow programmers to visualize complete profiles and efficiently manage complex dosing and PK concentration data. Further, by addressing the PK/PD data “black box” at its core, this paper demonstrates improvements in programming efficiency, reductions in late stage rework, and increased confidence in PopPK and PK/PD analysis outputs used for decision making and regulatory submissions across development stages.

PK-234 : Visualizing PK and ADA Data at Scale: A Parameter-Driven SAS Macro for Box Plot Generation
Prasannanjaneyulu Narisetty, Prasannanjaneyulu Narisetty

Effective visualization of Pharmacokinetic (PK) and Anti-drug Antibody (ADA) data is vital for interpreting complex clinical trial data. This presentation introduces a dynamic, user-friendly SAS macro designed to create high-quality box plots for Immunogenicity analyses, with a particular focus on ADA evaluations. The macro creates two different visualizations: 1. Antibody Titer by Study Visit 2. ADA Sample Result vs. Predose Serum Concentration by Study Visit and ADA Status This paper demonstrates how the macro generates two plots through extensive, parameter-driven customization. Users can define the population and analysis datasets, apply filters, set X and Y axis variables, integrate ADA status, and include statistical summaries. The macro validates that the specified X and Y variables exist in the input dataset and checks that color specifications align consistently with data groupings. It also detects missing cycle values and duplicate X-axis tick labels, and it provides clear, actionable log messages when required inputs are missing or incorrectly set.

PK-250 : Navigating Early Career Challenges in PK/ADA Statistical Programming
Diyu Yang, Merck
Sandeep Meesala, Merck & Co. Inc

Entering the pharmacokinetic (PK) and anti-drug antibody (ADA) programming field presents unique challenges distinct from traditional clinical programming. One of the primary difficulties is handling complex data structures from multiple sources. They must develop a deep understanding of drug concentration-versus-time data, different types of ADA variables, PK/ADA analytical process, and study design parameters that influence data integration. Additionally, PK/ADA data are often non-standardized and require extensive preprocessing before analysis. Balancing programming accuracy, efficiency, and scientific understanding can be difficult without sufficient domain knowledge. Another common challenge involves mastering efficient coding practices, debugging, and validation techniques. Many new programmers are still refining their programming logic and may struggle to write optimized, reusable, and compliant code. A limited understanding of the end-to-end clinical data workflow- “from data collection to statistical analysis and reporting- ” can also hinder a programmer’s ability to appreciate the broader context and impact of their work. Addressing these challenges requires a combination of structured training, continuous learning, and practical exposure. This paper summarizes the key obstacles faced by new statistical programmers and proposes practical strategies to overcome them, thereby facilitating smoother integration into the clinical programming environment.

PK-272 : An End-to-End R-Based Pharmacokinetics (PK) Workflow for Regulatory Submission: The INAVO120 Study
Qi Liu, Genentech
Shweta Vadhavkar, Genentech-Roche

The INAVO120 clinical study supported the development and regulatory approval of inavolisib and represented Roche’s first end-to-end regulatory submission executed entirely in R using the OCEAN (One CEntralised ANalytics platform), an open-source, cloud-enabled analytics environment. This work emphasizes modern programming practices for clinical pharmacology analyses, including pharmacokinetics (PK), pharmacodynamics (PD), and model-informed drug development. The Clinical Pharmacology, Modeling and Simulation Analyst (MSA) team, in collaboration with PD Data Sciences (PDDS) and OCEAN platform teams, redesigned the clinical study reporting process by transitioning PK analysis datasets, Analysis Data Model (ADaM) deliverables, Tables, Listings, and Graphs (TLGs), and Modeling & Simulation outputs from SAS to R. PK ADaMs were programmed in alignment with the latest CDISC standards for Non-compartmental Analysis (NCA), enabling standardized, submission-ready outputs and early PK data access. The Inavolisib New Drug Application (NDA), submitted in March 2024 and approved by the FDA in October 2024, served as the inaugural use case for this approach. Despite challenges associated with adopting new tools, automation, and GitLab-based collaborative workflows, the MSA team successfully delivered CSR-ready PK and M&S datasets. This case study demonstrates how open-source tools (R), automation, and CDISC-aligned standards can be integrated to support scalable, regulatory-ready PK/PD and model-informed drug development workflows.

PK-276 : SAS and R for Expanding Data for Pharmacometrics Analysis (PMx) Analysis Data Sets
Jeffrey Rathbun, Simulations Plus
Rebecca Humphrey, Simulations Plus, Inc.

Pharmacometrics- Analysis- is performed with a time-ordered data set. The data set will contain observation records (Pharmacokinetic (PK) concentrations and/or Pharmacodynamic (PD) endpoint values), dosing records, any demographic, vital sign, laboratory test, or addition classifying / qualifying data of interest. Typically, this data will be contained in SDTM and/or ADaM data domains that need to be assembled into the PMx Analysis data set. SAS and R are two programming languages that can be used to accomplish this. This paper will focus on expanding data in PMx Analysis data set builds and how that is handled by SAS and R. Data is often provided on specific dates or over a range of dates instead of daily records. Dosing information, subject data (e.g. weight), and simulated patient data are examples where expanding data is needed. R methods examined will include use of the tidyverse package and the combination of several functions. SAS methods examined will include macro programming, retain statements, and use of the Program Data Vector (PDV). There are advantages and disadvantages to using each language.

PK-282 : From Chaos to Clarity: A Programmer’s Perspective on Standardizing Population Pharmacokinetic (PopPK) Data for Regulatory Success
Naveen Muppalla, Exelixis, Inc
Shibani Harite, Exelixis Inc.

Population Pharmacokinetics (PopPK) modeling typically follows a multi-stage process encompassing base, intermediate, and final models. This paper emphasizes the critical role of statistical programmer in streamlining the PopPK data workflows and preparing datasets and documents for regulatory submissions. Integrating Phase I dose-escalation studies with Phase III pivotal trials requires robust data harmonization and adherence to regulatory standards. We explored two strategies for New Drug Application (NDA) submission. The legacy approach which included creation of NONMEM ready PopPK csv files, define.pdf and a reviewer’s guide. The CDISC approach which included creation of adppk.xpt dataset leveraging the ADaM PopPK implementation guide (Version 1.0) with machine-readable define.xml metadata. This paper compares these two approaches from a statistical programmer’s perspective by highlighting key challenges such as handling complex dosing schedules, imputing missing/partial exposure data, reconstructing Pharmacokinetic (PK) timepoints, and deriving interval-based concomitant medication covariates. Source data included SDTM domains for ongoing studies and ADaM datasets from completed studies. This approach minimized dependencies and allowed early model development. This paper also outlines a blueprint for the Analysis Data Reviewer’s Guide (adrg) and eCTD Module 5.3.3.5 structure ensuring data and model traceability. By mastering these two approaches, statistical programmers can streamline the regulatory submission workflow and align with FDA technical conformance expectations thus reducing the submission review time.

PK-310 : Enhancing ADaM PK Datasets to Automate PK TFLs Generation
Jianli Ping, Gilead Sciences Inc
Karthik Sankepelli, Gilead

Automated tool for programming outputs has been increasingly applied to generate safety tables, figures, and listings, which demonstrates improved programming efficiency . Implementing this programming methodology for PK data poses specific challenges, primarily due to differences in how treatment effects are defined when generating PK TFLs. Specifically, the key input variables for PK output automation, differs from the conventional treatment variables used in most other ADaM datasets. This discrepancy arises because PK analysis require treatment effects to be further stratified based on reference dosing days, food effect and other study-specifics. Currently, these PK specific treatment variables are typically derived and managed at the TFL programming level. In this paper, we propose PK group variables (TRTAGxN) and their descriptions that can fit the needs for PK TFLs programming by adding them into ADPC and ADPP. These group treatment variables could be considered as key variables for automation and ensure consistency of PK treatment variables among ADPC, ADPP and TFLs. The treatment formats can also be updated dynamically by taking values from variable description. These new PK treatment variables can be used to automate the PK TFLs programs, for study designs such as SAD/MAD, food effects, and drug-drug interaction to generate PK outputs. This can improve PK programming timing, efficiency and accuracy. Additionally this paper will also illustrate how updated PK ADaM datasets can be used applied together with macro parameter adjustment and dynamic creation of different formats to automate PK TFLs generation.

PK-334 : Two Paths, One PK Journey: The Art of Balancing ADPC and ADNCA
Ashok Abburi, Exelixis, Inc.
Rakhe Jacob, Exelixis
Shibani Harite, Exelixis Inc.

Selecting the appropriate analysis dataset for pharmacokinetic (PK) evaluation is essential for meeting regulatory expectations and aligning with evolving CDISC standards. While ADPC has historically been used to represent observed concentration- time data and support descriptive PK outputs, it is increasingly viewed as less suitable for formal regulatory submissions. In contrast, the FDA encourages the use of ADNCA, an Adam compliant structure specifically designed for noncompartmental analysis (NCA). ADNCA emphasizes traceability, standardized derivations, controlled terminology, and complete metadata, making it more suitable for regulatory review and inclusion in define.xml and the Analysis Data Reviewer’s Guide (ADRG). We examine practical scenarios to guide dataset selection, including transitions from legacy workflows to ADNCA, reuse of historical ADPC in new submissions, and studies that require both descriptive PK summaries and formal NCA. The analysis highlights that ADNCA strengthens transparency and reproducibility by preserving clear SDTM lineage and maintaining consistent documentation in define.xml and the ADRG. For dual ADPC- ADNCA submissions, we describe when providing both datasets is truly necessary and propose ways to prevent overlap that could complicate regulatory review. The paper further provides actionable guidance on adopting ADNCA, strengthening end- ‘to- ‘end traceability, and optimizing metadata to streamline regulatory review. Emphasis is placed on clear dataset of purpose statements, harmonized derivation rules, and disciplined use of controlled terminology. By articulating a principled approach to dataset design and documentation, this work equips sponsors and programmers to select datasets that enhance data quality, support regulatory readiness, and align with evolving standards while minimizing submission complexity.

PK-361 : Early Restricted Unblinded PK Data Access (ERUPA): Framework to Accelerate PopPK and ERES Deliverables Through Data-Centric, Firewalled Workflows
Dheeraj Rupani, AstraZeneca PLC
Srinivas Bachina, AZ
Kiran Kode, AstraZeneca PLC

AstraZeneca’s accelerated submissions initiative emphasizes delivery of high-quality Population PK (PopPK) and Exposure- Response (ERES) outputs through a data-centric, connected workflow. Early Restricted Unblinded PK Data Access (ERUPA) is a GxP-compliant framework that enables a firewalled team to access unblinded PK concentration data and prespecified covariates (e.g., demographics, dosing, labs) prior to clinical data lock (CDL), while strictly excluding safety and efficacy endpoints to maintain study integrity. ERUPA operationalizes governance via SOP-driven charters, role-based access controls, secure audit-trailed repositories, and defined interaction rules between blinded and unblinded personnel. Paired with Earlier SDTM/ADaM Access for PK and ERES (ESAP) post-CDL, ERUPA shifts PopPK off the critical path, creating capacity for timely ERES modeling and report finalization. We present the end-to-end process design, governance, and technical enablers: standardized dataset specifications via Pharmacometric Data Request Tool (PMX DaRT); harmonized raw PK data models; global CRT/eData packaging standards for first time-right submissions; and cross-functional training across Clinical Pharmacology, Biometrics, Data Management, and Integrated Bioanalysis. Implementation experience demonstrates reduced ambiguity and rework, accelerated access to analysis-ready data, improved audit readiness, and reproducible coding practices. Case examples highlight ERUPA suitability decision criteria (including covariate constraints and partnered programs), resource models (internal/external), contract language for early data transfers, and practical data cut-off planning to ensure representativeness while preserving blinding of efficacy and safety.

PK-387 : A systematic approach for imputing missing dose information in population pharmacokinetic analysis datasets
Prema Sukumar, Bristol Myers Squibb
Renuka Hegde, Bristol Myers squibb
Erin Dombrowsky, Bristol Myers Squibb
Neelima Thanneer, Bristol-Myers Squibb

Accurate recording of dosing events is essential for population pharmacokinetic (popPK) analysis. Missing dose dates or times frequently occur due to limitations in case report forms or incomplete data reconciliation, potentially compromising data integrity and analysis quality. Standardized imputation rules have been developed for oral, intravenous (IV), and subcutaneous (SC) doses to ensure complete dosing histories for popPK datasets. Dose records lacking both start and end dates are excluded. Oral/IV/SC use PK collection and adjacent dosing times for time imputation. IV/SC studies also use protocol-defined durations for imputation when dates/times are missing. ADDL, II are also used in Oral studies for missing dose records. All imputed records are flagged, and the algorithm is documented in both dataset specifications and pharmacometric reports. This systematic approach enables pharmacometric programmers to consistently handle data deficiencies, improving analysis quality and reproducibility. This paper will illustrate comprehensive examples for IV, SC, and oral dosing scenarios, serving as a reference for adapting standard imputation rules in popPK dataset preparation. Adoption of these standards increases efficiency and aligns with Clinical Data Interchange Standards Consortium Analysis Data Model popPK Implementation Guide, supporting industry-wide best practices.

PK-437 : Bridging the Gap: The Strategic Evolution of Real-World Evidence in Clinical Pharmacology and Regulatory Decision-Making
Anbu Damodaran, Alexion, AstraZeneca Rare Disease

While Randomized Controlled Trials (RCTs) remain the gold standard for establishing drug efficacy, they often fail to capture the complexities of diverse patient populations, long-term outcomes, and real-world prescribing patterns. This paper explores the burgeoning role of Real-World Data (RWD) and Real-World Evidence (RWE) from a pharmaceutical industry perspective, focusing on its integration into clinical pharmacology and drug development. We categorize key application areas, including the assessment of Drug-Drug Interactions (DDIs) in polypharmacy environments, dosing optimization for pediatric and organ-impaired populations, and the enrichment of Model-Informed Drug Development (MIDD). Furthermore, we examine the use of RWD in creating synthetic control arms for rare diseases and supporting regulatory label expansions. By addressing data source limitations such as missingness and loss to follow-up through advanced techniques like patient tokenization, this paper outlines a roadmap for clinical pharmacologists to lead cross-functional efforts. Ultimately, we demonstrate how RWE serves not just as a supplement to RCTs, but as a critical driver for more inclusive, efficient, and data-driven regulatory approvals.

Panel Discussion

PN-236 : The Impact of AI and Automation on Statistical Programming: Opportunities, Risks, and the Path Forward
Amy Gillespie, Merck & Co., Inc.
Daniel Schramek, GSK
Francis Kendall, Biogen
Qin Li, Regeneron Pharmaceuticals
Mariann Micsinai-Balan, Genentech/Roche

Automation and artificial intelligence (AI) are rapidly reshaping statistical programming within regulated clinical development environments. Advances in automated workflows, AI-assisted coding, and intelligent validation promise efficiency gains, yet they also introduce new challenges related to quality, reproducibility, governance, and regulatory compliance. As organizations operate under GCP and inspection-readiness expectations, statistical programming leaders must carefully balance innovation with rigor, ensuring that emerging technologies enhance trust in clinical trial deliverables. Equally important is understanding how these technologies affect the people who design, execute, and oversee statistical analyses.

PN-245 : ADaM Pet Peeves Part 2: More Things Programmers Do That Make Us Crazy
Sandra Minjoe, ICON PLC
Nancy Brucken, IQVIA
Alyssa Wittle, Atorus Research
Nate Freimark, The Griesser Group
Richann Watson, DataRich Consulting
Paul Slagle, IQVIA
Tatiana Sotingco, J & J Inovative Medicine

The authors have been actively involved in the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) team for many years, and they include past and future CDISC ADaM Team Leads, CDISC ADaM Sub-team leads, and authorized CDISC ADaM trainers. Because of our extensive ADaM expertise, we each end up reviewing a lot of ADaM submissions before they are sent to regulatory agencies. Following the companion paper presented at PharmaSUG in 2025, this paper highlights additional common issues that we’ve seen ADaM developers make. For each topic, we explain the issue and provide a better and/or more conformant approach.

PN-446 : Panel Discussion: QC & Validation: Beyond the Basics Participants: Authors of AP-159, AP-375, and AP-418
Eunice Ndungu, Merck & Co Inc.
Alice Cheng, Independent
John LaBore, SAS Institute
Josh Horstman, PharmaStat LLC
Troy Hughes, Data Llama Analytics

Quality control (QC) and validation are foundational to clinical trial programming, ensuring the accuracy, consistency, and regulatory readiness of SDTM, ADaM, and reporting outputs. This panel explores modern, practical approaches to QC that go beyond traditional dataset comparison workflows, with a focus on improving efficiency, reliability, and risk detection in real-world pharmaceutical programming environments. The session brings together complementary perspectives on the use of SAS® tools and custom automation for validation, including PROC COMPARE, macro-based comparison utilities, the &SYSINFO automatic macro variable, and enhanced comparison strategies designed to detect discrepancies that may be overlooked by default procedures. Panelists will demonstrate how standard comparison methods can mask issues in derived variables, missing values, special value handling, and large-scale analysis datasets, and how targeted options, automation, and alternative comparison techniques can strengthen validation outcomes. All examples and demonstrations will use SAS® software (Base SAS®), with compatibility across SAS 9.x environments (including SAS 9.4). No operating system- specific dependencies are required, and the concepts apply across Windows and Unix/Linux SAS deployments. The intended audience includes clinical programmers, statistical programmers, data standards teams, and validation specialists involved in double programming, regulatory submission preparation, and QC automation. Attendees should have a basic working knowledge of SAS programming and PROC COMPARE. The session is suitable for intermediate-level programmers seeking to improve QC efficiency and robustness, as well as experienced professionals looking to modernize validation workflows and reduce risk in submission-critical deliverables.

Real-World Data (RWD) and Real-World Evidence (RWE)

RW-126 : Streamlining Workflows in Real-World Evidence Studies with an R-Based Automation Tool
Lihai Song, Merck & Co.

Routine analyses- “summary statistics, regression, survival analysis, meta-analysis, and classification metrics (e.g., F1 score)- “are foundational to real-world evidence (RWE) studies but often require repetitive, time-consuming work. We developed an R-based automation tool that increases efficiency, reduces errors, and produces publication-ready outputs. R users incorporate the tool as an R package to standardize syntax, reduce typographical and case-sensitivity errors, and streamline workflows. Non-R users access the same functionality via a Shiny web application that enables point-and-click analysis: upload a dataset, choose variables and analysis type, and generate tables and figures downloadable in multiple formats. The tool simplifies routine RWE analyses, improving productivity and reproducibility across user skill levels. This paper describes the tool’s architecture and interface, includes application screenshots, and presents example use cases demonstrating typical workflows and outputs.

RW-186 : Corticosteroids in severe COVID-19 across molecular endotypes and vaccination status: an Emulated Target Trial approach to benchmark to and extend upon findings from RECOVERY
Xi Jiang, SAS Institute
Scott McClain, SAS Institute

Corticosteroids are standard care for severe COVID-19, but acute lung injury is molecularly heterogeneous, and treatment response may vary by vaccination status and patient-specific factors. We emulated a target trial to evaluate corticosteroid effectiveness in severe COVID-19 respiratory failure and to explore heterogeneity of treatment effects across vaccination status and clinically derived molecular endotypes. Using a retrospective cohort of 5,000 hospitalized patients at the University of North Carolina Hospital (January 2020- December 2022), we applied an emulated target trial framework with inverse probability of treatment weighting to address confounding and approximate randomized comparisons. Outcomes in mechanically ventilated patients were compared with the RECOVERY trial and stratified by vaccination status and molecular endotypes. We successfully recapitulated the benefit of corticosteroids observed in RECOVERY among patients requiring invasive mechanical ventilation. At the clinical level there were no differences in treatment effect across vaccination status. However, stratification by clinically-linked molecular endotypes revealed distinct populations of responders and non-responders to corticosteroids. This approach demonstrated how rigorous causal inference methods applied to real-world electronic health record data can provide insights into treatment effectiveness, validate patient subgroups likely to benefit, and support more precise, personalized therapy strategies in critical COVID-19 illness. The framework also enables continuous evaluation of treatments over time, can inform clinical trial design and anchors the value of molecular patient profiling.

RW-210 : dbLoadTable: A Robust and Efficient Solution for Bulk Data Transfer in Real-World Evidence Analytics
Li Liu, Merck & Co., Inc.

The rwdtools R package streamlines Real-World Data (RWD) analysis by offering functions tailored to the scale and complexity of Real-World Evidence (RWE) database studies. This presentation highlights one key function: dbLoadTable, which efficiently uploads large local data tables to remote databases. In RWE workflows, analysts often create intermediate datasets locally that contain partial patient or study information. To enrich these datasets with additional variables, such as demographics or outcomes, from other database tables, the local data must first be placed into the same database source. This is the only way to join tables within the same database. Benchmark comparisons show that dbLoadTable dramatically outperforms existing functions such as dplyr::copy_to, DBI::dbWriteTable, and BOPM::rwdex_save (BOPM is a Merck internal R package) in both speed and reliability. For example, when processing a 7.6 MB dataset, dbLoadTable is approximately 300 times faster than alternatives. For datasets over 40 MB, it is the only function among those tested that consistently completes the task without failure. This capability is critical for enabling scalable, efficient data integration in RWE studies.

RW-248 : Real World Data and CDISC – An Evolving Journey
Shuo Cao, Ephicacy
Venkat Rajagopal, Ephicacy Consulting Group

Real- ‘World Evidence (RWE) is redefining the generation of clinical evidence by capturing insights from routine healthcare data, and recent research shows that its integration with structured standards is actively shaping the evolution of data models themselves. To fully harness the potential of real- ‘world data (RWD), which often varies in structure and semantics, harmonization with globally recognized, semantically interoperable standards is essential to improve data quality, sharing, and evidence credibility. Aligning RWD with research standards such as CDISC, while bridging with interoperability frameworks like HL7 FHIR, unlocks synergies, identifies gaps, and enhances the utility of diverse observational sources. Comparative studies demonstrate that CDISC standards can align effectively with disease-focused research datasets while complementing healthcare exchange standards to capture richer clinical context and enable bidirectional harmonization. This co- ‘evolutionary perspective reimagines CDISC not as a static endpoint but as a living framework that adapts to the nuances of real-world practice and interoperability advances, guiding the transformation of heterogeneous RWD into evidence that is both submission-ready and robust for broader discovery. The journey from unstructured clinical encounters to CDISC-aligned RWE reflects a synthesis of semantic mapping, standards convergence, and collaborative innovation, pointing toward a future where real-world insights and structured evidence are seamlessly interwoven to accelerate discovery, enhance reproducibility, and support evidence-driven healthcare. This paper will elucidate how the convergence of real- ‘world data and clinical research standards represents a methodological inflection point, fostering a standards- ‘driven ecosystem that advances the reproducibility, regulatory relevance, and translational value of real- ‘world evidence in contemporary healthcare.

RW-253 : Embracing Novel Approaches to Automated Causal Inference Framework
Laura Watson, SAS Institute
Sherrine Eid, SAS Institute

Our industry perhaps has the deepest understanding of the statement, correlation does not imply causation. Causal claims are directly actionable for policy implementation in a way that traditional association-based claims might not be. This leaves us with a rather glaring and consequential question: how can we infer causation? As the evolution of technology and methods grows, we find ourselves at an exciting time for innovation and potential to exponentially impact causality in our industry. This paper outlines an applied framework for performing causal analysis on observational data in SAS Viya, leveraging AI and automation with the trusted logic in SAS and novel open-source packages. Researchers can now robustly analyze causal inference in real world data, as well as visually analyze the insights generated by the best-in-class analytics on an end-to-end platform. Moreover, a low-code-no-code and programming interface allows easier collaboration for research study teams to access advanced analytics like machine learning models to generate evidence for a regulatory dialogue. This solution allows epidemiologists and researchers to assess real time what-if scenarios and comparisons to generate intelligence more quickly. As a motivating example, we leverage this framework against real-world data from the U.S. National Health and Nutrition Examination Survey (NHANES) to estimate the causal effect of smoking cessation on long-term weight change.

RW-335 : Engineering a Scalable Centralized Statistical Monitoring Engine: The Architecture of the Risk Assessment & Mitigation Platform (RAMP)
Sanjeev Kumar, Circulant Solutions Inc
Sanjay Koshatwar, Circulants
Gopal Joshi, Senior Scientist

Traditional clinical monitoring relies heavily on manual oversight, often failing to detect subtle data anomalies indicative of fraud, misconduct, or systemic error. We present the architecture and technical implementation of RAMP, a scalable platform for Centralized Statistical Monitoring (CSM). Leveraging a big data technology stack (Python, SQL), RAMP ingests and harmonizes high-dimensional clinical study data. The core innovation lies in its hybrid analytical engine, which combines rule-based Key Risk Indicators (KRIs) with unsupervised machine learning algorithms (e.g., Clustering, Isolation Forests) to detect multivariate outliers. The system features an interactive visualization layer that prioritizes high-risk sites for clinical monitors. This paper details the data pipeline, the statistical methodologies used for calculating risk scores, and the feedback loop for tracking mitigation effectiveness. Performance benchmarks highlight the platform’s ability to process multi-center trial data efficiently, providing a robust technical foundation for data-driven clinical oversight.

RW-340 : Architecting a Unified Healthcare Data Lakehouse: Leveraging Spark, OMOP, and FHIR for Multi-Source Integration
Sanjeev Kumar, Circulant Solutions Inc
Gopal Joshi, Senior Scientist
Sanjay Koshatwar, Circulants

Integrating heterogeneous healthcare data streams demands robust infrastructure capable of handling the extreme “Variety” and “Volume” of big data. This paper presents a high-performance “Lakehouse” architecture designed to standardize and analyze disjointed datasets: FAERS (spontaneous reporting), administrative Claims (structured billing), and IoT Device Sensors (unstructured time-series). Built on PySpark for distributed processing and Delta Lake for ACID-compliant storage, the system employs a hybrid standardization strategy. We leverage the OMOP Common Data Model to normalize diverse medical terminologies in claims and safety reports, while simultaneously modeling high frequency sensor data into FHIR-compliant structures. To validate this architecture, we linked these datasets via temporal proxies and employed specialized R libraries to generate high-dimensional visualizations of cross-domain findings. Our results demonstrate that harmonizing open standards with modern data engineering effectively resolves data fragmentation, paving the way for comprehensive, multidimensional healthcare insights.

RW-343 : From Automation to Evidence: Governing LLMs and AI Agents in Real-World Outcomes Research
Sherrine Eid, SAS Institute

Introduction Large language models (LLMs) and AI agents are increasingly used in patient outcomes research (POR) to accelerate feasibility assessment, cohort construction, phenotype extraction, literature reviews, and analytic code generation. While these tools offer efficiency gains, their statistical validity, reproducibility, and governance remain insufficiently characterized, particularly when applied to real-world data (RWD) supporting regulatory and health technology assessment decisions. This study evaluates governance considerations and methodological tradeoffs of publicly available LLMs and AI agent frameworks for POR. Methods We conducted a structured technical review and benchmarking of publicly available LLMs (OpenAI GPT-4, Google Gemini, Anthropic Claude, and open-weight models such as Meta Llama 3.1) and agent frameworks (retrieval-augmented generation [RAG]). Evaluation criteria included statistical reproducibility, data governance and privacy controls, transparency and auditability, bias and drift management, and fitness-for-purpose in regulated POR workflows. Assessments were aligned with guidance from the NIST AI Risk Management Framework, TRIPOD-AI, CONSORT-AI, and FDA RWD/E considerations. Results LLMs performed well for NLP-driven phenotype extraction, protocol summarization, and exploratory analyses when deployed with RAG and human-in-the-loop validation. However, model opacity and vendor-driven version changes introduced reproducibility risks without formal versioning. Open-weight models enabled greater statistical control and auditability but required substantial MLOps investment. Agentic workflows improved analytical throughput while increasing error-propagation risk. Conclusions LLMs and AI agents can enhance POR efficiency when governed as statistical instruments. Model selection, deployment architecture, and governance controls should be treated as core methodological decisions to ensure transparency, validity, and regulatory readiness.

RW-367 : PSMATCH: Propensity Score Matching of Clinical Trial Data with External Control Arms
Ginger Barlow, Prilenia Therapeutics

Pharmaceutical and biotech companies have major financial incentives to lower research and development costs and get products to market faster. Patients have urgent needs for these treatments to diagnose, cure or alleviate symptoms of diseases and conditions. External control arms using Real World Data, such as registries and platforms, can facilitate research into rare diseases when a randomized controlled trial would be challenging to perform. Using these external control groups provides a more reliable comparison to evaluate safety and efficacy than a single-arm trial would. Prilenia has utilized external control arms for studies in treatment of Huntington’s disease and amyotrophic lateral sclerosis (ALS). Analyzing data from these hybrid-controlled trials has similar methodology to analysis of internal clinical trials with patient selection from the control arm needed in advance to ensure the control group matches the clinical profiles of the treated group. This paper discusses how we used SAS PROC PSMATCH to create a matched group of treated and control patients by computing the observation weights from propensity scores, reducing the differences in baseline covariates between treated and control, and performing analysis using the external arm as control. This knowledge will be useful as an introduction for intermediate level programmers and statisticians to these types of studies and analyses.

RW-379 : Contrast Effects in Curated Observational Data
Shankar Srinivasan, Bayer
Helen Guo, Bayer
Yvonne Buttkewitz, Evidenze Germany GmbH

We provide details on weighted analyses to help remove baseline biases in contexts where there are multiple groups and where there is interest in comparisons involving multiple treatment groups, such as pairwise-comparisons or comparisons of one group or a combination of groups to another group or combination. For instance, in a data set with outcome on 5 therapies with 2 in one drug class and 3 in another drug class, there might be interest in comparison of a therapy of interest to other therapies in the drug class or to one or all the members of the other drug class, or in a comparison of the two drug classes. These call for the use of a composite linear string of treatment effects called a contrast. For such composite inferences, we describe modifications to weighting typically used in the two-group setting, which lead to near optimal estimates of composite effects and provides stabilized weights. Variants on theoretical results supporting the use of propensity scores in observational settings are used to obtain expressions for the weights. This extended framework validates as well, the use of stabilized weights in the two-groups setting. A simple pedagogical example is used to clarify the computation of this weight and simulations in SAS® are used to demonstrate he utility in estimating effect. Both simulation code and the simpler code for analysis will be provided. Similar results have been previously presented by the author at the BASS conference but have not yet been published.

RW-384 : Assessing Quality of Real-World Data Sources
Robert Collins, SAS Institute

As recently as December 2025, the FDA continues to update guidance on the use of real-world data (RWD) and evidence (RWE) for drug and biologic products and medical devices. While the regulator continues to be increasingly open to the use of RWD sources that have flourished as a result of technical advancements and requirements such as medical data interoperability, sponsors and regulators believe there are still significant opportunities for the use of RWD as long as the quality and provenance of the sources are demonstrable. In this session, we present a discussion applicable to anyone interested in the use of RWE of “first-person” and “third-person views” of data by the generating institutions and the consumers of pooled data and establish that source-based quality methodologies have the greatest potential for improved data. We next review the factors that give rise to problematic data at the source. We then look at practical approaches for a third-party data consumer to assess and document data quality. These methods present and expand on traditional methods of exploratory data analysis and consider the use of machine learning and artificial intelligence to provide new approaches to assessing data quality.

RW-429 : Biostatistical Foundations 201: Privacy Preserving Patient Linkage Across Real World Data Sources
Anbu Damodaran, Alexion, AstraZeneca Rare Disease

As integrated evidence generation becomes the industry standard, unifying disparate Real-World Data (RWD) sources- “such as EHRs, claims, and registries- “is essential for a longitudinal patient view. However, the absence of universal identifiers and the necessity of HIPAA compliance require sophisticated Privacy-Preserving Record Linkage (PPRL). This paper explores the biostatistical foundations of patient linkage, progressing from Deterministic and Probabilistic (Fellegi-Sunter) models to modern Tokenization and Bloom filter encoding. We move beyond basic matching to address the critical biostatistical challenge of Match Quality Estimation. By analyzing the trade-offs between precision and recall, we demonstrate how linkage errors can propagate into downstream analyses, potentially compromising study validity. Furthermore, we examine Linkage Bias, discussing how non-random missingness in quasi-identifiers can lead to systematic under-representation of specific demographic subgroups. Attendees will learn to evaluate data convergence using overlap metrics and temporal alignment, while gaining familiarity with tools like Splink and Dedupe. By bridging the gap between data engineering and clinical statistics, this session equips biostatisticians, epidemiologists and statistical programmers with the framework to accurately and ethically “connect the dots” across the RWD ecosystem.

RW-435 : Operationalizing Real World Data for External Control Arms: An End to End Framework for Rare Disease and Oncology Trials
Anbu Damodaran, Alexion, AstraZeneca Rare Disease

As oncology and rare disease development increasingly confront ethical and operational limits of randomized placebo arms, real- ‘world data (RWD) offers a path to create external or synthetic control arms. However, transforming registry and electronic health record (EHR) data into regulatory- ‘grade evidence requires more than concept- “it demands disciplined engineering. This paper consolidates a practical framework to architect trust in real- ‘world evidence (RWE) by integrating (1) protocol- ‘to- ‘registry semantic harmonization, (2) automated, audit- ‘ready quality control (QC) with ML- ‘assisted checks for unstructured data, and (3) principled causal inference designs (e.g., target trial emulation, propensity- ‘based balancing, doubly- ‘robust estimation) aligned to ICH E9(R1) estimands. We describe an end- ‘to- ‘end pipeline emphasizing provenance, versioning, pre- ‘specification, diagnostics, and sensitivity analyses. An illustrative oncology use case outlines cohort construction, endpoint derivation (e.g., ORR, PFS, OS), overlap weighting, hierarchical modeling for site effects, and adjudication workflows for radiology NLP outputs. The result is a submission- ‘ready blueprint that elevates RWD from observational noise to high- ‘fidelity operational evidence- “improving ethics, efficiency, and interpretability for small- ‘sample, high- ‘need populations.

Study Data Integration & Analysis

SI-109 : Insights and Experience Sharing with Patient-Reported Outcome Data Analysis in FDA’s Submission
Jingyuan Chen, Genentech

Recent years have seen a growing interest on the inclusion of Patient-Reported Outcomes (PRO) data in confirmatory trials. This presentation will explore key insights from the FDA’s guidance on ‘Submitting Patient-Reported Outcome Data in Cancer Clinical Trials,’ with a focus on critical definitions and analytical approaches. We will share our experiences in addressing FDA information requests related to PRO data, including successful cases where PRO analyses contributed to labeling during NDA submissions. Additionally, we will outline the aligned strategies for PRO data analysis within our molecule program, highlighting their impact on future filing studies.

SI-227 : A Statistical Programmer’s Guide to Tipping Point Analysis in SAS
Ang Xu, Boehringer-ingelheim

Handling missing data under potentially unverifiable assumptions remains a challenge in longitudinal clinical trial analyses. Primary analyses frequently rely on the ‘follow the reference’ (FTR) assumption, whereby patients discontinuing randomized treatment are assumed to follow the response trajectory of the placebo arm. From a statistical programmer’s perspective, this presentation demonstrates a SAS-based implementation of a tipping point analysis to assess the robustness of primary conclusions to departures from the FTR assumption using longitudinal forced vital capacity (FVC) data. Non-monotone missing FVC values are first imputed using Markov Chain Monte Carlo (MCMC) methods to generate certain datasets with a monotone missingness pattern, a prerequisite for sequential regression imputation in SAS. The number of imputations is selected to minimize Monte Carlo error, and reproducibility is ensured through use of a fixed random seed. The tipping point analysis is conducted using a Multiple Delta Adjustment approach. Within each monotone dataset, missing values are sequentially imputed by referencing placebo-arm data, with delta adjustments applied to reflect varying degrees of departure from the FTR assumption. Adjustments are specified as linear functions of time since treatment discontinuation to represent declining persistence of efficacy. Each completed dataset is analyzed using a random slope and intercept model fitted via restricted maximum likelihood, and treatment differences at the time of interest are combined using Rubin’s rules. Results are summarized to identify tipping points where study conclusions change.

SI-293 : From Data to Dossier: Lessons from Cross-Company Regulatory Submissions
Nikita Joseph, AstraZeneca
Gayathri Mahadevan, Daiichi Sankyo, Inc

Cross-company pooling and submission presents significant challenges requiring agile strategy adaptation, unique problem-solving skills, and predictive insight. Drawing on personal experiences collaborating in cross-company projects this paper will share practical lessons to drive a successful regulatory submission for products co-developed by pharma companies. We will examine complexities around such pooling strategies, and how shifting company-driven priorities impact programming deliverables. We will explore data challenges caused by differing company standards through examples from adverse events, exposure, pharmacokinetics, and immunogenicity data. Particularly, challenges around data integration and alignment, standard adherence, and maintaining analysis consistency will be discussed. We will then shift our focus to programming review of submission documents like pooling strategy, SAP, CSR, and SmPC. Building on these experiences, we will demonstrate how early alignment, proactive risk monitoring, and systematic review processes will improve efficiency and quality. Attendees will learn strategies to drive successful submissions in the collaborative drug development space.

SI-306 : An Approach for Generating Tumor Biopsy Datasets for Drug Development
Ryan Hernandez-Cancela, Merck & Co., Inc.
Jeff Cheng, Merck & Co., Inc.
Sandeep Meesala, Merck & Co. Inc

Drugs that target tumor-associated antigens (TAAs) represent an important domain within oncology, due in part to their proven clinical benefit. To support in-house development of these drugs, certain exploratory analyses were planned. These analyses will help identify potential TAA targets, and they require data from tumor biopsies collected in a study. However, comprehensive datasets that include all pertinent biopsy data were not available. Methods were needed for generating these datasets in an accurate, robust, and timely manner. Therefore, a suite of tools was developed for creating biopsy datasets. An algorithm was designed to match biopsies to clinical data from a specific CRF. Programs were written in SAS to derive the values of certain CDISC variables (ex. smoking status) at the biopsy date. Supportive documents, such as dataset specifications and issue logs, were created in Excel to provide traceability and help track issues. Finally, a general workflow was designed for generating datasets as efficiently as possible. As a result, comprehensive biopsy datasets can now be created within reasonable timelines. Data linked to biopsies are either directly applicable or provided with enough context to assess relevance. Sufficient documentation is available to justify mappings between requested data and available clinical data, as well as answer other stakeholder questions. In this paper, we explain the process for creating these exploratory biopsy datasets and how it addresses certain challenges that we encountered.

SI-348 : Algorithms to align the distribution of follow-up across independently collected cohorts when comparing time to event endpoints using conventional Kaplan Meier and Cox regression methods.
Shankar Srinivasan, Bayer
Yvonne Buttkewitz, Evidenze Germany GmbH
Regina Uttenreuther, Evidenze Germany GmbH
Baldeep Chani Talwar, Syneos Health
Manjari Dissanayake, Bayer Corporation

Using data from two independently collected cohorts, such as a historical control from a completed randomized trial and a 1:1 propensity score matched treated cohort from a prospective single arm trial, one can often have differences in the accrual time and follow-up post accrual. Such a context requires special approaches for time-to-event endpoints. We will describe an algorithm and provide SAS® code to implement the algorithm together with a motivation for its use using masked data (publication of study results are pending). In our analyses, we matched cohorts when the enrollment of our prospective cohort was completed and before the availability of complete outcome data in the prospective cohort. A team blinded to outcome data from both cohorts conducted the matching. We then tallied events across cohorts while our trial was ongoing till the event thresholds for analyses were reached. To address differential follow-up distributions, an algorithm was used to provide equivalent follow-up in the control to that in the treated arm, both during the study and at study closure. Conventional methods under proportional hazards could have led to biases through differential accrual and follow-up durations across the groups compared: a simplex paradox where short durations and long durations are associated with hazards of events of differing magnitudes, despite both being associated with the same ratio of hazards across groups. Our protocol pre-specified algorithm randomly paired control patients with similar follow-up distributions in a manner agnostic to censoring or event status, using potential observation times till data-cut off.

SI-353 : Simplifying Clinical Site Oversight with GenAI Site Narratives: Transforming Raw Data into Actionable, Inspection-Ready Insights
Rohit Kadam, Mr.
Saurabh Das, Tata Consultancy Services
Niketan Panchal, Tata Consultancy Services

Risk-Based Quality Management (RBQM) demands timely, standardized insights into site performance. Yet today, site narratives are manually compiled from disparate systems like EDC, CTMS, safety, labs making oversight inconsistent and resource intensive. We present an agentic AI framework that automates inspection-ready site narratives by orchestrating specialized agents to analyze enrollment, protocol deviations, SAE trends, and data quality metrics. Outputs are synthesized into structured narratives using a generative layer, with full traceability and configurable business rules. In a pilot, this approach reduced manual effort by 70- 80% and surfaced emerging risks weeks earlier than traditional reviews. Built on CDISC standards, the solution ensures interoperability and aligns with ICH E6 (R3) expectations. This session will share implementation architecture, validation strategies, and lessons learned from scaling across studies. Attendees will gain practical insights into integrating AI-driven narratives into RBQM workflows to enhance quality oversight and regulatory readiness. Methods: – Data sources: EDC, CTMS, Labs, Safety databases, lab systems, and query logs. – Agentic AI design: Specialized agents analyze enrollment velocity, protocol compliance, SAE trends, query aging, and lab outliers. – Narrative synthesis: A generative layer converts agent outputs into standardized narratives with full audit traceability and configurable business rules. Results: Pilot implementations reduced manual effort by 70- 80%, standardized interpretation across sites, and surfaced emerging risks weeks earlier than traditional reviews. Conclusion: Automated site narratives represent a novel RBQM capability, bridging the gap between raw data and actionable insights. his session will share architecture diagrams, validation strategies, and deployment best practices for attendees.

SI-373 : From Chaos to Consistency: Standardizing External Clinical Data with Excel Power Query
Isaac Vazquez, Ephicacy Consulting Group
Jose Hernandez Rivero, Principal Statistical Programmer

Clinical and real-world data integrations increasingly depend on external data sources that fall outside traditional clinical data pipelines. These sources- “such as laboratory vendor files, Clinical Endpoint Committee (CEC) datasets, and operational reports- “are often delivered in heterogeneous formats including plain text, CSV, and JSON files. Inconsistent structures and formats frequently lead to manual preprocessing, custom scripts, and increased risk of error prior to SAS integration. This paper presents a practical and scalable approach using Microsoft Excel Power Query to ingest, transform, and standardize external data from multiple heterogeneous sources into a single, structured Excel output designed for direct consumption in SAS. Power Query is used to perform repeatable data ingestion, parsing, data cleansing, variable standardization, and reshaping while maintaining transparency and traceability. The proposed solution is built as a reusable Power Query framework embedded within an Excel file. Once configured, the framework automatically refreshes and regenerates a standardized output whenever updated source files are received, without requiring modifications to the transformation logic. The resulting dataset adheres to predefined structural and naming conventions, enabling seamless import into SAS and downstream integration with SDTM, ADaM, or other analysis-ready datasets. Use cases demonstrated include the integration of external laboratory results, adjudicated adverse event data from CECs, and other non-CRF data sources commonly encountered in clinical trials. This approach reduces programming overhead, improves reproducibility, and provides a controlled preprocessing layer that complements existing SAS workflows.

SI-414 : Harmonizing History: A Framework for Deriving Line of Therapy in Complex Integrated Summaries.
Bhargav Koduru, Medidata Solutions
Kavitha Guddam, Dassault systemes
Santosh Reddy Lekkala, Medidata Inc

The Integrated Summary of Safety (ISS) and the Integrated Summary of Efficacy (ISE) are vital components of submissions to the FDA and EMA. Their importance has now become more important following the FDA’s 2026 guidance titled “Use of Bayesian Methodology in Clinical Trials of Drugs and Biologics.” This guidance addresses the frameworks that incorporate data from previous studies, Real-World Evidence (RWE), and non-concurrent controls to improve study design and predictive outcomes. A key factor in successfully harmonizing these diverse data sources is the accurate determination of Line of Therapy (LOT). LOT refers to the order/sequence in which treatments are administered as the disease progresses and plays a crucial role as a stratification factor, especially in oncology. Its impact is twofold: in terms of efficacy, LOT helps distinguish the probabilities of success, as patients in earlier lines of therapy generally show better responses. Regarding safety, it informs pooling strategies for dose bridging, recognizing that later-line patients may need higher dosing regimens. Additionally, LOT status can affect the statistical correlation between Progression-Free Survival (PFS) and Overall Survival (OS), thereby validating the selection of endpoints. However, utilizing LOT necessitates consistent data normalization to standardize treatment sequences, responses, and durations across all data sources. This paper outlines our journey in implementing LOT standardization. We will discuss the specific challenges encountered in harmonizing data for the ISS and ISE, and the methodologies used to derive accurate LOT.

Submission Standards for Global Health Authorities

SS-114 : Statistical Programmer’s role in using Lorenz eValidator to validate contributions to eCTD
Sampath Madanu, astrazeneca

The Electronic Common Technical Document (eCTD) is the standard electronic format for pharmaceutical companies submitting New Drug Applications (NDAs), Biologics License Applications (BLAs) and other reports to regulatory agencies globally such as FDA, EMA and PMDA which require the technical compliance to eCTD for their regulatory submissions. Lorenz eValidator is one of the commonly used eCTD validation software by the Pharma companies and regulatory agencies. It can perform Study data checks, PDF document checks (aCRF, Reviewer’s guide, Complex Algorithm-Bookmarks, Hyperlinks, Annotations), and Key File checks. In the past, publishing teams were the main users of Lorenz eValidator, but with focus on accelerating submissions and getting medicines to patients faster, some aspects of eCTD technical validation are shifting earlier in the process. Statistical Programmers are now using Lorenz eValidator to technically validate deliverables (PDFs, datasets) before passing them to publishing to avoid identifying issues later and needed to redeliver. The Lorenz eValidator tool is separate and different from Pinnacle21 validation tool that many programmers are familiar with. This article provides details Statistical Programmer’s role in eCTD validation using the Lorenz eValidator, different validation criteria, errors encountered during the validation and resolving these errors.

SS-176 : Continuous GxP: Implementing Change Control and Revalidation for a Living Open Source Library
Priyanka Sawant, Sycamore Informatics

The rapid innovation cycles in the R and Python Open Source (OS) ecosystems present a significant dilemma for GXP-regulated environments. While continuous integration of new features is desired, the traditional, slow, and expensive study-by-study validation process is fundamentally unsustainable against weekly package updates. This conflict creates an innovation bottleneck, forcing organizations to operate on outdated, sub-optimal codebases. This presentation addresses this critical gap by detailing a Continuous Validation Strategy enabled by collaborative technology and services expertise. We will demonstrate a solution that shifts the focus from reactive, full-scope validation to proactive, risk-mitigated change control managed jointly by a GxP compliant platform and a validation partner. The process is built around three pillars: 1) Risk-Based Triage: New package versions are automatically assessed using objective metrics (like riskmetric) combined with a services expert regulatory review, efficiently scoping the revalidation effort to only critical changes. 2) Modular Validation: The system leverages existing, pre-approved validation artifacts, executing only “delta” test cases that target modified functionalities. This significantly accelerates the release of validated package versions. 3) Version Management: Crucially, the platform utilizes containerization to ensure stability, allowing ongoing clinical studies to remain locked down on their currently validated package versions while the central library is updated. This systematic, risk-mitigated approach transforms the validation burden into a streamlined, continuous, audit-ready process.

SS-193 : Structure for Success: Delivering a Complex, Accelerated NDA with Evolving Scope
Tingting Tian, Merck
Chao Su, Merck
Erica Davis, Merck

Late-stage regulatory submissions such as New Drug Applications (NDAs) require tight coordination among clinical, statistical, programming, and other functional teams, while timelines and scope evolve in real time. This paper describes the planning, execution, and lessons learned from an NDA comprising nine components: a supportive Phase II study, four pivotal Phase III studies, three pooling packages, and a combined bioanalytical/BIMO package. Approximately 1,300 tables, figures, and listings (TFLs) were produced, with a Safety Update Report (SUR) following. Timelines for each of the Phase III database locks and the target submission date shifted earlier by 2-5 months, reducing development and review cycles. Key challenges included managing a compressed and shifting timeline, incorporating late-breaking requests for an Integrated Summary of Efficacy, and extensive post-hoc sensitivity analyses, and delivering complex model-based efficacy analyses that required advanced handling and computationally intensive methods. To address these challenges, the team implemented staggered development, centralized standards under a lead statistical programmer, regularly scheduled cross-functional check-ins, automation-enabled QC, structured intake and prioritization of post-DBL requests, and enhanced validation strategies such as four-way validation. The paper concludes with practical recommendations on minimum development windows, risk-based dry runs, modular programming standards, and governance of resources and requests for accelerated submissions.

SS-209 : Lessons to Guardrails: Operationalizing Early Checks for FDA Submission Readiness
Jeff Xia, Merck & Co.

Managing lessons learned from numerous submissions can be challenging and difficult to track consistently. This paper introduces a practical method to transform lessons into guardrails through a phase- ‘based checklist embedded between Last Subject Out (LSO) and Database Lock (DBL), enhancing quality and consistency in studies with filing potential. The checklist was developed by extracting insights from past oncology submissions, staff meetings, monthly Lunch- ‘and- ‘Learns, and periodic Spotlight Projects; converting these insights into actionable statements; and assigning each item an owner, timing, and verifiable actions. The framework addresses common pitfalls, including CSR/TFL alignment, metadata currency, aCRF readiness, investigational drug mapping consistency, disciplined dry runs, unblinded DMC workflows, compliance checks, log hygiene, and submission readiness across eCTD components. Attendees will learn a repeatable process to convert lessons into guardrails, review concrete examples and evidence artifacts, and take away ready- ‘to- ‘use templates and a “Top Pitfalls” checklist they can deploy immediately. Full paper attached, please see attachment.

SS-243 : Efficiency in Action: Automating Bookmarking for CRFs and Other Regulatory Submission Documents
Srivathsa Ravikiran, Agios Pharmaceuticals
Sri raghunadh Kakani, Agios Pharmaceuticals
Yang Xu, Agios Pharmaceuticals

Bookmarking Case Report Forms (CRFs) for regulatory submissions is a critical but often laborious task for statistical programmers. Manual bookmarking in Adobe Acrobat Pro is time-consuming, repetitive, and prone to human error, especially for large, multi-level documents. This paper presents a detailed, step-by-step workflow for automating CRF bookmarking using the TRS toolbox (formerly ISI toolbox) plug-in for Adobe Acrobat Pro. By leveraging the toolbox’s import/export functionality and Excel templates, programmers can efficiently generate, edit, and apply complex bookmark structures with minimal manual intervention. This approach not only accelerates the submission preparation process but also enhances accuracy, reproducibility, and compliance with regulatory requirements. The workflow is illustrated with practical procedures, template examples, and guidance for integrating screenshots, making it accessible for programmers seeking to streamline their CRF bookmarking tasks.

SS-254 : A Practical Approach to Multiple-Period CLINSITE Preparation
Liyuan Huang, Alnylam Pharmaceuticals, Inc.
Amanda Plaisted, Alnylam Pharmaceuticals
Sreedhar Bodepudi, Alnylam

The preparation of the summary-level clinical site dataset (e.g., CLINSITE) for a Bioresearch Monitoring (BIMO) package in an NDA submission requires consistent, traceable, and regulatory-compliant data structuring across study periods. In the study described, two periods were included: a double-blind (DB) treatment period, which contained two primary efficacy endpoints and safety evaluations for two different population sets, and an open-label extension (OLE) period, which focused exclusively on safety assessments at the time of the primary analysis database lock. To maintain consistency in data review and downstream validation without introducing any variable that is not in the FDA BIMO updated technical conformance guide (TCG, Version 3.0, August 11, 2022), the variable STUDYID in CLINSITE was structured to incorporate both DB and OLE data. ENDPOINT and ENDPTYPE were set to the same values in DB period, while TRTEFFR1 and TRTEFFR2 were intentionally set to missing value for the OLE period because no formal efficacy endpoints were evaluated at that time. This paper outlines the technical and regulatory challenges encountered, and presents practical solutions implemented to prepare a CLINSITE dataset aligned with the TCG version 3.0.

SS-255 : Investigator-Initiated Trials: Navigating Statistical Programming Challenges with Practical Solutions
Vikas Patil, Servier Pharmaceuticals
Ramesh Sundaram, Servier

Investigator-initiated trials (IITs) are essential for advancing clinical research, particularly in exploring novel therapeutic approaches and addressing unmet medical needs. Unlike industry-sponsored trials, IITs are often driven by academic researchers and often operate under constrained budgets, limited operational infrastructure, and diverse data collection systems. From a statistical programming perspective, these constraints can translate into heightened risk across the end-to-end pipeline – ranging from data ingestion and transformation to analysis, reporting, and inspection readiness. This paper outlines key programming challenges in IITs, including data integration, statistical methodology, compliance requirements, and resource limitations. Practical solutions include adopting minimum data standards (e.g., CDISC), using modular and reusable programming frameworks, implementing version control and automated checks, and leveraging templated documentation for traceability. Applying modern programming principles and risk-based validation enables IITs to deliver inspection-ready outputs efficiently, ensuring reproducibility and credibility without replicating the complexity of large-scale industry trials.

SS-261 : Closing the Loop: Validating AI-Generated SDTM Mappings using CDISC CORE and Synthetic Data
Pietro Belligoli, Technical University of Munich (TUM)
Constantin Weberpals, TUM
Yarhy Flores Lopez, Technical University of Munich

Generative AI approaches to SDTM mapping often focus on prediction accuracy but lack tight integration with conformance validation, leaving specification and logic errors undetected until late in the data collection or analysis phases. We describe a closed-loop framework that combines generative AI SDTM mapping with automated CDISC CORE conformance validation using synthetic data to enable earlier detection of specification issues. The proposed methodology extracts detailed field-level metadata from CRFs and complementary non-EDC inputs, including data transfer specifications and generates SDTM variable mappings using large language models grounded in SDTMIG specifications. To proactively assess conformance, the system generates synthetic SDTM datasets based on the proposed mappings and executes CDISC CORE validation rules against these datasets. Validation failures automatically trigger an iterative refinement loop, in which the AI model revises mappings and derivation logic based on specific CDISC CORE rule violations and contextual feedback. This closed-loop process continues until conformance criteria are satisfied or remaining non-conformance is identified as expected and appropriately documented. Early results demonstrate the framework’s ability to detect logic-based derivation errors, context-dependent Value Level Metadata violations, and cross-variable inconsistencies such as mismatched Test Codes and Units that are difficult to identify through manual review alone. By integrating mapping generation and conformance validation into a single automated workflow, this approach demonstrates the feasibility of shifting CDISC compliance earlier into the specification process, reducing downstream rework and improving the overall quality and reliability of SDTM deliverables.

SS-268 : Post-DBL Programming Update Tracker: Automating Revision Capture to Strengthen Audit Readiness, Oversight, and Compliance Across Programs and ADaM Specifications
Yunyi Jiang, Merck & Co., Inc.
Christine Teng, Merck

Regulatory inspections are a critical component of the submission process, serving as a safeguard to confirm the safety, efficacy, and compliance of pharmaceutical products. For management and governance stakeholders, demonstrating compliance, performing pre-inspection checks, and proactively avoiding CAPAs (Corrective and Preventive Actions) are essential. After Database Lock (DBL), even minor updates to code or specifications can create significant audit and compliance risks if documentation is incomplete or inconsistent. Audit readiness depends on maintaining a single, traceable record of what changed, when it changed, and why. This paper presents a SAS-based tool that tracks post-DBL updates to programming and ADaM specifications. It converts technical logs into oversight-ready evidence, including standardized Excel outputs, discrepancy alerts for risk-based review, and a non-destructive update mode that preserves user-added commentary while appending new findings. This tool can scan multiple folders, check program and specification views, and produce comprehensive summaries within minutes. This paper explains the tool’s design and workflow, highlights example alerts and outputs, and shares metrics from SAS implementations- “demonstrating automated summary runtimes, time savings, accuracy improvements, and reduced maintenance effort. Attendees will learn how to implement automated post-DBL change tracking, create governance-friendly evidence trails, and minimize manual documentation while preserving essential context and accountability.

SS-277 : Optimizing Clinical TFL Review with Python and Power BI: A Reproducible Workflow to Reduce QC Time and Improve Traceability
Vijaya Lakshmi Cherakam, Ephicacy Consulting Group
Latha Donapati, Ephicacy Consulting group

Clinical Programming and Biostatistics teams frequently operate under significant time constraints while being responsible for producing accurate, consistent, and submission-ready Tables, Figures, and Listings (TFLs). Traditional manual TFL review processes are resource-heavy, and susceptible to oversight, particularly in large or complex clinical studies where multiple data transfers were expected. This paper presents a practical, open source driven framework leveraging Python and Power BI to enhance and streamline the TFL validation process in clinical trials. The proposed approach integrates automated data and metadata validation using Python, along with independent cross-validation of statistical summaries through interactive Power BI dashboards. These dashboards enable reviewers to rapidly identify discrepancies, trends, and anomalies across outputs. The framework supports reproducible validation, reduces manual review cycles, and improves collaboration and communication among programming, statistics, and study teams. Examples based on typical clinical trial deliverables demonstrate how these open-source workflows can be incorporated into existing processes without disrupting established standards. This paper provides attendees with actionable methods to improve TFL review efficiency and data quality within real-world project timelines. Implementation details, representative code snippets, governance and validation considerations, and a discussion of limitations and future enhancements are included. The approach enhances traceability, supports inspection readiness, and complements rather than replaces validated SAS®/R processes used for final regulatory submissions.

SS-307 : From Data Flood to Insight: Efficient SDTM Validation for High-Frequency Sources
Seiko Yamazaki, Certara

In recent years, the increasing use of high-frequency data such as eDiaries and wearable devices has led to cases where direct conversion into SDTM results in an extreme increase in record counts and very large datasets. Under these circumstances, running CDISC compliance checks using validation tools can require excessive execution time or even fail to complete due to memory limitations or timeouts. Regulatory authorities may encounter similar challenges when receiving and reviewing such datasets, potentially causing delays in the regulatory review process. From this perspective, this presentation will discuss practical strategies that can be implemented at the SDTM design and mapping stage to reduce validation runtime, enabling attendees to manage the validation process smoothly and efficiently while avoiding memory, timeout, or execution errors.

SS-336 : Beyond the First Draft: How Generative AI Is Redefining Medical Writing in Pharma
Devendra Toshniwal, Circulants INC.
Gopal Joshi, Senior Scientist
Sanjay Koshatwar, Circulants

The increasing complexity of clinical development and regulatory expectations has placed significant pressure on traditional medical writing workflows. Documents such as patient narratives, deviation reports, clinical study reports, and regulatory submissions require high levels of accuracy, consistency, traceability, and compliance- “often under tight timelines. Conventional, manual authoring approaches struggle to scale efficiently while maintaining quality and regulatory rigor. This paper presents a practical and scalable approach to reinventing medical writing workflows using Generative Artificial Intelligence (GenAI), focusing on its application across clinical and regulatory documentation. The proposed framework demonstrates how GenAI can be integrated with structured clinical data sources (e.g., SDTM, ADaM, and operational datasets) to assist in generating patient narratives, deviation summaries, and regulatory-ready content while preserving auditability and data integrity. The discussion highlights real-world use cases where GenAI acts as an intelligent co-author- “automating first-draft generation, standardizing language, improving consistency across documents, and significantly reducing authoring timelines. Special emphasis is placed on governance mechanisms such as human-in-the-loop validation, controlled prompt design, traceability to source data, and compliance with GxP and regulatory expectations. In addition, the paper explores architectural considerations, data security controls, and validation strategies required to responsibly operationalize GenAI in regulated environments. Through practical examples and implementation insights, this session demonstrates how organizations can move from manual, document-centric processes to intelligent, AI-assisted workflows that improve efficiency, quality, and decision-making.

Tools, Tech & Innovation

TT-102 : Sync & Scale: Empowering Cloud Hub & Team Synergy with SAS Bridge
Mayank Singh, Johnson and Johnson MedTech (Neurovascular)

In our clinical industry, efficient and secure data sharing is critical for clinical, regulatory, and research workflows. This paper presents a modular SAS 9.4 macro framework that automates secure transfers between Amazon S3 and Microsoft SharePoint/Teams Online. The implementation leverages PROC S3 for cloud storage operations and Microsoft Graph API calls via PROC HTTP for SharePoint/Teams integration, supporting upload, download, update, and safe deletion workflows. The approach streamlines document management, enhances collaboration, and ensures compliance with industry standards. The SAS_S3_SharePoint_DataMover streamlines the data transfer process by enabling efficient uploading and downloading, thereby reducing manual intervention and decreasing the likelihood of errors. Practical applications illustrate how this integrated workflow facilitates prompt data sharing, expedites review cycles, and ensures robust audit trails. This scalable and flexible framework provides a valuable solution for SAS programmers and data managers aiming to enhance data governance, improve collaboration, and maintain compliance within regulated environments.

TT-106 : Automating Bioresearch Monitoring (BIMO) Listings Using R
Weishan Song, Vertex Pharmaceuticals
Wanting Jin, University of North Carolina at Chapel Hill
Weiyu Zhou, Vertex Pharmaceuticals, Inc
Margaret Huang, Vertex Pharmaceuticals, Inc.

The U.S. Food and Drug Administration (FDA) requires a BIMO package for pivotal studies to support clinical site inspections. A key component is the subject-level data line listings by site, which provide site-specific safety and efficacy subject data. These listings are often produced resource intensively by manually programming outputs from SDTM/ADaM datasets for each site and each listing. Programming individually can also introduce variability and misalignment with the clinical study report (CSR). We developed a tool using R that automates the generation of these subject-level listings by site using an output-to-output approach (creating BIMO listings directly from corresponding CSR listings). The tool ingests the finalized CSR listings, extracts content, page layout, and formatting and uses this information to reproduce the required site-specific BIMO listings without reprogramming. This 1- preserves the CSR’s structure, 2- maintains consistency between CSR and BIMO outputs and 3- standardizes presentation of the BIMO listings. This tool meets regulatory requirements and standards, reduces manual effort, simplifies quality control, and improves traceability from CSR to BIMO deliverables.

TT-118 : Automated Quality Checks for SDTM and ADaM Datasets Using R Shiny
Shih-Che (Danny) Hsu, Pfizer
Wei Qian, Pfizer, Inc

In clinical trial analysis programming, SDTM and ADaM data sets are an essential component in the process of ensuring accuracy and reliability of clinical trial data reporting; managing a project that involves numerous SDTM and ADaM datasets along with their corresponding log files can be complex and resource-intensive. Identifying data quality issues- “such as inconsistent timestamps, missing values, or log errors- “often requires manual inspection across multiple files, which is time-consuming and prone to oversights. To address this challenge, we developed an R Shiny application that automates overall dataset quality checks and presents the results in a centralized, interactive dashboard. Users interact with the application by specifying a folder path containing SDTM and ADaM datasets along with associated log files. Upon submission, the application performs key validations including timestamp consistency checks, log issue detection, and structural integrity assessments across both SDTM and ADaM datasets. With a single click, programming leads are presented with a visualized report that highlights potential issues and can be filtered by dataset or domain and display detailed summaries for further investigation. This tool not only streamlines the review process but also enhances transparency, traceability, and reproducibility in clinical data workflows. By integrating R Shiny’s dynamic interface with robust and customizable back-end logic, the application empowers teams to proactively monitor data quality and reduce the risk of downstream reporting errors. This presentation will showcase the application’s design, core functionalities, and real-world impact on improving efficiency and accuracy in clinical programming review cycles.

TT-121 : R Package Management in LSAF: Challenges and Solutions
Praneeth Adidela, ICON plc
Sagar Koona, acldigital

R is increasingly recognized as a powerful tool for statistical analysis and reporting in clinical research. The SAS® Life Science Analytics Framework (LSAF) provides an integrated system for managing, analyzing, reporting, and reviewing clinical research data. With the release of version 5.3, R is now integrated alongside SAS. However, in regulated environments, direct downloads of R packages from CRAN or other external sources are often restricted. This paper presents a workflow for installing and managing R packages in LSAF using internal CRAN-like repositories and local libraries. It outlines key challenges and offers practical solutions for effective R package management within LSAF.

TT-130 : My DIY Swiss Army Knife of SAS® Procedures: A Macro Approach of Forging with My Favorite PROCs
Jason Su, Daiichi Sankyo, Inc.

Here I take advantage of SAS® macro facility and forge these following four (4) extremely popular procedures into one (1) Swiss-Army knife (SAK)-styled macro %pfs (the acronyms): PROC PRINT, PROC CONTENTS (not in the acronym), PROC FREQ, and PROC SQL. Controlled by a mode-switch parameter (MODE), the macro can fashion out any one of the 4 procedures in a succinct version supporting popular options in various procedures, such as OBS, FIRSTOBS, WHERE, SHORT, VAR, etc. The new macro has the capacity to carry out my most-frequent jobs from such procedures, such as selectively printing some records from a dataset, displaying its data structure, quickly deriving certain variable frequency, counting certain variables, etc. Based on the sprit, fellow programmers are encouraged to create their own version of the macro %pfs. Upon being called with different modes, the SAK macro can perform any of the procedures, and immediately release the programmers from much of the repetitive syntax-typing work. Additionally, the functionalities of the tool can be expanded and many innovative capacities can be added such as performing fuzzy searching on ID variables, automatically saving the counting results into a macro variable for later use, so that macro can become smart and surprisingly powerful.

TT-132 : A fully automated PDF solution using SAS without third-party PDF tools
Zhongan Chen, Pfizer

In clinical trials, statistical programmers often produce well-bookmarked PDF packages of TFLs for review or submission purposes. Traditional ways often include a manual process, using Adobe Acrobat, or other third-party PDF tools (Python, Libreoffice, Sejda etc.) to convert RTF to PDF and then combine PDF files into a package. They often come with extra cost in software licenses and could be time-consuming. This paper outlines a novel, free, fast and fully automated approach to generate bookmarked PDF packages with TOCs (Table of Content) using SAS. The only software needed other than SAS is Microsoft Word which should be available to most users already. No third-party PDF tools, such as Adobe Acrobat is needed in this process. Everything is done with SAS macros and is fully automated.

TT-139 : From ChatGPT to Copilot: Evolving AI Support in SAS and Beyond
Jyoti (Jo) Agarwal, Gilead Sciences

Building on insights from PharmaSUG 2025 Paper SI 294, which showcased ChatGPT as a transformative assistant for SAS programming workflows, this 2026 submission expands the conversation to a broader ecosystem of AI powered tools, focusing on GitHub Copilot and Microsoft Copilot and their practical applications in statistical programming. As the pharmaceutical industry moves beyond prompt-based experimentation, Copilot tools are redefining how programmers write, debug, and optimize code across SAS, R, Python, and other platforms. This paper presents real world use cases where Copilot enhances productivity in clinical trial programming, including automating repetitive tasks, generating documentation, and facilitating cross language translation. It highlights Copilot’s integration into development environments, enabling seamless code suggestions, intelligent error detection, and contextual learning tailored to clinical data standards. Through comparative analysis, the paper clarifies the distinct roles of ChatGPT, Microsoft Copilot, and GitHub Copilot: Prompt engineering remains central to maximizing AI utility, and this paper offers refined strategies for crafting effective prompts that yield accurate, reproducible, and audit ready outputs. It also addresses critical considerations such as data privacy, model bias, and validation protocols essential for deploying AI tools in regulated environments. By showcasing practical implementations and lessons learned from integrating Copilot into statistical programming pipelines, this paper provides a roadmap for programmers and organizations seeking to harness the next generation of AI tools. The future of clinical programming is not just about faster code, it is about smarter, safer, and more collaborative development powered by AI.

TT-147 : Bridging the Gap: Table-Driven SAS Programming as a Pathway to AI in Clinical Trials Statistical Programming
James Sun, Constat System

General-purpose Large Language Models (LLMs) demonstrate a robust, out-of-the-box knowledge of clinical trial process and industry data standards, handling tasks like SDTM data generating can produce decent SAS code. However, the challenge for genuine AI integration lies in adapting these models to the specific programming requirements while also consider the impacts of the highly specific, proprietary business rules unique to an individual company. This paper establishes table-driven programming as the critical pathway to overcome this barrier. By systematically externalizing hard-coded logic into clean, declarative metadata tables, this technique simplifies maintenance, enhances code flexibility, and, most importantly, creates the structured, high-quality “fuel” necessary for accurate LLM training. This modernized environment not only improves programming standards today but also enables LLMs to automatically generate logic specifications from complex regulatory text (e.g., FDA rules) and extract crucial metadata from documents like Protocols & SAP fully integrating AI with clinical trial statistical programming.

TT-152 : aCRF In Full: A Complete Solution for Relatively Little Work
Carlo Radovsky, Immanant

Leveraging Python and SAS, users are able to extract, analyze, and applying PDF comment annotations in accordance with CDISC conventions. Past solutions were a mix of manual and automated processes, often relying on the inconsistent and limited representation of the annotations in FDF or xFDF formats. By leveraging Python and it’s PDF-related libraries, a far more robust and flexible approach is possible. The walkthrough will also explore the pros and cons of using AI to help build the code to support the process, and will detail how to support a fully submission-ready aCRF document including annotations, bookmarks, and a page reference in support of Define-XML integration.

TT-156 : SAS to R: A Practical Bridge for Programmers
Jyoti (Jo) Agarwal, Gilead Sciences

As the analytics landscape evolves, R has emerged as a versatile and powerful tool for data analysis, visualization, and statistical modeling. While SAS remains a trusted-platform in many industries, R offers unique advantages, including flexibility, open-source accessibility, and a rich-ecosystem of packages for advanced analytics. For SAS programmers, transitioning to R can be challenging due to differences in syntax, data structures, and programming paradigms. Building on insights from the 2025 SI-95 paper on SAS programming efficiency, this paper provides a practical, hands-on bridge for SAS users entering the R environment. It guides readers through environment setup, console operations, variable assignments, and arithmetic and sequence operations, highlighting parallels and key differences with SAS. The discussion extends to R’s data structures: vectors, matrices, arrays, data frames, and lists; and explains data type coercion, helping users understand how R manages heterogeneous data. The paper emphasizes modern R workflows, including data wrangling with tidyverse functions, creating variables, handling missing data, and reverse coding, reflecting SAS data-step operations. Visualization is another key focus: barplots, histograms, boxplots, scatterplots, line-charts, and clustering dendrograms, with examples demonstrating how R enables more customizable, visually appealing, and interactive analyses compared to SAS. Finally, the paper highlights the use of R packages, and functional programming approaches that simplify complex workflows. Through step-by-step examples and real-world applications, SAS programmers gain actionable insights, enabling them to leverage R’s capabilities while building on existing SAS expertise. This paper serves as a comprehensive guide for users seeking to expand their data analysis toolkit beyond SAS.

TT-175 : Enhancing Quality and Efficiency in Clinical Programming with a Python-Based Automated File Comparison Tool
Ratheesh Gunda, Kite Pharma, a Gilead company

Ensuring consistency and accuracy of files across sub/directories is a critical step in statistical programming workflows to include the latest datasets for up-to-date analyses. Discrepancies in file presence or versioning can lead to outdated analyses and quality issues. This paper introduces a Python-based automated file comparison tool with a graphical user interface (GUI) designed to streamline the validation process between hierarchical folder structures and flat directories. The tool performs three core functions: 1. Verifies file presence or absence between folders, 2. Detects timestamp changes to identify updated or modified files, and 3. Generates a detailed Excel report summarizing differences for easy review and documentation. With its intuitive GUI, users can select source and target folders, initiate comparisons, and instantly review results, eliminating the need for manual checks or repetitive SAS library setups. Built using Python’s standard libraries and common open-source packages, the tool efficiently traverses subdirectories, extracts file metadata, and presents results in a clear, audit-ready format that highlights missing, modified, or newly added files. Unlike traditional SAS-based workflows that require manual library management for multiple subfolders, this Python solution offers greater flexibility and automation, allowing large-scale comparisons to be completed within minutes. By reducing manual effort and enhancing traceability, the tool supports regulatory compliance, data integrity, and quality control across programming teams. This paper will describe the tool’s design, workflow, and reporting capabilities, providing practical guidance for implementing automated file validation processes to improve efficiency and reproducibility in programming.

TT-204 : From SAS Servers to AI Agentic SCEs: Integrating Agentic AI into GxP-Compliant Biometrics Workflows
Kevin Lee, Clinvia

Biometrics teams in clinical trials are currently navigating a significant transition as traditional Statistical Computing Environments (SCE) move from local SAS servers toward the integration of Agentic AI within modern, multi-lingual ecosystems in SAS, R and Python. The paper demonstrates how Agentic AI could be integrated into SCE environment and how SCE could be built and validated in GxP and 21 CRF Part 11 compliance. First, the paper introduces the concepts of Agentic AI in AI agents and Agentic workflow, and its use cases such as SDTM/ADaM/TFL development and validation, clinical artifacts development, and synthetic data creation. Then, the paper provides a detailed look at the implementation and validation of a cloud-based SCE that is fully GxP and 21 CFR Part 11 compliant. It also illustrates the paradigm shift offered by Agentic AI, where autonomous agents use programming languages as tools to plan, reason, and execute statistical tasks within SCE. Key insights provided in this paper include: – A Validated Multi-Lingual SCE: A robust architecture supporting SAS, R and Python, ensuring compliance through IQ/OQ/PQ and risk-based validation. – The integration of Agentic AI: how Agentic AI automates the process in SCE – Future-Proofing Compliance: Strategies for maintaining “audit-ready” status in SCE Ultimately, this paper provides biometrics leaders with a roadmap to transform their SCE from a passive storage and execution environment into an intelligent Agentic AI platform that not only manages data but actively drives clinical insights.

TT-241 : TOON Format: A Token-Efficient Data Exchange Solution for AI-Enhanced Clinical Programming
Saikrishnareddy Yengannagari, BMS

As pharmaceutical companies increasingly adopt Large Language Models (LLMs) for clinical programming tasks, token-based pricing creates significant cost challenges. A typical CDISC laboratory dataset with 100,000 records generates approximately 10 million tokens in JSON format, resulting in substantial API expenses. This paper introduces TOON (Token-Oriented Object Notation), a compact data format that reduces token consumption by 50-90% compared to JSON while preserving complete SAS metadata including variable labels, formats, and types. We present two open-source SAS macros- “%sas2toon and %toon2sas- “implemented entirely in BASE SAS requiring no additional licenses. These macros enable seamless bidirectional conversion between SAS datasets and TOON format with 100% round-trip fidelity. Real-world testing demonstrates that a 500-subject ADLB dataset reduced from 5850,000 tokens (JSON) to 920,000 tokens (TOON), representing 80% cost savings per LLM query. The format is human-readable, Git-friendly, and immediately applicable to existing clinical programming workflows.

TT-263 : Boolean Rhapsody: 50 shades of true Is this the real code? Is this just fantasy?
Charu Shankar, SAS Institute

SAS logic isn’t always black and white- “especially when missing values get involved. This session breaks down how SAS evaluates Boolean expressions across IF, WHERE, and IFN/IFC, so your filters and flags behave exactly as intended. We’ll walk through what SAS treats as true, false, and “it depends,” how AND/OR/NOT behave with missing data, and the classic logic traps that quietly flip results. You’ll also leave with a practical Boolean Truth Table Cheat Sheet (TRUE/FALSE/MISSING outcomes for common operators) to sanity-check conditions fast. Expect practical examples you can use immediately for cleaner derivations, safer filtering, and more defensible QC. We’ll cover: – “️ What SAS considers “true,” “false,” and “wait- what?” – “️ How AND, OR, and NOT really behave (especially when missing values crash the party) – “️ Performance tips for cleaner logic and fewer misunderstandings – “️ Real-world examples where Boolean confusion led to code heartbreak Takeaway – sharper Boolean instincts, fewer surprises, and a ready-to-use reference you can share with your team. Target Audience – Whether you’re a SAS beginner or a seasoned pro who’s had one too many logic fails, this session is your backstage pass to the world of conditional code. Spoiler alert: it’s not just 0 and 1- “it’s 50 shades of TRUE.

TT-275 : SAS for Microsoft 365: Integrating SAS Programs, Data, and Reports Across the Microsoft 365 Ecosystem
Shelby Taylor, SAS Institute

SAS for Microsoft 365 bridges the gap between advanced SAS analytics and everyday productivity tools by integrating SAS Viya directly into Microsoft Excel, Word, PowerPoint, and Outlook. This paper provides an overview of SAS for Microsoft 365 and demonstrates how users can prepare data and generate insights in SAS Viya, then seamlessly explore, update, and share results within familiar Microsoft applications. Through a guided demonstration, attendees will see how SAS Visual Analytics reports can be accessed, filtered, and embedded as live objects in Excel, enabling users to insert charts and tables that remain linked to the underlying SAS data. The paper also highlights inserting SAS data tables into Excel for local exploration, uploading Excel-based data back into SAS Viya, and executing SAS programs and jobs directly from Microsoft applications with results embedded in documents and spreadsheets. Examples include updating report objects based on new filters, enhancing inserted data with native Excel features, and regenerating results after code changes. In addition, the presentation showcases how SAS for Microsoft 365 simplifies communication by embedding report objects, attaching customized PDF reports, and inserting live report links directly into Outlook emails, as well as creating dynamic Word documents and PowerPoint presentations powered by SAS analytics. By combining the analytical strength of SAS Viya with the accessibility of Microsoft 365, SAS for Microsoft 365 enables more efficient analysis, collaboration, and reporting workflows across teams.

TT-309 : Global Macro for Master Tracker
Zhuo Chen, BridgeBio Pharma

In a clinical study, it is important to create and maintain the master tracker for SDTM, ADaM, Tables Listings, Figures (TLF) development and validation for each delivery. Normally a clinical study has hundreds or even thousands of TLF outputs, it is very time-consuming if manually creating it for each delivery. This paper describes a global Macro which can automate the master tracker efforts. It also check different issues and flag them in the master tracker, therefore help team to guarantee accuracy, gain efficiency and avoid human errors.

TT-312 : Trust but Verify: How ChatGPT and SAS Can Be Comrades!
Steve Black, Neurocrine Biosciences

SAS programmers in clinical development are under constant pressure to move faster while upholding uncompromising standards for quality and compliance. ChatGPT introduces a secure, governed large-language-model platform that can help navigate this tension when used thoughtfully. This paper presents practical, real-world examples of how ChatGPT has been applied to SAS programming workflows, including SDTM and ADaM production, derivation logic clarification, TFL output creation, and validation efforts. Highlighting its use within a regulated environment, ensuring reproducibility, traceability, and the continuing necessity of good programmer judgment. Rather than automating decisions or replacing established roles, ChatGPT is positioned as a trusted comrade- “useful, disciplined, and effective- “working alongside SAS programmers without taking anyone’s job hostage.

TT-391 : Taming Polyglot Analytics: Simplifying Cross-Language Workflows in a Unified IDE
David Ward, Triam Ltd
Troy Wolfe, Triam

Modern analytical workflows increasingly span multiple programming languages, each selected for its strengths in data access, transformation, modeling, or visualization. While this polyglot approach can be powerful, it often introduces significant complexity for statistical programmers and analysts, including fragmented tooling, inconsistent workflows, and increased maintenance overhead. Managing these challenges across separate editors, runtimes, and execution environments can reduce productivity and increase operational risk. This paper demonstrates how cross-language analytical workflows can be simplified using the Siemens Analytic Workbench, an integrated development environment that enables analysts to work with multiple programming languages from a single, unified interface. The presentation will demonstrate a project in which four different languages – SAS, Python, R, and SQL – are written and executed from within one development environment, highlighting various personal and organization benefits. This paper is intended for statistical programmers and analysts working in increasingly heterogeneous analytical environments. Participants will gain a clearer understanding of how unified tooling can reduce complexity in polyglot workflows while preserving flexibility, productivity, and increasing productivity and job satisfaction.

TT-394 : Structure-Preserving Preprocessing of Clinical Documents for Large Language Model Analysis
Zun Wang, R&G US Inc
Juntao Yan, Eli Lilly and Company

Clinical trial documents are essential to drug development but are often lengthy and labor intensive to align or compare within or across studies due to strict traceability and governance requirements. Although large language models (LLMs) are increasingly used to improve document analysis efficiency, directly applying LLMs to flat text presents practical challenges. Document length can exceed model context limits, and loss of section boundaries prevents LLMs from recognizing the inherent hierarchical structure of clinical documents, compromising both accuracy and traceability. To address these challenges, this paper introduces a structure-preserving preprocessing workflow that reconstructs clinical documents (e.g., DOCX and PDF files) into an explicit hierarchical representation prior to LLM analysis. By chunking documents according to semantic sections rather than processing full documents, the workflow supports more efficient LLM interaction while preserving section-level traceability. This design also reduces the risk of mixing information across unrelated sections, keeping outputs grounded in clearly scoped context. In practice, this approach enables faster and more reliable clinical document review through targeted, section-level analysis. Limiting analysis to relevant sections reduces the number of LLM calls required, improving processing efficiency and turnaround time. The workflow applies across common clinical trial documents, including protocols and SAPs, and supports practical use cases such as version control and protocol- SAP alignment within a study. Explicit section alignment allows differences to be evaluated within matched structural context, reducing manual review effort during iterative document updates.

TT-403 : From SAP to CSR: A Metadata-Driven TFL Workflow
Bhavin Busa, Clymb Clinical

Clinical trial reporting workflows are often fragmented across disconnected tools and manual handoffs, resulting in inefficiencies, limited traceability, and challenges in maintaining consistency from analysis planning through Clinical Study Report (CSR) delivery. While standards such as CDISC ADaM and the recently published Analysis Results Standard (ARS) provide a strong foundation, many organizations still struggle to operationalize these standards across the full analysis results lifecycle. This paper presents a metadata-driven framework that connects key stages of the analysis results pipeline: from Statistical Analysis Plan (SAP) specification through TFL design, automated result generation, centralized review, and downstream CSR integration. The proposed workflow demonstrates how structured metadata derived from the SAP and protocol can be used to prospectively define TFL shells, drive generation of Analysis Results Datasets (ARDs), automate production of tables, figures, and listings, and support collaborative review and version control in a centralized environment. By treating analysis results metadata as a single source of truth, this approach improves traceability between planned analyses, executed programs, and reported outputs, while enabling automation, reproducibility, and reuse. The framework reduces manual programming effort, shortens review cycles, and establishes a scalable foundation for downstream statistical reporting and CSR authoring. Practical considerations for implementation and incremental adoption within existing programming environments are also discussed.

TT-417 : A Fantasy in Three with PROC FCMP: Memoization of Resource-Intensive Calculations, in-Memory Hash-Object Storage and Retrieval Operations, and Disk-Based Persistent Data Set Modification and Preservation
Troy Hughes, Data Llama Analytics

The hash object, an in-memory data structure, has long been the companion of any SAS practitioner seeking to maximize software efficiency. Hash objects facilitate a host of in-memory operations that minimize I/O processing and runtime, including key-value lookups, duplicate value identification, deduplication, transposition, sorting, and even frequency calculation! Moreover, this versatile data structure can be embedded inside reusable SAS user-defined functions and subroutines constructed using PROC FCMP, the SAS Function Compiler, and this design can improve the readability, modularity, and maintainability of your software. This talk showcases the efficiency of hash and demonstrates the 13 hash methods newly added to PROC FCMP in SAS Viya 2024.11 (but which remain unavailable in SAS 9.4 PROC FCMP), including: DO_OVER, EQUALS, FIND_NEXT, HAS_NEXT, HAS_PREV, REF, REMOVE_DUP, REPLACEDUP, RESET_DUP, SUMDUP, SUM, and SETCUR. From a functionality perspective, the welcome addition of these methods to the SAS Viya FCMP procedure means that developers now can build and utilize hash objects equivalently within both the SAS Viya DATA step and the SAS Viya FCMP procedure. However, from a software portability perspective, this stark divergence in syntax between SAS Viya and SAS 9.4 imbues risks because hash methods built on SAS Viya may no longer be portable to the SAS 9.4 FCMP procedure.

TT-439 : From Suspicion to Evidence: Automating Character Truncation Risk Audits with a Parameterized SAS Macro and Review-Ready Excel Output
Xinran Hu, Merck

Character truncation issues in clinical analysis datasets are often detected late, when resolution is expensive and traceability back to upstream specifications or source representations can be limited. Even when truncation cannot be conclusively proven post hoc, repeated observations of values reaching the declared character length represent a practical signal that a variable may be operating at its storage boundary and warrants targeted review. This paper presents a parameterized SAS macro utility designed as a risk-based audit for character variables. The macro scans one or more datasets, computes variable-level length metrics (defined length, maximum observed value length, and at-limit frequency), and prioritizes variables using configurable thresholds (minimum hit count and/or proportion among non-missing records, tolerance to allow near-limit screening, and top-N selection). The primary deliverable is an Excel workbook intended to support standards-driven QC and traceability. Each dataset tab reports only variables meeting a strict boundary condition (defined length equals maximum observed length), and provides evidence needed for efficient follow-up: distinct at-limit values, their counts, and a simple end-character cue (final letter/number/punctuation) to help reviewers recognize patterns consistent with hard cuts versus expected formatting. A consolidated Summary tab provides a cross-dataset view for prioritization and documentation. The paper discusses common false-positive scenarios in clinical analysis datasets, recommended thresholding strategies, and a practical review workflow that ties flagged findings to specifications, controlled terminology expectations, and derivation logic. The result is a lightweight, explainable screening framework that strengthens routine analysis datasets QC and helps surface potential length-boundary risks before they propagate downstream.

ePosters

PO-007 : DefinePageChecker: A Python Tool for Verifying Page Number Hyperlinks in Define.xml
Xianhua Zeng, Taimei Intelligence Pharmaceutical

The accuracy of page number hyperlinks in define.xml files is crucial for regulatory submissions in clinical trials. These files often include page number hyperlinks to ensure accurate cross-referencing in the clinical trial report. Discrepancies in these hyperlinks may lead to incorrect or broken links, compromising the integrity of the report. This paper introduces DefinePageChecker, a Python tool designed to automate the verification of page number hyperlinks in Define.xml files. The tool checks whether the hyperlinks correspond to the correct page numbers and missing annotation pages, ensuring that the report’s navigation structure remains intact and reliable. The application is available for download on GitHub: https://github.com/XianhuaZeng/PharmaSUG/raw/master/2026/DefinePageChecker.zip

PO-009 : Word2PDF: A Python Tool for Converting and Merging Word or PDF Files into a Single PDF
Xianhua Zeng, Taimei Intelligence Pharmaceutical

As part of clinical trial reporting, large numbers of RTF/PDF outputs are created and at the completion of a milestone in a study, it is required to convert and/or merge all reports in a user-friendly file format document for easy delivery and review. This paper presents a Windows desktop application built with Python to convert and/or merge the multiple RTF/PDF files into a single bookmarked PDF document in a user-defined order. This app can be executed in Microsoft® Windows 7 or later versions environment with Microsoft® Word installed. In addition, it can also be used to covert and/or merge DOC and DOCX files. So, I call this app as Word2PDF. The app and test files are available for download on my GitHub: https://github.com/XianhuaZeng/PharmaSUG/raw/master/2026/Word2PDF.zip

PO-117 : From Learner to Innovator: A Journey in R Empowered by AI to Enhance Narrative Review in Clinical Studies
Shih-Che (Danny) Hsu, Pfizer

In early 2025, I embarked on a six-month journey to learn R programming, beginning with Pfizer’s internal training series. These materials provided step-by-step examples and hands-on exercises that built a solid foundation in R. A learning buddy offered guidance and support, making the experience collaborative and enriching. Throughout this journey, I leveraged AI- “specifically Microsoft Copilot- “as an additional learning companion. Copilot provided contextual assistance, explained complex R concepts, and offered real-time feedback, making the learning process more intuitive and efficient. I then completed a six-week Posit Academy course, which deepened my understanding of functions, loops, and Quarto for reproducible reporting. Exposure to R Shiny examples sparked my interest in interactive applications. Copilot continued to assist by suggesting optimized code structures and helping debug exercises. It clarified concepts, troubleshot code, and guided implementation strategies, accelerating my learning curve and boosting confidence in applying R to real-world problems. Leveraging Copilot’s support, I developed an Interactive Patient Profiles dashboard using R Shiny. This tool enables clinical reviewers and medical writers to explore patient-level data dynamically, enhancing the narrative review process and helping identify trends and anomalies more efficiently. Copilot’s contributions included generating UI components, optimizing server logic, and ensuring data integrity, which collectively enhanced the dashboard’s functionality and usability. This abstract highlights how structured learning, mentorship, and AI-assisted development- “particularly through Copilot- “can empower statistical programmers to innovate and improve clinical workflows. Copilot’s integration into every phase illustrates how AI can transform learning and development in statistical programming.

PO-122 : Implementation of Quality Tolerance Limits in Statistical Programming
Yang Gao, Pfizer Inc.

Quality Tolerance Limits (QTLs), as outlined in ICH GCP E6(R2), support clinical trials by proactively identifying systematic issues that may compromise participant safety or the reliability of trial outcomes. As a requirement for all clinical studies, QTLs are fundamental to protecting trial integrity, ensuring patient safety, and maintaining the credibility of study endpoints. To comply with these regulatory expectations, sponsors must select critical parameters and establish appropriately justified thresholds that align with the study’s primary objectives. QTLs function as early warning indicators, empowering study teams to respond promptly and effectively with mitigation strategies whenever deviations are detected. The implementation of QTLs is a collaborative, cross-functional process in which statistical programmers play an important role throughout the entire QTL lifecycle, especially in ongoing monitoring and final reporting. This paper highlights essential considerations from the statistical programming perspective and presents a case study illustrating the effective application of QTLs within a pharmaceutical organization.

PO-143 : Consolidation of CDISC ADaM
Cindy Stroupe-Davis, Data Dynamo
Trevor Mankus, Pinnacle 21
Tatiana Sotingco, J & J Inovative Medicine
Alyssa Wittle, Atorus Research

In early 2022, the CDISC ADaM team began reviewing normative and informative content across existing documentation. This review revealed inconsistencies in terminology, structure, and guidance. To address these issues, a dedicated team was formed to consolidate the ADaM standard and its Implementation Guide (IG) into a single, unified document. This paper will discuss: – The scope of document consolidation – The rationale behind this effort – General changes being introduced – The timeline for implementation – Key questions and considerations moving forward

PO-187 : An Alternative Option to Create XPT Files with a SAS Function
Jose Hernandez Rivero, Principal Statistical Programmer
Ruth Rivera Barragan, Ephicacy Consulting Group

Creating SAS transport (XPT) files remains a critical step in preparing datasets for clinical and regulatory submissions. Programmers still face the long-standing limitations of PROC COPY, PROC CPORT, and DATA step, all bound by the SAS Version 5 format restricting variable names to eight characters and labels to forty. These constraints are reasonable when datasets are strictly CDISC-compliant, but in many real-world scenarios- “such as raw data transfers, EDC extracts, medical device studies, or sponsor-specific requests- “they become impractical and even counterproductive. Recognizing this gap, we explored the LOC2XPT SAS function as a modern alternative that enables greater flexibility while maintaining submission integrity. LOC2XPT generates fully functional XPT files without the usual truncation or metadata loss, preserving variable and label names exactly as intended. Beyond solving a technical limitation, this approach streamlines workflows, reduces rework, and supports diverse data exchange needs that standard methods often fail to accommodate. Through implementation tests and validation results, LOC2XPT proved both reliable and efficient- “producing compliant, high-quality transport files compatible with regulatory expectations. Its simplicity, adaptability, and consistency make it an appealing enhancement for teams managing complex or non-standard data pipelines. In short, LOC2XPT bridges the gap between the rigor of regulatory requirements and the flexibility that modern programming demands.

PO-194 : Managing Unblinded Activities Internally: The Independent Statistical Analysis Team (iSAT) Model
Hong Wang, Boehringer Ingelheim
Shu Chen, Boehringer Ingelheim
Chen Li, Boehringer Ingelheim

The independent Statistical Analysis Team (iSAT) was established in 2017 as a sponsor-employed, fully unblinded, and independent analysis group to manage unblinded activities internally. Initially created to support Pharmacovigilance for anticipated event reporting, iSAT’s role has expanded to include Data Monitoring Committees (DMCs) and Interim Analyses (IAs). The team operates under strict firewalls- “physical, electronic, and organizational- “to ensure confidentiality and maintain study integrity. Processes include secure data environments, hierarchical separation, and controlled communication. For anticipated event analyses, iSAT receives limited datasets excluding efficacy data, while IA workflows involve blinded teams providing programs that iSAT executes post-unblinding. For DMCs, iSAT functions as both independent statistician and programmer, enabling rapid turnaround times and confidential handling of additional requests. Over eight years, this model has supported more than 30 trials, demonstrating advantages such as expert internal resources, streamlined communication, elimination of external data transfers, and improved efficiency. While external perception of internal unblinding remains a potential concern, restricting iSAT involvement to non-pivotal studies mitigates this risk. Overall, the iSAT model offers a robust, efficient, and secure approach for managing unblinded activities within a sponsor organization.

PO-213 : %Compare_counts: A Macro for Speeding Up the QC Process When Proc Compare Slows it Down
Michael Garside, Phastar

Quality control (QC) is a critical component of clinical trial programming and is necessary to ensure the accuracy and consistency of SDTM, ADaM, and output datasets. Current tools (such as Proc Compare) are useful and necessary. However, they can be inefficient in certain scenarios. This paper introduces the %Compare_counts SAS macro, a macro designed to directly compare the frequency of value combinations across specified variables between production and QC datasets. As a complementary QC tool, %Compare_counts provides a fast and effective approach for pinpointing discrepancies, thereby streamlining the QC process.

PO-221 : One Study, Many Regulators: Submission-Ready Data Package for Multi-Region Filings
Himanshu Patel, Merck & Co., Inc.
Chintan Pandya, Merck & Co., Inc

Global clinical trials are routinely submitted to multiple health regulatory authorities, including the FDA (US), EMA (EU), PMDA (Japan), and NPMA (China). While CDISC standards provide a unified technical framework, differences in regional regulatory requirements, such as eCTD structure, data expectations, validation requirements, and review-tool sensitivities, can create significant challenges during submission preparation. This paper presents a practical, programmer-led framework for building a submission-ready data package that supports multi-region filings with minimal region-specific changes. Using oncology studies as an example, the paper discusses strategies for designing region-agnostic datasets, managing region-specific requirements through metadata and documentation, and maintaining robust traceability from raw -> SDTM -> ADaM -> TLFs. Additionally, it includes recommendations for folder structures, quality checklists, validation workflows, traceability artifacts commonly expected during regulatory reviews or inspections, and common mistakes and how to prevent them. By shifting the mindset from “analysis-ready” to “submission-ready” early in the study lifecycle, programmers can significantly reduce downstream regulatory risk, improve submission efficiency, and support parallel global filings.

PO-230 : Automating Character Variable Length Updates Using SAS Macros
Carleigh Crabtree, SAS

Character variable lengths in SAS data sets are often defined larger than necessary, leading to increased storage requirements and reduced processing efficiency, particularly for large data sets. Because character variables are padded with blanks to their defined length, identifying columns where the allocated length exceeds the longest stored value can present opportunities for optimization. This paper introduces two SAS macros designed to analyze and improve character variable efficiency. The first macro, %CharCheck, examines all character variables in a specified data set and reports each variable’s defined length alongside the maximum length of its stored values. The second macro, %CharResize, uses this information to create a copy of the data set with reduced character lengths where appropriate. Together, these macros provide a systematic approach to identifying oversized character columns and safely resizing them. The paper explores the underlying SAS functions and macro techniques used to access data set metadata, evaluate variable attributes, and dynamically generate LENGTH statements. Practical examples demonstrate how these tools can reduce data set size while preserving data integrity. By incorporating simple character variable checks into routine workflows, SAS users can create more efficient data sets and improve overall program performance.

PO-244 : Dynamic table generation for ongoing studies with unstable or changing cohorts
Haoran Li, ClinChoice
Junruo Xia, ClinChoice Canada Inc.
Yixuan Zhang, Clinchoice,LLC

Summary tables are essential for presenting key statistics in clinical trial reporting, yet they often require frequent updates in ongoing studies due to staggered enrollment, cohort expansion, or evolving dose levels. Traditional programming approaches require frequent manual code modifications for each dry run, increasing workload and maintenance risk. To address this challenge, we propose a SAS macro-driven solution that automates the generation of dynamic summary tables. This macro framework dynamically constructs treatment group column names and definitions for PROC REPORT based on the latest ADSL data. The macro automatically extracts treatment group information, calculates Big N values, supports multiple commonly used column header display formats, and allocates column widths proportionally based on a predefined total width. Dynamic column names and definitions are stored in global macro variables: COL_NAME (for column headers) and COL_DEF (for column attributes specification in PROC REPORT), and seamlessly integrated into reporting programs. The approach was applied to a baseline characteristics table as a use case. Results showed that summary tables were automatically updated when treatment groups changed, without requiring manual code modifications. The solution enhances programming efficiency, reduces maintenance burden, and improves code conciseness. The macro can be easily extended to other summary tables, such as demographic, disposition, and findings tables, and provides a practical and flexible framework for dynamic clinical trial reporting.

PO-266 : A Modular SAS Macro Library for Clinical Trial Table Generation: From SAS Foundation to R-Ready Architecture
Ming Yang, Kura Oncology
Ludmila Navolodskaya, Kura Oncology

The pharmaceutical industry is experiencing a transformative shift in statistical programming, driven by regulatory acceptance of open-source tools, increasing demand for efficiency, and the rise of AI-assisted development workflows. This paper presents a comprehensive SAS macro library comprising six interconnected macros designed for generating regulatory-compliant clinical trial tables, listings, and figures (TLFs). The library implements a modular architecture with standardized interfaces, enabling rapid production of demographics tables, adverse event summaries, laboratory safety analyses, and efficacy endpoints. By adhering to CDISC ADaM standards and ICH E3 guidelines, these macros reduce programming time from hours to minutes while maintaining strict regulatory compliance. Furthermore, this paper discusses how the library’s modular design serves as a blueprint for migration to R using Pharmaverse packages such as rtables, tern, and tidytlg. We demonstrate practical applications through oncology trial examples and provide guidance for organizations navigating the SAS-to-R transition, positioning statistical programmers to leverage both traditional and emerging tools in the evolving regulatory landscape.

PO-270 : SAS viya dynamic visualization of data
Jumin Geng, Teva Pharmaceutical Industries, Ltd

This e-Poster displays the powerful feature of SAS Viya Analytics with the point-and-click application. The dynamic visualization methods allow for more effective display of high dimensional data which can improve the interpretation of various dimensions of clinical trial data. For example, AEs including both their toxicity specific aggregate direction and magnitude can be easily visualized by SAS Viya. In this e-Poster a few types of charts based on clinical trial data were created by using SAS Viya.

PO-273 : Why SAS viya Speeds Up Analytics
Max Hu, ICON Clinical Research

As technology has advanced and data volumes have grown, SAS has continually evolved to meet the needs of data scientists. This e Poster highlights how SAS Viya leverages the power of Cloud Analytic Services (CAS) to perform advanced analytic computations with impressive speed. Processing time is significantly reduced when running comparisons in SAS Viya. The poster presents a side by side comparison of code execution from 4 aspects, demonstrating the efficiency gains achieved.

PO-287 : Maintaining a Multi-Lingual Code Inventory in Real-World Evidence
Joshna reddy Nimmala, Merck & Co., Inc., Rahway, NJ
Yu Feng, Merck & Co., Inc.

As healthcare claims databases continue to evolve in structure and complexity, statistical programmers are increasingly challenged to maintain consistency, efficiency, and compliance across Real-World Data (RWD) analysis. To address these challenges, programmers rely on multi-lingual code inventories (including SAS, R, and Python scripts) and adapt them as needed to support Real-World Evidence (RWE) generation. However, without a centralized and dynamic approach, these inventories can become fragmented, outdated, and difficult to manage. This paper presents a scalable framework for maintaining a centralized RWE code inventory that supports detection of structural changes in RWD sources. By identifying key structural triggers, such as additions or changes in variable names, coding systems, or data schema, this framework enables intelligent searches across the code inventory to flag impacted scripts and suggest updates. This automation reduces manual effort, accelerates programming workflows, and minimizes the risk of errors. Additionally, the framework incorporates traceability and version control mechanisms to ensure reproducibility and audit readiness, aligning with ICH E6(R2) Good Clinical Practice guidelines. This approach not only enhances operational efficiency but also supports regulatory compliance and cross-study consistency, making it an asset in the evolving RWE landscape.

PO-331 : Hero-in-the-Loop: A Super Squad Approach to SDTM Creation & Validation
Julie Ann Hood, Pinnacle 21 by Certara

Behind every high-quality, CDISC-compliant SDTM dataset lies a team of data-driven heroes, each with their own unique skillsets and battle-tested expertise. In today’s clinical trials, creating and validating submission-ready datasets is rarely a solo mission. It’s a full-scale, multi-role operation that calls upon the powers of Data Managers, Standards Managers, Clinical Programmers, and Biostatisticians, each bringing something special to the fight for clean, conformant data. Our heroes may hail from different domains, but they share a common trait: a working knowledge of CDISC standards. Whether it’s SDTM variables on an aCRF, CDASH-structured EDC systems, or routine exposure to standardized specs, each role is already familiar with the tools of the trade. This foundational knowledge becomes their secret weapon: breaking down long-standing silos and enabling smarter, more collaborative workflows than ever before. In a world of evolving standards, AI-assisted processes, and increasing expectations for speed and accuracy, the path to submission success isn’t always linear. There’s no single “right way” to create SDTM datasets, but there are more efficient ways. Understanding the strengths of each role opens up opportunities to rethink traditional processes, distribute tasks more effectively, and innovate workflows that better leverage individual expertise. This poster will present our SDTM Squad, outlining the signature skills and superpowers each role brings to the clinical trial data lifecycle, supercharged by emerging contributions of AI. By highlighting the strengths of this dynamic team, we’ll demonstrate that with the right strategy and collaboration, regulatory compliance can be both faster and more heroic.

PO-377 : Pharmacometric Analysis Dataset Generation Process: Data, Roles & Tools from a programmers Perspective
Srinivas Bachina, AZ
Dheeraj Rupani, AstraZeneca PLC
Hasi Mondal, AstraZeneca
Kiran Kode, AstraZeneca PLC

Pharmacometric analyses (PopPK, PKSafety, PKQT, PK- AE, PKPD, PK- Efficacy) rely on high quality, analysis ready datasets. We describe a standardized programming workflow that clarifies roles and quality controls to improve delivery predictability and regulatory readiness across therapeutic areas. Methods: The process spans intake to closeout. Intake begins with a work request, resourcing, and a kickoff to align scope, timelines, and study specific considerations. Specifications are developed via a standardized order form aligned to defined data standards. Source discovery and access are coordinated with study programmers. Programming leverages template code for common dataset types and validated macro libraries, with specialized routines for complex domains (e.g., concomitant medications via a curated DDI resource) and safeguards for early unblinding scenarios. Quality is embedded through agreed QC strategies, diagnostics, and a structured issue log. Validation is documented in a formal report, and deliverables are provisioned to designated collaboration areas for pharmacometric analysis. Study documents (protocol, CRF, SAP, CSR), specifications, and validation records follow controlled lifecycle management to ensure traceability and auditability. Results: Clear PM- Programming touchpoints improved line of sight for resource planning and reduced rework through early joint review of design and data availability. Templates and macros accelerated development and enhanced consistency. A defined QC/validation framework strengthened data integrity, while disciplined data movement simplified downstream analysis and compliance. Conclusions: A well defined, programming workflow enables faster, higher quality, reproducible pharmacometric datasets that support internal decisions and regulatory submissions. Enhancements will broaden exposure- response standards, automate diagnostics and validation, and deepen integration with collaboration platforms.

PO-386 : Case Study: Integrating ADPPK CDISC Standards into Pharmacometric Programming and Analysis Workflows
Praseeda Rajan, Bristol Myers Squibb
Renuka Hegde, Bristol Myers squibb
Erin Dombrowsky, Bristol Myers Squibb
Neelima Thanneer, Bristol-Myers Squibb
Yue Zhao, Bristol Myers Squibb

With the release of the CDISC ADaM Population Pharmacokinetic (popPK) Implementation Guide (IG) and anticipated FDA requirements, pharmacometric programmers must adapt to new standards for creating ADPPK datasets. This case study provides a comprehensive guide for integrating ADPPK CDISC standards into pharmacometric programming and analysis workflows. Achieving compliance necessitated extensive workflow changes, from dataset preparation to submission, including accurate metadata alignment and updating internal tools. A key challenge was harmonizing and pooling multiple studies into a single CDISC-compliant ADPPK dataset, requiring mapping of historical variables and inclusion of standardized variables (e.g., PARAM/PARAMCD, AVAL/AVALC, RACE, SEX). Analysis-specific variables (e.g., ARACE, AGEGR1) were derived as recommended, and model files were updated for compatibility with new variable requirements. Permissible and conditional flag variables were leveraged for record exclusion within model files. The result was a single dataset suitable for both modeling and regulatory submission, supporting automation and tool development. Conformance to the ADPPK IG enhances data quality, consistency, and facilitates seamless interactions with health authorities. This case study serves as a practical guide for programmers and modelers to achieve robust compliance with emerging regulatory standards.

PO-392 : Running Python from SAS: A Practical Comparison of Available Approaches
David Ward, Triam Ltd

As Python continues to grow in popularity for data science and analytics, many SAS programmers seek ways to incorporate Python capabilities without abandoning established SAS workflows. Fortunately, the SAS language provides multiple mechanisms for executing Python code from within SAS programs, each with different strengths, limitations, and operational considerations. This paper provides a practical, comparative overview of the primary methods for running Python code within SAS environments. Covered techniques include PROC FCMP, PROC PYTHON (SAS/Viya and Siemens SLC), system calls, and rest API, along with a sidebar on Python packages such as SASPy. For each method, we discuss setup requirements, execution models, data exchange patterns, and common use cases, along with known limitations and pitfalls. The presentation is aimed at advanced SAS programmers who want to integrate Python into existing SAS codebases in a controlled and supportable manner. Emphasis is placed on practical decision-making rather than advocacy, helping attendees understand when each approach is appropriate and when it is not. Attendees will leave with clear guidance and best practices for selecting and implementing Python-in-SAS techniques that align with their technical goals, operational constraints, and organizational standards.

PO-395 : Implementing AI Agent-Driven Tools to Accelerate Clinical Research Workflows
Hohyun Lee, Clinvia

With the rapid emergence of large language models (LLMs), generative AI has evolved from a conversational tool into a powerful platform that has accelerated and changed the ways industries use automation. Beyond just a prompt-level interaction, LLMs now enable the development of Agentic AI tools and workflows to augment statistical programming and increase automation within clinical research. This paper explores the usage of these AI agent-driven tools and the design/workflow of such systems. Specifically, the primary systems this paper focuses on are an SAP (Statistical Analysis Plans) generator, a Clinical Trial Protocol generator, and a TFL (Tables, Figures, and Listings) Validator. This paper first outlines how to design workflows for building such AI tools. Then, this paper explains the internal logic of each application. Next, this paper describes the steps it takes to communicate effectively with both the LLMs and the user. Finally, this paper demonstrates how to test and analyze these applications to generate consistent, regulatory-aligned SAP and protocol content, as well as validating TFL’s through generated python code. In addition, the development and iteration of these AI agent systems are accelerated through the use of Cursor, an AI-assisted coding environment. This paper also explains how Cursor can be used to generate and refactor code, explain functions and workflows, and even reflect on and resolve errors the applications face. Altogether, this work demonstrates how AI agents, combined with AI-augmented development tools, can help modernize, accelerate, and transform the statistical field by reducing manual effort, improving consistency, and enhancing autonomy.

PO-406 : Smarter, Faster, Better: GenAI- ‘Driven Authoring for Data Reviewer’s Guides
Christina Scienski, Pfizer
Christine Rossin, Pfizer

Background: Use generative AI (genAI) leveraging existing regulatory documents and internal metadata files to populate key sections of Data Reviewer’s Guide for regulatory submissions (e.g., FDA, PMDA, EMA). Aim: 1. Streamline writing process by creating uniform documentation across all therapeutic areas and assets. 2. Establish consistency in content and quality. 3. Improves authoring efficiency, i.e., resources and time. Method(s): 1. Launch GenAI within Word. 2. Utilize PHUSE DRG templates (ADRG, cSDRG) embedded for the user. 3. Ingest source files depending upon the template type: For cSDRG, the required documents are: – Protocol – Exported Pinnacle 21 cSDRG document – Final Annotated CRF – Upstream data management metadata file for Inclusion/Exclusion database build For ADRG, the required documents are: – Protocol – Exported Pinnacle 21 adrg document – Statistical Analysis Plan (SAP) – System level path for submitted programs – Answer 2 questions regarding intermediate dataset submission and imputation rules. 4. Enter the Sponsor Name, Protocol Number and Title. Once documents are ingested and questions are completed, the user initiates the authoring process. The system generates draft content from source documents, which the user can review and accept. The document can then be saved as a first draft for further review, refinement, and finalization. Conclusion: This GenAI approach improves speed, consistency and quality across Data Reviewer’s Guides while reducing the time required for drafting, review, and validation.

PO-415 : Traceability in Real World trials- just an aERD away.
Ashwini Yermal Shanbhogue, None

Traceability of data or provenance is an essential requirement for the regulatory review of clinical trial data. Ensuring the ability to trace data from its source to tabulation datasets to analysis datasets to analysis results is therefore one of the priorities of a sponsor submitting said data. In a traditional, gold standard, randomized controlled trial (RCT), where clinical trial data is collected using Case Report Forms (CRFs), one of the ways to ensure and/or demonstrate this traceability is the annotated CRF or aCRF. In recent years however, the increase in availability of Real World Data (RWD) and the evolution of tools available to analyze it has resulted in the rise of a new kind of clinical trial- one that utilizes Real World Evidence (RWE) or evidence that is generated from the analysis of RWD, to assess the effectiveness and safety of a therapeutic product. RWD, however, is collected in electronic databases rather than in CRFs. Currently, there is no regulatory submission document that demonstrates the traceability of RWD present in electronic databases to analysis results. Therefore, I would like to propose an annotated entity relationship diagram (aERD) as a solution to this problem.

PO-433 : Enhancing CDISC Standards Implementation (SDTM and ADaM) with PROC FCMP, PROC IML and Macro Loop Integration in Oncology Clinical Trials.
Ajay Gupta, Daiichi Sankyo
Misikir Tesfaye, Daiichi Sankyo Inc.

Oncology clinical trials require standardized SDTM and ADaM datasets for regulatory submission, but the complexity of tumor measurements and response evaluation leads to time sensitive and error-prone manual processes. This poster aims to demonstrate the application of PROC IML ,PROC FCMP and a macro loop integration to efficiently create SDTM(TU, AE) and ADaM (ADRS,ADAE) datasets, ensuring compliance with CDISC (Clinical Data Interchange Standards Consortium) standards (SDTMIG v3.4 ,ADMIG v1.2).PROC IML was used for matrix operations to transform tumor data per RECIST 1.1 criteria, creating SDTM TU datasets and deriving Best Overall Response (BOR) for ADaM datasets. PROC FCMP standardized variables (e.g., ISO 8601 dates, severity mappings for SDTM AE and added analysis variables (e.g., treatment-emergent flags) for Adam ADAE.A macro loop automated the process across multiple domains, has the reducing processing time about 60% to 70 % in range per domain and achieving a 35% to 40% reduction in mapping errors. Result shows that the integrated approach not only enhances efficiency but also ensures regulatory compliance, making it a valuable tool for oncology trial data management. Future applications include extending the framework to additional domains and integrating with AI for predictive analytics. This methodology gives a scalable solution for clinical trial programmers, improving the speed and accuracy of regulatory submission in oncology research.

Paper Presentations

Paper Presentations

PharmaSUG 2026 Paper Presentations

Sections

AI in Pharma, Biotech and Clinical Data Science

Advanced Programming

Advanced Statistical Methods

Career Development, Leadership & Soft Skills

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

Data Visualization & Interactive Analytics

Emerging Technologies (R, Python, GitHub etc.)

Hands-On Training

PK/PD/ADA and Quantitative Pharmacology

Panel Discussion

Real-World Data (RWD) and Real-World Evidence (RWE)

Study Data Integration & Analysis

Submission Standards for Global Health Authorities

Tools, Tech & Innovation

ePosters

Abstracts

AI in Pharma, Biotech and Clinical Data Science

Advanced Programming

Advanced Statistical Methods

Career Development, Leadership & Soft Skills

Data Standards Implementation (CDISC, SEND, ADaM, SDTM)

Data Visualization & Interactive Analytics

Emerging Technologies (R, Python, GitHub etc.)

Hands-On Training

PK/PD/ADA and Quantitative Pharmacology

Panel Discussion

Real-World Data (RWD) and Real-World Evidence (RWE)

Study Data Integration & Analysis

Submission Standards for Global Health Authorities

Tools, Tech & Innovation

ePosters

PharmaSUG 2026 U.S.

Overview

Conference Content

Extras

Scholarships

For Sponsors

For Presenters