Paper presentations are the heart of a PharmaSUG conference. Here is the list including the next batch of confirmed paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 14-May-2024.

Sections

Advanced Programming

Paper No.	Author(s)	Paper Title (click for abstract)
AP-102	Derek Morgan	Creating Dated Archives Automatically with SAS®
AP-108	Bart Jablonski	Macro Variable Arrays Made Easy with macroArray SAS package
AP-135	Lisa Mendez & Richann Watson	LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
AP-138	Timothy Harrington	An Introduction to the SAS Transpose Procedure and its Options
AP-144	Charu Shankar	SAS® Super Duo: The Program Data Vector and Data Step Debugger
AP-175	Jeffrey Meyers	Tips for Completing Macros Prior to Sharing
AP-191	Songgu Xie & Michael Pannucci & Weiming Du & Huibo Xu & Toshio Kimura	Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
AP-212	Samiul Haque & Jim Box	R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
AP-218	Xinran Luo & Weijie Yang	Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
AP-229	Vicky Yuan	Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
AP-252	Frank Canale	Externally Yours – Adeptly Managing Data Outside Your EDC System
AP-253	James Austrow	Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
AP-256	Ian Sturdy	Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
AP-268	Xiangchen Cui & Jessie Wang & Min Chen	A New Approach to Automating the Creation of the Subject Visits (SV) Domain
AP-289	Jianfeng Wang & Li Cheng	Programming with SAS PROC DS2: Experience with SDTM/ADaM
AP-295	David Bosak	Replicating SAS® Procedures in R with the PROCS Package
AP-298	Huitong Niu & Yan Wang	Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
AP-349	Richann Watson & Louise Hadden	Just Stringing Along: FIND Your Way to Great User-Defined Functions
AP-361	Chary Akmyradov	Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
AP-420	Adam Yates & Misti Paudel & Fengming Hu	Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
AP-424	Magnus Mengelbier	Adding the missing audit trail to R

Data Standards

Paper No.	Author(s)	Paper Title (click for abstract)
DS-109	Philip Mason	Analyzing your SAS log with user defined rules using an app or macro.
DS-130	Wanchian Chen	SDTM Specifications and Datasets Review Tips
DS-150	Laura Elliott & Ben Bocchicchio	Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
DS-154	Richann Watson & Elizabeth Dennis & Karl Miller	Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
DS-188	Wei Shao & Xiaohan Zou	Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
DS-193	Inka Leprince & Richann Watson	Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
DS-204	Sandra Minjoe	ADaM Discussion Topics: PARQUAL, ADPL, Nadir
DS-205	Crystal Cheng	A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
DS-271	Alec McConnell & Yun Peng	Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
DS-274	Kristin Kelly & Michael Beers	Guidance Beyond the SDTM Implementation Guide
DS-276	Soumya Rajesh	Your Guide to Successfully Upversioning CDISC Standards
DS-280	Laura Fazio & Andrew Burd & Emily Murphy & Melanie Hullings	I Want to Break Free: CRF Standardization Unleashing Automation
DS-287	Lihui Deng & Kylie Fan & Jia Li	ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
DS-305	Vibhavari Honrao	Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
DS-310	Pritesh Desai & Mary Liang	Converting FHIR to CDASH using SAS
DS-342	Karin LaPann	CDISC Therapeutic Area User Guides and ADaM Standards Guidance
DS-353	Anbu Damodaran & Ram Gudavalli & Kumar Bhimavarapu	Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
DS-360	Swaroop Kumar Koduri & Shashikant Kumar & Sathaiah Sanga	A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
DS-367	Wei Duan	Handling of Humoral and Cellular Immunogenicity Data in SDTM
DS-374	Reshma Radhakrishnan	Implementation of composite estimands for responder analysis based on change from baseline in non-solid tumours
DS-388	Rubha Raghu & Sugumaran Muthuraj & Vijayakumar Radhakrishnan & Nithiyanandhan Ananthakrishnan	Advancing the Maturation of Standardized CRF Design
DS-398	Varsha Mithun Patil & Mrityunjay Kumar	Streamlining Patient-reported outcome (PRO) data standardization & analysis
DS-400	Steve Ross & Ilan Carmeli	AI and the Clinical Trial Validation Process – Paving a Rocky Road
DS-406	Santosh Ranjan	Game changer! The new CDISC ADaM domain ADNCA for PK/PD data analysis

Data Visualization and Reporting

Paper No.	Author(s)	Paper Title (click for abstract)
DV-127	Louise Hadden	The Missing(ness) Piece: Building Comprehensive, Data Driven Missingness Reports and Codebooks Dynamically
DV-155	Jeffrey Meyers	Combining Functions and the POLYGON Plot to Create Unavailable Graphs Including Sankey and Sunburst Charts
DV-170	Kirk Paul Lafler	Creating Custom Excel Spreadsheets with Built-in Autofilters Using SAS® Output Delivery System (ODS)
DV-186	Ilya Krivelevich & Cixin He & Binbin Zhang-Wiener & Wenyin Lin	Enhanced Spider Plot in Oncology
DV-216	Margaret Wishart & Tamara Martin	Utilizing Data Visualization for Continuous Safety and Efficacy Monitoring within Early Development
DV-222	Mrityunjay Kumar & Shashikant Kumar	Kaplan-Meier Graph: a comparative study using SAS vs R
DV-246	Indraneel Narisetty	AutoVis Oncology Presenter: Automated Python-Driven Statistical Analysis and Visualizations for Powerful Presentations
DV-278	Kuldeep Sen	Standardization of the Patient Narrative Using a Metadata-driven Approach
DV-283	Tongda Che & Danfeng Fu	Exploring the Application of FDA Medical Query (FMQ) in Visualizing Adverse Event Data
DV-293	Dave Hall	Splashy Graphics Suitable for Publication? ODS LAYOUT Can Do It!
DV-313	Kostiantyn Drach & Iryna Kotenko	Visual discovery in Risk-Based Monitoring using topological models
DV-323	Chevell Parker	Tales From A Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
DV-327	Junze Zhang & Chuanhai Tang & Xiaohui Wang	A R Markdown Structure for Automatically Generating Presentation Slides
DV-328	Chevell Parker	Next level Reporting: ODS and Open Source
DV-331	Kirk Paul Lafler	Ten Rules for Better Charts, Figures and Visuals
DV-348	Murali Kanakenahalli & Annette Bove & Smita Sehgal	Periodic safety reports of clinical trials
DV-380	Tracy Sherman & Aakar Shah	Amazing Graph Series: Swimmer Plot – Visualizing the Patient Journey: Adverse Event Severity, Medications, and Primary Endpoint
DV-382	Helena Belloff & William Lee & Melanie Hullings	A ‘Shiny’ New Perspective: Unveiling Next-Generation Patient Profiles for Medical and Safety Monitoring
DV-389	Vijayakumar Radhakrishnan & Nithiya Ananthakrishnan	Automation and integration of data visualization using R ESQUISSE & R SHINY
DV-395	Pradeep Acharya & Anurag Srivastav	Pictorial Representation of Adverse Events (AE) Summary- ” A new perspective to look at the AE data in Clinical Trials
DV-396	Yun Ma & Yifan Han	Piloting data visualization and reporting with Rshiny apps
DV-433	Steve Wade & Sudhir Kedare & Matt Travell & Chen Yang & Jagan Mohan Achi	Interactive Data Analysis and Exploration with composR: See the Forest AND the Trees
DV-438	Kevin Viel	Exploring DATALINEPATTERNS, DATACONTRASTCOLORS, DATASYMBOLS, the SAS System® REGISTRY procedure, and Data Attribute Maps (ATTRMAP) to assign invariant attributes to subjects and arms throughout a proj
DV-455	Vandita Tripathi & Manas Saha	Reimagining reporting and Visualization during clinical data management
DV-456	Joshua Cook	An introduction to Quarto: A Versatile Open-source Tool for Data Reporting and Visualization
DV-458	Joshua Cook & Kirk Paul Lafler	Quarto 1.4: Revolutionizing Open-source Dashboarding Capabilities

Hands-on Training

Paper No.	Author(s)	Paper Title (click for abstract)
HT-101	Mathura Ramanathan & Nancy Brucken	Deep Dive into the BIMO (Bioresearch Monitoring) Package Submission
HT-111	Bart Jablonski	A Gentle Introduction to SAS Packages
HT-118	Philip Holland	The Art of Defensive SAS Programming
HT-143	Charu Shankar	The New Shape Of SAS Code
HT-152	Phil Bowsher	GenAI to Enhance Your Statistical Programming
HT-157	Jayanth Iyengar	Understanding Administrative Healthcare Datasets using SAS ‘ programming tools.
HT-197	Dan Heath	Building Complex Graphics from Simple Plot Types
HT-201	Ashley Tarasiewicz & Chelsea Dickens	Transitioning from SAS to R
HT-413	Richann Watson & Josh Horstman	Complex Custom Clinical Graphs Step by Step with SAS® ODS Statistical Graphics
HT-459	Troy Hughes	Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures

Leadership Skills

Paper No.	Author(s)	Paper Title (click for abstract)
LS-134	Patrick Grimes	Recruiting Neurodivergent Candidates using the Specialisterne Approach
LS-167	Kirk Paul Lafler	Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
LS-176	Jeff Xia & Simiao Ye	Effectively Manage the Programming Team Using MS Team
LS-286	Priscilla Gathoni	Unlock Your Greatness: Embrace the Power of Coaching
LS-304	Diana Avetisian	Translation from statistical to programming: effective communication between programmers and statisticians
LS-317	LaNae Schaal	What Being a Peer-to-Peer Mentor Offers – Perspective from an Individual Project Level Contributor
LS-335	Monali Khanna	Creating a Culture of Engagement – Role of a Manager
LS-345	Christiana Hawn & Lily Ray	Leadership Lessons from Another Life: How my Previous Career Helped Me as a Statistician
LS-351	Anbu Damodaran & Neha Srivastava	A Framework for Risk-Based Oversight for Fully Outsourced Clinical Studies
LS-357	Purvi Kalra & Varsha Patil	Harmony in Motion: Nurturing Work-Life Balance for Sustainable Well-being
LS-371	Dilip Raghunathan	Go Get – Em: Manager’s Guide to Make a Winning Business Proposal for Technology Solutions
LS-383	Mathura Ramanathan	Ongoing Trends and Strategies to Fine-tune the CRO/Sponsor Partnership – Perspectives from Statistical Programming
LS-410	Josh Horstman & Richann Watson	Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
LS-443	Melanie Hullings & Andrew Burd & Helena Belloff & Emily Murphy	Data Harmony Revolution: Rocking Trials with Clinical Data Literacy

Metadata Management

Paper No.	Author(s)	Paper Title (click for abstract)
MM-225	Kang Xie	Variable Subset Codelist
MM-226	Jeetender Chauhan & Madhusudhan Ginnaram & Sarad Nepal & Jaime Yan	Methodology for Automating TOC Extraction from Word Documents to Excel
MM-240	Avani Kaja	Managing a Single Set of SDTM and ADaM Specifications across All Your Phase 1 Trials
MM-245	Trevor Mankus	Relax with Pinnacle 21’s RESTful API
MM-267	Xiangchen Cui & Min Chen & Jessie Wang	A Practical Approach to Automating SDTM Using a Metadata-Driven Method That Leverages CRF Specifications and SDTM Standards
MM-358	Lakshmi Mantha & Purvi Kalra & Arunateja Gottapu	Optimizing Clinical Data Processes: Harnessing the Power of Metadata Repository (MDR) for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Efficacy (ISE)
MM-447	Vandita Tripathi & Manas Saha	Automating third party data transfer through digitized Electronic DTA Management

Real World Evidence and Big Data

Paper No.	Author(s)	Paper Title (click for abstract)
RW-125	Ajay Gupta & Natalie Dennis	Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot’s Algorithm: Step-by-Step Programming in R
RW-227	Yu Feng	A SAS® Macro Approach: Defining Line of Therapy Using Real-World Data in Oncology
RW-275	Catherine Briggs & Sherrine Eid & Samiul Haque & Robert Collins	Win a PS5! How to Run and Compare Propensity Score Matching Performance Across Multiple Algorithms in Five Minutes or Less
RW-390	Ryan Lafler & Anna Wade	Unraveling the Layers within Neural Networks: Designing Artificial and Convolutional Neural Networks for Classification and Regression Using Python’s Keras & TensorFlow
RW-421	Sherrine Eid & Robert Collins & Samiul Haque	Applications of Machine Learning and Artificial Intelligence in Real World Data in Personalized Medicine for Non-Small Cell Lung Cancer Patients
RW-450	Lorraine Johnson & Lara Kassab & Jingyi Liu & Deanna Needell & Mira Shapiro	Towards understanding Neurological manifestations of Lyme disease through a machine learning approach with patient registry data
RW-453	Joshua Cook & Achraf Cohen	Interfacing with Large-scale Clinical Trials Data: The Database for Aggregate Analysis of ClinicalTrials.gov

Solution Development

Paper No.	Author(s)	Paper Title (click for abstract)
SD-141	Kevin Lee	“Prompt it”, not “Google it” : Prompt Engineering for Statistical Programmers and Biostatisticians
SD-165	Kirk Paul Lafler & Ryan Lafler & Joshua Cook & Stephen Sloan	Benefits, Challenges, and Opportunities with Open-Source Software Integration
SD-166	Kirk Paul Lafler	The 5 CATs in the Hat – Sleek Concatenation String Functions
SD-179	Jim Box & Samiul Haque	Developing Web Apps in SAS Visual Analytics
SD-198	Chengxin Li	AutoSDTM Design and Implementation With SAS Macros
SD-200	Illia Skliar	Bridging AI and Clinical Research: A New Era of Data Management with ChatGPT
SD-211	Matt Maloney	Utility Macros for Data Exploration of Clinical Libraries
SD-217	William Wei & Shunbing Zhao	Semi-Automated and Modularized Approach to Generate Tables for Clinical Study – Categorical Data Report
SD-239	Hong Qi & Mary Varughese	Automation of Report Generation Beyond Macro
SD-243	Lakshmi Mantha & Inbasakaran Ramesh	Unravelling the SDTM Automation Process through the Utilization of SDTM Transformation Template
SD-255	Danfeng Fu & Dickson Wanjau & Ben Gao	Define-XML Conversion: A General Approach on Content Extraction Using Python
SD-262	Bart Jablonski	Integration of SAS GRID environment and SF-36 Health Survey scoring API with SAS Packages
SD-266	Amy Zhang & Huei-Ling Chen	A Tool for Automated Comparison of Core Variables Across ADaM Specifications Files
SD-318	Yunsheng Wang & Erik Hansen & Chao Wang & Tina Wu	Streamlined EDC data to SDTM Mapping with Medidata RAVE ALS
SD-343	Benjamin Straub	Two hats, one noggin: Perspectives on working as a developer and as a user of the admiral R package for creating ADaMs.
SD-356	Kevin Viel	Standardizing Validation Data Sets (VALDS) as matrices indexed by Page, Section, Row, and Columns (PSRC) to improve Validation and output creation and revisions.
SD-365	Zhihao Luo	Readlog Utility: Python based Log Tool and the First Step of a Comprehensive QC System
SD-370	Yongjiang (Jerry) Xu & Karen Xu & Suzanne Viselli	Enhancing FDA Debarment List Compliance through Automated Data Analysis Using Python and SAS
SD-397	Saurabh Das & Rohit Kadam & Rajasekhar Gadde & Niketan Panchal & Saroj Sah	Advancing Regulatory Intelligence with conversational and generative AI
SD-401	Xinran Hu & Jeff Xia	Excel Email Automation Tool: Streamlining Email Creation and Scheduling
SD-412	Sherrine Eid & Sundaresh Sankaran	Safety Signals from Patient Narratives PLUS: Augmenting Artificial Intelligence to Enhance Generative AI Value
SD-426	Rajprakash Chennamaneni & Sudhir Kedare & Jagan Mohan Achi	Shift gears with ‘gt’: Finely tuned clinical reporting in R using “gt” and “gt summary” packages
SD-429	Bill Zhang & Jun Yang	Build up Your Own ChatGPT Environment with Azure OpenAI Platform
SD-431	Steve Wade & Sudhir Kedare & Matt Travell & Chen Yang & Jagan Mohan Achi	inspectoR: QC in R? No Problem!
SD-444	Troy Hughes	Five Reasons To Swipe Right on PROC FCMP, the SAS Function Compiler for Building Modular, Maintainable, Readable, Reusable, Flexible, Configurable User-Defined Functions and Subroutines

Statistics and Analytics

Paper No.	Author(s)	Paper Title (click for abstract)
ST-113	Girish Kankipati & Jai Deep Mittapalli	Multiple Logistic Regression Analysis using Backward Selection Process on Objective Response Data with SAS®
ST-164	Kirk Paul Lafler	Data Literacy 101: Understanding Data and the Extraction of Insights
ST-192	Igor Goldfarb & Sharma Vikas	Generative Artificial Intelligence in sample size estimation – challenges, pitfalls, and conclusions
ST-199	Yuting Peng & Ruohan Wang	Demystifying Incidence Rates: A Step-by-Step Guide to Adverse Event Analysis for Novice Programmers
ST-208	Vadym Kalinichenko	Bayesian Methods in Survival Analysis: Enhancing Insights in Clinical Research
ST-251	Isabella Wang & Jin Xie & Lauren George	Dealing with Missing Data: Practical Implementation in SAS and R
ST-297	Christiana Hawn & Dhruv Bansal	Relative Dose Intensity in Oncology Trials: A Discussion of Two Approaches
ST-303	Ibrahim Priyana Hardjawidjaksana & Els Janssens & Ellen Winckelmans	Source Data Quality Issues in PopPK/PD Dataset Programming: a Systematic Approach to Handle Duplicates
ST-334	Ethan Brockmann & Dong Xi	Versatile and efficient graphical multiple comparison procedures with {graphicalMCP}
ST-338	Michael Lamm	Bayesian Additive Regression Trees for Counterfactual Prediction and Estimation
ST-339	Fang Chen & Yi Gong	Bayesian Hierarchical Models with the Power Prior Using PROC BGLIMM
ST-366	Chuck Kincaid	MLNR or Machine Learning in R
ST-381	Peng Zhang & Lizhong Liu & Tai Xie	Opportunities and Challenges for R as an open-sourced solution for statistical analysis and reporting, from vendor’s perspective
ST-414	Richard Moreton & Lata Maganti	Estimating Time to Steady State Analysis in SAS
ST-425	Sudhir Kedare & Steve Wade & Chen Yang & Matthew Travell & Jagan Mohan Achi	iCSR: A Wormhole to Interactive Data Exploration Universe
ST-461	Ishwar Chouhan	Oncology ADaM Datasets Creation Using R Programming: A Comprehensive Approach

Strategic Implementation & Innovation

Paper No.	Author(s)	Paper Title (click for abstract)
SI-136	Ke Xiao	Agile, Collaborative, Efficient (ACE): A New Perspective on Data Monitoring Committee Data Review Preparation
SI-140	Kevin Lee	A fear of missing out and a fear of messing up : A Strategic Roadmap for ChatGPT Integration at Company Level
SI-160	Jason Zhang & Jaime Yan	LLM-Enhanced Training Agent for Statistical Programming
SI-185	Binal Mehta & Patel Mukesh	The Role of the Blinded Programmer in Preparation of Data Monitoring Committee Packages (for Clinical Trials)
SI-189	Vidya Gopal	Automating the annotation of TLF mocks Using Generative AI
SI-190	Ruohan Wang & Chris Qin	Navigating Success: Exploring AI-Assisted Approaches in Predicting and Evaluating Outcome of Clinical Trials and Submissions
SI-230	Todd Case & Margaret Huang	Quality Assurance within Statistical Programming: A Systemic Way to Improve Quality Control
SI-269	Juliane Manitz & Anuja Das & Antal Martinecz & Jaxon Abercrombie & Doug Kelkhoff	Validating R for Pharma – Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical Work
SI-291	Nancy Brucken & Mary Nilsson & Greg Ball	PHUSE Safety Analytics Working Group – Overview and Deliverables Update
SI-319	Lydia King	A Change is Gonna Come: Maintaining Company Culture, Managing Time Zones, and Integrating Teams after a Global Acquisition
SI-346	Chaitanya Pradeep Repaka & Santhosh Karra	aCRF Copilot: Pioneering AI/ML Assisted CRF Annotation for Enhanced Clinical Data Management Efficiency
SI-362	Karma Tarap & Nicole Thorne & Tamara Martin & Derek Morgan & Pooja Ghangare	SASBuddy: Enhancing SAS Programming with Large Language Model Integration
SI-391	Chaitanya Pradeep Repaka & Santhosh Karra	Facilitating Seamless SAS-to-R Transition in Clinical Data Analysis: A Finetuned LLM Approach
SI-408	Manuela Koska & Veronika Csom	Agile Sponsor Oversight of Statistical Programming Activities
SI-446	Shilpa Sood & Sridhar Vijendra	One size does not fit all: The need and art of customizing SCE and MDR for end users
SI-452	Amit Javkhedkar & Sridhar Vijendra	Embracing Diversity in Statistical Computing Environments: A Multi-Language Approach

Submission Standards

Paper No.	Author(s)	Paper Title (click for abstract)
SS-132	Jai Deep Mittapalli & Girish Kankipati	BIMO Brilliance: Your Path to Compliance Resilience
SS-133	Jai Deep Mittapalli & Jinit Mistry & Venkatesulu Salla	Cultivating Success with Non-standard Investigator-sponsored Trial Data for FDA Submissions
SS-137	David Izard	Study Start Date – Let’s Get it Right!
SS-213	Sandra Minjoe	Is a Participation-Level ADaM Dataset a Solution for Submitting Integration Data to FDA?
SS-263	Vicky Yuan	Creating Adverse Event Tables using R and SASSY System
SS-290	Robin Wu & Lili Li & Steven Huang	Combine PDFs in Submission-ready Format Quick and Easy
SS-306	Swaroop Neelapu	Leveraging SAS and Adobe Plug-in for CRF Bookmark Generation (Rave studies)
SS-311	Yilan Xu & Hu Qu & Tina Wu	How to generate a submission ready ADaM for complex data
SS-333	Hanne Ellehoj & Veeresh Namburi	Lead-in and extension trials, how we documented datapoint traceability
SS-344	Benjamin Straub	Piloting into the Future: Publicly available R-based Submissions to the FDA
SS-363	Hiba Najeeb & Raghavender Ranga	A Programmer’s Insight into an Alternative to TQT Study Data Submission
SS-368	Rashmi Gundaralahalli Ramesh & Jeffrey Lavenberg	Design Considerations for ADaM Protocol Deviations Dataset in Vaccine Studies
SS-376	André Veríssimo & Ismael Rodriguez	Experimenting with Containers and webR for Submissions to FDA in the Pilot 4
SS-377	Wei Duan	Challenges and Considerations When Building e-Submission SDTM Data Packages
SS-422	Flora Mulkey	Submitting Patient-Reported Outcome Data in Cancer Clinical Trials- Guidance for Industry Technical Specifications Document

ePosters

Paper No.	Author(s)	Paper Title (click for abstract)
PO-106	Xianhua Zeng	No LEAD Function? Let’s Create It!
PO-123	Kevin Sun	Enhancing Define-XML Generation: Based on SAS Programming and Pinnacle 21 Community
PO-128	Louise Hadden	A Deep Dive into Enhancing SAS/GRAPH® and SG Procedural Output with Templates, Styles, Attributes, and Annotation
PO-129	Varsha Ganagalla & Natalie Johnson	The Survival Mode
PO-145	Jason Su	Integrity, Please: Three Techniques for One-Step Solution in Pharmaceutical Programming
PO-158	Jayanth Iyengar	If its not broke, don’t fix it; existing code and the programmers’ dilemma
PO-194	Elizabeth Li & Carl Chesbrough & Inka Leprince	Updates on Preparing a BIMO Data Package
PO-195	Yi Guo	A Simple Way to Make Adaptive Pages in Listings and Tables
PO-196	Yi Guo	Comparing SAS® and R Approaches in Creating Multicell Dot Plots in Statistical Programming
PO-231	Michael Stout	Best Function Ever: PROC FCMP
PO-258	Madhavi Gundu & Vivek Jayesh Mandaliya	An approach to make Data Validation and Reporting tool using R Shiny for Clinica Data Validation
PO-292	Julie Ann Hood & Jennifer Manzi	Elevate Your Game: Leveling Up SDTM Validation with the Magic of Data Managers
PO-299	Chen Li & Hong Wang & Ke Xiao	Upholding Blinding in Clinical Trials: Strategies and Considerations for Minimizing Bias
PO-324	David Franklin	Plotting Data by US ZIP Code
PO-440	Oliver Lu & Katie Watson	The SAS Genome – Genetic Sequencing
PO-451	Vandita Tripathi & Manas Saha	Simplifying Edit Check Configuration

Abstracts

Advanced Programming

AP-102 : Creating Dated Archives Automatically with SAS®
Derek Morgan, Bristol Myers Squibb
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4

When creating patient profiles, it can be useful for clinical scientists to compare current data with previous data in real time without having to request those data from an Information Technology (IT) source. This is a method for using SAS® to perform the archiving via a scheduled daily job. The primary advantage of SAS over an operating script is its date handling ability, removing many difficult calculations in favor of intervals and functions. This paper details an application that creates dated archive folders and copies SAS data sets into those dated archives, with automated aging and deletion of old data and folders. The application allows clinical scientists to customize their archive frequency (within certain limits.) It also keeps storage requirements to a minimum as defined by IT. This replaced a manual process that required study programmers to create the archives, eliminating the possibility of missed or incorrectly dated archives. The flexibility required for this project and the conditions under which it ran required using SAS date and time intervals and their functions. SAS was used to manipulate the files and directories.

AP-108 : Macro Variable Arrays Made Easy with macroArray SAS package
Bart Jablonski, yabwon
Monday, 10:30 AM – 11:20 AM, Location: Key Ballroom 4

A macro variable array is a jargon term for a list of macro variables with a common prefix and numerical suffixes. Macro arrays are valued by advanced SAS programmers and often used as “driving” lists, allowing sequential metadata for complex or iterative programs. Use of macro arrays requires advanced macro programming techniques based on indirect reference (aka, using multiple ampersands &&), which may intimidate less experienced programmers. The aim of the paper is to introduce the macroArray SAS package. The package facilitates a solution that makes creation and work with macro arrays much easier. It also provides a “DATA-step-arrays-like” interface that allows use of macro arrays without complications that arise from indirect referencing. Also, the concept of a macro dictionary is presented, and all concepts are demonstrated through use cases and examples.

AP-135 : LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4

This paper provides an overview of six SAS CALL subroutines that are frequently used by SAS® programmers but are less well-known than SAS functions. The six CALL subroutines are CALL MISSING, CALL SYMPUTX, CALL SCAN, CALL SORTC/SORTN, CALL PRXCHANGE, and CALL EXECUTE. Instead of using multiple IF-THEN statements, the CALL MISSING subroutine can be used to quickly set multiple variables of various data types to missing. CALL SYMPUTX creates a macro variable that is either local or global in scope. CALL SCAN looks for the nth word in a string. CALL SORTC/SORTN is used to sort a list of values within a variable. CALL PRXCHANGE can redact text, and CALL EXECUTE lets SAS write your code based on the data. This paper will explain how those six CALL subroutines work in practice and how they can be used to improve your SAS programming skills.

AP-138 : An Introduction to the SAS Transpose Procedure and its Options
Timothy Harrington, Navitas Data Sciences
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 4

PROC TRANSPOSE is a SAS(r) procedure for arranging the contents of a dataset column from a vertical to a horizontal layout based on selected BY variables. This procedure is particularly useful for efficiently manipulating clinical trials data with a large number of observations and groupings as is often found in laboratory analysis or vital signs data. The use of PROC TRANSPOSE is illustrated with examples showing different modes of arranging the data. Possible problems which can occur when using this procedure, and their solutions are also discussed.

AP-144 : SAS® Super Duo: The Program Data Vector and Data Step Debugger
Charu Shankar, SAS Institute
Tuesday, 11:00 AM – 11:50 AM, Location: Key Ballroom 4

Whether you are a self-taught SAS learner with a lot of experience, or a novice just entering the SAS universe, you may not have spent a lot of time delving into two fantastic SAS® superpowers. The Program Data Vector (PDV) is where SAS processes one observation at a time, in memory. The Data Step Debugger is an excellent tool to actually see the observation being held in memory and watch the movement of data from input to memory to output. Combining these two tools supplies SAS practitioners a lot of utility to “get under the hood” of how SAS code works in practice to ingest and analyze data during program operations. Once you know the specifics of what happens during compile time / execution, joins, and creating arrays, efficient SAS code will be at your fingertips. Action packed with animations, live demos and a great hands on section, this presentation will likely be a resource that you will use and reuse now and in the future

AP-175 : Tips for Completing Macros Prior to Sharing
Jeffrey Meyers, Regeneron Pharmaceuticals
Monday, 4:00 PM – 4:20 PM, Location: Key Ballroom 4

SAS macros are a programmer’s best friend when written well, and their worst nightmare when not. Macros are a powerful tool within SAS for automating complicated analyses or completing repetitive tasks. The next step after building a capable tool is to share it with others. The creator of the macro does not have much time to catch the attention of the user. The user encountering multiple errors, no documentation or guides, and lack of intuitive features pushes the user away from the macro. This paper will focus on completing a macro to give the user the best possible experience prior to sharing.

AP-191 : Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
Songgu Xie, Regeneron Pharmaceuticals
Michael Pannucci, Arcsine Analytics
Weiming Du, Alnylam Pharmaceuticals
Huibo Xu, Greenwich High School
Toshio Kimura, Arcsine Analytics
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4

Generative artificial intelligence using large language models (LLMs) such as ChatGPT is an emerging trend. However, discussions using LLMs in biostatistics and statistical programming have been somewhat limited. This paper provides a comprehensive evaluation of major LLMs (ChatGPT, Bing AI, Google BARD, Anthropic Claude 2) in their utility within biostatistics and statistical programming (SAS and R). We tested major LLMs across several challenges: 1) Conceptual Knowledge, 2) Code Generation, 3) Error Catching/Correcting, 4) Code Explanation, and 5) Programming Language Translation. Within each challenge, we asked easy, medium and advanced difficulty level questions related to three topics: Data, Statistical Analysis, and Display Generation. After providing the same prompts to each LLM, responses were captured and evaluated. For some prompts, LLMs provided incorrect responses, also known as “hallucinations.” Although LLMs replacing biostatisticians and statistical programmers may be overhyped, there are nevertheless use cases where LLMs are helpful in assisting statistical programmers.

AP-212 : R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
Samiul Haque, SAS Institute
Jim Box, SAS Institute
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 4

The integration of different programming languages and tools is pivotal for translational data science. R Shiny is the most popular tool for building web applications in R. However, biostatisticians and data scientists often prefer to leverage SAS Procs or macros for clinical decision making. The world of R Shiny and SAS does not need to be decoupled. R Shiny applications can incorporate SAS procs and analytics. In this work, we present mechanisms for integrating R Shiny and SAS. We demonstrate how SAS Procs and macros can be executed from R Shiny front end and SAS logs and results can be printed within Shiny App.

AP-218 : Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
Xinran Luo, Everest Clinical Research
Weijie Yang, Everest Clinical Research
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4

AI language like ChatGPT has impressed and even intimidated programmers. There are discussions of ChatGPT with examples of simple SAS steps and there are descriptions of various usages of ChatGPT without examples, but few papers discuss the use of ChatGPT in SAS macro development with examples. This paper explores the utility of ChatGPT in enhancing the process of writing SAS macros from scratch, using an example of checking SAS log in batch on Windows, and comparing the process of using conventional search engines. The focus is not only on utilizing ChatGPT’s capabilities to provide programmers with initial ideas of program structure when they encounter unusual work requests, but also on demonstrating its application in developing a robust macro by showing key steps of the conversations between programmers and ChatGPT. Although ChatGPT proves invaluable in offering insights and suggestions, it’s imperative to acknowledge certain caveats. Not all responses provided by ChatGPT are infallible, especially in the context of technical domains like SAS programming. Emphasizing the importance of independent verification, this paper underscores the need for users, especially new learners of SAS, to scrutinize and validate the suggestions before implementation. This paper aims to empower SAS practitioners by showcasing how ChatGPT can complement their macro-writing endeavors. By highlighting both the potentials and limitations of leveraging AI language models like ChatGPT, this paper contributes to fostering a balanced and discerning approach towards utilizing AI-driven assistance in SAS programming and macro development.

AP-229 : Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
Vicky Yuan, Incyte Coperation
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4

Shift summary of laboratory values in CTCAE grade to the worst grade abnormal value is often required for most laboratory data analysis and submission. The purpose of CTCAE grade shift table is to presents how the results are varying from the baseline to post-base visits in the study. This paper will illustrate how to report a shift table using R and packages from the SASSY system. It will start from an example and explain the anatomy, then a step-wise explanation of how to report the table in .doc file. The example is interesting because it contains “internal” footnotes that can change on every page. The R product used in this paper is R SASSY package version 1.2.0 running on RStudio environment

AP-252 : Externally Yours – Adeptly Managing Data Outside Your EDC System
Frank Canale, SoftwaRx, LLC
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4

Programmers in the pharmaceutical industry are used to working with data that is entered into, and extracted from, a system commonly known as an EDC (Electronic Data Capture) system. When using data that is sourced from one of these systems, you can reliably count on the type of data you’ll receive (normally SAS datasets), and if the EDC is set up well, a standard structure that provides output data containing CDISC/CDASH variable names. But what does one do when receiving data that is sourced outside the EDC system and received from other vendors? How do you manage this data- retrieve it- validate the structure- even export it to a format allowing you to merge it with other more conventional SAS datasets?

AP-253 : Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
James Austrow, Cleveland Clinic
Monday, 5:00 PM – 5:20 PM, Location: Key Ballroom 4

The PDF is one of the most ubiquitous file formats and can be read on nearly every computing platform. So how, in the year 2024, can it still be so inconvenient to perform basic editing tasks such as concatenating and merging files, inserting page numbers, and creating bookmarks? These features are often locked behind paid licenses in proprietary software or require that the documents be uploaded to a web server, the latter of which poses unacceptable security risks. In fact, the PDF is a public standard and there exist free, open-source libraries that make it easy to build in-house solutions for these and many other common use cases. In this paper, we demonstrate how to use Python to assemble and customize PDF documents into a final, polished deliverable. We will also lay the foundation for automating these tasks, which can save countless hours on reports that have to be prepared on a regular basis.

AP-256 : Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
Ian Sturdy, Eli Lilly and Company
Tuesday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4

This paper explores the potential benefits of incorporating ChatGPT, a state-of-the-art natural language processing model, in statistical programming within the pharmaceutical industry. By leveraging ChatGPT’s capabilities, this technology can save time, money, and most importantly, your sanity. Programming often leads to frustration, anxiety, and sleepless nights trying to solve complex problems. Various practical applications and techniques that harness the power of ChatGPT will be described to reduce all of these. In a world where Artificial Intelligence threatens to take our jobs, this paper suggests methods of tapping into the untapped potential of ChatGPT to empower programmers with innovative tools, thereby increasing our value. When programming issues arise, no longer will you need to worry about judgement or hostility from others on online forums. ChatGPT is a powerful tool we have yet to fully leverage, and its benefits extend well beyond our imaginations, let alone this paper.

AP-268 : A New Approach to Automating the Creation of the Subject Visits (SV) Domain
Xiangchen Cui, Crisprtx Therapeutics
Jessie Wang, CRISPR Therapeutics
Min Chen, CRISPR Therapeutics
Tuesday, 1:30 PM – 2:20 PM, Location: Key Ballroom 4

The creation of the subject visits (SV) domain is one of the most challenging tasks of SDTM programming. Aside from the small portion of mapping from raw dataset variables to SV variables, SV programming mainly consists of a more complex derivation process, which is totally different from that of other SDTM domains. The dynamic parts of the SV programming process, such as identifying raw datasets and their variables with both date/time and clinical visits, cause manual development of a SAS program to be time-consuming and error prone. Hence, automating its code generation would achieve and enhance efficiency and accuracy. This paper will present a new approach for SV automation based on the SDTM automation done in our previous paper, which leveraged CRF specifications from an EDC database and SDTM standards [1]. It will introduce the standard SV programming logic flow with 10 sequential steps, which leads us to develop an additional SAS-based macro named %SV_Code_Generator as an expansion to the macro introduced in [1]. The output of this macro (SV.sas) achieves 100% automation of SV domain for the raw data collected per CRFs in a clinical study. This new approach guarantees all raw dataset variables related to subject visits are accounted for in SV programming thanks to the sequential programming automations. This automation allows for the generation of SV dataset to occur very early in the programming development cycle and makes developing programmatic quality checks for clinical data review and data cleaning more efficient and economically feasible.

AP-289 : Programming with SAS PROC DS2: Experience with SDTM/ADaM
Jianfeng Wang, UNIVERSITY OF MINNESOTA TWIN CITIES, MINNEAPOLIS, MINNESOTA
Li Cheng, Vertex Pharmaceuticals Inc.
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 4

PROC DS2 is a procedure introduced with SAS Base 9.4. This procedure provides opportunities for SAS programmers to apply Object Oriented Programming (OOP) and multithread techniques in SAS programming and is a critical connection between the – traditional’ SAS programming and programming in SAS Viya platform. The goal of this paper is to pilot the use of PROC DS2 in the work of preparing clinical trial CDISC datasets. In this paper, PROC DS2 is tested in the programming of SDTM/ADaM on a server with SAS Base 9.4 M3 release. After converting SDTM/ADaM programs written in – traditional’ SAS programming language into the PROC DS2 code, this paper presents the lessons learned and the notes taken when the obstacles are overcome or bypassed. Furthermore, OOP and multithread techniques are explored to apply into the programming for SDTM/ADaM. Programming setups with a standard folder structure are discussed and the performance of using OOP and multithread techniques are also evaluated.

AP-295 : Replicating SAS® Procedures in R with the PROCS Package
David Bosak, r-sassy.org
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 4

The “procs” package aims to simulate some commonly used SAS® procedures in R. The purpose of simulating SAS procedures is to make R easier to use and match statistical results. Another important motivation is to provide stable tools to work with in the pharmaceutical industry. The package replicates several of the most frequently used procedures, such as PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG. The package also contains some data manipulation procedures like PROC TRANSPOSE and PROC SORT. This paper will present an overview of the package and provide demonstrations for each function.

AP-298 : Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
Huitong Niu, Master of Science Student, Biostatistics, Fielding School of Public Health, University of California, Los Angeles
Yan Wang, Adjunct Assistant Professor, Public and Population Health, School of Dentistry, University of California, Los Angeles
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4

This paper examines effective techniques for merging longitudinal datasets with key variable inaccuracies, focusing on date errors. Traditional SAS methods, like the DATA Step MERGE or PROC SQL JOIN, require exact matches on key variables, which is challenging in datasets containing errors. Our paper compares fuzzy matching and clustering analysis within SAS, assessing their effectiveness in reconciling datasets with inconsistencies in date variables. We simulate a longitudinal dataset of approximately 2,000 observations, representing about 500 patients with repeated measurements. The dataset is used to simulate two datasets including normally (or uniformly) distributed errors on date, manually introduced errors (e.g., typing “12” as “21”), and missing date information (e.g., entering “06/23” instead of “12/06/2023”). For each scenario, we use fuzzy matching and clustering analysis to merge two datasets, evaluating the accuracy of each technique. Preliminary results show varied effectiveness depending on the type of error on the date variable. For datasets with normally (or uniformly) distributed errors on date, clustering analysis significantly outperforms fuzzy matching with a 94.9% accuracy rate compared to 54.1%. In the case of manually introduced errors, both methods achieve high accuracy, around 98%. However, for datasets with missing date information, fuzzy matching is more effective, attaining an 84.4% accuracy rate as opposed to 45.2% for clustering analysis. The paper concludes with a discussion of these findings, offering insights for researchers on selecting appropriate methods for merging datasets with errors on date.

AP-349 : Just Stringing Along: FIND Your Way to Great User-Defined Functions
Richann Watson, DataRich Consulting
Louise Hadden, Abt Global Inc.
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 4

SAS® provides a vast number of functions and subroutines (sometimes referred to as CALL routines). These useful scripts are an integral part of the programmer’s toolbox, regardless of the programming language. Sometimes, however, pre-written functions are not a perfect match for what needs to be done, or for the platform that required work is being performed upon. Luckily, SAS has provided a solution in the form of the FCMP procedure, which allows SAS practitioners to design and execute User-Defined Functions (UDFs). This paper presents two case studies for which the character or string functions SAS provides were insufficient for work requirements and goals and demonstrate the design process for custom functions and how to achieve the desired results.

AP-361 : Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
Chary Akmyradov, Arkansas Children’s Research Institute
Monday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4

This paper delves into the optimization of repetitive tasks in SAS programming, a common challenge faced by data analysts and programmers. The primary focus is on harnessing the power of SAS macro programming techniques, specifically through the implementation of do loops within macros. Initially, the paper introduces the basics of SAS macros, outlining their significance in automating repetitive sequences of code, and providing a foundational understanding of macro variables and syntax. The discussion then progresses to the implementation of simple do loops within macros, highlighting their practicality in routine data manipulation tasks. Through a series of practical examples and use-case scenarios, the paper demonstrates the effectiveness of these loops in real-world applications. Addressing the limitations of these simple implementations, the paper further explores the generalization of do loops, presenting advanced methods to create dynamic, parameter-driven macros capable of handling a variety of tasks and parameters. This advanced approach is exemplified through complex scenarios and case studies, showcasing the adaptability and efficiency of generalized do loops in diverse data analysis contexts. By the conclusion, the paper provides a comprehensive insight into the role of macro programming in SAS, offering a valuable resource for SAS programmers seeking to streamline their coding workflow and enhance efficiency in data processing tasks. This work not only serves as a practical guide for current SAS users but also contributes to the broader conversation on the future of macro programming in data analysis.

AP-420 : Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
Adam Yates, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Misti Paudel, Brigham and Women’s Hospital Division of Rheumatology, Inflammation, and Immunity, Harvard School of Medicine
Fengming Hu, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Monday, 2:00 PM – 2:50 PM, Location: Key Ballroom 4

Synthetic data for clinical trials independent of human participants has growing utility in clinical and epidemiologic fields, but a persistent concern has been the viability and reliability of producing synthetic data which conforms to the complex nature of biomedical data. Recent successes in synthetic clinical trial data include the use of Synthetic Control Arm (SCA) applications, but the generation of treatment-related data necessarily faces additional scrutiny. While synthetic data cannot replace trail data for scientific discovery, planning and development phases of clinical trials can benefit from the use of synthetic treatment and control data. This paper describes a novel program developed in base SAS which generates synthetic data that was used in clinical trial development, design, and report programming. We developed a stochastically grounded process which generates synthetic data of population-specific enrollment characteristics, as well as longitudinal local and systematic reactogenicity, unsolicited events, and adverse events. We implement a discrete-time Markov process framework to generate longitudinal observation time, incorporating a Poisson-based probability of events within each state. This 2-phase stochastic generation process results in across observation time which conforms to biologically natural and realistic behaviors. Key to our process is that reaction frequency may be modulated based on expert experience or historical expectations, but the generated data do not rely directly on existing clinical data. Potential applications and extensions in a machine learning context will be discussed. This paper is intended for individuals with an interest in clinical trial data and a basic to intermediate command of SAS Macro processing.

AP-424 : Adding the missing audit trail to R
Magnus Mengelbier, Limelogic AB
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4

The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned. The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting. We also briefly consider how the same principles can be extended to other languages such as Python, SAS, Java, etc.

Data Standards

DS-109 : Analyzing your SAS log with user defined rules using an app or macro.
Philip Mason, Wood Street Consultants
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 2

SAS provide some pretty basic help with logs that are produced, typically just linking to errors and warnings. Many people build log checkers to look for particular things of interest in their logs, which usually involves capturing the log and then running some SAS code against it. I made a way to define rules in JSON format which can be read by a SAS macro and used to look for things in a log. This means different rules can be used for different use cases. They can be used via a macro or via a web application I build. The web app can switch between rules, provides summaries, draws diagrams of the code, provides performance stats, and more. Hopefully this functionality might one day be built into SAS, but in the meantime it works well as an addition.

DS-130 : SDTM Specifications and Datasets Review Tips
Wanchian Chen, AstraZeneca
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 2

SDTM requirements are spread across various sources such as SDTM Implementation Guide (SDTMIG) domain specifications section, SDTMIG domain assumptions section, and FDA Study Data Technical Conformance Guide. While Pinnacle 21 can assist in identifying issues with SDTM data, it is important to note that data is often limited at the early stages of a study. The most efficient process would be to review SDTM specifications before the creation of SDTM programs, to minimize program modifications and save time. Programmers often seek guidance on conducting a comprehensive review of SDTM but unsure where to start. In this presentation, I will provide a concise summary of frequently seen, domain specific as well as general, findings observed in multiple studies when reviewing SDTM. I will show which issues can be seen in the Pinnacle 21 report and which ones are missed. I will also cover situations where variables are not applicable to your study, but still may pass Pinnacle 21 checks. This presentation is designed to benefit programmers involved in SDTM review process.

DS-150 : Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
Laura Elliott, SAS Institute Inc.
Ben Bocchicchio, SAS
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2

Utilization of a cloud-based repository has become increasingly more common with large clinical trials. Verifying the integrity of data moved into the cloud for clinical trials is of utmost importance. Normally, this process requires manual intervention to verify the local source data matched the data stored in the cloud-based system. This paper discusses a process that will automate the creation of a verification report comparing md5 checksums from source to destination. The process, written in python, generates a .csv file of checksums from the source data, then uses an input file containing the folder paths to be uploaded to the cloud via REST APIs to migrate the data. The source md5 checksums are also uploaded. The python code then calls the REST APIs to execute a script in the cloud which compared the source and destination md5s using SAS code. The result of the process is a .pdf report that summarizes the comparison of the source and destination md5 checksums. This process offers a completely automated way to prove data integrity for migration of local source data into a cloud-based clinical repository.

DS-154 : Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
Richann Watson, DataRich Consulting
Elizabeth Dennis, EMB Statistical Solutions, LLC
Karl Miller, IQVIA
Monday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2

For analysis purposes, dataset records are often assigned to an analysis timepoint window rather than simply using the visits or timepoints from the collected data. The rules for analysis timepoint windows are usually defined in the Statistical Analysis Plan (SAP) and can involve complicated derivations to determine which record(s) best fulfils the analysis window requirements. For traceability, there are ADaM standard variables available to help explain how records are assigned to the analysis windows. This paper will explore these ADaM variables and provide examples on how they may be applied.

DS-188 : Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
Wei Shao, Bristol Myers Squibb
Xiaohan Zou, Bristol Myers Squibb
Monday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2

In electronic submission packages, ADaM datasets and Define.xml stand as pivotal components. Ensuring consistency between these elements is critical. However, despite this importance, the current method still heavily depends on manual checks. To address this challenge, we introduce an innovative automated approach driven by ADaM specifications. Our solution involves a suite of SAS® macros engineered to streamline the translation from ADaM specification to both ADaM datasets and Define.xml. These macros orchestrate a seamless automation process, facilitating the generation of ADaM datasets while concurrently fortifying consistency between ADaM datasets and Define.xml. The automated processes include format creation, core variable addition, variable attributes generation, dynamic length adjustment based on actual values, and automatic ADaM specification updates from actual data. These macros act as dynamic tools, constructing datasets with precision, adjusting variable attributes, and most importantly, syncing Define.xml with actual data. Our automated tool system not only expedites ADaM datasets creation but also ensures an inherent consistency with Define.xml. This amalgamation of automation and specification-based integrity significantly reduces manual errors, enhances data quality, and fortifies the efficiency of the submission process.

DS-193 : Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
Inka Leprince, PharmaStat, LLC
Richann Watson, DataRich Consulting
Tuesday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2

In the intricate dance of clinical trials that involve multiple treatment groups and varying dose levels, subjects pirouette through planned treatments – each step assigned with precision. Yet, in the realms of pediatric, oncology, and diabetic trials, the challenge arises when planned doses twirl in the delicate arms of weight adjustments. How can data analysts choreograph Analysis Data Model (ADaM) datasets to capture these nuanced doses? There is a yearning to continue with the normal dance routine of analyzing subjects based on their protocol-specified treatments, yet at times it is necessary to learn a new dance step, so as not to overlook the weight-adjusted doses the subjects actually received. The treatment variables TRTxxP/N in the Subject-Level Analysis Dataset (ADSL) and their partners TRTP/N in Basic Data Structure (BDS) and Occurrence Data Structure (OCCDS) are elegantly designed to ensure each treatment glides into its designated column in the summary tables. But we also need to preserve the weight-adjusted dose level on a subject- and record-level basis. DOSExxP and DOSExxA, gracefully twirl in the ADSL arena, while their counterparts, the dashing DOSEP and DOSEA, lead the waltz in the BDS and OCCDS datasets. Together, these harmonious variables pirouette across the ADaM datasets, capturing the very essence of the weight-adjusted doses in a dance that seamlessly unfolds.

DS-204 : ADaM Discussion Topics: PARQUAL, ADPL, Nadir
Sandra Minjoe, ICON PLC
Tuesday, 2:00 PM – 2:20 PM, Location: Key Ballroom 2

This paper and presentation will cover three topics that have been under varying levels of discussion within the CDISC ADaM team but are not part of the standard. First is the parameter-qualifier variable PARQUAL, which can be found in a couple Therapeutic Area User Guides (TAUGs), went out for public review as part of ADaMIG v1.2, but currently breaks BDS rules because it never made it into a final publication. Second is ADPL, a one-record-per-subject-per-participation dataset that might be useful for studies where subjects can enroll more than once or have multiple screening attempts, similar to the proposed SDTM DC domain. Third is Nadir variables, like Change from Nadir and Percent Change from Nadir, not currently allowed in a BDS structure. In each case, the paper and presentation will summarize CDISC ADaM team discussions and give personal (not CDISC-authorized) recommendations of when and how to implement these concepts in order to meet analysis needs.

DS-205 : A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
Crystal Cheng, SAS
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2

Pinnacle 21 Enterprise is a software provides checks on the data compliance with CDISC standards, control terminology and dictionaries when users preparing clinical data submission to regulatory agencies. By validating clinical data early and frequently during the conduction of the clinical trial, it helps user to discover data issues and address data issues in advanced, ensuring the quality of submission data. There are different ways to execute validations in P21 Enterprise. Users can either manually run the validation via user interface of P21 or, for a more automated process, execute a process flow in SAS life Sciences Analytics Framework (LSAF) to invoke the Enterprise Command Line Interface(ECLI) from P21. Integrating LSAF with P21 and setting up the validation process via a process flow is time-saving for programmers and less prone to errors during packaging and uploading datasets for P21 validation. This paper will focus on the detailed steps to set up the automated process flow of the Pinnacle 21 Validation in SAS Life Science Analytics Framework (LSAF) and explore the benefits of automating the validation process.

DS-271 : Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
Alec McConnell, BMS
Yun Peng
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2

Historically, oncology clinical trials have relied on Overall Survival (OS) and Progression Free Survival (PFS) as primary efficacy endpoints. While OS is often the most desired estimate, it requires many years of follow-up to derive an unbiased estimate from the study. Additionally, even with follow-up, OS estimates are subject to confounding due to subsequent therapies which are commonplace in the treatment of cancer. As a proxy for OS, the EMA has recommended the evaluation of Progression Free Survival 2 (PFS2). According to the EMA, “PFS2 is defined as the time from randomization (or registration, in non-randomized trials) to second objective disease progression, or death from any cause, whichever first.” In spite of this definition, PFS2 requires complex data collection and derivation. Within our oncology team at Bristol-Myers Squibb (BMS), different studies approach the derivation differently. In this paper, we will share how our team at BMS collects the relevant data to derive the PFS2 endpoint with a consistent approach in both the advanced and early settings. Furthermore, we will explain how we structure our ADAM datasets to assist in our derivation of the endpoint.

DS-274 : Guidance Beyond the SDTM Implementation Guide
Kristin Kelly, Pinnacle 21 by Certara
Michael Beers, Pinnacle 21
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2

A common misconception among preparers of SDTM data seems to be that it is sufficient to just follow the SDTM Implementation Guide when creating the datasets. The truth is that it is more complicated than that. A preparer of SDTM datasets needs to be aware of all the industry guidance available when preparing for regulatory submission, from CDISC and the regulatory agencies, but also from other organizations as well. This presentation will discuss some of the lesser-known guidance in the industry and why they should be referenced, as well as some of the impacts of not using these documents in the creation of SDTM datasets.

DS-276 : Your Guide to Successfully Upversioning CDISC Standards
Soumya Rajesh, CSG Llc. – an IQVIA Business
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2

As of 2023, newer versions of the CDISC standards (i.e., SDTM v2.0, SDTMIG v3.4, SDTM v1.7, SDTM IG v3.3, and Define.xml v2.1) are either required or supported by the industry’s regulatory agencies. This paper relays challenges and best practices the authors have experienced while up-versioning to these standards. Not all these practices are found in published standards. This paper will bring together the resources and lessons learned in one place, so that readers can skillfully navigate through the challenges of adopting these new standards. Highlights include strategies for dealing with value level metadata for variables with multiple codelist references, a new domain class, new domains, and domains referenced in TAUGs not seen in the IGs. We’ll discuss best practices for data modeling: when to use new variables, supplemental qualifiers, and targeting the appropriate domains. We’ll include experiences interpreting and dispositioning validation output from the applicable conformance rules.

DS-280 : I Want to Break Free: CRF Standardization Unleashing Automation
Laura Fazio, Formation Bio
Andrew Burd, Formation Bio
Emily Murphy, Formation Bio
Melanie Hullings, Formation Bio
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2

Achieving efficient and impactful Case Report Form (CRF) standardization in the pharmaceutical industry demands intense cross-functional collaboration and a shared understanding of the benefits. This foundation is crucial for improved data quality as well as downstream analysis and reporting automation. Deviations from standards cause manual review, increased errors, and added inefficiencies in downstream code development. To address these challenges, an internal Standards Committee led by Data Management and Systems Analytics teams was formed to gain diverse cross-functional alignment through a comprehensive charter. The charter mandates that study teams adhere to standards during study startup, with deviations requiring justification and approval from the Committee. While CRF standards are typically developed by Medical and Clinical teams, we additionally include roles with a focus on downstream analysis and reporting including our Data Science, Statistical Programming, and Clinical Analytics teams. This paper advocates for an inclusive approach to standards development, emphasizing that resulting datasets should be versatile for all downstream purposes. Such an approach unlocks the power of automation, minimizes reactivity, and fosters efficiency and continuity across clinical studies.

DS-287 : ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
Lihui Deng, Bristol Myers Squibb
Kylie Fan, BMS
Jia Li, BMS
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2

Unlike other types of solid tumors that use the RECIST 1.1 tumor response criteria, due to the particularity of prostate cancer, some common oncology efficacy endpoints, such as rPFS, ORR, time to response, and duration of response are usually based on the PCWG3 criteria. Additionally, other specific prostate cancer endpoints like PSA response rate and time to PSA progression are also based on PCWG3, involving more complex data collection and derivation than RECIST 1.1. In this paper, we will share efficacy endpoints in prostate cancer, such as PSA response and time to PSA progression. We will explain the ADaM design and data flow, and how to ensure traceability and data dependency in derivation. We successfully implemented programming for these complex endpoints, enhancing the speed and quality of effective analysis through the development of macros.

DS-305 : Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
Vibhavari Honrao, NMIMS University, Mumbai
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2

Demographic Dataset is the parent dataset which includes set of essential standard variables that describe each subject in a clinical study. One of these key variables is Unique Subject Identifier (USUBJID). SDTM IG does not provide any guidance on creation of USUBJID for pooled studies. Hence it becomes necessary to understand programming steps involved for statistical programmers. In clinical trials, there are cases wherein subjects are re-enrolled for different studies for a same compound, and it can be difficult to identify the subject while maintaining CDISC compliance. For ISS analysis, pooling of studies becomes challenging due to multiple SUBJID, RFICDTC, RFSTDTC, RFENDTC etc. within same USUBJID from different studies. This paper demonstrates various steps and programming logics involved to develop Demographic Dataset by taking hypothetical examples from multiple studies and creates pooled datasets.

DS-310 : Converting FHIR to CDASH using SAS
Pritesh Desai, sas
Mary Liang, SAS
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2

With the growing diversity of standards for collecting and presenting Real World Evidence (RWE), there is an escalating demand for the conversion of these standards into more actionable datasets. This paper demonstrates the transformation from FHIR (Fast Healthcare Interoperability Resource) to CDASH using various methods within SAS Viya. The outlined methods are easily adaptable to other standards or datasets initially presented in JSON format. Moreover, recognizing the need for accessible processes, we will highlight the creation of low/no code procedures to enhance access to these updated datasets, including the transformation of conversion work into SAS Viya Custom Steps.

DS-342 : CDISC Therapeutic Area User Guides and ADaM Standards Guidance
Karin LaPann, CDISC independent contractor
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2

One of the frequently overlooked yet immensely valuable resources for implementing standards are the CDISC Therapeutic Area User Guides (TAUGs). Presently the CDISC website hosts 49 of these guides, 23 of which incorporate ADaM sections. These guides are created by groups of CDISC standards volunteers across the industry and include medical professionals and researchers with experience in the respective disease areas. The first few years of development, these TAUGs concentrated on the collection of the data and the implementation of the SDTM to contain it. In 2014 the first TAUG with an analysis section using ADaM was published. Many TAUGs are developed with additional implementation of the analysis datasets, with ADaM compliant examples. This provides a utility to the programming community to illustrate how the SDTM datasets are further arranged for analysis. The latest initiative has been to expand these TAUGs through grants by organizations representing various diseases. One of these is the recently released Rare Diseases Therapeutic Area User Guide, partially sponsored by a grant from the National Organization for Rare Disorders (NORD) https://rarediseases.org/. This paper will describe the TAUGs developed with ADaM standards, highlighting their distinctions from prior versions. We will suggest how to use the TAUGs as a reference for conducting studies within various disease areas.

DS-353 : Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
Anbu Damodaran, Alexion Pharmaceuticals
Ram Gudavalli, Alexion Pharmaceuticals
Kumar Bhimavarapu, Alexion Pharmaceuticals
Tuesday, 11:00 AM – 11:20 AM, Location: Key Ballroom 2

This paper investigates the impact of continuous database updates during ongoing studies, particularly emphasizing EDC migrations and Protocol amendments. Through examination of practical examples, it reveals the cascading effects on CDISC datasets, as well as the resulting modifications in reporting. Moreover, the paper scrutinizes the downstream impacts of subject transfers across studies or sites, uncovering intricacies related to re-screening subjects who initially did not meet inclusion/exclusion criteria. By unraveling the complexities of these processes, the paper offers valuable insights to improve data integrity and ensure compliance with regulatory guidelines in clinical research.

DS-360 : A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
Swaroop Kumar Koduri, Ephicacy Lifescience Analytics Pvt Ltd
Shashikant Kumar, Ephicacy Lifescience Analytics
Sathaiah Sanga, Ephicacy Lifescience Analytics
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 2

Cancer is a disease where some of the body’s cells mutate, grow out of control, and spread to other body parts. The mutated cells possess the ability to infiltrate and destroy healthy body tissue all over the body. Liquid Tumors (Blood Cancer) are commonly occurring in bone marrow and the lymphatic system. In oncology clinical trials, response and progression is key to measuring survival and remission rates. In accordance with the response criteria guidelines, oncology studies are also divided into one of three subtypes. The first sub type, Solid Tumor study, usually follows RECIST (Response Evaluation Criteria in Solid Tumor) or irRECIST (immune-related RECIST). The second sub type, Lymphoma study, usually follows Cheson 1997 or 2007. Lastly, Leukemia studies follow study specific guidelines (e.g., IWCLL for Chronic Lymphocytic Leukemia). This paper will focus on the blood cancers (Lymphoma and Leukemia) also specifically show with examples SDTM and ADaM domains are used to collect the different data points in each type. This paper will show how standards are used to capture disease response and CDISC will streamline the development of clinical trial artifacts in liquid oncology studies.

DS-367 : Handling of Humoral and Cellular Immunogenicity Data in SDTM
Wei Duan, Moderna Therapeutics
Tuesday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2

PharmaSug 2024 Paper Presentations

PharmaSug 2024 Paper Presentations

PharmaSUG 2024 Paper Presentations

Sections

Advanced Programming

Data Standards

Data Visualization and Reporting

Hands-on Training

Leadership Skills

Metadata Management

Real World Evidence and Big Data

Solution Development

Statistics and Analytics

Strategic Implementation & Innovation

Submission Standards

ePosters

Abstracts

Advanced Programming

Data Standards

PharmaSUG 2024 U.S.

Overview

Conference Content

Extras

Scholarships

For Sponsors

Get Involved

For Presenters

Events

Resources