PharmaSug 2024 Paper Presentations

PharmaSug 2024 Paper Presentations

Paper presentations are the heart of a PharmaSUG conference. Here is the list including the next batch of confirmed paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 14-May-2024.

Sections

Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-102 Derek Morgan Creating Dated Archives Automatically with SAS®
AP-108 Bart Jablonski Macro Variable Arrays Made Easy with macroArray SAS package
AP-135 Lisa Mendez
& Richann Watson
LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
AP-138 Timothy Harrington An Introduction to the SAS Transpose Procedure and its Options
AP-144 Charu Shankar SAS® Super Duo: The Program Data Vector and Data Step Debugger
AP-175 Jeffrey Meyers Tips for Completing Macros Prior to Sharing
AP-191 Songgu Xie
& Michael Pannucci
& Weiming Du
& Huibo Xu
& Toshio Kimura
Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
AP-212 Samiul Haque
& Jim Box
R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
AP-218 Xinran Luo
& Weijie Yang
Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
AP-229 Vicky Yuan Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
AP-252 Frank Canale Externally Yours – Adeptly Managing Data Outside Your EDC System
AP-253 James Austrow Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
AP-256 Ian Sturdy Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
AP-268 Xiangchen Cui
& Jessie Wang
& Min Chen
A New Approach to Automating the Creation of the Subject Visits (SV) Domain
AP-289 Jianfeng Wang
& Li Cheng
Programming with SAS PROC DS2: Experience with SDTM/ADaM
AP-295 David Bosak Replicating SAS® Procedures in R with the PROCS Package
AP-298 Huitong Niu
& Yan Wang
Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
AP-349 Richann Watson
& Louise Hadden
Just Stringing Along: FIND Your Way to Great User-Defined Functions
AP-361 Chary Akmyradov Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
AP-420 Adam Yates
& Misti Paudel
& Fengming Hu
Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
AP-424 Magnus Mengelbier Adding the missing audit trail to R

Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-109 Philip Mason Analyzing your SAS log with user defined rules using an app or macro.
DS-130 Wanchian Chen SDTM Specifications and Datasets Review Tips
DS-150 Laura Elliott
& Ben Bocchicchio
Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
DS-154 Richann Watson
& Elizabeth Dennis
& Karl Miller
Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
DS-188 Wei Shao
& Xiaohan Zou
Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
DS-193 Inka Leprince
& Richann Watson
Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
DS-204 Sandra Minjoe ADaM Discussion Topics: PARQUAL, ADPL, Nadir
DS-205 Crystal Cheng A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
DS-271 Alec McConnell
& Yun Peng
Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
DS-274 Kristin Kelly
& Michael Beers
Guidance Beyond the SDTM Implementation Guide
DS-276 Soumya Rajesh Your Guide to Successfully Upversioning CDISC Standards
DS-280 Laura Fazio
& Andrew Burd
& Emily Murphy
& Melanie Hullings
I Want to Break Free: CRF Standardization Unleashing Automation
DS-287 Lihui Deng
& Kylie Fan
& Jia Li
ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
DS-305 Vibhavari Honrao Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
DS-310 Pritesh Desai
& Mary Liang
Converting FHIR to CDASH using SAS
DS-342 Karin LaPann CDISC Therapeutic Area User Guides and ADaM Standards Guidance
DS-353 Anbu Damodaran
& Ram Gudavalli
& Kumar Bhimavarapu
Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
DS-360 Swaroop Kumar Koduri
& Shashikant Kumar
& Sathaiah Sanga
A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
DS-367 Wei Duan Handling of Humoral and Cellular Immunogenicity Data in SDTM
DS-374 Reshma Radhakrishnan Implementation of composite estimands for responder analysis based on change from baseline in non-solid tumours
DS-388 Rubha Raghu
& Sugumaran Muthuraj
& Vijayakumar Radhakrishnan
& Nithiyanandhan Ananthakrishnan
Advancing the Maturation of Standardized CRF Design
DS-398 Varsha Mithun Patil
& Mrityunjay Kumar
Streamlining Patient-reported outcome (PRO) data standardization & analysis
DS-400 Steve Ross
& Ilan Carmeli
AI and the Clinical Trial Validation Process – Paving a Rocky Road
DS-406 Santosh Ranjan Game changer! The new CDISC ADaM domain ADNCA for PK/PD data analysis

Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-127 Louise Hadden The Missing(ness) Piece: Building Comprehensive, Data Driven Missingness Reports and Codebooks Dynamically
DV-155 Jeffrey Meyers Combining Functions and the POLYGON Plot to Create Unavailable Graphs Including Sankey and Sunburst Charts
DV-170 Kirk Paul Lafler Creating Custom Excel Spreadsheets with Built-in Autofilters Using SAS® Output Delivery System (ODS)
DV-186 Ilya Krivelevich
& Cixin He
& Binbin Zhang-Wiener
& Wenyin Lin
Enhanced Spider Plot in Oncology
DV-216 Margaret Wishart
& Tamara Martin
Utilizing Data Visualization for Continuous Safety and Efficacy Monitoring within Early Development
DV-222 Mrityunjay Kumar
& Shashikant Kumar
Kaplan-Meier Graph: a comparative study using SAS vs R
DV-246 Indraneel Narisetty AutoVis Oncology Presenter: Automated Python-Driven Statistical Analysis and Visualizations for Powerful Presentations
DV-278 Kuldeep Sen Standardization of the Patient Narrative Using a Metadata-driven Approach
DV-283 Tongda Che
& Danfeng Fu
Exploring the Application of FDA Medical Query (FMQ) in Visualizing Adverse Event Data
DV-293 Dave Hall Splashy Graphics Suitable for Publication? ODS LAYOUT Can Do It!
DV-313 Kostiantyn Drach
& Iryna Kotenko
Visual discovery in Risk-Based Monitoring using topological models
DV-323 Chevell Parker Tales From A Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
DV-327 Junze Zhang
& Chuanhai Tang
& Xiaohui Wang
A R Markdown Structure for Automatically Generating Presentation Slides
DV-328 Chevell Parker Next level Reporting: ODS and Open Source
DV-331 Kirk Paul Lafler Ten Rules for Better Charts, Figures and Visuals
DV-348 Murali Kanakenahalli
& Annette Bove
& Smita Sehgal
Periodic safety reports of clinical trials
DV-380 Tracy Sherman
& Aakar Shah
Amazing Graph Series: Swimmer Plot – Visualizing the Patient Journey: Adverse Event Severity, Medications, and Primary Endpoint
DV-382 Helena Belloff
& William Lee
& Melanie Hullings
A ‘Shiny’ New Perspective: Unveiling Next-Generation Patient Profiles for Medical and Safety Monitoring
DV-389 Vijayakumar Radhakrishnan
& Nithiya Ananthakrishnan
Automation and integration of data visualization using R ESQUISSE & R SHINY
DV-395 Pradeep Acharya
& Anurag Srivastav
Pictorial Representation of Adverse Events (AE) Summary- ” A new perspective to look at the AE data in Clinical Trials
DV-396 Yun Ma
& Yifan Han
Piloting data visualization and reporting with Rshiny apps
DV-433 Steve Wade
& Sudhir Kedare
& Matt Travell
& Chen Yang
& Jagan Mohan Achi
Interactive Data Analysis and Exploration with composR: See the Forest AND the Trees
DV-438 Kevin Viel Exploring DATALINEPATTERNS, DATACONTRASTCOLORS, DATASYMBOLS, the SAS System® REGISTRY procedure, and Data Attribute Maps (ATTRMAP) to assign invariant attributes to subjects and arms throughout a proj
DV-455 Vandita Tripathi
& Manas Saha
Reimagining reporting and Visualization during clinical data management
DV-456 Joshua Cook An introduction to Quarto: A Versatile Open-source Tool for Data Reporting and Visualization
DV-458 Joshua Cook
& Kirk Paul Lafler
Quarto 1.4: Revolutionizing Open-source Dashboarding Capabilities

Hands-on Training

Paper No. Author(s) Paper Title (click for abstract)
HT-101 Mathura Ramanathan
& Nancy Brucken
Deep Dive into the BIMO (Bioresearch Monitoring) Package Submission
HT-111 Bart Jablonski A Gentle Introduction to SAS Packages
HT-118 Philip Holland The Art of Defensive SAS Programming
HT-143 Charu Shankar The New Shape Of SAS Code
HT-152 Phil Bowsher GenAI to Enhance Your Statistical Programming
HT-157 Jayanth Iyengar Understanding Administrative Healthcare Datasets using SAS ‘ programming tools.
HT-197 Dan Heath Building Complex Graphics from Simple Plot Types
HT-201 Ashley Tarasiewicz
& Chelsea Dickens
Transitioning from SAS to R
HT-413 Richann Watson
& Josh Horstman
Complex Custom Clinical Graphs Step by Step with SAS® ODS Statistical Graphics
HT-459 Troy Hughes Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures

Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-134 Patrick Grimes Recruiting Neurodivergent Candidates using the Specialisterne Approach
LS-167 Kirk Paul Lafler Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
LS-176 Jeff Xia
& Simiao Ye
Effectively Manage the Programming Team Using MS Team
LS-286 Priscilla Gathoni Unlock Your Greatness: Embrace the Power of Coaching
LS-304 Diana Avetisian Translation from statistical to programming: effective communication between programmers and statisticians
LS-317 LaNae Schaal What Being a Peer-to-Peer Mentor Offers – Perspective from an Individual Project Level Contributor
LS-335 Monali Khanna Creating a Culture of Engagement – Role of a Manager
LS-345 Christiana Hawn
& Lily Ray
Leadership Lessons from Another Life: How my Previous Career Helped Me as a Statistician
LS-351 Anbu Damodaran
& Neha Srivastava
A Framework for Risk-Based Oversight for Fully Outsourced Clinical Studies
LS-357 Purvi Kalra
& Varsha Patil
Harmony in Motion: Nurturing Work-Life Balance for Sustainable Well-being
LS-371 Dilip Raghunathan Go Get – Em: Manager’s Guide to Make a Winning Business Proposal for Technology Solutions
LS-383 Mathura Ramanathan Ongoing Trends and Strategies to Fine-tune the CRO/Sponsor Partnership – Perspectives from Statistical Programming
LS-410 Josh Horstman
& Richann Watson
Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
LS-443 Melanie Hullings
& Andrew Burd
& Helena Belloff
& Emily Murphy
Data Harmony Revolution: Rocking Trials with Clinical Data Literacy

Metadata Management

Paper No. Author(s) Paper Title (click for abstract)
MM-225 Kang Xie Variable Subset Codelist
MM-226 Jeetender Chauhan
& Madhusudhan Ginnaram
& Sarad Nepal
& Jaime Yan
Methodology for Automating TOC Extraction from Word Documents to Excel
MM-240 Avani Kaja Managing a Single Set of SDTM and ADaM Specifications across All Your Phase 1 Trials
MM-245 Trevor Mankus Relax with Pinnacle 21’s RESTful API
MM-267 Xiangchen Cui
& Min Chen
& Jessie Wang
A Practical Approach to Automating SDTM Using a Metadata-Driven Method That Leverages CRF Specifications and SDTM Standards
MM-358 Lakshmi Mantha
& Purvi Kalra
& Arunateja Gottapu
Optimizing Clinical Data Processes: Harnessing the Power of Metadata Repository (MDR) for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Efficacy (ISE)
MM-447 Vandita Tripathi
& Manas Saha
Automating third party data transfer through digitized Electronic DTA Management

Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-125 Ajay Gupta
& Natalie Dennis
Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot’s Algorithm: Step-by-Step Programming in R
RW-227 Yu Feng A SAS® Macro Approach: Defining Line of Therapy Using Real-World Data in Oncology
RW-275 Catherine Briggs
& Sherrine Eid
& Samiul Haque
& Robert Collins
Win a PS5! How to Run and Compare Propensity Score Matching Performance Across Multiple Algorithms in Five Minutes or Less
RW-390 Ryan Lafler
& Anna Wade
Unraveling the Layers within Neural Networks: Designing Artificial and Convolutional Neural Networks for Classification and Regression Using Python’s Keras & TensorFlow
RW-421 Sherrine Eid
& Robert Collins
& Samiul Haque
Applications of Machine Learning and Artificial Intelligence in Real World Data in Personalized Medicine for Non-Small Cell Lung Cancer Patients
RW-450 Lorraine Johnson
& Lara Kassab
& Jingyi Liu
& Deanna Needell
& Mira Shapiro
Towards understanding Neurological manifestations of Lyme disease through a machine learning approach with patient registry data
RW-453 Joshua Cook
& Achraf Cohen
Interfacing with Large-scale Clinical Trials Data: The Database for Aggregate Analysis of ClinicalTrials.gov

Solution Development

Paper No. Author(s) Paper Title (click for abstract)
SD-141 Kevin Lee “Prompt it”, not “Google it” : Prompt Engineering for Statistical Programmers and Biostatisticians
SD-165 Kirk Paul Lafler
& Ryan Lafler
& Joshua Cook
& Stephen Sloan
Benefits, Challenges, and Opportunities with Open-Source Software Integration
SD-166 Kirk Paul Lafler The 5 CATs in the Hat – Sleek Concatenation String Functions
SD-179 Jim Box
& Samiul Haque
Developing Web Apps in SAS Visual Analytics
SD-198 Chengxin Li AutoSDTM Design and Implementation With SAS Macros
SD-200 Illia Skliar Bridging AI and Clinical Research: A New Era of Data Management with ChatGPT
SD-211 Matt Maloney Utility Macros for Data Exploration of Clinical Libraries
SD-217 William Wei
& Shunbing Zhao
Semi-Automated and Modularized Approach to Generate Tables for Clinical Study – Categorical Data Report
SD-239 Hong Qi
& Mary Varughese
Automation of Report Generation Beyond Macro
SD-243 Lakshmi Mantha
& Inbasakaran Ramesh
Unravelling the SDTM Automation Process through the Utilization of SDTM Transformation Template
SD-255 Danfeng Fu
& Dickson Wanjau
& Ben Gao
Define-XML Conversion: A General Approach on Content Extraction Using Python
SD-262 Bart Jablonski Integration of SAS GRID environment and SF-36 Health Survey scoring API with SAS Packages
SD-266 Amy Zhang
& Huei-Ling Chen
A Tool for Automated Comparison of Core Variables Across ADaM Specifications Files
SD-318 Yunsheng Wang
& Erik Hansen
& Chao Wang
& Tina Wu
Streamlined EDC data to SDTM Mapping with Medidata RAVE ALS
SD-343 Benjamin Straub Two hats, one noggin: Perspectives on working as a developer and as a user of the admiral R package for creating ADaMs.
SD-356 Kevin Viel Standardizing Validation Data Sets (VALDS) as matrices indexed by Page, Section, Row, and Columns (PSRC) to improve Validation and output creation and revisions.
SD-365 Zhihao Luo Readlog Utility: Python based Log Tool and the First Step of a Comprehensive QC System
SD-370 Yongjiang (Jerry) Xu
& Karen Xu
& Suzanne Viselli
Enhancing FDA Debarment List Compliance through Automated Data Analysis Using Python and SAS
SD-397 Saurabh Das
& Rohit Kadam
& Rajasekhar Gadde
& Niketan Panchal
& Saroj Sah
Advancing Regulatory Intelligence with conversational and generative AI
SD-401 Xinran Hu
& Jeff Xia
Excel Email Automation Tool: Streamlining Email Creation and Scheduling
SD-412 Sherrine Eid
& Sundaresh Sankaran
Safety Signals from Patient Narratives PLUS: Augmenting Artificial Intelligence to Enhance Generative AI Value
SD-426 Rajprakash Chennamaneni
& Sudhir Kedare
& Jagan Mohan Achi
Shift gears with ‘gt’: Finely tuned clinical reporting in R using “gt” and “gt summary” packages
SD-429 Bill Zhang
& Jun Yang
Build up Your Own ChatGPT Environment with Azure OpenAI Platform
SD-431 Steve Wade
& Sudhir Kedare
& Matt Travell
& Chen Yang
& Jagan Mohan Achi
inspectoR: QC in R? No Problem!
SD-444 Troy Hughes Five Reasons To Swipe Right on PROC FCMP, the SAS Function Compiler for Building Modular, Maintainable, Readable, Reusable, Flexible, Configurable User-Defined Functions and Subroutines

Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
ST-113 Girish Kankipati
& Jai Deep Mittapalli
Multiple Logistic Regression Analysis using Backward Selection Process on Objective Response Data with SAS®
ST-164 Kirk Paul Lafler Data Literacy 101: Understanding Data and the Extraction of Insights
ST-192 Igor Goldfarb
& Sharma Vikas
Generative Artificial Intelligence in sample size estimation – challenges, pitfalls, and conclusions
ST-199 Yuting Peng
& Ruohan Wang
Demystifying Incidence Rates: A Step-by-Step Guide to Adverse Event Analysis for Novice Programmers
ST-208 Vadym Kalinichenko Bayesian Methods in Survival Analysis: Enhancing Insights in Clinical Research
ST-251 Isabella Wang
& Jin Xie
& Lauren George
Dealing with Missing Data: Practical Implementation in SAS and R
ST-297 Christiana Hawn
& Dhruv Bansal
Relative Dose Intensity in Oncology Trials: A Discussion of Two Approaches
ST-303 Ibrahim Priyana Hardjawidjaksana
& Els Janssens
& Ellen Winckelmans
Source Data Quality Issues in PopPK/PD Dataset Programming: a Systematic Approach to Handle Duplicates
ST-334 Ethan Brockmann
& Dong Xi
Versatile and efficient graphical multiple comparison procedures with {graphicalMCP}
ST-338 Michael Lamm Bayesian Additive Regression Trees for Counterfactual Prediction and Estimation
ST-339 Fang Chen
& Yi Gong
Bayesian Hierarchical Models with the Power Prior Using PROC BGLIMM
ST-366 Chuck Kincaid MLNR or Machine Learning in R
ST-381 Peng Zhang
& Lizhong Liu
& Tai Xie
Opportunities and Challenges for R as an open-sourced solution for statistical analysis and reporting, from vendor’s perspective
ST-414 Richard Moreton
& Lata Maganti
Estimating Time to Steady State Analysis in SAS
ST-425 Sudhir Kedare
& Steve Wade
& Chen Yang
& Matthew Travell
& Jagan Mohan Achi
iCSR: A Wormhole to Interactive Data Exploration Universe
ST-461 Ishwar Chouhan Oncology ADaM Datasets Creation Using R Programming: A Comprehensive Approach

Strategic Implementation & Innovation

Paper No. Author(s) Paper Title (click for abstract)
SI-136 Ke Xiao Agile, Collaborative, Efficient (ACE): A New Perspective on Data Monitoring Committee Data Review Preparation
SI-140 Kevin Lee A fear of missing out and a fear of messing up : A Strategic Roadmap for ChatGPT Integration at Company Level
SI-160 Jason Zhang
& Jaime Yan
LLM-Enhanced Training Agent for Statistical Programming
SI-185 Binal Mehta
& Patel Mukesh
The Role of the Blinded Programmer in Preparation of Data Monitoring Committee Packages (for Clinical Trials)
SI-189 Vidya Gopal Automating the annotation of TLF mocks Using Generative AI
SI-190 Ruohan Wang
& Chris Qin
Navigating Success: Exploring AI-Assisted Approaches in Predicting and Evaluating Outcome of Clinical Trials and Submissions
SI-230 Todd Case
& Margaret Huang
Quality Assurance within Statistical Programming: A Systemic Way to Improve Quality Control
SI-269 Juliane Manitz
& Anuja Das
& Antal Martinecz
& Jaxon Abercrombie
& Doug Kelkhoff
Validating R for Pharma – Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical Work
SI-291 Nancy Brucken
& Mary Nilsson
& Greg Ball
PHUSE Safety Analytics Working Group – Overview and Deliverables Update
SI-319 Lydia King A Change is Gonna Come: Maintaining Company Culture, Managing Time Zones, and Integrating Teams after a Global Acquisition
SI-346 Chaitanya Pradeep Repaka
& Santhosh Karra
aCRF Copilot: Pioneering AI/ML Assisted CRF Annotation for Enhanced Clinical Data Management Efficiency
SI-362 Karma Tarap
& Nicole Thorne
& Tamara Martin
& Derek Morgan
& Pooja Ghangare
SASBuddy: Enhancing SAS Programming with Large Language Model Integration
SI-391 Chaitanya Pradeep Repaka
& Santhosh Karra
Facilitating Seamless SAS-to-R Transition in Clinical Data Analysis: A Finetuned LLM Approach
SI-408 Manuela Koska
& Veronika Csom
Agile Sponsor Oversight of Statistical Programming Activities
SI-446 Shilpa Sood
& Sridhar Vijendra
One size does not fit all: The need and art of customizing SCE and MDR for end users
SI-452 Amit Javkhedkar
& Sridhar Vijendra
Embracing Diversity in Statistical Computing Environments: A Multi-Language Approach

Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-132 Jai Deep Mittapalli
& Girish Kankipati
BIMO Brilliance: Your Path to Compliance Resilience
SS-133 Jai Deep Mittapalli
& Jinit Mistry
& Venkatesulu Salla
Cultivating Success with Non-standard Investigator-sponsored Trial Data for FDA Submissions
SS-137 David Izard Study Start Date – Let’s Get it Right!
SS-213 Sandra Minjoe Is a Participation-Level ADaM Dataset a Solution for Submitting Integration Data to FDA?
SS-263 Vicky Yuan Creating Adverse Event Tables using R and SASSY System
SS-290 Robin Wu
& Lili Li
& Steven Huang
Combine PDFs in Submission-ready Format Quick and Easy
SS-306 Swaroop Neelapu Leveraging SAS and Adobe Plug-in for CRF Bookmark Generation (Rave studies)
SS-311 Yilan Xu
& Hu Qu
& Tina Wu
How to generate a submission ready ADaM for complex data
SS-333 Hanne Ellehoj
& Veeresh Namburi
Lead-in and extension trials, how we documented datapoint traceability
SS-344 Benjamin Straub Piloting into the Future: Publicly available R-based Submissions to the FDA
SS-363 Hiba Najeeb
& Raghavender Ranga
A Programmer’s Insight into an Alternative to TQT Study Data Submission
SS-368 Rashmi Gundaralahalli Ramesh
& Jeffrey Lavenberg
Design Considerations for ADaM Protocol Deviations Dataset in Vaccine Studies
SS-376 André Veríssimo
& Ismael Rodriguez
Experimenting with Containers and webR for Submissions to FDA in the Pilot 4
SS-377 Wei Duan Challenges and Considerations When Building e-Submission SDTM Data Packages
SS-422 Flora Mulkey Submitting Patient-Reported Outcome Data in Cancer Clinical Trials- Guidance for Industry Technical Specifications Document

ePosters

Paper No. Author(s) Paper Title (click for abstract)
PO-106 Xianhua Zeng No LEAD Function? Let’s Create It!
PO-123 Kevin Sun Enhancing Define-XML Generation: Based on SAS Programming and Pinnacle 21 Community
PO-128 Louise Hadden A Deep Dive into Enhancing SAS/GRAPH® and SG Procedural Output with Templates, Styles, Attributes, and Annotation
PO-129 Varsha Ganagalla
& Natalie Johnson
The Survival Mode
PO-145 Jason Su Integrity, Please: Three Techniques for One-Step Solution in Pharmaceutical Programming
PO-158 Jayanth Iyengar If its not broke, don’t fix it; existing code and the programmers’ dilemma
PO-194 Elizabeth Li
& Carl Chesbrough
& Inka Leprince
Updates on Preparing a BIMO Data Package
PO-195 Yi Guo A Simple Way to Make Adaptive Pages in Listings and Tables
PO-196 Yi Guo Comparing SAS® and R Approaches in Creating Multicell Dot Plots in Statistical Programming
PO-231 Michael Stout Best Function Ever: PROC FCMP
PO-258 Madhavi Gundu
& Vivek Jayesh Mandaliya
An approach to make Data Validation and Reporting tool using R Shiny for Clinica Data Validation
PO-292 Julie Ann Hood
& Jennifer Manzi
Elevate Your Game: Leveling Up SDTM Validation with the Magic of Data Managers
PO-299 Chen Li
& Hong Wang
& Ke Xiao
Upholding Blinding in Clinical Trials: Strategies and Considerations for Minimizing Bias
PO-324 David Franklin Plotting Data by US ZIP Code
PO-440 Oliver Lu
& Katie Watson
The SAS Genome – Genetic Sequencing
PO-451 Vandita Tripathi
& Manas Saha
Simplifying Edit Check Configuration

Abstracts

Advanced Programming

AP-102 : Creating Dated Archives Automatically with SAS®
Derek Morgan, Bristol Myers Squibb
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4

When creating patient profiles, it can be useful for clinical scientists to compare current data with previous data in real time without having to request those data from an Information Technology (IT) source. This is a method for using SAS® to perform the archiving via a scheduled daily job. The primary advantage of SAS over an operating script is its date handling ability, removing many difficult calculations in favor of intervals and functions. This paper details an application that creates dated archive folders and copies SAS data sets into those dated archives, with automated aging and deletion of old data and folders. The application allows clinical scientists to customize their archive frequency (within certain limits.) It also keeps storage requirements to a minimum as defined by IT. This replaced a manual process that required study programmers to create the archives, eliminating the possibility of missed or incorrectly dated archives. The flexibility required for this project and the conditions under which it ran required using SAS date and time intervals and their functions. SAS was used to manipulate the files and directories.

AP-108 : Macro Variable Arrays Made Easy with macroArray SAS package
Bart Jablonski, yabwon
Monday, 10:30 AM – 11:20 AM, Location: Key Ballroom 4

A macro variable array is a jargon term for a list of macro variables with a common prefix and numerical suffixes. Macro arrays are valued by advanced SAS programmers and often used as “driving” lists, allowing sequential metadata for complex or iterative programs. Use of macro arrays requires advanced macro programming techniques based on indirect reference (aka, using multiple ampersands &&), which may intimidate less experienced programmers. The aim of the paper is to introduce the macroArray SAS package. The package facilitates a solution that makes creation and work with macro arrays much easier. It also provides a “DATA-step-arrays-like” interface that allows use of macro arrays without complications that arise from indirect referencing. Also, the concept of a macro dictionary is presented, and all concepts are demonstrated through use cases and examples.

AP-135 : LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4

This paper provides an overview of six SAS CALL subroutines that are frequently used by SAS® programmers but are less well-known than SAS functions. The six CALL subroutines are CALL MISSING, CALL SYMPUTX, CALL SCAN, CALL SORTC/SORTN, CALL PRXCHANGE, and CALL EXECUTE. Instead of using multiple IF-THEN statements, the CALL MISSING subroutine can be used to quickly set multiple variables of various data types to missing. CALL SYMPUTX creates a macro variable that is either local or global in scope. CALL SCAN looks for the nth word in a string. CALL SORTC/SORTN is used to sort a list of values within a variable. CALL PRXCHANGE can redact text, and CALL EXECUTE lets SAS write your code based on the data. This paper will explain how those six CALL subroutines work in practice and how they can be used to improve your SAS programming skills.

AP-138 : An Introduction to the SAS Transpose Procedure and its Options
Timothy Harrington, Navitas Data Sciences
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 4

PROC TRANSPOSE is a SAS(r) procedure for arranging the contents of a dataset column from a vertical to a horizontal layout based on selected BY variables. This procedure is particularly useful for efficiently manipulating clinical trials data with a large number of observations and groupings as is often found in laboratory analysis or vital signs data. The use of PROC TRANSPOSE is illustrated with examples showing different modes of arranging the data. Possible problems which can occur when using this procedure, and their solutions are also discussed.

AP-144 : SAS® Super Duo: The Program Data Vector and Data Step Debugger
Charu Shankar, SAS Institute
Tuesday, 11:00 AM – 11:50 AM, Location: Key Ballroom 4

Whether you are a self-taught SAS learner with a lot of experience, or a novice just entering the SAS universe, you may not have spent a lot of time delving into two fantastic SAS® superpowers. The Program Data Vector (PDV) is where SAS processes one observation at a time, in memory. The Data Step Debugger is an excellent tool to actually see the observation being held in memory and watch the movement of data from input to memory to output. Combining these two tools supplies SAS practitioners a lot of utility to “get under the hood” of how SAS code works in practice to ingest and analyze data during program operations. Once you know the specifics of what happens during compile time / execution, joins, and creating arrays, efficient SAS code will be at your fingertips. Action packed with animations, live demos and a great hands on section, this presentation will likely be a resource that you will use and reuse now and in the future

AP-175 : Tips for Completing Macros Prior to Sharing
Jeffrey Meyers, Regeneron Pharmaceuticals
Monday, 4:00 PM – 4:20 PM, Location: Key Ballroom 4

SAS macros are a programmer’s best friend when written well, and their worst nightmare when not. Macros are a powerful tool within SAS for automating complicated analyses or completing repetitive tasks. The next step after building a capable tool is to share it with others. The creator of the macro does not have much time to catch the attention of the user. The user encountering multiple errors, no documentation or guides, and lack of intuitive features pushes the user away from the macro. This paper will focus on completing a macro to give the user the best possible experience prior to sharing.

AP-191 : Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
Songgu Xie, Regeneron Pharmaceuticals
Michael Pannucci, Arcsine Analytics
Weiming Du, Alnylam Pharmaceuticals
Huibo Xu, Greenwich High School
Toshio Kimura, Arcsine Analytics
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4

Generative artificial intelligence using large language models (LLMs) such as ChatGPT is an emerging trend. However, discussions using LLMs in biostatistics and statistical programming have been somewhat limited. This paper provides a comprehensive evaluation of major LLMs (ChatGPT, Bing AI, Google BARD, Anthropic Claude 2) in their utility within biostatistics and statistical programming (SAS and R). We tested major LLMs across several challenges: 1) Conceptual Knowledge, 2) Code Generation, 3) Error Catching/Correcting, 4) Code Explanation, and 5) Programming Language Translation. Within each challenge, we asked easy, medium and advanced difficulty level questions related to three topics: Data, Statistical Analysis, and Display Generation. After providing the same prompts to each LLM, responses were captured and evaluated. For some prompts, LLMs provided incorrect responses, also known as “hallucinations.” Although LLMs replacing biostatisticians and statistical programmers may be overhyped, there are nevertheless use cases where LLMs are helpful in assisting statistical programmers.

AP-212 : R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
Samiul Haque, SAS Institute
Jim Box, SAS Institute
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 4

The integration of different programming languages and tools is pivotal for translational data science. R Shiny is the most popular tool for building web applications in R. However, biostatisticians and data scientists often prefer to leverage SAS Procs or macros for clinical decision making. The world of R Shiny and SAS does not need to be decoupled. R Shiny applications can incorporate SAS procs and analytics. In this work, we present mechanisms for integrating R Shiny and SAS. We demonstrate how SAS Procs and macros can be executed from R Shiny front end and SAS logs and results can be printed within Shiny App.

AP-218 : Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
Xinran Luo, Everest Clinical Research
Weijie Yang, Everest Clinical Research
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4

AI language like ChatGPT has impressed and even intimidated programmers. There are discussions of ChatGPT with examples of simple SAS steps and there are descriptions of various usages of ChatGPT without examples, but few papers discuss the use of ChatGPT in SAS macro development with examples. This paper explores the utility of ChatGPT in enhancing the process of writing SAS macros from scratch, using an example of checking SAS log in batch on Windows, and comparing the process of using conventional search engines. The focus is not only on utilizing ChatGPT’s capabilities to provide programmers with initial ideas of program structure when they encounter unusual work requests, but also on demonstrating its application in developing a robust macro by showing key steps of the conversations between programmers and ChatGPT. Although ChatGPT proves invaluable in offering insights and suggestions, it’s imperative to acknowledge certain caveats. Not all responses provided by ChatGPT are infallible, especially in the context of technical domains like SAS programming. Emphasizing the importance of independent verification, this paper underscores the need for users, especially new learners of SAS, to scrutinize and validate the suggestions before implementation. This paper aims to empower SAS practitioners by showcasing how ChatGPT can complement their macro-writing endeavors. By highlighting both the potentials and limitations of leveraging AI language models like ChatGPT, this paper contributes to fostering a balanced and discerning approach towards utilizing AI-driven assistance in SAS programming and macro development.

AP-229 : Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
Vicky Yuan, Incyte Coperation
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4

Shift summary of laboratory values in CTCAE grade to the worst grade abnormal value is often required for most laboratory data analysis and submission. The purpose of CTCAE grade shift table is to presents how the results are varying from the baseline to post-base visits in the study. This paper will illustrate how to report a shift table using R and packages from the SASSY system. It will start from an example and explain the anatomy, then a step-wise explanation of how to report the table in .doc file. The example is interesting because it contains “internal” footnotes that can change on every page. The R product used in this paper is R SASSY package version 1.2.0 running on RStudio environment

AP-252 : Externally Yours – Adeptly Managing Data Outside Your EDC System
Frank Canale, SoftwaRx, LLC
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4

Programmers in the pharmaceutical industry are used to working with data that is entered into, and extracted from, a system commonly known as an EDC (Electronic Data Capture) system. When using data that is sourced from one of these systems, you can reliably count on the type of data you’ll receive (normally SAS datasets), and if the EDC is set up well, a standard structure that provides output data containing CDISC/CDASH variable names. But what does one do when receiving data that is sourced outside the EDC system and received from other vendors? How do you manage this data- retrieve it- validate the structure- even export it to a format allowing you to merge it with other more conventional SAS datasets?

AP-253 : Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
James Austrow, Cleveland Clinic
Monday, 5:00 PM – 5:20 PM, Location: Key Ballroom 4

The PDF is one of the most ubiquitous file formats and can be read on nearly every computing platform. So how, in the year 2024, can it still be so inconvenient to perform basic editing tasks such as concatenating and merging files, inserting page numbers, and creating bookmarks? These features are often locked behind paid licenses in proprietary software or require that the documents be uploaded to a web server, the latter of which poses unacceptable security risks. In fact, the PDF is a public standard and there exist free, open-source libraries that make it easy to build in-house solutions for these and many other common use cases. In this paper, we demonstrate how to use Python to assemble and customize PDF documents into a final, polished deliverable. We will also lay the foundation for automating these tasks, which can save countless hours on reports that have to be prepared on a regular basis.

AP-256 : Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
Ian Sturdy, Eli Lilly and Company
Tuesday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4

This paper explores the potential benefits of incorporating ChatGPT, a state-of-the-art natural language processing model, in statistical programming within the pharmaceutical industry. By leveraging ChatGPT’s capabilities, this technology can save time, money, and most importantly, your sanity. Programming often leads to frustration, anxiety, and sleepless nights trying to solve complex problems. Various practical applications and techniques that harness the power of ChatGPT will be described to reduce all of these. In a world where Artificial Intelligence threatens to take our jobs, this paper suggests methods of tapping into the untapped potential of ChatGPT to empower programmers with innovative tools, thereby increasing our value. When programming issues arise, no longer will you need to worry about judgement or hostility from others on online forums. ChatGPT is a powerful tool we have yet to fully leverage, and its benefits extend well beyond our imaginations, let alone this paper.

AP-268 : A New Approach to Automating the Creation of the Subject Visits (SV) Domain
Xiangchen Cui, Crisprtx Therapeutics
Jessie Wang, CRISPR Therapeutics
Min Chen, CRISPR Therapeutics
Tuesday, 1:30 PM – 2:20 PM, Location: Key Ballroom 4

The creation of the subject visits (SV) domain is one of the most challenging tasks of SDTM programming. Aside from the small portion of mapping from raw dataset variables to SV variables, SV programming mainly consists of a more complex derivation process, which is totally different from that of other SDTM domains. The dynamic parts of the SV programming process, such as identifying raw datasets and their variables with both date/time and clinical visits, cause manual development of a SAS program to be time-consuming and error prone. Hence, automating its code generation would achieve and enhance efficiency and accuracy. This paper will present a new approach for SV automation based on the SDTM automation done in our previous paper, which leveraged CRF specifications from an EDC database and SDTM standards [1]. It will introduce the standard SV programming logic flow with 10 sequential steps, which leads us to develop an additional SAS-based macro named %SV_Code_Generator as an expansion to the macro introduced in [1]. The output of this macro (SV.sas) achieves 100% automation of SV domain for the raw data collected per CRFs in a clinical study. This new approach guarantees all raw dataset variables related to subject visits are accounted for in SV programming thanks to the sequential programming automations. This automation allows for the generation of SV dataset to occur very early in the programming development cycle and makes developing programmatic quality checks for clinical data review and data cleaning more efficient and economically feasible.

AP-289 : Programming with SAS PROC DS2: Experience with SDTM/ADaM
Jianfeng Wang, UNIVERSITY OF MINNESOTA TWIN CITIES, MINNEAPOLIS, MINNESOTA
Li Cheng, Vertex Pharmaceuticals Inc.
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 4

PROC DS2 is a procedure introduced with SAS Base 9.4. This procedure provides opportunities for SAS programmers to apply Object Oriented Programming (OOP) and multithread techniques in SAS programming and is a critical connection between the – traditional’ SAS programming and programming in SAS Viya platform. The goal of this paper is to pilot the use of PROC DS2 in the work of preparing clinical trial CDISC datasets. In this paper, PROC DS2 is tested in the programming of SDTM/ADaM on a server with SAS Base 9.4 M3 release. After converting SDTM/ADaM programs written in – traditional’ SAS programming language into the PROC DS2 code, this paper presents the lessons learned and the notes taken when the obstacles are overcome or bypassed. Furthermore, OOP and multithread techniques are explored to apply into the programming for SDTM/ADaM. Programming setups with a standard folder structure are discussed and the performance of using OOP and multithread techniques are also evaluated.

AP-295 : Replicating SAS® Procedures in R with the PROCS Package
David Bosak, r-sassy.org
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 4

The “procs” package aims to simulate some commonly used SAS® procedures in R. The purpose of simulating SAS procedures is to make R easier to use and match statistical results. Another important motivation is to provide stable tools to work with in the pharmaceutical industry. The package replicates several of the most frequently used procedures, such as PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG. The package also contains some data manipulation procedures like PROC TRANSPOSE and PROC SORT. This paper will present an overview of the package and provide demonstrations for each function.

AP-298 : Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
Huitong Niu, Master of Science Student, Biostatistics, Fielding School of Public Health, University of California, Los Angeles
Yan Wang, Adjunct Assistant Professor, Public and Population Health, School of Dentistry, University of California, Los Angeles
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4

This paper examines effective techniques for merging longitudinal datasets with key variable inaccuracies, focusing on date errors. Traditional SAS methods, like the DATA Step MERGE or PROC SQL JOIN, require exact matches on key variables, which is challenging in datasets containing errors. Our paper compares fuzzy matching and clustering analysis within SAS, assessing their effectiveness in reconciling datasets with inconsistencies in date variables. We simulate a longitudinal dataset of approximately 2,000 observations, representing about 500 patients with repeated measurements. The dataset is used to simulate two datasets including normally (or uniformly) distributed errors on date, manually introduced errors (e.g., typing “12” as “21”), and missing date information (e.g., entering “06/23” instead of “12/06/2023”). For each scenario, we use fuzzy matching and clustering analysis to merge two datasets, evaluating the accuracy of each technique. Preliminary results show varied effectiveness depending on the type of error on the date variable. For datasets with normally (or uniformly) distributed errors on date, clustering analysis significantly outperforms fuzzy matching with a 94.9% accuracy rate compared to 54.1%. In the case of manually introduced errors, both methods achieve high accuracy, around 98%. However, for datasets with missing date information, fuzzy matching is more effective, attaining an 84.4% accuracy rate as opposed to 45.2% for clustering analysis. The paper concludes with a discussion of these findings, offering insights for researchers on selecting appropriate methods for merging datasets with errors on date.

AP-349 : Just Stringing Along: FIND Your Way to Great User-Defined Functions
Richann Watson, DataRich Consulting
Louise Hadden, Abt Global Inc.
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 4

SAS® provides a vast number of functions and subroutines (sometimes referred to as CALL routines). These useful scripts are an integral part of the programmer’s toolbox, regardless of the programming language. Sometimes, however, pre-written functions are not a perfect match for what needs to be done, or for the platform that required work is being performed upon. Luckily, SAS has provided a solution in the form of the FCMP procedure, which allows SAS practitioners to design and execute User-Defined Functions (UDFs). This paper presents two case studies for which the character or string functions SAS provides were insufficient for work requirements and goals and demonstrate the design process for custom functions and how to achieve the desired results.

AP-361 : Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
Chary Akmyradov, Arkansas Children’s Research Institute
Monday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4

This paper delves into the optimization of repetitive tasks in SAS programming, a common challenge faced by data analysts and programmers. The primary focus is on harnessing the power of SAS macro programming techniques, specifically through the implementation of do loops within macros. Initially, the paper introduces the basics of SAS macros, outlining their significance in automating repetitive sequences of code, and providing a foundational understanding of macro variables and syntax. The discussion then progresses to the implementation of simple do loops within macros, highlighting their practicality in routine data manipulation tasks. Through a series of practical examples and use-case scenarios, the paper demonstrates the effectiveness of these loops in real-world applications. Addressing the limitations of these simple implementations, the paper further explores the generalization of do loops, presenting advanced methods to create dynamic, parameter-driven macros capable of handling a variety of tasks and parameters. This advanced approach is exemplified through complex scenarios and case studies, showcasing the adaptability and efficiency of generalized do loops in diverse data analysis contexts. By the conclusion, the paper provides a comprehensive insight into the role of macro programming in SAS, offering a valuable resource for SAS programmers seeking to streamline their coding workflow and enhance efficiency in data processing tasks. This work not only serves as a practical guide for current SAS users but also contributes to the broader conversation on the future of macro programming in data analysis.

AP-420 : Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
Adam Yates, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Misti Paudel, Brigham and Women’s Hospital Division of Rheumatology, Inflammation, and Immunity, Harvard School of Medicine
Fengming Hu, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Monday, 2:00 PM – 2:50 PM, Location: Key Ballroom 4

Synthetic data for clinical trials independent of human participants has growing utility in clinical and epidemiologic fields, but a persistent concern has been the viability and reliability of producing synthetic data which conforms to the complex nature of biomedical data. Recent successes in synthetic clinical trial data include the use of Synthetic Control Arm (SCA) applications, but the generation of treatment-related data necessarily faces additional scrutiny. While synthetic data cannot replace trail data for scientific discovery, planning and development phases of clinical trials can benefit from the use of synthetic treatment and control data. This paper describes a novel program developed in base SAS which generates synthetic data that was used in clinical trial development, design, and report programming. We developed a stochastically grounded process which generates synthetic data of population-specific enrollment characteristics, as well as longitudinal local and systematic reactogenicity, unsolicited events, and adverse events. We implement a discrete-time Markov process framework to generate longitudinal observation time, incorporating a Poisson-based probability of events within each state. This 2-phase stochastic generation process results in across observation time which conforms to biologically natural and realistic behaviors. Key to our process is that reaction frequency may be modulated based on expert experience or historical expectations, but the generated data do not rely directly on existing clinical data. Potential applications and extensions in a machine learning context will be discussed. This paper is intended for individuals with an interest in clinical trial data and a basic to intermediate command of SAS Macro processing.

AP-424 : Adding the missing audit trail to R
Magnus Mengelbier, Limelogic AB
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4

The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned. The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting. We also briefly consider how the same principles can be extended to other languages such as Python, SAS, Java, etc.

Data Standards

DS-109 : Analyzing your SAS log with user defined rules using an app or macro.
Philip Mason, Wood Street Consultants
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 2

SAS provide some pretty basic help with logs that are produced, typically just linking to errors and warnings. Many people build log checkers to look for particular things of interest in their logs, which usually involves capturing the log and then running some SAS code against it. I made a way to define rules in JSON format which can be read by a SAS macro and used to look for things in a log. This means different rules can be used for different use cases. They can be used via a macro or via a web application I build. The web app can switch between rules, provides summaries, draws diagrams of the code, provides performance stats, and more. Hopefully this functionality might one day be built into SAS, but in the meantime it works well as an addition.

DS-130 : SDTM Specifications and Datasets Review Tips
Wanchian Chen, AstraZeneca
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 2

SDTM requirements are spread across various sources such as SDTM Implementation Guide (SDTMIG) domain specifications section, SDTMIG domain assumptions section, and FDA Study Data Technical Conformance Guide. While Pinnacle 21 can assist in identifying issues with SDTM data, it is important to note that data is often limited at the early stages of a study. The most efficient process would be to review SDTM specifications before the creation of SDTM programs, to minimize program modifications and save time. Programmers often seek guidance on conducting a comprehensive review of SDTM but unsure where to start. In this presentation, I will provide a concise summary of frequently seen, domain specific as well as general, findings observed in multiple studies when reviewing SDTM. I will show which issues can be seen in the Pinnacle 21 report and which ones are missed. I will also cover situations where variables are not applicable to your study, but still may pass Pinnacle 21 checks. This presentation is designed to benefit programmers involved in SDTM review process.

DS-150 : Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
Laura Elliott, SAS Institute Inc.
Ben Bocchicchio, SAS
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2

Utilization of a cloud-based repository has become increasingly more common with large clinical trials. Verifying the integrity of data moved into the cloud for clinical trials is of utmost importance. Normally, this process requires manual intervention to verify the local source data matched the data stored in the cloud-based system. This paper discusses a process that will automate the creation of a verification report comparing md5 checksums from source to destination. The process, written in python, generates a .csv file of checksums from the source data, then uses an input file containing the folder paths to be uploaded to the cloud via REST APIs to migrate the data. The source md5 checksums are also uploaded. The python code then calls the REST APIs to execute a script in the cloud which compared the source and destination md5s using SAS code. The result of the process is a .pdf report that summarizes the comparison of the source and destination md5 checksums. This process offers a completely automated way to prove data integrity for migration of local source data into a cloud-based clinical repository.

DS-154 : Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
Richann Watson, DataRich Consulting
Elizabeth Dennis, EMB Statistical Solutions, LLC
Karl Miller, IQVIA
Monday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2

For analysis purposes, dataset records are often assigned to an analysis timepoint window rather than simply using the visits or timepoints from the collected data. The rules for analysis timepoint windows are usually defined in the Statistical Analysis Plan (SAP) and can involve complicated derivations to determine which record(s) best fulfils the analysis window requirements. For traceability, there are ADaM standard variables available to help explain how records are assigned to the analysis windows. This paper will explore these ADaM variables and provide examples on how they may be applied.

DS-188 : Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
Wei Shao, Bristol Myers Squibb
Xiaohan Zou, Bristol Myers Squibb
Monday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2

In electronic submission packages, ADaM datasets and Define.xml stand as pivotal components. Ensuring consistency between these elements is critical. However, despite this importance, the current method still heavily depends on manual checks. To address this challenge, we introduce an innovative automated approach driven by ADaM specifications. Our solution involves a suite of SAS® macros engineered to streamline the translation from ADaM specification to both ADaM datasets and Define.xml. These macros orchestrate a seamless automation process, facilitating the generation of ADaM datasets while concurrently fortifying consistency between ADaM datasets and Define.xml. The automated processes include format creation, core variable addition, variable attributes generation, dynamic length adjustment based on actual values, and automatic ADaM specification updates from actual data. These macros act as dynamic tools, constructing datasets with precision, adjusting variable attributes, and most importantly, syncing Define.xml with actual data. Our automated tool system not only expedites ADaM datasets creation but also ensures an inherent consistency with Define.xml. This amalgamation of automation and specification-based integrity significantly reduces manual errors, enhances data quality, and fortifies the efficiency of the submission process.

DS-193 : Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
Inka Leprince, PharmaStat, LLC
Richann Watson, DataRich Consulting
Tuesday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2

In the intricate dance of clinical trials that involve multiple treatment groups and varying dose levels, subjects pirouette through planned treatments – each step assigned with precision. Yet, in the realms of pediatric, oncology, and diabetic trials, the challenge arises when planned doses twirl in the delicate arms of weight adjustments. How can data analysts choreograph Analysis Data Model (ADaM) datasets to capture these nuanced doses? There is a yearning to continue with the normal dance routine of analyzing subjects based on their protocol-specified treatments, yet at times it is necessary to learn a new dance step, so as not to overlook the weight-adjusted doses the subjects actually received. The treatment variables TRTxxP/N in the Subject-Level Analysis Dataset (ADSL) and their partners TRTP/N in Basic Data Structure (BDS) and Occurrence Data Structure (OCCDS) are elegantly designed to ensure each treatment glides into its designated column in the summary tables. But we also need to preserve the weight-adjusted dose level on a subject- and record-level basis. DOSExxP and DOSExxA, gracefully twirl in the ADSL arena, while their counterparts, the dashing DOSEP and DOSEA, lead the waltz in the BDS and OCCDS datasets. Together, these harmonious variables pirouette across the ADaM datasets, capturing the very essence of the weight-adjusted doses in a dance that seamlessly unfolds.

DS-204 : ADaM Discussion Topics: PARQUAL, ADPL, Nadir
Sandra Minjoe, ICON PLC
Tuesday, 2:00 PM – 2:20 PM, Location: Key Ballroom 2

This paper and presentation will cover three topics that have been under varying levels of discussion within the CDISC ADaM team but are not part of the standard. First is the parameter-qualifier variable PARQUAL, which can be found in a couple Therapeutic Area User Guides (TAUGs), went out for public review as part of ADaMIG v1.2, but currently breaks BDS rules because it never made it into a final publication. Second is ADPL, a one-record-per-subject-per-participation dataset that might be useful for studies where subjects can enroll more than once or have multiple screening attempts, similar to the proposed SDTM DC domain. Third is Nadir variables, like Change from Nadir and Percent Change from Nadir, not currently allowed in a BDS structure. In each case, the paper and presentation will summarize CDISC ADaM team discussions and give personal (not CDISC-authorized) recommendations of when and how to implement these concepts in order to meet analysis needs.

DS-205 : A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
Crystal Cheng, SAS
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2

Pinnacle 21 Enterprise is a software provides checks on the data compliance with CDISC standards, control terminology and dictionaries when users preparing clinical data submission to regulatory agencies. By validating clinical data early and frequently during the conduction of the clinical trial, it helps user to discover data issues and address data issues in advanced, ensuring the quality of submission data. There are different ways to execute validations in P21 Enterprise. Users can either manually run the validation via user interface of P21 or, for a more automated process, execute a process flow in SAS life Sciences Analytics Framework (LSAF) to invoke the Enterprise Command Line Interface(ECLI) from P21. Integrating LSAF with P21 and setting up the validation process via a process flow is time-saving for programmers and less prone to errors during packaging and uploading datasets for P21 validation. This paper will focus on the detailed steps to set up the automated process flow of the Pinnacle 21 Validation in SAS Life Science Analytics Framework (LSAF) and explore the benefits of automating the validation process.

DS-271 : Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
Alec McConnell, BMS
Yun Peng
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2

Historically, oncology clinical trials have relied on Overall Survival (OS) and Progression Free Survival (PFS) as primary efficacy endpoints. While OS is often the most desired estimate, it requires many years of follow-up to derive an unbiased estimate from the study. Additionally, even with follow-up, OS estimates are subject to confounding due to subsequent therapies which are commonplace in the treatment of cancer. As a proxy for OS, the EMA has recommended the evaluation of Progression Free Survival 2 (PFS2). According to the EMA, “PFS2 is defined as the time from randomization (or registration, in non-randomized trials) to second objective disease progression, or death from any cause, whichever first.” In spite of this definition, PFS2 requires complex data collection and derivation. Within our oncology team at Bristol-Myers Squibb (BMS), different studies approach the derivation differently. In this paper, we will share how our team at BMS collects the relevant data to derive the PFS2 endpoint with a consistent approach in both the advanced and early settings. Furthermore, we will explain how we structure our ADAM datasets to assist in our derivation of the endpoint.

DS-274 : Guidance Beyond the SDTM Implementation Guide
Kristin Kelly, Pinnacle 21 by Certara
Michael Beers, Pinnacle 21
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2

A common misconception among preparers of SDTM data seems to be that it is sufficient to just follow the SDTM Implementation Guide when creating the datasets. The truth is that it is more complicated than that. A preparer of SDTM datasets needs to be aware of all the industry guidance available when preparing for regulatory submission, from CDISC and the regulatory agencies, but also from other organizations as well. This presentation will discuss some of the lesser-known guidance in the industry and why they should be referenced, as well as some of the impacts of not using these documents in the creation of SDTM datasets.

DS-276 : Your Guide to Successfully Upversioning CDISC Standards
Soumya Rajesh, CSG Llc. – an IQVIA Business
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2

As of 2023, newer versions of the CDISC standards (i.e., SDTM v2.0, SDTMIG v3.4, SDTM v1.7, SDTM IG v3.3, and Define.xml v2.1) are either required or supported by the industry’s regulatory agencies. This paper relays challenges and best practices the authors have experienced while up-versioning to these standards. Not all these practices are found in published standards. This paper will bring together the resources and lessons learned in one place, so that readers can skillfully navigate through the challenges of adopting these new standards. Highlights include strategies for dealing with value level metadata for variables with multiple codelist references, a new domain class, new domains, and domains referenced in TAUGs not seen in the IGs. We’ll discuss best practices for data modeling: when to use new variables, supplemental qualifiers, and targeting the appropriate domains. We’ll include experiences interpreting and dispositioning validation output from the applicable conformance rules.

DS-280 : I Want to Break Free: CRF Standardization Unleashing Automation
Laura Fazio, Formation Bio
Andrew Burd, Formation Bio
Emily Murphy, Formation Bio
Melanie Hullings, Formation Bio
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2

Achieving efficient and impactful Case Report Form (CRF) standardization in the pharmaceutical industry demands intense cross-functional collaboration and a shared understanding of the benefits. This foundation is crucial for improved data quality as well as downstream analysis and reporting automation. Deviations from standards cause manual review, increased errors, and added inefficiencies in downstream code development. To address these challenges, an internal Standards Committee led by Data Management and Systems Analytics teams was formed to gain diverse cross-functional alignment through a comprehensive charter. The charter mandates that study teams adhere to standards during study startup, with deviations requiring justification and approval from the Committee. While CRF standards are typically developed by Medical and Clinical teams, we additionally include roles with a focus on downstream analysis and reporting including our Data Science, Statistical Programming, and Clinical Analytics teams. This paper advocates for an inclusive approach to standards development, emphasizing that resulting datasets should be versatile for all downstream purposes. Such an approach unlocks the power of automation, minimizes reactivity, and fosters efficiency and continuity across clinical studies.

DS-287 : ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
Lihui Deng, Bristol Myers Squibb
Kylie Fan, BMS
Jia Li, BMS
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2

Unlike other types of solid tumors that use the RECIST 1.1 tumor response criteria, due to the particularity of prostate cancer, some common oncology efficacy endpoints, such as rPFS, ORR, time to response, and duration of response are usually based on the PCWG3 criteria. Additionally, other specific prostate cancer endpoints like PSA response rate and time to PSA progression are also based on PCWG3, involving more complex data collection and derivation than RECIST 1.1. In this paper, we will share efficacy endpoints in prostate cancer, such as PSA response and time to PSA progression. We will explain the ADaM design and data flow, and how to ensure traceability and data dependency in derivation. We successfully implemented programming for these complex endpoints, enhancing the speed and quality of effective analysis through the development of macros.

DS-305 : Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
Vibhavari Honrao, NMIMS University, Mumbai
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2

Demographic Dataset is the parent dataset which includes set of essential standard variables that describe each subject in a clinical study. One of these key variables is Unique Subject Identifier (USUBJID). SDTM IG does not provide any guidance on creation of USUBJID for pooled studies. Hence it becomes necessary to understand programming steps involved for statistical programmers. In clinical trials, there are cases wherein subjects are re-enrolled for different studies for a same compound, and it can be difficult to identify the subject while maintaining CDISC compliance. For ISS analysis, pooling of studies becomes challenging due to multiple SUBJID, RFICDTC, RFSTDTC, RFENDTC etc. within same USUBJID from different studies. This paper demonstrates various steps and programming logics involved to develop Demographic Dataset by taking hypothetical examples from multiple studies and creates pooled datasets.

DS-310 : Converting FHIR to CDASH using SAS
Pritesh Desai, sas
Mary Liang, SAS
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2

With the growing diversity of standards for collecting and presenting Real World Evidence (RWE), there is an escalating demand for the conversion of these standards into more actionable datasets. This paper demonstrates the transformation from FHIR (Fast Healthcare Interoperability Resource) to CDASH using various methods within SAS Viya. The outlined methods are easily adaptable to other standards or datasets initially presented in JSON format. Moreover, recognizing the need for accessible processes, we will highlight the creation of low/no code procedures to enhance access to these updated datasets, including the transformation of conversion work into SAS Viya Custom Steps.

DS-342 : CDISC Therapeutic Area User Guides and ADaM Standards Guidance
Karin LaPann, CDISC independent contractor
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2

One of the frequently overlooked yet immensely valuable resources for implementing standards are the CDISC Therapeutic Area User Guides (TAUGs). Presently the CDISC website hosts 49 of these guides, 23 of which incorporate ADaM sections. These guides are created by groups of CDISC standards volunteers across the industry and include medical professionals and researchers with experience in the respective disease areas. The first few years of development, these TAUGs concentrated on the collection of the data and the implementation of the SDTM to contain it. In 2014 the first TAUG with an analysis section using ADaM was published. Many TAUGs are developed with additional implementation of the analysis datasets, with ADaM compliant examples. This provides a utility to the programming community to illustrate how the SDTM datasets are further arranged for analysis. The latest initiative has been to expand these TAUGs through grants by organizations representing various diseases. One of these is the recently released Rare Diseases Therapeutic Area User Guide, partially sponsored by a grant from the National Organization for Rare Disorders (NORD) https://rarediseases.org/. This paper will describe the TAUGs developed with ADaM standards, highlighting their distinctions from prior versions. We will suggest how to use the TAUGs as a reference for conducting studies within various disease areas.

DS-353 : Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
Anbu Damodaran, Alexion Pharmaceuticals
Ram Gudavalli, Alexion Pharmaceuticals
Kumar Bhimavarapu, Alexion Pharmaceuticals
Tuesday, 11:00 AM – 11:20 AM, Location: Key Ballroom 2

This paper investigates the impact of continuous database updates during ongoing studies, particularly emphasizing EDC migrations and Protocol amendments. Through examination of practical examples, it reveals the cascading effects on CDISC datasets, as well as the resulting modifications in reporting. Moreover, the paper scrutinizes the downstream impacts of subject transfers across studies or sites, uncovering intricacies related to re-screening subjects who initially did not meet inclusion/exclusion criteria. By unraveling the complexities of these processes, the paper offers valuable insights to improve data integrity and ensure compliance with regulatory guidelines in clinical research.

DS-360 : A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
Swaroop Kumar Koduri, Ephicacy Lifescience Analytics Pvt Ltd
Shashikant Kumar, Ephicacy Lifescience Analytics
Sathaiah Sanga, Ephicacy Lifescience Analytics
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 2

Cancer is a disease where some of the body’s cells mutate, grow out of control, and spread to other body parts. The mutated cells possess the ability to infiltrate and destroy healthy body tissue all over the body. Liquid Tumors (Blood Cancer) are commonly occurring in bone marrow and the lymphatic system. In oncology clinical trials, response and progression is key to measuring survival and remission rates. In accordance with the response criteria guidelines, oncology studies are also divided into one of three subtypes. The first sub type, Solid Tumor study, usually follows RECIST (Response Evaluation Criteria in Solid Tumor) or irRECIST (immune-related RECIST). The second sub type, Lymphoma study, usually follows Cheson 1997 or 2007. Lastly, Leukemia studies follow study specific guidelines (e.g., IWCLL for Chronic Lymphocytic Leukemia). This paper will focus on the blood cancers (Lymphoma and Leukemia) also specifically show with examples SDTM and ADaM domains are used to collect the different data points in each type. This paper will show how standards are used to capture disease response and CDISC will streamline the development of clinical trial artifacts in liquid oncology studies.

DS-367 : Handling of Humoral and Cellular Immunogenicity Data in SDTM
Wei Duan, Moderna Therapeutics
Tuesday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2