PharmaSUG 2024 Paper Presentations
Paper presentations are the heart of a PharmaSUG conference. Here is the list including the next batch of confirmed paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 14-May-2024.
Sections
Advanced Programming
Data Standards
Data Visualization and Reporting
Hands-on Training
| Paper No. | Author(s) | Paper Title (click for abstract) |
| HT-101 | Mathura Ramanathan & Nancy Brucken |
Deep Dive into the BIMO (Bioresearch Monitoring) Package Submission |
| HT-111 | Bart Jablonski | A Gentle Introduction to SAS Packages |
| HT-118 | Philip Holland | The Art of Defensive SAS Programming |
| HT-143 | Charu Shankar | The New Shape Of SAS Code |
| HT-152 | Phil Bowsher | GenAI to Enhance Your Statistical Programming |
| HT-157 | Jayanth Iyengar | Understanding Administrative Healthcare Datasets using SAS ‘ programming tools. |
| HT-197 | Dan Heath | Building Complex Graphics from Simple Plot Types |
| HT-201 | Ashley Tarasiewicz & Chelsea Dickens |
Transitioning from SAS to R |
| HT-413 | Richann Watson & Josh Horstman |
Complex Custom Clinical Graphs Step by Step with SAS® ODS Statistical Graphics |
| HT-459 | Troy Hughes | Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures |
Leadership Skills
Metadata Management
| Paper No. | Author(s) | Paper Title (click for abstract) |
| MM-225 | Kang Xie | Variable Subset Codelist |
| MM-226 | Jeetender Chauhan & Madhusudhan Ginnaram & Sarad Nepal & Jaime Yan |
Methodology for Automating TOC Extraction from Word Documents to Excel |
| MM-240 | Avani Kaja | Managing a Single Set of SDTM and ADaM Specifications across All Your Phase 1 Trials |
| MM-245 | Trevor Mankus | Relax with Pinnacle 21’s RESTful API |
| MM-267 | Xiangchen Cui & Min Chen & Jessie Wang |
A Practical Approach to Automating SDTM Using a Metadata-Driven Method That Leverages CRF Specifications and SDTM Standards |
| MM-358 | Lakshmi Mantha & Purvi Kalra & Arunateja Gottapu |
Optimizing Clinical Data Processes: Harnessing the Power of Metadata Repository (MDR) for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Efficacy (ISE) |
| MM-447 | Vandita Tripathi & Manas Saha |
Automating third party data transfer through digitized Electronic DTA Management |
Real World Evidence and Big Data
Solution Development
Statistics and Analytics
Strategic Implementation & Innovation
Submission Standards
ePosters
Abstracts
Advanced Programming
AP-102 : Creating Dated Archives Automatically with SAS®
Derek Morgan, Bristol Myers Squibb
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4
When creating patient profiles, it can be useful for clinical scientists to compare current data with previous data in real time without having to request those data from an Information Technology (IT) source. This is a method for using SAS® to perform the archiving via a scheduled daily job. The primary advantage of SAS over an operating script is its date handling ability, removing many difficult calculations in favor of intervals and functions. This paper details an application that creates dated archive folders and copies SAS data sets into those dated archives, with automated aging and deletion of old data and folders. The application allows clinical scientists to customize their archive frequency (within certain limits.) It also keeps storage requirements to a minimum as defined by IT. This replaced a manual process that required study programmers to create the archives, eliminating the possibility of missed or incorrectly dated archives. The flexibility required for this project and the conditions under which it ran required using SAS date and time intervals and their functions. SAS was used to manipulate the files and directories.
AP-108 : Macro Variable Arrays Made Easy with macroArray SAS package
Bart Jablonski, yabwon
Monday, 10:30 AM – 11:20 AM, Location: Key Ballroom 4
A macro variable array is a jargon term for a list of macro variables with a common prefix and numerical suffixes. Macro arrays are valued by advanced SAS programmers and often used as “driving” lists, allowing sequential metadata for complex or iterative programs. Use of macro arrays requires advanced macro programming techniques based on indirect reference (aka, using multiple ampersands &&), which may intimidate less experienced programmers. The aim of the paper is to introduce the macroArray SAS package. The package facilitates a solution that makes creation and work with macro arrays much easier. It also provides a “DATA-step-arrays-like” interface that allows use of macro arrays without complications that arise from indirect referencing. Also, the concept of a macro dictionary is presented, and all concepts are demonstrated through use cases and examples.
AP-135 : LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4
This paper provides an overview of six SAS CALL subroutines that are frequently used by SAS® programmers but are less well-known than SAS functions. The six CALL subroutines are CALL MISSING, CALL SYMPUTX, CALL SCAN, CALL SORTC/SORTN, CALL PRXCHANGE, and CALL EXECUTE. Instead of using multiple IF-THEN statements, the CALL MISSING subroutine can be used to quickly set multiple variables of various data types to missing. CALL SYMPUTX creates a macro variable that is either local or global in scope. CALL SCAN looks for the nth word in a string. CALL SORTC/SORTN is used to sort a list of values within a variable. CALL PRXCHANGE can redact text, and CALL EXECUTE lets SAS write your code based on the data. This paper will explain how those six CALL subroutines work in practice and how they can be used to improve your SAS programming skills.
AP-138 : An Introduction to the SAS Transpose Procedure and its Options
Timothy Harrington, Navitas Data Sciences
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 4
PROC TRANSPOSE is a SAS(r) procedure for arranging the contents of a dataset column from a vertical to a horizontal layout based on selected BY variables. This procedure is particularly useful for efficiently manipulating clinical trials data with a large number of observations and groupings as is often found in laboratory analysis or vital signs data. The use of PROC TRANSPOSE is illustrated with examples showing different modes of arranging the data. Possible problems which can occur when using this procedure, and their solutions are also discussed.
AP-144 : SAS® Super Duo: The Program Data Vector and Data Step Debugger
Charu Shankar, SAS Institute
Tuesday, 11:00 AM – 11:50 AM, Location: Key Ballroom 4
Whether you are a self-taught SAS learner with a lot of experience, or a novice just entering the SAS universe, you may not have spent a lot of time delving into two fantastic SAS® superpowers. The Program Data Vector (PDV) is where SAS processes one observation at a time, in memory. The Data Step Debugger is an excellent tool to actually see the observation being held in memory and watch the movement of data from input to memory to output. Combining these two tools supplies SAS practitioners a lot of utility to “get under the hood” of how SAS code works in practice to ingest and analyze data during program operations. Once you know the specifics of what happens during compile time / execution, joins, and creating arrays, efficient SAS code will be at your fingertips. Action packed with animations, live demos and a great hands on section, this presentation will likely be a resource that you will use and reuse now and in the future
AP-175 : Tips for Completing Macros Prior to Sharing
Jeffrey Meyers, Regeneron Pharmaceuticals
Monday, 4:00 PM – 4:20 PM, Location: Key Ballroom 4
SAS macros are a programmer’s best friend when written well, and their worst nightmare when not. Macros are a powerful tool within SAS for automating complicated analyses or completing repetitive tasks. The next step after building a capable tool is to share it with others. The creator of the macro does not have much time to catch the attention of the user. The user encountering multiple errors, no documentation or guides, and lack of intuitive features pushes the user away from the macro. This paper will focus on completing a macro to give the user the best possible experience prior to sharing.
AP-191 : Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
Songgu Xie, Regeneron Pharmaceuticals
Michael Pannucci, Arcsine Analytics
Weiming Du, Alnylam Pharmaceuticals
Huibo Xu, Greenwich High School
Toshio Kimura, Arcsine Analytics
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4
Generative artificial intelligence using large language models (LLMs) such as ChatGPT is an emerging trend. However, discussions using LLMs in biostatistics and statistical programming have been somewhat limited. This paper provides a comprehensive evaluation of major LLMs (ChatGPT, Bing AI, Google BARD, Anthropic Claude 2) in their utility within biostatistics and statistical programming (SAS and R). We tested major LLMs across several challenges: 1) Conceptual Knowledge, 2) Code Generation, 3) Error Catching/Correcting, 4) Code Explanation, and 5) Programming Language Translation. Within each challenge, we asked easy, medium and advanced difficulty level questions related to three topics: Data, Statistical Analysis, and Display Generation. After providing the same prompts to each LLM, responses were captured and evaluated. For some prompts, LLMs provided incorrect responses, also known as “hallucinations.” Although LLMs replacing biostatisticians and statistical programmers may be overhyped, there are nevertheless use cases where LLMs are helpful in assisting statistical programmers.
AP-212 : R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
Samiul Haque, SAS Institute
Jim Box, SAS Institute
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 4
The integration of different programming languages and tools is pivotal for translational data science. R Shiny is the most popular tool for building web applications in R. However, biostatisticians and data scientists often prefer to leverage SAS Procs or macros for clinical decision making. The world of R Shiny and SAS does not need to be decoupled. R Shiny applications can incorporate SAS procs and analytics. In this work, we present mechanisms for integrating R Shiny and SAS. We demonstrate how SAS Procs and macros can be executed from R Shiny front end and SAS logs and results can be printed within Shiny App.
AP-218 : Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
Xinran Luo, Everest Clinical Research
Weijie Yang, Everest Clinical Research
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4
AI language like ChatGPT has impressed and even intimidated programmers. There are discussions of ChatGPT with examples of simple SAS steps and there are descriptions of various usages of ChatGPT without examples, but few papers discuss the use of ChatGPT in SAS macro development with examples. This paper explores the utility of ChatGPT in enhancing the process of writing SAS macros from scratch, using an example of checking SAS log in batch on Windows, and comparing the process of using conventional search engines. The focus is not only on utilizing ChatGPT’s capabilities to provide programmers with initial ideas of program structure when they encounter unusual work requests, but also on demonstrating its application in developing a robust macro by showing key steps of the conversations between programmers and ChatGPT. Although ChatGPT proves invaluable in offering insights and suggestions, it’s imperative to acknowledge certain caveats. Not all responses provided by ChatGPT are infallible, especially in the context of technical domains like SAS programming. Emphasizing the importance of independent verification, this paper underscores the need for users, especially new learners of SAS, to scrutinize and validate the suggestions before implementation. This paper aims to empower SAS practitioners by showcasing how ChatGPT can complement their macro-writing endeavors. By highlighting both the potentials and limitations of leveraging AI language models like ChatGPT, this paper contributes to fostering a balanced and discerning approach towards utilizing AI-driven assistance in SAS programming and macro development.
AP-229 : Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
Vicky Yuan, Incyte Coperation
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 4
Shift summary of laboratory values in CTCAE grade to the worst grade abnormal value is often required for most laboratory data analysis and submission. The purpose of CTCAE grade shift table is to presents how the results are varying from the baseline to post-base visits in the study. This paper will illustrate how to report a shift table using R and packages from the SASSY system. It will start from an example and explain the anatomy, then a step-wise explanation of how to report the table in .doc file. The example is interesting because it contains “internal” footnotes that can change on every page. The R product used in this paper is R SASSY package version 1.2.0 running on RStudio environment
AP-252 : Externally Yours – Adeptly Managing Data Outside Your EDC System
Frank Canale, SoftwaRx, LLC
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4
Programmers in the pharmaceutical industry are used to working with data that is entered into, and extracted from, a system commonly known as an EDC (Electronic Data Capture) system. When using data that is sourced from one of these systems, you can reliably count on the type of data you’ll receive (normally SAS datasets), and if the EDC is set up well, a standard structure that provides output data containing CDISC/CDASH variable names. But what does one do when receiving data that is sourced outside the EDC system and received from other vendors? How do you manage this data- retrieve it- validate the structure- even export it to a format allowing you to merge it with other more conventional SAS datasets?
AP-253 : Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
James Austrow, Cleveland Clinic
Monday, 5:00 PM – 5:20 PM, Location: Key Ballroom 4
The PDF is one of the most ubiquitous file formats and can be read on nearly every computing platform. So how, in the year 2024, can it still be so inconvenient to perform basic editing tasks such as concatenating and merging files, inserting page numbers, and creating bookmarks? These features are often locked behind paid licenses in proprietary software or require that the documents be uploaded to a web server, the latter of which poses unacceptable security risks. In fact, the PDF is a public standard and there exist free, open-source libraries that make it easy to build in-house solutions for these and many other common use cases. In this paper, we demonstrate how to use Python to assemble and customize PDF documents into a final, polished deliverable. We will also lay the foundation for automating these tasks, which can save countless hours on reports that have to be prepared on a regular basis.
AP-256 : Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
Ian Sturdy, Eli Lilly and Company
Tuesday, 3:00 PM – 3:20 PM, Location: Key Ballroom 4
This paper explores the potential benefits of incorporating ChatGPT, a state-of-the-art natural language processing model, in statistical programming within the pharmaceutical industry. By leveraging ChatGPT’s capabilities, this technology can save time, money, and most importantly, your sanity. Programming often leads to frustration, anxiety, and sleepless nights trying to solve complex problems. Various practical applications and techniques that harness the power of ChatGPT will be described to reduce all of these. In a world where Artificial Intelligence threatens to take our jobs, this paper suggests methods of tapping into the untapped potential of ChatGPT to empower programmers with innovative tools, thereby increasing our value. When programming issues arise, no longer will you need to worry about judgement or hostility from others on online forums. ChatGPT is a powerful tool we have yet to fully leverage, and its benefits extend well beyond our imaginations, let alone this paper.
AP-268 : A New Approach to Automating the Creation of the Subject Visits (SV) Domain
Xiangchen Cui, Crisprtx Therapeutics
Jessie Wang, CRISPR Therapeutics
Min Chen, CRISPR Therapeutics
Tuesday, 1:30 PM – 2:20 PM, Location: Key Ballroom 4
The creation of the subject visits (SV) domain is one of the most challenging tasks of SDTM programming. Aside from the small portion of mapping from raw dataset variables to SV variables, SV programming mainly consists of a more complex derivation process, which is totally different from that of other SDTM domains. The dynamic parts of the SV programming process, such as identifying raw datasets and their variables with both date/time and clinical visits, cause manual development of a SAS program to be time-consuming and error prone. Hence, automating its code generation would achieve and enhance efficiency and accuracy. This paper will present a new approach for SV automation based on the SDTM automation done in our previous paper, which leveraged CRF specifications from an EDC database and SDTM standards [1]. It will introduce the standard SV programming logic flow with 10 sequential steps, which leads us to develop an additional SAS-based macro named %SV_Code_Generator as an expansion to the macro introduced in [1]. The output of this macro (SV.sas) achieves 100% automation of SV domain for the raw data collected per CRFs in a clinical study. This new approach guarantees all raw dataset variables related to subject visits are accounted for in SV programming thanks to the sequential programming automations. This automation allows for the generation of SV dataset to occur very early in the programming development cycle and makes developing programmatic quality checks for clinical data review and data cleaning more efficient and economically feasible.
AP-289 : Programming with SAS PROC DS2: Experience with SDTM/ADaM
Jianfeng Wang, UNIVERSITY OF MINNESOTA TWIN CITIES, MINNEAPOLIS, MINNESOTA
Li Cheng, Vertex Pharmaceuticals Inc.
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 4
PROC DS2 is a procedure introduced with SAS Base 9.4. This procedure provides opportunities for SAS programmers to apply Object Oriented Programming (OOP) and multithread techniques in SAS programming and is a critical connection between the – traditional’ SAS programming and programming in SAS Viya platform. The goal of this paper is to pilot the use of PROC DS2 in the work of preparing clinical trial CDISC datasets. In this paper, PROC DS2 is tested in the programming of SDTM/ADaM on a server with SAS Base 9.4 M3 release. After converting SDTM/ADaM programs written in – traditional’ SAS programming language into the PROC DS2 code, this paper presents the lessons learned and the notes taken when the obstacles are overcome or bypassed. Furthermore, OOP and multithread techniques are explored to apply into the programming for SDTM/ADaM. Programming setups with a standard folder structure are discussed and the performance of using OOP and multithread techniques are also evaluated.
AP-295 : Replicating SAS® Procedures in R with the PROCS Package
David Bosak, r-sassy.org
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 4
The “procs” package aims to simulate some commonly used SAS® procedures in R. The purpose of simulating SAS procedures is to make R easier to use and match statistical results. Another important motivation is to provide stable tools to work with in the pharmaceutical industry. The package replicates several of the most frequently used procedures, such as PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG. The package also contains some data manipulation procedures like PROC TRANSPOSE and PROC SORT. This paper will present an overview of the package and provide demonstrations for each function.
AP-298 : Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
Huitong Niu, Master of Science Student, Biostatistics, Fielding School of Public Health, University of California, Los Angeles
Yan Wang, Adjunct Assistant Professor, Public and Population Health, School of Dentistry, University of California, Los Angeles
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 4
This paper examines effective techniques for merging longitudinal datasets with key variable inaccuracies, focusing on date errors. Traditional SAS methods, like the DATA Step MERGE or PROC SQL JOIN, require exact matches on key variables, which is challenging in datasets containing errors. Our paper compares fuzzy matching and clustering analysis within SAS, assessing their effectiveness in reconciling datasets with inconsistencies in date variables. We simulate a longitudinal dataset of approximately 2,000 observations, representing about 500 patients with repeated measurements. The dataset is used to simulate two datasets including normally (or uniformly) distributed errors on date, manually introduced errors (e.g., typing “12” as “21”), and missing date information (e.g., entering “06/23” instead of “12/06/2023”). For each scenario, we use fuzzy matching and clustering analysis to merge two datasets, evaluating the accuracy of each technique. Preliminary results show varied effectiveness depending on the type of error on the date variable. For datasets with normally (or uniformly) distributed errors on date, clustering analysis significantly outperforms fuzzy matching with a 94.9% accuracy rate compared to 54.1%. In the case of manually introduced errors, both methods achieve high accuracy, around 98%. However, for datasets with missing date information, fuzzy matching is more effective, attaining an 84.4% accuracy rate as opposed to 45.2% for clustering analysis. The paper concludes with a discussion of these findings, offering insights for researchers on selecting appropriate methods for merging datasets with errors on date.
AP-349 : Just Stringing Along: FIND Your Way to Great User-Defined Functions
Richann Watson, DataRich Consulting
Louise Hadden, Abt Global Inc.
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 4
SAS® provides a vast number of functions and subroutines (sometimes referred to as CALL routines). These useful scripts are an integral part of the programmer’s toolbox, regardless of the programming language. Sometimes, however, pre-written functions are not a perfect match for what needs to be done, or for the platform that required work is being performed upon. Luckily, SAS has provided a solution in the form of the FCMP procedure, which allows SAS practitioners to design and execute User-Defined Functions (UDFs). This paper presents two case studies for which the character or string functions SAS provides were insufficient for work requirements and goals and demonstrate the design process for custom functions and how to achieve the desired results.
AP-361 : Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
Chary Akmyradov, Arkansas Children’s Research Institute
Monday, 10:00 AM – 10:20 AM, Location: Key Ballroom 4
This paper delves into the optimization of repetitive tasks in SAS programming, a common challenge faced by data analysts and programmers. The primary focus is on harnessing the power of SAS macro programming techniques, specifically through the implementation of do loops within macros. Initially, the paper introduces the basics of SAS macros, outlining their significance in automating repetitive sequences of code, and providing a foundational understanding of macro variables and syntax. The discussion then progresses to the implementation of simple do loops within macros, highlighting their practicality in routine data manipulation tasks. Through a series of practical examples and use-case scenarios, the paper demonstrates the effectiveness of these loops in real-world applications. Addressing the limitations of these simple implementations, the paper further explores the generalization of do loops, presenting advanced methods to create dynamic, parameter-driven macros capable of handling a variety of tasks and parameters. This advanced approach is exemplified through complex scenarios and case studies, showcasing the adaptability and efficiency of generalized do loops in diverse data analysis contexts. By the conclusion, the paper provides a comprehensive insight into the role of macro programming in SAS, offering a valuable resource for SAS programmers seeking to streamline their coding workflow and enhance efficiency in data processing tasks. This work not only serves as a practical guide for current SAS users but also contributes to the broader conversation on the future of macro programming in data analysis.
AP-420 : Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
Adam Yates, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Misti Paudel, Brigham and Women’s Hospital Division of Rheumatology, Inflammation, and Immunity, Harvard School of Medicine
Fengming Hu, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Monday, 2:00 PM – 2:50 PM, Location: Key Ballroom 4
Synthetic data for clinical trials independent of human participants has growing utility in clinical and epidemiologic fields, but a persistent concern has been the viability and reliability of producing synthetic data which conforms to the complex nature of biomedical data. Recent successes in synthetic clinical trial data include the use of Synthetic Control Arm (SCA) applications, but the generation of treatment-related data necessarily faces additional scrutiny. While synthetic data cannot replace trail data for scientific discovery, planning and development phases of clinical trials can benefit from the use of synthetic treatment and control data. This paper describes a novel program developed in base SAS which generates synthetic data that was used in clinical trial development, design, and report programming. We developed a stochastically grounded process which generates synthetic data of population-specific enrollment characteristics, as well as longitudinal local and systematic reactogenicity, unsolicited events, and adverse events. We implement a discrete-time Markov process framework to generate longitudinal observation time, incorporating a Poisson-based probability of events within each state. This 2-phase stochastic generation process results in across observation time which conforms to biologically natural and realistic behaviors. Key to our process is that reaction frequency may be modulated based on expert experience or historical expectations, but the generated data do not rely directly on existing clinical data. Potential applications and extensions in a machine learning context will be discussed. This paper is intended for individuals with an interest in clinical trial data and a basic to intermediate command of SAS Macro processing.
AP-424 : Adding the missing audit trail to R
Magnus Mengelbier, Limelogic AB
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 4
The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned. The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting. We also briefly consider how the same principles can be extended to other languages such as Python, SAS, Java, etc.
Data Standards
DS-109 : Analyzing your SAS log with user defined rules using an app or macro.
Philip Mason, Wood Street Consultants
Monday, 3:00 PM – 3:20 PM, Location: Key Ballroom 2
SAS provide some pretty basic help with logs that are produced, typically just linking to errors and warnings. Many people build log checkers to look for particular things of interest in their logs, which usually involves capturing the log and then running some SAS code against it. I made a way to define rules in JSON format which can be read by a SAS macro and used to look for things in a log. This means different rules can be used for different use cases. They can be used via a macro or via a web application I build. The web app can switch between rules, provides summaries, draws diagrams of the code, provides performance stats, and more. Hopefully this functionality might one day be built into SAS, but in the meantime it works well as an addition.
DS-130 : SDTM Specifications and Datasets Review Tips
Wanchian Chen, AstraZeneca
Monday, 4:30 PM – 4:50 PM, Location: Key Ballroom 2
SDTM requirements are spread across various sources such as SDTM Implementation Guide (SDTMIG) domain specifications section, SDTMIG domain assumptions section, and FDA Study Data Technical Conformance Guide. While Pinnacle 21 can assist in identifying issues with SDTM data, it is important to note that data is often limited at the early stages of a study. The most efficient process would be to review SDTM specifications before the creation of SDTM programs, to minimize program modifications and save time. Programmers often seek guidance on conducting a comprehensive review of SDTM but unsure where to start. In this presentation, I will provide a concise summary of frequently seen, domain specific as well as general, findings observed in multiple studies when reviewing SDTM. I will show which issues can be seen in the Pinnacle 21 report and which ones are missed. I will also cover situations where variables are not applicable to your study, but still may pass Pinnacle 21 checks. This presentation is designed to benefit programmers involved in SDTM review process.
DS-150 : Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
Laura Elliott, SAS Institute Inc.
Ben Bocchicchio, SAS
Monday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2
Utilization of a cloud-based repository has become increasingly more common with large clinical trials. Verifying the integrity of data moved into the cloud for clinical trials is of utmost importance. Normally, this process requires manual intervention to verify the local source data matched the data stored in the cloud-based system. This paper discusses a process that will automate the creation of a verification report comparing md5 checksums from source to destination. The process, written in python, generates a .csv file of checksums from the source data, then uses an input file containing the folder paths to be uploaded to the cloud via REST APIs to migrate the data. The source md5 checksums are also uploaded. The python code then calls the REST APIs to execute a script in the cloud which compared the source and destination md5s using SAS code. The result of the process is a .pdf report that summarizes the comparison of the source and destination md5 checksums. This process offers a completely automated way to prove data integrity for migration of local source data into a cloud-based clinical repository.
DS-154 : Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
Richann Watson, DataRich Consulting
Elizabeth Dennis, EMB Statistical Solutions, LLC
Karl Miller, IQVIA
Monday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2
For analysis purposes, dataset records are often assigned to an analysis timepoint window rather than simply using the visits or timepoints from the collected data. The rules for analysis timepoint windows are usually defined in the Statistical Analysis Plan (SAP) and can involve complicated derivations to determine which record(s) best fulfils the analysis window requirements. For traceability, there are ADaM standard variables available to help explain how records are assigned to the analysis windows. This paper will explore these ADaM variables and provide examples on how they may be applied.
DS-188 : Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
Wei Shao, Bristol Myers Squibb
Xiaohan Zou, Bristol Myers Squibb
Monday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2
In electronic submission packages, ADaM datasets and Define.xml stand as pivotal components. Ensuring consistency between these elements is critical. However, despite this importance, the current method still heavily depends on manual checks. To address this challenge, we introduce an innovative automated approach driven by ADaM specifications. Our solution involves a suite of SAS® macros engineered to streamline the translation from ADaM specification to both ADaM datasets and Define.xml. These macros orchestrate a seamless automation process, facilitating the generation of ADaM datasets while concurrently fortifying consistency between ADaM datasets and Define.xml. The automated processes include format creation, core variable addition, variable attributes generation, dynamic length adjustment based on actual values, and automatic ADaM specification updates from actual data. These macros act as dynamic tools, constructing datasets with precision, adjusting variable attributes, and most importantly, syncing Define.xml with actual data. Our automated tool system not only expedites ADaM datasets creation but also ensures an inherent consistency with Define.xml. This amalgamation of automation and specification-based integrity significantly reduces manual errors, enhances data quality, and fortifies the efficiency of the submission process.
DS-193 : Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
Inka Leprince, PharmaStat, LLC
Richann Watson, DataRich Consulting
Tuesday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2
In the intricate dance of clinical trials that involve multiple treatment groups and varying dose levels, subjects pirouette through planned treatments – each step assigned with precision. Yet, in the realms of pediatric, oncology, and diabetic trials, the challenge arises when planned doses twirl in the delicate arms of weight adjustments. How can data analysts choreograph Analysis Data Model (ADaM) datasets to capture these nuanced doses? There is a yearning to continue with the normal dance routine of analyzing subjects based on their protocol-specified treatments, yet at times it is necessary to learn a new dance step, so as not to overlook the weight-adjusted doses the subjects actually received. The treatment variables TRTxxP/N in the Subject-Level Analysis Dataset (ADSL) and their partners TRTP/N in Basic Data Structure (BDS) and Occurrence Data Structure (OCCDS) are elegantly designed to ensure each treatment glides into its designated column in the summary tables. But we also need to preserve the weight-adjusted dose level on a subject- and record-level basis. DOSExxP and DOSExxA, gracefully twirl in the ADSL arena, while their counterparts, the dashing DOSEP and DOSEA, lead the waltz in the BDS and OCCDS datasets. Together, these harmonious variables pirouette across the ADaM datasets, capturing the very essence of the weight-adjusted doses in a dance that seamlessly unfolds.
DS-204 : ADaM Discussion Topics: PARQUAL, ADPL, Nadir
Sandra Minjoe, ICON PLC
Tuesday, 2:00 PM – 2:20 PM, Location: Key Ballroom 2
This paper and presentation will cover three topics that have been under varying levels of discussion within the CDISC ADaM team but are not part of the standard. First is the parameter-qualifier variable PARQUAL, which can be found in a couple Therapeutic Area User Guides (TAUGs), went out for public review as part of ADaMIG v1.2, but currently breaks BDS rules because it never made it into a final publication. Second is ADPL, a one-record-per-subject-per-participation dataset that might be useful for studies where subjects can enroll more than once or have multiple screening attempts, similar to the proposed SDTM DC domain. Third is Nadir variables, like Change from Nadir and Percent Change from Nadir, not currently allowed in a BDS structure. In each case, the paper and presentation will summarize CDISC ADaM team discussions and give personal (not CDISC-authorized) recommendations of when and how to implement these concepts in order to meet analysis needs.
DS-205 : A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
Crystal Cheng, SAS
Tuesday, 2:30 PM – 2:50 PM, Location: Key Ballroom 2
Pinnacle 21 Enterprise is a software provides checks on the data compliance with CDISC standards, control terminology and dictionaries when users preparing clinical data submission to regulatory agencies. By validating clinical data early and frequently during the conduction of the clinical trial, it helps user to discover data issues and address data issues in advanced, ensuring the quality of submission data. There are different ways to execute validations in P21 Enterprise. Users can either manually run the validation via user interface of P21 or, for a more automated process, execute a process flow in SAS life Sciences Analytics Framework (LSAF) to invoke the Enterprise Command Line Interface(ECLI) from P21. Integrating LSAF with P21 and setting up the validation process via a process flow is time-saving for programmers and less prone to errors during packaging and uploading datasets for P21 validation. This paper will focus on the detailed steps to set up the automated process flow of the Pinnacle 21 Validation in SAS Life Science Analytics Framework (LSAF) and explore the benefits of automating the validation process.
DS-271 : Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
Alec McConnell, BMS
Yun Peng
Tuesday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2
Historically, oncology clinical trials have relied on Overall Survival (OS) and Progression Free Survival (PFS) as primary efficacy endpoints. While OS is often the most desired estimate, it requires many years of follow-up to derive an unbiased estimate from the study. Additionally, even with follow-up, OS estimates are subject to confounding due to subsequent therapies which are commonplace in the treatment of cancer. As a proxy for OS, the EMA has recommended the evaluation of Progression Free Survival 2 (PFS2). According to the EMA, “PFS2 is defined as the time from randomization (or registration, in non-randomized trials) to second objective disease progression, or death from any cause, whichever first.” In spite of this definition, PFS2 requires complex data collection and derivation. Within our oncology team at Bristol-Myers Squibb (BMS), different studies approach the derivation differently. In this paper, we will share how our team at BMS collects the relevant data to derive the PFS2 endpoint with a consistent approach in both the advanced and early settings. Furthermore, we will explain how we structure our ADAM datasets to assist in our derivation of the endpoint.
DS-274 : Guidance Beyond the SDTM Implementation Guide
Kristin Kelly, Pinnacle 21 by Certara
Michael Beers, Pinnacle 21
Monday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2
A common misconception among preparers of SDTM data seems to be that it is sufficient to just follow the SDTM Implementation Guide when creating the datasets. The truth is that it is more complicated than that. A preparer of SDTM datasets needs to be aware of all the industry guidance available when preparing for regulatory submission, from CDISC and the regulatory agencies, but also from other organizations as well. This presentation will discuss some of the lesser-known guidance in the industry and why they should be referenced, as well as some of the impacts of not using these documents in the creation of SDTM datasets.
DS-276 : Your Guide to Successfully Upversioning CDISC Standards
Soumya Rajesh, CSG Llc. – an IQVIA Business
Tuesday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2
As of 2023, newer versions of the CDISC standards (i.e., SDTM v2.0, SDTMIG v3.4, SDTM v1.7, SDTM IG v3.3, and Define.xml v2.1) are either required or supported by the industry’s regulatory agencies. This paper relays challenges and best practices the authors have experienced while up-versioning to these standards. Not all these practices are found in published standards. This paper will bring together the resources and lessons learned in one place, so that readers can skillfully navigate through the challenges of adopting these new standards. Highlights include strategies for dealing with value level metadata for variables with multiple codelist references, a new domain class, new domains, and domains referenced in TAUGs not seen in the IGs. We’ll discuss best practices for data modeling: when to use new variables, supplemental qualifiers, and targeting the appropriate domains. We’ll include experiences interpreting and dispositioning validation output from the applicable conformance rules.
DS-280 : I Want to Break Free: CRF Standardization Unleashing Automation
Laura Fazio, Formation Bio
Andrew Burd, Formation Bio
Emily Murphy, Formation Bio
Melanie Hullings, Formation Bio
Tuesday, 9:00 AM – 9:20 AM, Location: Key Ballroom 2
Achieving efficient and impactful Case Report Form (CRF) standardization in the pharmaceutical industry demands intense cross-functional collaboration and a shared understanding of the benefits. This foundation is crucial for improved data quality as well as downstream analysis and reporting automation. Deviations from standards cause manual review, increased errors, and added inefficiencies in downstream code development. To address these challenges, an internal Standards Committee led by Data Management and Systems Analytics teams was formed to gain diverse cross-functional alignment through a comprehensive charter. The charter mandates that study teams adhere to standards during study startup, with deviations requiring justification and approval from the Committee. While CRF standards are typically developed by Medical and Clinical teams, we additionally include roles with a focus on downstream analysis and reporting including our Data Science, Statistical Programming, and Clinical Analytics teams. This paper advocates for an inclusive approach to standards development, emphasizing that resulting datasets should be versatile for all downstream purposes. Such an approach unlocks the power of automation, minimizes reactivity, and fosters efficiency and continuity across clinical studies.
DS-287 : ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
Lihui Deng, Bristol Myers Squibb
Kylie Fan, BMS
Jia Li, BMS
Tuesday, 10:30 AM – 10:50 AM, Location: Key Ballroom 2
Unlike other types of solid tumors that use the RECIST 1.1 tumor response criteria, due to the particularity of prostate cancer, some common oncology efficacy endpoints, such as rPFS, ORR, time to response, and duration of response are usually based on the PCWG3 criteria. Additionally, other specific prostate cancer endpoints like PSA response rate and time to PSA progression are also based on PCWG3, involving more complex data collection and derivation than RECIST 1.1. In this paper, we will share efficacy endpoints in prostate cancer, such as PSA response and time to PSA progression. We will explain the ADaM design and data flow, and how to ensure traceability and data dependency in derivation. We successfully implemented programming for these complex endpoints, enhancing the speed and quality of effective analysis through the development of macros.
DS-305 : Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
Vibhavari Honrao, NMIMS University, Mumbai
Monday, 11:30 AM – 11:50 AM, Location: Key Ballroom 2
Demographic Dataset is the parent dataset which includes set of essential standard variables that describe each subject in a clinical study. One of these key variables is Unique Subject Identifier (USUBJID). SDTM IG does not provide any guidance on creation of USUBJID for pooled studies. Hence it becomes necessary to understand programming steps involved for statistical programmers. In clinical trials, there are cases wherein subjects are re-enrolled for different studies for a same compound, and it can be difficult to identify the subject while maintaining CDISC compliance. For ISS analysis, pooling of studies becomes challenging due to multiple SUBJID, RFICDTC, RFSTDTC, RFENDTC etc. within same USUBJID from different studies. This paper demonstrates various steps and programming logics involved to develop Demographic Dataset by taking hypothetical examples from multiple studies and creates pooled datasets.
DS-310 : Converting FHIR to CDASH using SAS
Pritesh Desai, sas
Mary Liang, SAS
Monday, 8:00 AM – 8:20 AM, Location: Key Ballroom 2
With the growing diversity of standards for collecting and presenting Real World Evidence (RWE), there is an escalating demand for the conversion of these standards into more actionable datasets. This paper demonstrates the transformation from FHIR (Fast Healthcare Interoperability Resource) to CDASH using various methods within SAS Viya. The outlined methods are easily adaptable to other standards or datasets initially presented in JSON format. Moreover, recognizing the need for accessible processes, we will highlight the creation of low/no code procedures to enhance access to these updated datasets, including the transformation of conversion work into SAS Viya Custom Steps.
DS-342 : CDISC Therapeutic Area User Guides and ADaM Standards Guidance
Karin LaPann, CDISC independent contractor
Monday, 8:30 AM – 8:50 AM, Location: Key Ballroom 2
One of the frequently overlooked yet immensely valuable resources for implementing standards are the CDISC Therapeutic Area User Guides (TAUGs). Presently the CDISC website hosts 49 of these guides, 23 of which incorporate ADaM sections. These guides are created by groups of CDISC standards volunteers across the industry and include medical professionals and researchers with experience in the respective disease areas. The first few years of development, these TAUGs concentrated on the collection of the data and the implementation of the SDTM to contain it. In 2014 the first TAUG with an analysis section using ADaM was published. Many TAUGs are developed with additional implementation of the analysis datasets, with ADaM compliant examples. This provides a utility to the programming community to illustrate how the SDTM datasets are further arranged for analysis. The latest initiative has been to expand these TAUGs through grants by organizations representing various diseases. One of these is the recently released Rare Diseases Therapeutic Area User Guide, partially sponsored by a grant from the National Organization for Rare Disorders (NORD) https://rarediseases.org/. This paper will describe the TAUGs developed with ADaM standards, highlighting their distinctions from prior versions. We will suggest how to use the TAUGs as a reference for conducting studies within various disease areas.
DS-353 : Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
Anbu Damodaran, Alexion Pharmaceuticals
Ram Gudavalli, Alexion Pharmaceuticals
Kumar Bhimavarapu, Alexion Pharmaceuticals
Tuesday, 11:00 AM – 11:20 AM, Location: Key Ballroom 2
This paper investigates the impact of continuous database updates during ongoing studies, particularly emphasizing EDC migrations and Protocol amendments. Through examination of practical examples, it reveals the cascading effects on CDISC datasets, as well as the resulting modifications in reporting. Moreover, the paper scrutinizes the downstream impacts of subject transfers across studies or sites, uncovering intricacies related to re-screening subjects who initially did not meet inclusion/exclusion criteria. By unraveling the complexities of these processes, the paper offers valuable insights to improve data integrity and ensure compliance with regulatory guidelines in clinical research.
DS-360 : A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
Swaroop Kumar Koduri, Ephicacy Lifescience Analytics Pvt Ltd
Shashikant Kumar, Ephicacy Lifescience Analytics
Sathaiah Sanga, Ephicacy Lifescience Analytics
Tuesday, 10:00 AM – 10:20 AM, Location: Key Ballroom 2
Cancer is a disease where some of the body’s cells mutate, grow out of control, and spread to other body parts. The mutated cells possess the ability to infiltrate and destroy healthy body tissue all over the body. Liquid Tumors (Blood Cancer) are commonly occurring in bone marrow and the lymphatic system. In oncology clinical trials, response and progression is key to measuring survival and remission rates. In accordance with the response criteria guidelines, oncology studies are also divided into one of three subtypes. The first sub type, Solid Tumor study, usually follows RECIST (Response Evaluation Criteria in Solid Tumor) or irRECIST (immune-related RECIST). The second sub type, Lymphoma study, usually follows Cheson 1997 or 2007. Lastly, Leukemia studies follow study specific guidelines (e.g., IWCLL for Chronic Lymphocytic Leukemia). This paper will focus on the blood cancers (Lymphoma and Leukemia) also specifically show with examples SDTM and ADaM domains are used to collect the different data points in each type. This paper will show how standards are used to capture disease response and CDISC will streamline the development of clinical trial artifacts in liquid oncology studies.
DS-367 : Handling of Humoral and Cellular Immunogenicity Data in SDTM
Wei Duan, Moderna Therapeutics
Tuesday, 1:30 PM – 1:50 PM, Location: Key Ballroom 2