PharmaSUG 2025 Paper Presentations
Paper presentations are the heart of a PharmaSUG conference. Here is the list of confirmed paper selections so far. Papers are organized into 15 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 28-May-2025.
Sections
Advanced Programming
Paper No.Author(s)Paper Title (click for abstract)
Artificial Intelligence and Machine Learning
Paper No.Author(s)Paper Title (click for abstract)
Data Standards
Paper No.Author(s)Paper Title (click for abstract)
Data Visualization and Reporting
Paper No.Author(s)Paper Title (click for abstract)
Hands-On Training
Paper No.Author(s)Paper Title (click for abstract)
| HT-012 | Kirk Lafler | SAS® Macro Programming Tips and Techniques |
| HT-115 | Bart Jablonski & Quentin McMullen |
Route Sixty-Six to SAS Programming, or 63 (+3) Syntax Snippets for a Table Look-up Task, or How to Learn SAS by Solving Only One Exercise! |
| HT-187 | Louise Hadden | The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets |
| HT-190 | Toshio Kimura & Siqi Wang & Weiming Du & Songgu Xie |
AI Performing Statistical Analysis: A Major Breakthrough in Clinical Trial Data Analysis |
| HT-353 | Troy Hughes | What’s black and white and sheds all over? The Python Pandas DataFrame, the Open-Source Data Structure Supplanting the SAS® Data Set |
| HT-377 | James Joseph | Scoring Real-World Data Reliability for Clinical Investigations |
| HT-397 | Phil Bowsher | Trying Out Positron: New IDE for Statistical Programming |
| HT-398 | Bhavin Busa | Hands-on Training: eTFL Portal & TFL Designer Community |
| HT-399 | Jim Box | Introduction to SAS Viya |
Leadership and Professional Development
Paper No.Author(s)Paper Title (click for abstract)
Metadata Management
Paper No.Author(s)Paper Title (click for abstract)
R, Python, and Open Source Technologies
Paper No.Author(s)Paper Title (click for abstract)
Real World Evidence and Big Data
Paper No.Author(s)Paper Title (click for abstract)
Solution Development
Paper No.Author(s)Paper Title (click for abstract)
Statistics and Analytics
Paper No.Author(s)Paper Title (click for abstract)
Strategic Implementation & Innovation
Paper No.Author(s)Paper Title (click for abstract)
Submission Standards
Paper No.Author(s)Paper Title (click for abstract)
e-Posters
Paper No.Author(s)Paper Title (click for abstract)
Abstracts
Advanced Programming
AP-002 : Worried about that Second Date with ISO®? Using PROC FCMP to Convert and Impute ISO 8601 Dates to Numeric Dates
Richann Watson, DataRich Consulting
Monday, 9:00 AM – 9:20 AM, Location: Aqua Salon AB
Within the life sciences, programmers frequently have to deal with dates in order to determine durations and number of days from a reference point. Within Study Data Tabulation Model (SDTM) domains, Clinical Data Interchange Standards Consortium (CDISC) has implemented the use of the International Organization for Standardization (ISO) format, ISO® 8601, for datetimes. While this helps to standardize how dates and times are captured so that there is no confusion as to what the date represents, it does not help from an analysis perspective. Within SDTM, these ISO dates are character dates and sometimes have missing components. From an analysis purposes the dates need to be converted to a numeric date and sometimes imputation of partial dates is needed. There are formats that can convert a complete ISO date to numeric date, but what if the date is a partial date. While SAS® has myriad functions, there is no function that will help with this conversion and imputation. Fortunately, with the use of the FCMP procedure, we are able to create our own custom functions to help achieve our desired goal. This paper illustrates the process of building a custom function that will take a date that is captured in the appropriate ISO format in SDTM (–DTC) and convert it to numeric date while also imputing any missing components. In addition, it will set the corresponding date imputation variable (–DTF) to the correct level.
AP-006 : Calculating the Physical Length of a PDF File String with ‘Times New Roman’ Font by SAS
Yueming Wu, Astex Pharmaceuticals
Steven Li, Medtronic PLC.
Monday, 2:00 PM – 2:10 PM, Location: Aqua Salon AB
When using the PROC REPORT procedure with ODS PDF statement for output table/listing generation, people often wish to know the ‘Physical Length’ of a string, instead of the number of characters in the string. The physical length of a string could be used to decide the column’s minimum width without word wrapping up, or without overlapping to the next column. This paper created a custom SAS function ‘pwidth()’ through PROC FCMP to calculate the physical length of a string on font ‘Times New Roman’ in PDF file. It may be useful for generating a nice FDA’s BIMO (Bioresearch Monitoring) PDF Listing.
AP-033 : Four roads lead to Outputting Datasets Dynamically (ODDy) & Beyond
Jason Su, Daiichi Sankyo, Inc.
Monday, 1:30 PM – 1:50 PM, Location: Aqua Salon AB
Outputting Datasets Dynamically (ODDy) is a common programming task, which is to partition an input table based on its contents on-the-fly. Although WHERE statement can be used to create subset datasets individually, the most efficient way might be to create all with just one-passing of the source or master one with OUTPUT statements. Then the question is: How to automate the whole process and create these OUTPUT statements dynamically? Here three popular methods are provided: MACRO, FILENAME+%INCLUDE, and CALL EXECUTE routine. Additionally, the paper gives two variants of the 1st solution, namely a macro program and a macro function. Finally, besides the three generalized solutions, there is a special solution especially for ODDy which is to output with HASH object. More importantly, from the perspective of data-driven programming, the paper summarizes the advantages/disadvantages and gives short insight on the circumstance niches for of each technique. In the end, the readers will be more familiar with these powerful solutions for data-driven tasks and their primary differences, and in the future when facing other data-driven programming tasks, can immediately pick the best from the four techniques based on specific circumstances.
AP-035 : Develop a SAS® Macro for Dataset/Program Timestamp Version Control
Chen Wang, Merck
Monday, 2:15 PM – 2:25 PM, Location: Aqua Salon AB
In the context of various clinical trial study deliverables, programmers who support study CSR, eDMC, etc., are tasked with managing a multitude of files, including programs, datasets, and Tables, Listings, and Figures (TLFs). Ensuring version control of these study files is increasingly crucial, presenting a growing challenge for the clinical study team in maintaining and tracking version control. The absence of a tool for checking version control can easily lead to issues such as copying or updating incorrect files. To address this challenge, this paper introduces a SAS macro that checks file consistency by comparing modified timestamps and produces a comprehensive list of files between the source and target directories, including their sub-directories. Implementing this macro will empower study programmers to effectively monitor file versioning and avoid potential errors.
AP-037 : Listing Shell To Sas Program Automation Tool
Balaji Ayyappan, ICON PLC
Johnny Tai, Kite Pharma, Inc
Monday, 4:00 PM – 4:20 PM, Location: Aqua Salon AB
Listings in clinical reports are required part of the documentation for clinical trials and used for statistical review by regulatory authority. We created a tool that can automatically generate the SAS programs to produce multiple listings, when we pass the TFL shell document (Table, Figures and Listings shells). Tool can identify the listing shell pages from shell and uses the shell annotation information, metadata and value level information from the datasets while generating the SAS code. SAS and VBA applications are used to develop this tool. Based on the annotated variables, for example: analysis flag used in title – we can subset required subjects from analysis dataset, character or numeric date variables are converted to company specified character date format in reporting, columns width in proc report procedure and page number variable to determine number of lines to be printed in each page are automatically determined. If any column reporting two or more variables, tool derives the concatenated variable based on the symbol present in column header. If Y/N values present in dataset and in Shell if it has Yes/No, tool read both dataset and shell info and automatically generate formats. If any column required customized coding that can also be included in the code using key words. This tool can evolve, we can add more features to it and tool can be extended to produce standard Tables and Figures.
AP-039 : How Not to SAS: Avoiding Common Pitfalls and Bad Habits
Melodie Rush, SAS
Tuesday, 11:30 AM – 11:50 AM, Location: Aqua Salon AB
Using SAS software effectively isn’t just about knowing the syntax- “it’s about developing habits and strategies that enhance your efficiency and accuracy. In this 50-minute session, we’ll take a humorous look at some of the worst practices in SAS programming, curated with the help of ChatGPT, and explain why these missteps can lead to frustration, inefficiency, or even disastrous results. By highlighting these “bad tips,” we’ll demonstrate best practices to avoid these mistakes, organize your work, optimize performance, and debug like a pro. Whether you’re a beginner or an experienced SAS user, this session will give you practical advice to sharpen your skills and streamline your SAS journey.
AP-053 : Implementing Data Checks for Patient Registries — A Modular Approach
John Gerlach, Navitas Life Sciences
Elizabeth Loisel, Sanofi
Monday, 4:30 PM – 4:50 PM, Location: Aqua Salon AB
The pharmaceutical industry collects data on patients to evaluate the safety and effectiveness of their products. These patient registries contain copious amounts of longitudinal data on thousands of patients, collected over many years. In order to ensure data integrity, numerous data checks must be performed such that any data issues are reported to Data Management. This paper explains a modular approach to implementing data checks that facilitates extensibility and includes macros whose names have a suffixed macro variable, which greatly decreases the lines of SAS® code.
AP-059 : Validate the Code, not just the data: A system for SAS program validation
Jayanth Iyengar, Data Systems Consultants LLC
Tuesday, 9:00 AM – 9:20 AM, Location: Aqua Salon AB
Regardless of the industry they work in, SAS programmers are focused on validating data, and devote a considerable amount of attention to the quality of data, whether its raw source data, submitted SAS data sets, or SAS output, including figures and listings. No less important is the validity of code and the SAS programs which extract, manipulate, and analyze data. Although code validity can be assessed through the SAS log, there other ways to produce metrics on code validity. This paper introduces a system for SAS program validation which produces useful information on lines of code, number of data steps, total run and CPU time and other metrics for project-related SAS programs.
AP-078 : Writing SAS MACROs in R? R functions can help!
Chen Ling, AbbVie
Yachen Wang, AbbVie
Monday, 10:00 AM – 10:20 AM, Location: Aqua Salon AB
Duplicated programming is an annoying task and will greatly impact the work efficiency. To avoid repetitive work, standard macros in SAS are developed. Currently, many companies are exploring the transition from SAS to R and programmers are starting to embrace R functions, similar to SAS macros, to handle repetitive programming, hence reduce workload and improve efficiency. This paper provides a detailed comparison of SAS MACROs and R functions, demonstrating how R functions can serve as an effective alternative for creating dynamic, reusable and robust code. In this paper, the first section explores foundational aspects, comparing SAS MACROs and R functions in terms of overall structure and calling methods, the usage of parameters or arguments. This section also discusses lazy evaluation in setting default values, scope of variables, and implementing conditional logic. Both example SAS and R codes are provided to illustrate functionalities. The second section offers practical demonstrations using R functions for generating Adverse Event (AE) tables, which are crucial in clinical trial reporting. This paper aims to help readers understand the unique advantages of using R for automating their reporting tasks and provides practical examples to guide readers to develop R functions from scratch, thus facilitating a smooth transition from SAS to R.
AP-081 : Efficiency Techniques in SAS® 9.
Stephen Sloan, Dawson D R
Tuesday, 3:00 PM – 3:20 PM, Location: Aqua Salon AB
Using space and time efficiently has always been important to organizations and programmers. We want to be able to use available space without having to obtain new servers or other resources, and without having to delete variables or observations to make the SAS data sets fit into the available space. We also want our jobs to run more quickly both to reduce waiting times and to ensure that scheduled jobs finish on time and successor jobs are not unnecessarily delayed. As we move toward cloud computing efficiency will become even more important because billing algorithms in cloud environments charge for every byte and CPU second. Sometimes we are in a hurry to get our jobs done on time, so we don’t pay attention to efficiency, sometimes we don’t know how much time and space our jobs will use, and sometimes we’re asked to go into existing jobs and make changes that are seemingly incremental but can cause large increases in the amount of required space and/or time. Finally, there can be jobs that have been running for a long time and we take the attitude “if it ain’t broke, don’t fix it” because we don’t want the programs to stop working, especially if they’re not well-documented. With a reasonably good knowledge of SAS® Base, there are things that can help our programs optimize the use of space and time and run more quickly without causing any loss of observations or variables and without changing the results of the programs.
AP-097 : Better, Faster, Stronger: A Case Study in Updating SAS Macro Code with Run-time Optimization in Mind
Valerie Finnemeyer, Harvard TH Chan School of Public Health
Monday, 10:30 AM – 10:50 AM, Location: Aqua Salon AB
Often, macro code is first written with a particular purpose and then “macrotized” to be used more generally. Over time, additional features are tacked onto the original code, resulting in “Frankencode” that does the job, but not necessarily in the best way. In an ideal world, this is the point at which you might start over from scratch, building a new version of the code from the ground up. However, with a large, complex macro, going back to the drawing board might not be practical. Worse, it may not offer enough improvement to justify the rewriting time. This paper provides a framework for how to tackle updating these complex macros with a particular goal in mind: run-time efficiency. This framework is introduced and applied to an example macro written at CBAR to perform and aggregate descriptive statistics from a CDISC ADaM BDS or ADSL dataset in order to generate publication-ready summary tables. We walk through the entire process, from brainstorming an initial list of potential improvements, to conducting experiments to evaluate their impact, and ultimately deciding on an approach to update the macro that minimizes macro programmer effort while maximizing efficiency gain for users. While this case study takes place using SAS code, the framework is applicable to all programming languages.
AP-121 : SET Statement Considered Harmful
Bart Jablonski, yabwon
Monday, 11:30 AM – 11:50 AM, Location: Aqua Salon AB
The SET statement is the primary way SAS programmers access the data (I could bet $5, it is over 90% of cases). Since data processing in SAS is I/O oriented the majority of optimization can be done by reducing I/O, i.e., the proper use of the SET statement (of course also MERGE, INPUT, DATA=, etc.) in the code. The article is dedicated to beginner SAS programmers and considered education focused. In the paper we preset a group of examples showing how (mis)use of the SET statement can (and is) harmful for data processing and how the grim-situation can be fixed.
AP-141 : A New Gateway to Open Source in SAS Viya
Jim Box, SAS Institute
Mary Dolegowski, SAS
Monday, 3:00 PM – 3:20 PM, Location: Aqua Salon AB
PROC GATEWAY is a new procedure in SAS Viya that allows you to run code written in other programming languages directly from within SAS. We will show how GATEWAY can be used to integrate R and Python code in your SAS Programs, allowing you to take advantage of new open source procedures to solve more complex coding tasks. We’ll also show you how to take advantage of Viya’s cloud capabilities to submit open source programs in a way to take advantage of multi-threading, allowing you to really leverage large data or complex analyses.
AP-151 : Sugar Rush: SQL Master Class for Pharma Professionals
Charu Shankar, SAS Institute
Tuesday, 2:00 PM – 2:20 PM, Location: Aqua Salon AB
Join Charu Shankar, a seasoned SAS presenter, for a dynamic presenttion exploring SQL’s sweet role in pharmaceutical data analysis, with a focus on diabetes clinical trials. This session tackles industry-specific challenges using sugar-rich datasets like DCCT, NHANES, and TDS, offering hands-on practice with real-world glucose data. Participants will learn to query, summarize, and report clinical data, including HbA1c levels, demographics, and treatment outcomes. Key skills covered: – Data Exploration: Sifting through the sugar to assess diabetes progression. – Data Preparation: Blending datasets into a smooth analytical base. – Advanced SQL: Sweetening your queries with averages, correlations, and metrics. – Data Reporting: Serving up clear insights on treatment outcomes. By the end, attendees will confidently analyze clinical trial data, uncover trends in sweetness, and drive data-driven decisions in diabetes research.
AP-157 : Macro to Compare ADaM Spec Derivations Against Available SDTM Data
Ingrid Shu, Merck
Kexin Guan, Merck & Co., Inc.
Jeff Xia, Merck
Tuesday, 10:00 AM – 10:10 AM, Location: Aqua Salon AB
An ADaM specifications file establishes the groundwork for developing ADaM datasets while maintaining CDISC standards. The specifications are comprehensive, and of these, the ADaM variable derivation rules are particularly critical and susceptible to errors. Ensuring traceability between the source data (SDTM) and the analysis data is a core principle of ADaM specifications. Oftentimes, the programmer will use a global standard ADaM spec and adjust the derivation rules to fit their specific study needs. This process can be time-intensive and requires extensive cross-checking between specification derivations and the actual available values of raw SDTM data. This SAS macro offers a solution by detecting most, if not all, cases where the specification derivations are incongruent with the existing SDTM data. It systematically reviews the entire ADaM specification and identifies value mismatches by using regular expressions. If incorrect derivations are overlooked, it can lead to severe downstream consequences in the quality of data analysis. This tool can greatly assist programmers in verifying the accuracy of the ADaM specifications and reduce the likelihood of such errors occurring.
AP-161 : Get Tipsy with Debugging Tips for SAS® Code: The After Party
Lisa Mendez, Army MEDCOM
Richann Watson, DataRich Consulting
Tuesday, 4:30 PM – 4:50 PM, Location: Aqua Salon AB
You’ve spent hours crafting your SAS® code, or perhaps you’ve inherited a jumbled mess from someone else, and when you run it, it “explodes” like an ill-fated cocktail experiment and leaves a bad taste in your mouth! Mastering the art of debugging not only makes coding a smoother experience but also spares you from needing a stiff drink to cope with the frustration. This paper will discuss tips, tricks, and techniques for debugging your SAS code. It will show you how to use the debugger in both SAS Display Manager and SAS Enterprise Guide, akin to a bartender knowing just the right tools to mix the perfect drink. It will also describe how to use PUTLOG and PUT statements, along with their key differences- “like choosing between a whiskey neat and a whiskey sour. Additional debugging aids such as using OPTIONS, code versioning, and macro variables will be discussed, ensuring you have a well-stocked bar of techniques at your disposal. As a bonus, the paper will also provide some common error messages and their meanings- “your personal cocktail menu of debugging solutions.
AP-162 : Mining Data from PDF Files Using SAS
Michael Stout, Johnson & Johnson Medical Device Companies
Brian Knepple, J&J MedTech Orthopadics
Tuesday, 10:30 AM – 10:40 AM, Location: Aqua Salon AB
SAS is a powerful software tool and can read and process data from multiple file formats. SAS can efficiently read non-SAS datasets, such as Text, EXCEL, and CSV files. It is much more challenging to read and process files in PDF (portable document format) format. This paper describes an approach and pitfalls of mining data embedded in PDF files. Reading and processing data from PDF files can be challenging, but the gain is worth the pain. This process has reduced the amount of time and improved accuracy of manually collating data for summary reports.
AP-192 : A Macro to Automatically Check SAS Logs for Common Issues
Hailin Yu, Clinical Outcomes Solutions
Tuesday, 10:15 AM – 10:25 AM, Location: Aqua Salon AB
Manually reviewing SAS logs can be time-consuming and error-prone, especially in complex clinical trial programming. This paper presents a macro that automatically scans SAS logs for common issues like errors, warnings, unresolved macro variables, and merge statements with missing BY variables. By automating log checking, programmers can save time and ensure high-quality outputs.
AP-211 : You’ve Got Options: Five-Star SAS® System Option Hacks
Louise Hadden, Cormac Corporation
Monday, 8:30 AM – 8:50 AM, Location: Aqua Salon AB
SAS® provides myriad opportunities for customizing programs and processes, including a wide variety of system options that can control and enhance SAS code from start to finish. This paper and presentation demonstrates methods of obtaining information on SAS system options, and moves on to fully explicate ten SAS system option hacks, from COMPRESS to XWAIT. System options are highly dependent on platforms, security concerns, SAS versions and products: dependencies and defaults will be discussed. SAS practitioners will gain a deeper understanding of the powerful SAS system options they’ve seen, used, and automatically included in their code. This presentation is suitable for all skill and experience levels; platform differences are part of the discussion.
AP-213 : Programmatic Annotation of Case Report Forms
Matthew Finnemeyer, Vertex Pharmaceuticals Inc.
Monday, 11:00 AM – 11:20 AM, Location: Aqua Salon AB
Annotation of Case Report Forms (CRFs) is a critical task in clinical trials and can often be a time-consuming process; manual creation or import of annotations as Comments to a PDF can be tedious and error prone. This paper outlines a robust macro suite that: imports pre-existing PDF annotations into the SAS environment, parses key information for user review and modification, “up-versions” annotations to align with the latest standards, aligns annotations with their “target page” of the Destination CRF, and exports the finalized annotations to a Forms Data Format (. FDF) file for direct import by the user. This suite has been shown to reduce internal programmer-hours for CRF annotation by 50% or more in the creation of new or modified aCRFs.
AP-217 : Automating SAS Program Header Updates with Macros
Kexin Guan, Merck & Co., Inc.
Tuesday, 4:00 PM – 4:20 PM, Location: Aqua Salon AB
Maintaining accurate and comprehensive documentation in pharmaceutical programming is essential for audit trails and program traceability. As clinical programming projects increase in complexity, manually updating program headers becomes a challenging and tedious task. Programmers often face difficulties in tracking multiple input datasets, output files, and macro calls across various SAS programs. This paper presents a solution as a macro designed to automate the generation and updating of program headers. The macro retrieves essential metadata such as input datasets, macro calls, program outputs, logs, and program flows from existing SAS programs, and seamlessly integrates them into the program header. The macro can process individual files or entire directories, providing flexibility across diverse programming environments. Other key features of this macro include automatic version date updates, preservation of revision history, and selective management of existing header information. By automating the header generation process, this tool reduces manual effort, minimizes errors, and ensures up-to-date information, significantly enhancing documentation efficiency and accuracy in programming workflows.
AP-226 : Advanced ODS Excel Tricks: Password Protected Workbooks and Clickable Navigation
Jeffrey Meyers, Regeneron Pharmaceuticals
Tuesday, 2:30 PM – 2:50 PM, Location: Aqua Salon AB
The Microsoft Excel destination was added to the SAS output delivery system (ODS) in version 9.4M3 and offered an improvement over the EXCELXP tag sets destination. The Excel destination allows SAS users to create workbooks in the XLSX format instead of the XML format which reduces the file size and unlocks additional functionality. This paper focuses on two advanced tricks when using the Excel destination: password protection and clickable navigation. The Excel destination can create a “protected” workbook but does not include the option to include a password which allows any user to simply remove the protection from within Excel. This paper will show techniques to add password protection to the Excel file and how to selectively choose which worksheets or columns are protected. The second focus of this paper, clickable navigation, allows the user to click on a cell within a worksheet to jump to another cell anywhere in the workbook leading to very fluid and dynamic reports.
AP-238 : When PROC SORT Isn’t Enough: Implementing Horizontal Sorting in SAS
Krutika Parvatikar, Merck & Co, Inc.
Jeff Xia, Merck
Monday, 2:30 PM – 2:40 PM, Location: Aqua Salon AB
Sorting data is a fundamental task in data management and analysis, often accomplished in SAS using the robust PROC SORT procedure. However, unique data scenarios occasionally arise where PROC SORT alone may not suffice, particularly when sorting values at the record level, i.e., sorting variables horizontally within an observation in a dataset. This paper explores the implementation of algorithms in SAS to address such challenges. We demonstrate how bubble sort can efficiently sort arrays in memory, enabling precise control over sorting logic and customization. We also demonstrate how array sorting can be used to achieve the same goal. One compelling real-world application in clinical research involves assembling citations for Tables, Listings, and Figures (TLFs). This process entails creating a structured citation that lists all ADaM or SDTM datasets used to generate a specific TLF in a clear, well-defined format. By incorporating hyperlinks to each dataset, the citation allows reviewers to access the relevant datasets with a single click, streamlining the review process and enhancing data traceability. This innovative approach not only improves efficiency but also supports better collaboration and transparency during regulatory submissions. This paper highlights the versatility of SAS as a programming tool, emphasizing innovation in handling complex data scenarios. Attendees will gain insight into how tailored sorting techniques can enhance efficiency, expand SAS’s capabilities, and address niche data requirements in the life sciences industry.
AP-263 : Why SASSY?
David Bosak, r-sassy.org
Tuesday, 8:30 AM – 8:50 AM, Location: Aqua Salon AB
R has several weaknesses in regards to its use in the pharmaceutical industry. It lacks many of the basic capabilities of SAS®. It is also numerically inconsistent with SAS. R is furthermore conceptually very different from SAS, and quite difficult for a SAS programmer to learn and use. Lastly, the stability and backward compatibility enjoyed by SAS programmers generally does not exist R. Most R package developers do not give priority to these values. The purpose of this paper is to explain how the SASSY system of R packages can solve all of these problems. The paper will first clarify the above problems. It will then give a quick overview of the SASSY ecosystem. Finally, the paper will elaborate on how SASSY addresses the above issues in a very simple and direct way. The end result is a small, stable set of R packages that replicates the basic functionality of SAS, without any of the weaknesses normally associated with R.
AP-279 : Using the ODS EXCEL Engine to Create Customized MS Excel Workbooks
Marlene Scott, AtriCure, Inc.
Tuesday, 10:45 AM – 10:55 AM, Location: Aqua Salon AB
Ever been asked to provide data to non-SAS programmers? This paper proposes using Base SAS tools, using the DATA Step, PROC SQL, and PROC REPORT, to create customized MS Excel workbooks to display data. The paper will provide additional details on options programmers can use to create multiple spreadsheets, ensure better control over how text is displayed, add filters to columns, freeze row headers, modify column widths, and control how data are displayed in cells utilizing TAGATTR attributes along with PROC REPORT. Additionally, this paper will discuss ways to selectively highlight results using the COMP/ENDCOMP statement within PROC REPORT. Finally, this paper will also provide a novel use of creating metadata of SAS datasets by describing how, using both PROC SQL and DICTIONARY.COLUMNS, to dynamically create data dictionaries describing dataset attributes typically generated using PROC CONTENTS. A step-by-step approach will be used to demonstrate the programming concepts outlined in the paper. All programs displayed in this paper will be developed using Base SAS, v9.4.
AP-312 : How Not to Drown in Code When Pooling Data
Scott Burroughs, Orbis Clinical
Tuesday, 11:00 AM – 11:10 AM, Location: Aqua Salon AB
At some point in our pharma careers, we’ll likely have to pool some data from multiple studies. Could be for FDA required deliveries such as ISS/ISE or could be some internal initiative such as an aggregated database to potentially be used for experimental research. Even in the CDISC/ADaM world, the same analysis data set for multiple studies could be (unfortunately) widely variant. That means that each data set may not have the same set of variables and/or variables may not have been derived the same for every study. For pooling variant data sets, can we do this in an efficient manner and maybe make it robust to changes, while we’re at it?
AP-347 : Last Observation Carried Forward (LOCF) in Longitudinal Patient Studies: A Functional Approach to Imputing Missing Values Using PROC FCMP, the SAS® Function Compiler
Troy Hughes, Data Llama Analytics
Tuesday, 1:30 PM – 1:50 PM, Location: Aqua Salon AB
Last observation carried forward (LOCF) is a ubiquitous method of imputing missing values in longitudinal studies, and is commonly implemented when a subject misses a scheduled visit and data cannot be collected (or generated). In general, the last “valid” value from a previous visit is retained for the later visit for which the data could not be obtained, and this conservative estimation succeeds in cases where the actual value would have been little changed. Study protocols may stipulate which prior values count as “valid” (e.g., after the start of treatment) as well as for how long (e.g., how many days, visit weeks, consecutive missed assessments) a value can be used to impute other values. Given these complexities, LOCF solutions implemented in SAS® historically adopt a procedural approach, and often require multiple DATA steps and/or procedures to impute data both within subjects and across observations. Conceptually, however, a functional approach can be envisioned in which LOCF could be calculated using a function call – ” delivering the same functionality through a single line of code while hiding (abstracting) the complexity of the calculation inside the function’s definition. The FCMP procedure enables SAS practitioners to build user-defined functions- “even those that perform inter-observation calculations – ” and this text demonstrates a user-defined subroutine that dynamically calculates LOCF while relying on CDISC and ADaM standards, data structures, and nomenclature. The software design concepts herein are adapted from the renowned SAS Press book: PROC FCMP User-Defined Functions: An Introduction to the SAS® Function Compiler. (Hughes, 2024)
AP-367 : SAS Macro to Calculate Standardized Mean Difference
Tong Zhao, LLX Solutions, LLC
Tian Gu, LLX Solutions, LLC
Monday, 2:45 PM – 2:55 PM, Location: Aqua Salon AB
The standardized mean difference (SMD) is a measure of effect size between the two groups. It is widely used in the external controlled clinical trials studies, comparing the baseline characteristics of the treated group and the propensity-score (PS) weighted or propensity-score (PS) matched control group. This paper details the methodology for calculating SMD for both continuous and binary variables in such analyses. Additionally, we provide a SAS macro that implements these calculations directly from the underlying formulas, offering researchers a flexible alternative to standard procedures.
Artificial Intelligence and Machine Learning
AI-136 : AI-Assisted Transition: from SAS to RStudio for PK Summary Analysis
Shiqi Lin, Merck & Co., Inc., Rahway, NJ, USA
Monday, 5:00 PM – 5:10 PM, Location: Indigo 204
SAS has traditionally been the primary tool for data analysis and reporting in the pharmaceutical industry. However, the increasing popularity of RStudio and its open-source nature have driven a need for program migration. Manual migration is time-consuming, error-prone, and requires significant expertise. This paper presents an AI-assisted approach to streamline the transition from SAS to RStudio. Using a pharmacokinetic (PK) summary table program as a case study, we decompose a SAS macro into key modules, convert it to RStudio code with the help of an AI tool, and compare results between SAS and RStudio to assess the accuracy of this approach.
AI-137 : Traceability and AI for Improved Understanding, Communication and QC of Clinical Trials
Tomás Sabat Stofsel, Verisian
Monday, 8:30 AM – 8:50 AM, Location: Indigo 204
Data standardization and clinical trial code generation, validation and maintenance are resource- and time-intensive. Statistical code complexity is enormous as one study may contain dozens of files with 1000s of cross references. This makes understanding how data is analyzed in a study non-trivial. In this talk, we explain how we can use a trial’s SAS analysis logs and metadata into a graph that powers logical inference and AI to provide a new level of study understanding and validation. We discuss how code traceability can enable existing processes across the statistical analysis value chain: spanning study building, results validation, specification validation, study pooling, and define-xml automation, vastly simplifying processes across a complex stakeholder landscape.
AI-194 : Visualize High Dimension Data Using t-SNE
Jun Yang, Avidity Bioscience
Tuesday, 10:00 AM – 10:20 AM, Location: Indigo 204
I introduce a novel technique named “t-SNE”, designed for visualizing high-dimensional data by assigning each data point a location in a two or three-dimensional map. This method, a variant of Stochastic Neighbor Embedding (Hinton and Roweis, 2002), offers improved optimization ease and generates markedly superior visualizations by embedding the points from high dimension to low dimension trying to preserve the neighborhood of that point. t-SNE excels in creating a unified map that unveils structures across various scales, a crucial attribute when dealing with high-dimensional data distributed across distinct yet interrelated low-dimensional manifolds, such as images depicting objects from diverse classes and viewpoints.
AI-209 : Integrating Generative AI into Medical Writing: Building an Interactive Drafting and Search Framework
Tadashi Matsuno, Shionogi & Co., Ltd.
Monday, 1:30 PM – 1:50 PM, Location: Indigo 204
The increasingly lengthy clinical development timelines have elevated the need to shorten the medical writing process, particularly for key documents like Clinical Study Reports (CSRs) and protocols. At Shionogi, we aim to enhance drafting efficiency by 15- 25% through an in-house Generative AI solution deployed on a cloud platform. This solution leverages a Large Language Model (LLM) and references organizational templates and historical trial documents stored in a secure cloud environment. However, due to issues such as hallucination, relying solely on LLMs to produce perfect documents is unrealistic. Therefore, we are developing a web-based front-end application that enables medical writers to interactively review and refine AI-generated text, focusing on creating only the initial draft and leaving finalization to human oversight. In parallel, we are advancing the development of a document search application that vectorizes CSRs, protocols, and regulatory meeting notes, enabling natural language queries for various use cases within the Development organization. Through these initiatives, we plan to embed Generative AI at earlier stages of clinical development planning. This presentation will delve into our technical framework, demonstrate how human intervention complements automated processes, and discuss the key insights and challenges we have encountered thus far.
AI-214 : Generating Synthetic Clinical Trial Data with AI: Methods, Challenges, and Insights
Max Ma, Everest Clinical Research Corporation
Weijie Yang, Everest Clinical Research Inc.
Emily Ma, University of Toronto
Monday, 9:00 AM – 9:20 AM, Location: Indigo 204
In the clinical trial industry, the ability to generate realistic synthetic data is essential for programming preparation, validation, data simulation, and training. Artificial Intelligence (AI) offers powerful tools to achieve this by generating structured and meaningful synthetic data that aligns with predefined specifications. This paper discusses the use of Python and OpenAI’s API to generate synthetic clinical trial data, utilizing the Pydantic library to enforce structured outputs. A comprehensive workflow is presented, covering the entire process from specification preparation to exporting data for Electronic Data Capture (EDC) systems or further analysis. All the examples referenced in this paper can be accessed at the following GitHub repository: https://github.com/Emily93643/AI-Synthetic-Data.
AI-239 : Gen AI Assisted Code Conversion: From SAS to R Standard ADaM Templates
Jeff Cheng, Merck & Co., Inc.
Srinivas Malipeddi, Merck & Co., Inc.
Gurubaran Veeravel, Merck & Co., Inc.
Jaime Yan, Merck
Suhas Sanjee, Merck
Monday, 11:30 AM – 11:50 AM, Location: Indigo 204
Generative AI (Gen AI) holds significant promise for streamlining the conversion of code between programming languages. Given the recent shift towards open-source languages in clinical trial Analysis & Reporting (A&R), we evaluated Gen AI’s ability to convert a SAS standards library to R. Our global standards library includes template programs in SAS that serve as starting points for creating ADaM datasets and involve minimal SAS macro code. Thus, we focused on using Gen AI to convert SAS-based ADaM templates to R, with the aim of providing study programmers hands-on experience in validating ADaM datasets with R. This transition also facilitates statisticians’ review and understanding of ADaM dataset creation, as they are generally more comfortable with R. So far, we have converted approximately 70% of the ADaM templates in our global standards library, achieving a 60% reduction in manual code conversion efforts and improving the consistency and accuracy of the output. This not only enhances the adaptability of statistical programmers but also streamlines operations by reducing the need for additional resources through Gen AI. In this paper, we will discuss our approach, the challenges we faced, and how we successfully navigated them to achieve our objectives.
AI-240 : Automation of Trial Design Domains generation using AI
Prasoon Sangwan, TCS
Monday, 3:00 PM – 3:20 PM, Location: Indigo 204
Trial Design Domains are a crucial component of the Study Design Tabular Model, specifically designed to encapsulate study design information. These datasets play a vital role in creating a comprehensive trial design knowledge base, offering a quick overview of key design features such as treatment comparisons, subject criteria, and trial pathways. Processing Trial Design Domains can be quite challenging, as most of the necessary information is derived from the protocol, an unstructured document, rather than from the data collected directly from subjects. This paper will explore how Artificial Intelligence (AI) and Machine Learning (ML) can streamline the generation of datasets by extracting data from the protocol, Case Report Forms (CRFs), and various other SDTM domains. It will detail how an ensemble of models can be employed to classify different sections of the protocol and extract specific elements pertinent to each trial design domain. Additionally, the paper will discuss the introduction of the Unified Study Definition Model (USDM) to facilitate the creation of Trial Design domains strengthening the path of digital data flow.
AI-242 : Translation from SAS® to R Using ChatGPT®
David Bosak, r-sassy.org
Brian Varney, Experis
Monday, 10:00 AM – 10:50 AM, Location: Indigo 204
Recent years have seen an explosion of use cases for generative AI. People are using AI to do research, write papers, summarize articles, and write code. At first, the quality of the output was not very good. Yet with each new release, the output is getting better and better. It is now feasible to perform tasks that previously would not be considered. In this paper, we will explore the topic of code translation from SAS® to R using ChatGPT®. The paper will first provide an overview of how to perform such a translation. We will then elaborate on a list of tips and tricks to make the machine translation better. Finally, we will point out some limitations of the technology and tell you what to watch out for. The paper will hopefully present you with enough information to try translating some code yourself.
AI-246 : Incorporating LLMs into SAS Workflows
Samiul Haque, SAS Institute
Sundaresh Sankaran, SAS Institute
Monday, 2:00 PM – 2:20 PM, Location: Indigo 204
In this work, we explore how Python integration with SAS offers a seamless way to incorporate Large Language Models (LLMs) into SAS Studio. By leveraging Python’s extensive libraries and the robust capabilities of the SAS platform, developers can embed LLM frameworks into their projects without disrupting existing workflows. We introduce a Retrieval Augmented Generation (RAG) framework tailored for SAS Studio, enabling straightforward adoption by any SAS Viya user. Our approach details how the core components of the RAG architecture- “retrieval, augmentation, and generation- “can be deployed within the SAS environment using both proprietary and open-source technologies. We also highlight how this flexible architecture addresses diverse use cases, including adverse event detection, structured data extraction, data mapping, protocol search, and patient matching. By uniting Python’s advanced language processing capabilities with SAS’s powerful analytics, data scientists can enhance their projects with more efficient and intelligent data handling. Finally, we provide a comprehensive blueprint for customizing RAG solutions in SAS Studio, allowing practitioners to adapt the framework to their specific needs. By following this guide, data scientists and developers can harness the full potential of LLMs in the SAS ecosystem, ultimately enabling deeper insights and more scalable analytics solutions- “all within a familiar SAS workflow.
AI-286 : Get Started with ChatGPT and DeepSeek in SAS
James Austrow, Cleveland Clinic
Monday, 2:30 PM – 2:50 PM, Location: Indigo 204
If you’ve been feeling like a wallflower at the LLM party, this paper is for you. Whether you’re intimidated by the jargon, a SAS programmer marooned in a sea of Python tutorials, or simply don’t know where to begin, start here. Follow along with this walkthrough to build, from scratch, the foundations of an AI-powered SAS application. You’ll get properly acquainted with the language model lingo: prompt engineering, zero-shot, few-shot, chain-of-thought, and prompt chaining; all receive practical treatment. Don’t just have them explained, see how you can apply them to solve a real data processing problem. The AI landscape is constantly evolving. Two of the most well-known language model services, ChatGPT and DeepSeek, receive specific focus here, but the concepts are generalizable and apply to all the major providers. Advanced SAS users stand to get the most out of this paper, but anyone, even non-programmers, with an interest in prompt engineering can benefit as well.
AI-299 : Leveraging Gen AI (ChatGPT, Gemini) API for Advanced Biometric Analysis in SAS, Python and R
Kevin Lee, Clinvia
Monday, 11:00 AM – 11:20 AM, Location: Indigo 204
Since its release, ChatGPT has rapidly gained popularity, reaching 100 million users within 2 months. Not only we could use ChatGPT for prompting, but we could build application / codes using API. The paper explores the integration of Gen AI (e.g., OpenAI, Gemini) API into biometric data analysis workflows using SAS, Python, and R programming languages. The paper will display the workflow of Gen AI integration and provide sample codes for SAS, Python, and R programming. First, the paper will show how to create API in OpenAI ( paid) and Gemini ( free version). Secondly, it will help to create the input data containing prompts and the questions to Gen AI. Thirdly, it will create codes (e.g., SAS, Python, R) to read the input data and to generate output data. Fourth, it will show how to read output data to obtain the results from ChatGPT. The paper also presents a document-based question-answering system leveraging LangChain, Retrieval-Augmented Generation (RAG), and OpenAI models. It will show how we could build a simple chatbot for ADaM IG. Finally, the paper will explore the potential use cases of Gen-AI application to revolutionize biometric function, outlining innovative solutions and strategies for leading this technological advancement such as code conversion, code generation, prompt-driven data visualization, content development (e.g., SAP, CSR, Narratives), AI Agent driven automation and many more. Furthermore, it will address the inherent risks associated with Generative AI-generated output and propose mitigation strategies.
AI-324 : Code Smarter, Not Harder: The 5 C’s of ChatGPT for the SASsy Professional
Charu Shankar, SAS Institute
Kirk Lafler, sasNerd
Monday, 4:00 PM – 4:50 PM, Location: Indigo 204
In today’s fast-paced analytics world, efficiency is everything. This session explores how ChatGPT and SAS complement each other across the 5 C’s to enhance productivity: Communicate – Automate structured email reports with SAS, or use ChatGPT for dynamic, polished messaging. Learn when to use each for maximum impact. Code – Speed up SAS programming with ChatGPT’s syntax suggestions, debugging tips, and code generation- “all while maintaining best practices. Collaborate – Use SAS for version control and shared repositories, while ChatGPT streamlines teamwork with summaries, documentation, and code explanations. Customize – Enhance efficiency with SAS macros and fine-tune ChatGPT prompts for tailored reporting and automation. Create – Harness AI-driven insights for problem-solving while ensuring accuracy and compliance in SAS analytics. Featuring hands-on demos and real-world examples, this session will equip you with practical strategies to code smarter, not harder.
AI-331 : Streamlining and Accelerating Clinical Research through AI and GenAI driven Insights and safety review of Medical and Scientific Literature
Rohit Kadam, Mr.
Saurabh Das, Tata Consultancy Services
Rajasekhar Gadde, Tata Consultancy Services
Niketan Panchal, Mr.
Alejandra Guerchicoff, Phd
Tuesday, 8:30 AM – 8:50 AM, Location: Indigo 204
Literature search in the medical life sciences domain involves systematically exploring and reviewing existing academic and research literature to gather information, evidence, and insights related to a specific topic or research question, and also identify and report safety events. With the advancements in the technology and rapid growth of the Life Sciences domain, the amount of data and information generated everyday is enormous. Weekly there are new releases and updates of medical journals and papers. To get the correct articles and papers it is important to identify and choose appropriate databases for the search. In the medical field, databases like PubMed, MEDLINE, Embase, and other databases are commonly used. Search is done on these databases using a custom search strategy. The relevant articles based on the search strategy are selected, information from them is extracted stored in a structured manner. Insights are derived from the selected articles based on a custom user query; this user query along with the extracted information is given to the state of the art Language model. The solution goes beyond analysis to determine the nature of the articles we’ve fetched. Specifically, we’re focused on classifying them into three categories: Valid Individual Case Safety Report (ICSR), Potential ICSR, or not a valid ICSR for the product under scrutiny. Achieving this classification involves a custom strategy, one that can be adapted to meet the unique needs of our users. This strategy is designed to provide actionable insights and is a testament to the flexibility of our system.
AI-345 : Productive Safety Signal Detection and Analysis using In-context Learning
Sundaresh Sankaran, SAS Institute
Samiul Haque, SAS Institute
Sherrine Eid, SAS Institute
Tuesday, 11:00 AM – 11:20 AM, Location: Indigo 204
Analysis of Adverse Event (AE) narratives enables better patient outcomes by providing rich safety signals. In addition to time-consuming manual review, life sciences organisations and regulators have used Natural Language Processing (NLP) to classify and extract signals. However, NLP’s value can be constrained due to its complex nature, interpretability, and difficulty to scale. In this session, we demonstrate how NLP, combined with In-context Learning, a Generative AI technique, can efficiently structure AE narratives into easily interpretable datasets. In-context learning helps us specify an instruction template and provide context to Large Language Models (LLMs), using a customisable checklist containing numerous dimensions on which safety reviewers analyse AEs. Checklist responses are assessed and consolidated into scorecards, which are then surfaced through SAS reports and visualisations along with insights and patterns. In addition, SAS Viya’s NLP techniques provide an efficient filtration and pattern identification mechanism by standardising information contained in AE reports. Configurable parameters help us adjust our analysis for different groups and patient segments. At the same time, we also highlight caveats and risks posed by AI, and suggest guardrail mechanisms which help preserve trust in results. Through this session, we will spark discussion and ideas on how AI can enhance safety signal analyses while, at the same time, maintain high quality output and transparent workflows.
AI-349 : Harnessing AI & CDISC ARS for Effortless Statistical Reporting & CSR Writing
Bhavin Busa, Clymb Clinical
Navin Dedhia, Clymb Clinical
Tuesday, 10:30 AM – 10:50 AM, Location: Indigo 204
Imagine a future where writing Clinical Study Reports (CSRs) is faster, more accurate, and nearly effortless- “this is the promise of our Gen-AI-driven model leveraging CDISC Analysis Results Standard (ARS). We present a model fine-tuned on ARS and Analysis Results Data (ARD) metadata, streamlining the interpretation of analysis results and generating narrative summaries for CSR safety and efficacy sections. By processing pre-structured ARD datasets, the model identifies key findings and produces detailed summaries, significantly reducing manual effort. This approach enhances efficiency, saving time for biostatisticians, clinicians, and medical writers, while ensuring accuracy and consistency across TFL outputs and CSR text. We also discuss building this solution’s pros and cons, including scalability, regulatory alignment, and data privacy challenges. By feeding in template CSRs, ADaM datasets, and ARDs, we were able to train the model to write CSR sections. The ARD generates displays in the layout as prompted and can also produce in-text tables for the CSR. The model supports validation of numbers when ADaM datasets are fed into it. With APIs, we were able to directly plug the Gen-AI tool-generated CSR sections into the template document. This integration minimizes manual interventions, enhancing the overall process and making the structuring of displays easy when feeding ARD. This is made possible by leveraging Gen-AI and the CDISC ARS model/ARD.
AI-356 : Application of advanced GenAI tools in sample size estimation – questions and thoughts
Igor Goldfarb, Accenture
Ella Zelichonok, Naxion
Tuesday, 9:00 AM – 9:20 AM, Location: Indigo 204
Last couple of years are characterized by storming expansion of Generative Artificial Intelligence (GenAI) and leading pharma companies plan to invest billions of dollars in development of GenAI in upcoming years. The goal of this work is to discuss prospective benefits and to increase awareness of investigators, scientists and statisticians about application of GenAI to calculation of the sample size for prospective clinical trial. The authors assessed the performance, benefits and risks of using one of the latest versions (4o) of ChatGPT for sample size calculations in clinical study design. ChatGPT (Chat Generative Pretrained Transformer) was chosen as one of the popular GenAI products. The publicly available (ClinicalTrials.gov) results (sample size used in the completed study) were replicated using commercial software nQuery and ChatGPT (version 4o). The authors found that this advanced version of ChatGPT performs much better than the previous ones, while still demonstrating some errors. The authors also analyzed the reproducibility test and cases when ChatGPT (version 4o) returns various results in response to the same request asked multiple times. While the use of GenAI is very promising and prospective benefits for statistical community are clearly visible, there are still many challenges and limitations in its application at its current stage. (e.g., bias, hallucinations, complexity, replicability, etc.). In general, scientists and statistician should exercise caution in using GenAI. The companies planning the clinical study are recommended to hire experienced biostatisticians for sample size estimation.
AI-362 : AI-Powered Data Issue Tracker for Efficient Data Issue Tracking and Resolution
Bharath Donthi, Statistics & Data Corporation
Tuesday, 11:30 AM – 11:50 AM, Location: Indigo 204
Traditionally, data issues identified by the statistical programmers, statisticians and sponsors are documented in an excel and shared with clinical data managers and clinical data associates for resolution. This process is inefficient, leading to reporting of same or similar issues by different reviewers, making it difficult to track issue status and detect duplicates and lacks proper audit trial to ensure comprehensive resolution. To address these challenges, a new system has been developed. This system enables the entry of issues, assign tasks to individuals, provide real-time status updates and maintain audit trial. The system leverages AI to detect duplicate issues and prompts users to verify information and add issues only if needed. The AI can also identify subjects with similar issues and assist in generating SQL queries to help users create necessary reports. The AI-powered data issue tracker significantly reduces manual effort and ensures faster and more reliable responses ultimately enhancing quality of research data.
Data Standards
DS-028 : Name That ADaM Dataset Structure
Nancy Brucken, IQVIA
Monday, 10:00 AM – 10:20 AM, Location: Indigo 202
This presentation will give a brief overview of the standard ADaM dataset classes, followed by an interactive segment where a commonly-used table shell will be displayed, and the audience will be encouraged to name the standard ADaM dataset class and variables that could be used to most easily generate that table. Alternative approaches will be discussed where appropriate, including the use of the ADaM Other class and when it is acceptable to produce a single table from multiple ADaM datasets. The goal is to get the audience thinking about designing ADaM datasets based on table requirements, instead of on the structure of the SDTM domains feeding into those datasets.
DS-063 : The Winding Road to ADSL
Elizabeth Dennis, EMB Statistical Solutions, LLC
Grace Fawcett, Syneos Health
Monday, 9:00 AM – 9:20 AM, Location: Indigo 202
The typical study will use SDTM data as the sole origin for the ADaM datasets. ADSL is derived using SDTM.DM, EX, and other domains. But what’s the best process to use when the roadmap to ADSL contains twists and turns? There are times when the typical process is not robust enough to produce a well-documented ADSL that meets all the analysis needs. This presentation will go through three scenarios where details of the study and the analysis require a more complicated process to produce ADSL: 1. The derivations for a flag variable (usually a population flag variable) are so complicated that an intermediate dataset is helpful. 2. The derivations for a flag variable (usually a population flag variable) require clinical adjudication. 3. There are multiple participations per subject.
DS-065 : Which ADaM Data Structure Is Most Appropriate? Gray Areas in BDS and OCCDS.
Veronica Gonzalez, Biogen Inc
Monday, 10:30 AM – 10:50 AM, Location: Indigo 202
Choosing the ADaM dataset structure, based on the analysis requirements, is usually straightforward. Statistical analyses involving change from baseline, like those from Lab or Vital Signs, can only be done in a ADaM Basic Dataset Structure (BDS) dataset. Analyses that involve counts/frequencies of hierarchical dictionary data, like Adverse Events, Concomitant Medications, and Medical History, can only be completed in ADaM Occurrence Data Structures (OCCDS) dataset. However, there are times that the ADaM structure needed to support the statistical analysis is not well defined and can be completed using either BDS or OCCDS. This paper outlines an instance where both BDS and OCCDS ADaM datasets could be appropriate to achieve the statistical analysis. The strengths and weaknesses of each data structure will be scrutinized, utilizing one in depth example, to illustrate that an analysis structure choice is not always straightforward and that either could be used to execute the analysis requirements.
DS-067 : Multiple Imputation Techniques in the Context of ADaM Datasets
Shunbing Zhao, Merck & Co.
Linping Li, Merck & Co.
Wednesday, 8:00 AM – 8:20 AM, Location: Indigo 202
Multiple Imputation (MI) is an effective and increasingly popular method for handling missing data in regulatory clinical studies. Using MI method, each missing value is replaced with a set of plausible values. It becomes challenging to build a CDISC-compliant ADaM dataset from this multiply imputed dataset. In this paper, from a statistical programmer’s perspective, we use two detailed examples to illustrate how to implement the multiple imputation techniques in the generation of CDISC-compliant ADaM datasets. In the first example, we elaborate on how to build an ADaM Basic Data Structure (BDS) dataset to perform a sensitivity analysis using a control-based pattern mixture model. In this example, we show how to use SAS® MI procedure to generate an analysis dataset and how to transform it into ADaM standardized dataset step by step. In the second example, we demonstrate how to build a time-to-event BDS dataset using a simulation-based jump to reference MI approach. Additionally, we explain how to handle and comply with Pinnacle 21 validation rules. The sharing of hands-on experiences in this paper is intended to assist readers in preparing CDISC-compliant ADaM datasets to facilitate MI analysis in regulatory clinical trials, and further to support FDA submissions.
DS-075 : Controlling attributes of .xpt files generated by R
Yachen Wang, AbbVie
Chen Ling, AbbVie
Wednesday, 10:15 AM – 10:25 AM, Location: Indigo 202
In the process of electronic data submission to Food & Drug Administration (FDA), the most common way is using SAS to generate Simplified Transport (.xpt) files. Currently, more and more companies are exploring using R to generate statistical outputs for regulatory submission, and programmers are trying to generate .xpt file in R. However, due to the software setup difference between SAS and R, the attributes (length, label, format) of variables are not always handled very well. As we know, label, length and format are indispensable components of SDTM and ADaM dataset, and they should follow CDISC standard. In this paper, we provide comprehensive guides about checking and assigning variables’ attributes (length, labels and formats) while developing. xpt files in R. Moreover, we also introduce efficient ways to assign attributes to variables in R generated .xpt files based on specifications documents. Through practical examples, we compare those different attributes controlling methods, thereby demonstrating how to ensure the exported datasets from R meet the requirement.
DS-079 : BDS Dataset with PARQUAL and PARQTYP Variables for Time-to-Safety Events Analysis
Kang Xie, AbbVie
Wednesday, 8:30 AM – 8:50 AM, Location: Indigo 202
The analysis of adverse events can employ the ADaM datasets, specifically ADAE or ADTTE. ADAE is OCCDS data structure, and ADTTE, structured in the BDS with additional time-to-event (TTE) variables, specifically supports survival analysis related to censoring factors, making CNSR a required variable. This paper describes an approach to creating a BDS dataset named ADSTOI, which is designed to analyze the time to various events related to safety topics of interest (STOI) by grade and cycle within the oncology therapeutic area across different scenarios that do not involve censoring factors. The ADSTOI design aims to 1) facilitate the direct derivations of the time to various events, and 2) explicitly present the variables involved in these time calculations. Additionally, this paper introduces the use of PARQUAL and PARQTYP within the BDS ADSTOI dataset, demonstrating how these variables apply to the analysis of time to various events related to STOI, supporting decision-making regarding the standard criteria for PARQUAL and PARQTYP in ADaM.
DS-103 : Time-to-Deterioration for Patients Reported Outcomes
Christine Teng, Merck
Wednesday, 9:45 AM – 10:05 AM, Location: Indigo 202
In oncology studies, time-to-deterioration (TTD) in Patient-Reported Outcomes (PRO) assessments serves as an important endpoint, measuring how cancer treatment affects a patient’s quality of life and symptom burden. TTD is typically defined as the duration from a specified starting point- “such as the initiation of treatment or a baseline assessment- “until a patient indicates a clinically decline in their health status. This decline may be reflected by heightened symptoms, diminished quality of life, or an increased need for medical intervention. The definitions of TTD can vary depending on the primary focus of the specific disease area, and the intended objectives of the assessment. The two common applied definitions of TTD for PROs evaluation are (1) time to first deterioration and (2) time to first confirmed deterioration. This paper seeks to discuss a method for designing ADaM analysis datasets (ADPRO/ADPROTTE) that effectively support both types of TTD analyses.
DS-105 : ADaM Pet Peeves: Things Programmers Do That Make Us Crazy
Sandra Minjoe, ICON PLC
Nancy Brucken, IQVIA
Tuesday, 10:00 AM – 10:20 AM, Location: Indigo 202
The two of us authors have been actively involved in the CDISC ADaM team a long time – one for more than 20 years, and the other for over 10 years. During that time, we’ve led and contributed to a variety of ADaM documents, and we are both certified CDISC ADaM trainers. Because of our extensive ADaM expertise, we each end up reviewing a lot of ADaM submissions before they are sent to regulatory agencies. This paper and presentation highlight some of the common issues that we’ve seen ADaM developers make. For each issue, we provide a better and/or more conformant approach.
DS-109 : Impact of Drug Accountability on Drug Compliance and Dose Intensity in Clinical Trials
Vishal Gandhi, Merck & Co.
Milan Adesara, Merck & Co.
Wednesday, 9:00 AM – 9:20 AM, Location: Indigo 202
In clinical trials involving oral drug administration with varying dosing frequencies, drug accountability is critical for assessing drug exposure, compliance, and treatment efficacy. Patients are typically dispensed a specific quantity of medication, with any unused tablets returned at subsequent visits. The returned medication is analyzed to determine key parameters, including the Actual Consumed Dose, Actual Dose Intensity and Relative Dose Intensity. These parameters provide a comprehensive understanding of patient adherence to the prescribed regimen, enabling the identification of potential deviations that may impact pharmacokinetic and study outcomes. Ensuring accurate measurement of Drug Compliance directly influences the reliability of study results. Evaluating Actual Dose Intensity and Relative Dose Intensity allows reviewers to compare the administered dose with the intended dose and assess the degree to which patients adhere to the treatment plan. This information is essential for determining the effectiveness of the drug’s efficacy and safety in real-world conditions. Thus, drug accountability serves as a cornerstone for the successful execution and interpretation of clinical trials. This presentation will thoroughly explore various scenarios we faced while creating ADEXSUM dataset. By addressing this complexity, we aim to improve comprehension and outline a logical approach for deriving these parameters.
DS-123 : Decoding Laboratory Toxicity Grading: Unlocking the Potential and Overcoming Challenges of CTCAE
Xiaoting Wu, Vertex Pharmaceuticals
Lei Zhao, Vertex Pharmaceuticals
Tuesday, 9:00 AM – 9:10 AM, Location: Indigo 202
Toxicity grading is a critical tool for assessing the severity of adverse events and ensuring clinical trial safety. The Common Terminology Criteria for Adverse Events (CTCAE), developed by the National Cancer Institute (NCI), provides standardized criteria for grading adverse events. Originally designed for oncology, CTCAE has since been applied across various therapeutic areas. Despite its widespread use, significant challenges remain in interpreting and implementing CTCAE criteria, particularly in the context of laboratory toxicity grading. Existing publication provides limited guidance on these complexities, including the concepts of CTCAE for laboratory toxicity, the integration of CTCAE grading into analysis data models, and the application of CTCAE criteria to laboratory results at baseline or post-baseline. This paper aims to address these gaps by comparing CTCAE to other dictionaries such as MedDRA, examining the contents of CTCAE, exploring its practical implementation in ADLB data and SHIFT table, and proposing strategies to address the challenges in laboratory toxicity grading. By doing so, it provides a critical framework for improving the accuracy and consistency of laboratory toxicity assessments.
DS-126 : ADaM Fundamental Principles vs. Rules: Which to Follow, When, and Why?
Sandra Minjoe, ICON PLC
Mario Widel, IQVIA
Monday, 11:00 AM – 11:20 AM, Location: Indigo 202
The two of us authors have been actively involved in the CDISC ADaM team a long time – one for more than 20 years, and the other for over 15 years. During that time, we have led and contributed to a variety of ADaM documents, and we are both authorized CDISC ADaM trainers. In our years of working with ADaM, we’ve seen many different interpretations of ADaM documents – some interpretations are correct, others are not. This paper and presentation aim to address the confusion we have seen on ADaM Fundamental Principles vs. Rules, describe when each are needed, propose a hierarchy, and recommend how to handle compliance. We provide examples, focusing on tricky cases where there can be multiple correct options for implementation.
DS-169 : A discussion of the new ADaM guidance (ADNCA and ADPPK) for pharmacokinetics.
Luke Reinbolt, Navitas Data Sciences
Tuesday, 9:15 AM – 9:25 AM, Location: Indigo 202
Pharmacokinetics (PK) is the study of the effect of the body on a drug. Different mathematical methods are used to calculate PK parameters that describe, characterize, and quantify this effect. Two common ways to determine PK parameters are non-compartmental analysis (NCA) and population PK (PopPK). Non-compartmental analysis (NCA) is one class of mathematical methods for studying the level of exposure following administration of a drug and is commonly used to analyze serial drug concentration data from clinical trials by individual subjects. Population PK (PopPK) is a model-based representation of PK processes with a statistical component, enabling identification of the sources of inter- and intraindividual variability. Recent guidance has been published by CDISC which directly affects the PK process. The two guidances are the Analysis Data Model Implementation Guide for Non-compartmental Analysis Input Data (ADNCA) and Basic Data Structure for ADaM PopPK Implementation Guide (ADPPK). The purpose of this presentation is to discuss both ADNCA and ADPPK and their impact on the PK process.
DS-256 : What comes first, the chicken (ADSL) or the egg (ADNCA)? Modularize your covariate creation to support flexible analysis dataset implementation!
David Izard, Merck
Monday, 11:30 AM – 11:50 AM, Location: Indigo 202
Good Programming Practice dictates that you should create a variable once and then share it to other locations where it is needed. In the context of analysis datasets following the CDISC ADaM standard, this typically involves creating your covariates during the implementation of the ADSL dataset and then merging them on to other analysis datasets at subsequent points in the development sequence. ADNCA, the ADaM BDS dataset designed to support non-compartmental PK analysis, would utilize these covariates from ADSL but is usually needed and implemented before ADSL is available. This paper explores modular covariate creation and how it supports both the subtle details of covariate differences between ADSL and ADNCA and good programming practices.
DS-316 : Cytokine Release Syndrome (CRS) – Data Collection, Clinical Database Integration and Analyses in a Dose Escalation Cell Therapy Trial
Cyrille Correia, PPD
Venky Chakravarthy, Takeda
Tuesday, 11:00 AM – 11:20 AM, Location: Indigo 202
Cell therapy is a promising new therapeutic option in treating cancer. It has the potential to treat patients in other therapeutic areas such as autoimmune diseases. However, Cytokine Release Syndrome (CRS) remains a significant safety concern in these clinical trials. This presentation details the integration of CRS data into clinical study databases and subsequent analyses. We emphasize the process from data collection to CDISC-compliant datasets. CRS data is collected via electronic Case Report Forms (eCRFs) and integrated into Study Data Tabulation Model (SDTM) datasets. We will focus on SDTM.AE/SUPPAE for adverse events, SDTM.CE/SUPPCE for CRS signs and symptoms, and SDTM.CM/SUPPCM for treating CRS with drugs like tocilizumab. The relationships between these datasets are mapped using SDTM.RELREC to ensure accurate linkage of CRS events, treatments, and symptoms. We then discuss the creation of Analysis Data Model (ADaM) datasets, specifically ADaM.ADAE and ADaM.ADCE, which facilitate detailed CRS analyses. Key variables such as Time to CRS onset since last treatment infusion, Time to resolution of CRS and Last dosing date/time before CRS onset are derived to support comprehensive analysis. These variables enable the examination of the timing, duration, and severity of CRS events hence the safety of interventions. By showcasing this structured pathway from data integration to analysis, we highlight the critical role of robust statistical programming in enhancing the understanding and treatment of CRS in cell therapy trials. This methodology ensures data integrity and compliance while providing valuable insights for monitoring patient safety.
DS-336 : Creating JSON Transport Data for Regulatory Life Sciences
Mary Dolegowski, SAS
Matt Becker, SAS
Tuesday, 10:30 AM – 10:50 AM, Location: Indigo 202
The capacity to convert data into a JSON structure has become indispensable as the use of APIs in regulatory life sciences becomes more critical. This capability facilitates interoperability, data sharing, and conformance among systems. In this context, we illustrate the process of converting SDTM (Study Data Tabulation Model) data to the Dataset-JSON format, which is a critical prerequisite for regulatory submissions to agencies such as the FDA. We emphasize two SAS methods for this conversion: Proc JSON and the CASL2JSON function within Proc CAS. Although Dataset-JSON is employed to illustrate these methods, the underlying processes are inherently data-agnostic, which allows for their application to other regulatory data structures. Seamless adaptation to changing industry standards and submission requirements is guaranteed by this adaptability.
DS-338 : Efficient CDISC Controlled Terminology Mapping: An R-Based Automation Solution
Yunsheng Wang, ClinChoice
Tuesday, 8:30 AM – 8:50 AM, Location: Indigo 202
This paper presents a method for automating the generation of SDTM Controlled Terminology (CT) terms using an R Shiny application. The system connects to the CDISC API or a custom CT library to map EDC raw terms to their corresponding SDTM CT terms. Using similarity-based matching algorithms, the system ensures that all EDC raw terms are auto mapped to corresponding SDTM CT terms. When exact matches are not found, the web-based interface allows users to review and modify the suggested CT terms to align with sponsor requirements. The tool supports the export of terms such as datasets, Excel files, or program code. The paper demonstrates methodology, system architecture, and practical applications, highlighting the tool’s potential to improve clinical data management, data visualization, and data standardization in clinical trials.
DS-346 : Getting under the ‘umbrella’ of Specimen-based Findings Domains!
Soumya Rajesh, CSG Llc. – an IQVIA Business
Kapila Patel, IQVIA
Tuesday, 11:30 AM – 11:50 AM, Location: Indigo 202
The Study Data Tabulation Model Implementation Guide (SDTM IG) 3.4 has brought together domains that collect observations based on tests or examinations performed on collected biological specimens under one umbrella called “Specimen-based Findings Domains”. They include Biospecimen Findings (BS), Cell Phenotype Findings (CP), Genomics Findings (GF), Immunogenicity Specimen Findings (IS), Laboratory Test Results (LB), Microbiolgy and Pharmacokinetic domains. Each domain is defined to group measures of a common topic (e.g., microbiology susceptibility, microscopic findings, pharmacokinetic concentrations). This paper presents a comprehensive guide on how to determine the appropriate domain for various specimen-based findings. By examining the specific variables and criteria outlined in SDTM IG 3.4, we offer practical insights and examples to help programmers accurately categorize and report their findings. It also discusses how to use controlled terminology for these domains to ensure clear and consistent data submission.
Data Visualization and Reporting
DV-018 : Visually Exploring Kaplan-Meier Curves Using SAS GTL and R survminer
Yang Gao, Pfizer Inc.
Tuesday, 1:30 PM – 1:40 PM, Location: Aqua Salon E
Kaplan-Meier curves are commonly used to visually summarize time-to-event data and present primary trial findings in clinical trials. A Kaplan-Meier curve shows the survival probability of an event at a certain time interval. In the SAS system, PROC LIFETEST and SGPLOT procedures are used to generate simple survival plots. SAS programmers can use SAS Graph Template Language (GTL) directly to design and customize their graphs using the TEMPLATE and SGRENDER procedures. In addition to SAS, R, a free programming language, is widely used for data manipulation, statistical modeling, and visualization. R has highly advanced graphical capabilities. This paper illustrates the procedures to generate Kaplan-Meier plots using two approaches of SAS GTL and survminer R package. By providing methods in both languages, readers have options when creating Kaplan-Meier curves for regulatory submission.
DV-029 : Jazz Up Your Profile: Perfect Patient Profiles in SAS® using ODS Statistical Graphics
Josh Horstman, PharmaStat LLC
Richann Watson, DataRich Consulting
Tuesday, 2:30 PM – 3:20 PM, Location: Aqua Salon E
Patient profiles are often used to monitor the conduct of a clinical trial, detect safety signals, identify data entry errors, and catch protocol deviations. Each profile combines key data collected regarding a single subject – everything from dosing to adverse events to lab results. In this presentation, two experienced statistical programmers share how to leverage the SAS Macro Language, Output Delivery System (ODS), the REPORT procedure, and ODS Statistical Graphics to blend both tabular and graphical elements. The result is beautiful, highly-customized, information-rich patient profiles that meet the requirements for managing a modern clinical trial.
DV-034 : A Case Study on Visualization Methods in R
Margaret Huang, Vertex Pharmaceuticals, Inc.
Chunting Zheng, Vertex Pharmaceuticals, Inc.
Lei Zhao, Vertex Pharmaceuticals, Inc
Todd Case, Vertex Pharmaceuticals, Inc
Monday, 8:30 AM – 8:50 AM, Location: Aqua Salon E
This paper presents a case study on using R to efficiently generate high-quality visualizations for clinical trial data, based on a summer intern project at Vertex Pharmaceuticals. This project aimed to replicate complex figures originally created in SAS for the EXAGAMGLOGENE AUTOMCEL (EXA-CEL; CTX001) submission for sickle cell disease. Beyond simply replicating SAS figures, we assessed R’s flexibility in customizing complex visualizations. By showcasing a range of examples, including swimmer plots, bar charts, histograms, box plots, line plots, dual-axis plots, and others, we demonstrated R’s versatility and flexibility in producing informative and customizable figures. Leveraging R’s robust ecosystem of packages, such as ggplot2, gridExtra, and ggpubr, we were able to closely match SAS-generated visuals and explore additional visualization possibilities. These packages offer extensive control over elements like layout, aesthetics, and annotations, making R an ideal tool for creating publication-ready graphics that meet clinical and regulatory standards. This case study emphasizes the practical advantages of R for clinical trial data visualization, offering insights into its use for pharmaceutical applications.
DV-071 : Visualizing oncology data through 3D bar charts in R and Python
Girish Kankipati, Pfizer Inc
Venkatesulu Salla, Seagen
Monday, 9:00 AM – 9:20 AM, Location: Aqua Salon E
3D bar charts are powerful tools for visualizing complex data sets, particularly in clinical trial analyses. This paper explores the use of R and Python for creating 3D bar charts to effectively present oncology data, facilitating better analysis and decision making. The visualization of clinical trial results and patient outcomes data in oncology often requires sophisticated tools to highlight patterns and trends. R and Python, known for their robust data analysis and visualization capabilities which provide versatile libraries such as ggplot2, plotly, and matplotlib, are excellent options to support the development of 3D visualizations. This paper demonstrates practical workflows for implementing 3D bar charts in both programming languages, including data preparation, chart customization, and advanced interactivity. Through comparative analysis, the paper highlights the strengths and limitations of each platform, focusing on their performance, ease of use, and adaptability for different oncology datasets. Examples include visualizing patient response rates across treatment groups or assessing biomarker distributions across demographic categories. By leveraging R and Python, researchers and clinicians can create visually appealing and interactive 3D bar charts to communicate findings effectively. This not only enhances understanding among stakeholders but also aids in identifying critical insights that might be obscured in raw datasets. The study underscores the importance of integrating advanced visualization techniques into oncology workflows, paving the way for more informed decision-making in clinical and research settings.
DV-148 : Risk Mitigation Solution for Kaplan-Meier Plot in Drug Labeling
Bingjun Wang, Merck &Co.
Jane Liao, Merck & Co., Inc.
Suhas Sanjee, Merck
Jeff Xia, Merck
Tuesday, 1:45 PM – 1:55 PM, Location: Aqua Salon E
Abstract: Efficacy results are often included in the drug label, and statistical programmers are required to deliver related figures to support this. The stakeholders in regulatory affairs typically modify the annotations on the figures manually to translate into other languages for worldwide regulatory submissions. This non-programmatic process is challenging and error prone. It is important to allow modifications only for certain annotations and ensure that statistical values and graphic elements are non-editable. In this paper, we propose methods for creating Kaplan-Meier (K-M) plots, which are commonly included in the oncology studies for drug labeling. We describe two methods to fulfill these requirements. The first approach involves utilizing SAS and VBA tools. Second approach is using R Shiny application. Both approaches assure accuracy, consistency, and reproducibility. These approaches also provide traceability and eliminates the potential risk of inadvertent changes to the statistics annotated on the figures.
DV-150 : Amazing Graph Series: Advanced SAS® Visualization to Wake Up Your Data – From Static to Fantastic
Tracy Sherman, Optimal Analysis Inc.
Aakar Shah, Acadia Pharmaceuticals Inc.
Monday, 10:00 AM – 10:20 AM, Location: Aqua Salon E
In an era where data drives decision-making, the ability to interact dynamically with visualizations is crucial for uncovering insights and streamlining workflows. SAS® 9.4 provides advanced functionality within the SGPLOT procedure to create interactive visualizations with features such as hover-over data points, drill-down capabilities, and zoom out options using ODS HTML with imagemap capabilities. These advanced techniques empower users to explore data more effectively, facilitating routine data reviews with increased efficiency and precision. This paper demonstrates the step-by-step creation of these interactive visualizations, emphasizing their practical application and visual aesthetics. By exploring the capabilities of the SGPLOT procedure, this paper aims to equip users with the tools to transform static graphs into engaging, actionable visualizations suitable for routine data reviews.
DV-158 : Efficiently Creating Multiple Graphs on One Page utilizing SAS: Comparing PROC GREPLAY and ODS LAYOUT Approaches
Jenny Zhang, Merck & Co., Inc
Tuesday, 10:30 AM – 10:50 AM, Location: Aqua Salon E
In clinical research programmers often need to produce a variety of graphs as part of data analysis. Generating multiple graphs on one page is a challenging task. This paper will compare two different methods of creating multiple graphs on single page, highlighting their advantages and limitations. The first method examines traditional techniques such as employing PROC GPLOT followed by PROC GREPLAY to arrange the graphs. This old approach often results in a complex workflow that requires users to manage separate graphics and can lead to challenges in displaying the outputs as desired. The second method adopts a modern approach, leveraging the capabilities of ODS LAYOUT in conjunction with PROC SGPLOT or PROC SGPANEL. This contemporary method simplifies the procedure for generating the multiple graphs on one page where users can easily integrate their graphs into various ODS output formats (HTML, PDF, RTF, etc.) without requiring any additional steps. By contrasting these two approaches, we aim to guide SAS users in selecting the best practices for effective data visualization, ultimately fostering better data-driven decision-making in analytical processes.
DV-160 : An Integrated R Shiny Solution for Dynamic Subgroup Adjustments and Customizations
Yi Guo, Pfizer Inc.
Monday, 10:30 AM – 10:50 AM, Location: Aqua Salon E
A major challenge in using SAS macros and traditional R tools such as ggplot2 for figure generation arises when categories within a variable change with dataset updates. Such changes, including the addition of new categories as data accumulate or the exclusion of existing ones due to updated filtering criteria, require code adjustments to subgroup settings like ordering, colors, markers, or other style options. Done manually, this process is inefficient, error-prone, and particularly challenging in clinical trial reporting, where data get frequent updates and accurate and timely visualizations are critical. While effective for static visualizations, these tools lack flexibility for dynamic, no-code customization, limiting their scalability. To address this, we introduce an integrated set of interactive features powered by R Shiny, including automatic subgroup detection, interactive legend reordering, and customizable color pickers. These features enable users to dynamically adjust subgroup settings, reorder legends through the interface, and freely select reordered subgroup colors, offering greater design flexibility. All updates are synchronized to ensure consistency, accuracy, and an intuitive workflow for creating visualizations. This integrated approach bridges the gap between manual adjustments and dynamic customization, improving efficiency and adaptability for dynamic datasets. This paper details the implementation and demonstrates applications through real-world examples.
DV-188 : Beyond Basic SG Procedures: Enhancing Visualizations in SAS Graphic
Vicky Yuan, Incyte Coperation
Fengying Miao, Incyte Coperation
Tuesday, 11:00 AM – 11:20 AM, Location: Aqua Salon E
SAS statistical graphics (SG) procedures, particularly PROC SGPLOT, provide powerful tools for generating high-quality graphs. However, users often encounter limitations when customizing plots beyond the standard options. This paper introduces SG annotation and SG attribute maps, advanced techniques that extend the capabilities of SG procedures by allowing users to precisely control graphical elements, add custom labels, lines, shapes, and other enhancements. Through practical examples, we demonstrate how SG annotation and SG attribute maps can be applied to various plot types, demonstrating techniques to refine presentation graphics and tailor statistical displays to specific analytical needs. by leveraging SG annotation, users can create a graphic with more information, visually compelling and highly customization. The SAS code used in this paper is using SAS@ 9.4 SAS Enterprise environment.
DV-219 : Zero to Hero: The Making of a Comprehensive R Shiny Figures App
Yi Guo, Pfizer Inc.
Matthew Salzano, Pfizer Inc.
Nicholas Sun, Pfizer Inc.
Tuesday, 8:30 AM – 8:50 AM, Location: Aqua Salon E
Creation of an interactive R Shiny application from scratch may seem like assembling a complex puzzle, but with the right approach, it becomes an achievable and rewarding journey. This paper will take you through the development process for an enterprise-level application designed for the generation of high-quality figures for clinical research and development, including publications, presentations and regulatory submissions. Development was done with a unique eye towards integration, as a small step in a larger transformative and integrated strategy of delivering dynamic outputs. The app utilized the R Shiny framework and Golem’s modular architecture to deliver a no-code interface for users to create customizable and dynamic visualizations from clinical data. It is centered around three core types of modules: data upload, figure generation, and download. Users can upload and filter their data, select from 15 commonly used plot types, and download the finalized plot along with the corresponding code for reproducibility and traceability. This paper examines the app development process, including app design and production of individual modules, with a focus on future integration, user experience and statistical analysis. Specifically, we discuss how a detailed understanding of each figure’s purpose guided the design of user-friendly widgets. We provide practical examples of relevant R packages and code used to effectively implement various functionalities. By sharing our experience and practical insights, we aim to help others simplify their development process, overcome common challenges, and fully leverage the capabilities of R Shiny to create effective and user-friendly applications.
DV-222 : Enhance Safety Data Analysis Using SafetyGraphic R Package
Vicky Yuan, Incyte Coperation
Monday, 11:00 AM – 11:20 AM, Location: Aqua Salon E
Safety data analysis presents numerous challenges in both interpretation and evaluation. Traditionally, safety data analysis has relied on tabular summaries and statistical tests to detect potential safety signals. such as generating Adverse Event (AE) tables and listings, summary of laboratory statistics, and monitoring key parameters of vistal signs and ECG. Those make it difficult to detect complex patterns. sometimes large volumes of static data can lead to delayed signal detection. SafetyGraphic package play a crucial role in improving the communication of safety results to regulators, investigators, data monitoring committees, and other stakeholders. Visual analytics can present complex safety data more concisely and effectively than tables and listings, facilitating a clearer understanding of potential safety concerns. This paper will provide examples illustrating the impact of visual tools in ongoing safety assessments and discuss enhancements that can further improve their effectiveness in risk-benefit evaluation.
DV-244 : Shining a Light on Adverse Event Monitoring with R Shiny
Abigail Zysk, PPD, part of Thermo Fisher Scientific
Joe Lorenz, PPD, part of Thermo Fisher Scientific
Monday, 11:30 AM – 11:50 AM, Location: Aqua Salon E
In pharmacovigilance (PVG), timely and precise reporting of serious adverse events (SAEs) is vital for patient safety and regulatory compliance. To enhance the display and usability of SAE data, we developed an R Shiny dashboard for the automated reporting of SAEs. R Shiny offers several advantages over traditional tools like SAS for dynamic and interactive data visualization and reporting. Key features of the R Shiny dashboard include: 1. Real-time Data Integration: Seamless extraction and processing of SAE data from existing SAS programs. 2. Interactive Visualization: Dynamic charts and graphs providing immediate insights into SAE trends. 3. Customizable Alerts: Configurable notification settings to alert users of specific SAE criteria. 4. Automated Reporting: Scheduled and on-demand report generation with automated email delivery. 5. User-friendly Interface: An intuitive interface that accommodates users with varying technical expertise. We discussed the technical architecture, including the integration of R packages for data manipulation, visualization, and email automation. Additionally, we share insights from the deployment process, highlighting challenges and customization solutions to ensure robust performance. The implementation of an R Shiny dashboard represents a significant enhancement in SAE reporting, offering a dynamic and efficient solution for PVG teams. By improving the accessibility and quality of SAE reports, this tool supports better decision-making which ultimately contributes to improved patient safety. This paper aims to provide valuable insights for statistical programmers, data scientists, and PVG professionals interested in leveraging R Shiny for developing automated reporting solutions in the pharmaceutical industry.
DV-251 : Integrated R scripts to Power BI
Jun Yang, Avidity Bioscience
Tuesday, 9:00 AM – 9:20 AM, Location: Aqua Salon E
Visualization plays a critical role in reviewing, estimating data, and identifying outliers in clinical trials. By combining the capabilities of Power BI and R, researchers can create a user-friendly and efficient platform for developing professional visualization packages with data automation and interactive features. Power BI, developed by Microsoft, is a powerful business analytics tool that enables users to transform raw data into actionable insights through interactive dashboards and reports. On the other hand, R is a widely used programming language in clinical trials, known for its open-source nature, robust statistical capabilities, and extensive community support. Together, these tools offer an effective solution for addressing the data visualization and analytical needs of the pharmaceutical and healthcare industries.
DV-268 : A Customizable Framework in R for Presenting BICR Data in a User-Friendly Format
Reneta Hermiz, Pfizer, Inc.
Jing Ji, Pfizer, Inc.
Tuesday, 4:00 PM – 4:20 PM, Location: Aqua Salon E
In oncology trials, Blinded Independent Central Review (BICR) data plays a crucial role in providing unbiased and standardized assessments of imaging data. BICR data ensures the consistency and reliability of imaging assessments compared to investigator assessments. This independent review process contributes to the robustness and quality control of trial outcomes. BICR data, although crucial, can be complex and fragmented across multiple SDTM domains and can pose significant challenges for researchers and clinicians. This paper outlines a process in R programming for generating a single source dataset that contains records from multiple SDTM domains. Then, we utilize “reactable” and “shiny” R packages to produce a web-based application with drillable tables according to three hierarchical levels: lesion level, visit level, and subject level. By linking all tables to a single source dataset, our approach provides a tool to simplify BICR data review. The integration and harmonization of BICR data into a cohesive dataset enhances data transparency and accessibility, promoting more efficient data analysis and decision-making processes. This paper introduces a flexible and reproducible workflow for presenting BICR data.
DV-297 : Swanky Sankey Enhancements: Transforming a Graph with Pretty Curves to a Research Tool uncovering Deeper Scientific Insights
Siqi Wang, Arcsine Analytics
Toshio Kimura, Arcsine Analytics
Tuesday, 2:00 PM – 2:10 PM, Location: Aqua Salon E
Sankey plots show the flow of data from one state to another over time. Existing implementations are available displaying the main bar graph with curvy paths flowing from one category to another. However, reviewers are left wondering how many patients are actually going to and coming from each category. Additionally, the relative spacing on the x-axis representing time is not maintained; therefore, 4-week and 12-week intervals are represented with equal spacing. We propose the following enhancements to improve upon these shortcomings. First, we will introduce sidebars along with the option to display n and percent for the number of patients going to and coming from each category. This will proactively address the most commonly asked question from clinical and medical writing colleagues. Second, we will use relative spacing for the x-axis representing time so that unequal time intervals will be appropriately displayed. For example, a 4-week interval will be one third the space of a 12-week interval. Third, a summary table showing all of the values will be generated. This table can be used to determine the value for any part of the Sankey plot which the medical writers can use within a report or publication. These improvements address the most widely cited shortcomings of the current Sankey implementation and will facilitate a wider adoption of the Sankey plot. These enhancements greatly improve interpretability and will transform the Sankey plot from a visually appealing graph with pretty curvy lines to an informative figure that provides researchers with deeper scientific insight.
DV-335 : Unleashing Oncology Revenue Insights: Advanced Forecasting Frameworks in Action
Naquan Ishman, SAS Institute
Dave Kestner, SAS Institute
Tuesday, 10:00 AM – 10:20 AM, Location: Aqua Salon E
In the evolving landscape of oncology therapeutics, accurate revenue forecasting is critical yet challenging. This presentation introduces an innovative hybrid framework combining SAS and open-source tools to tackle complex forecasting needs, including dynamic competitive landscapes, patient flow modeling, and regulatory compliance. Through a live demo and case study, we showcase how this approach improved forecast accuracy for a recent oncology launch, addressing data privacy, version control, and algorithm validation challenges. Attendees will gain actionable insights to enhance forecasting processes, supporting informed decision-making for oncology product launches and strategic planning. Oncology revenue forecasting is challenging due to dynamic competitive landscapes, complex patient flows, and strict regulatory requirements. Existing approaches often fail to integrate diverse datasets, adapt to market shifts, and ensure accuracy and transparency. These limitations hinder informed decision-making, strategic planning, and successful product launches. Our solution addresses these challenges by combining SAS and open-source tools for accurate, adaptable, and compliant forecasting. We developed a hybrid forecasting framework combining SAS ‘advanced analytics with the flexibility of Python and R to address oncology forecasting challenges. SAS ensured compliance, data security, and auditability, while open-source tools enabled custom algorithm development. Using SAS Viya, we integrated diverse datasets like patient demographics, treatment pathways, and competitive data. Predictive models in SAS Visual Forecasting analyzed patient flows and revenue projections, while SAS Visual Analytics enabled real-time scenario adjustments. This approach improved accuracy, adaptability, and transparency.
DV-337 : Combined Waterfall and Swimmer Plot using R for Visualization of Tumor Response Data
Akshata Salian, Ephicacy LifeScience Analytics Pvt Ltd
Tuesday, 11:30 AM – 11:50 AM, Location: Aqua Salon E
Visualizing tumor response data is crucial in oncology studies to assess the effectiveness of treatments and monitor patient outcomes. Waterfall plot typically displays the percentage change in tumor size from baseline for each patient and Swimmer plot provides a detailed, patient-level representation of treatment timelines showing duration of treatment and clinical events over time. The combined visualization integrates key elements of both plots and offers a more comprehensive understanding of treatment outcomes. Tumor progression can be directly linked to the length of therapy, offering insights into how response evolves over time. It also facilitates comparisons between subgroups (e.g., treatment arms) by incorporating multiple data dimensions into the plot. It also reduces the need for multiple plots, making it easier to present and interpret data at first sight. In SAS we have the option to combine them together for an effective interpretation. In this paper, we will see a method for creating a combined Waterfall and Swimmer plot using open-source platform, enabling a unified visualization of tumor shrinkage and treatment duration. Using publicly available tumor response datasets, we demonstrate the step-by-step implementation of this combined plot in R, including data preprocessing, customization, and annotation. The combined plot incorporates tumor response as bars and overlays patient timelines, treatment events, and response status as lines and markers. We demonstrate the implementation using publicly available R packages, such as ggplot2, dplyr and ggpubr, and provide a reproducible workflow for statistical programmers.
DV-357 : Breaking Down Silos: Empowering Pharma Commercial Teams Through Integrated Data Insights
Hector Campos, DataPharma, LLC
Naquan Ishman, SAS Institute
Mike Turner, SAS
Tuesday, 4:30 PM – 4:50 PM, Location: Aqua Salon E
Effective data visualization is crucial for navigating the complexities of pharmaceutical and biotech operations. AI-driven analytics and integrated visualization transform large datasets into actionable insights, enhancing decision-making across commercial, customer engagement, and market access strategies. The Problem Isn’t Complexity- “It’s Integration In pharmaceutical commercial operations, data fragmentation across prescriptions, market data, and CRM systems limit visibility, making it difficult to execute cohesive, data-driven engagement strategies. The challenge isn’t just data volume- “it’s fragmentation. Without a unified, real-time data strategy, commercial teams lack the visibility needed to execute effective engagement strategies. The Cost of Disconnected Insights Without integration, executive teams operate in an information void, where physician engagement is driven more by intuition than by insights. The lack of timely, relevant, and actionable data results in missed opportunities for precision targeting, ineffective omnichannel coordination, and reduced market access efficiency. The Power of Integration By addressing data quality and data augmentation, organizations can unlock a fully connected ecosystem that drives strategic decision-making. This approach: – Enables precision-targeted HCP engagement with AI-powered segmentation and next-best-action insights. – Enhances data accuracy, integration, and consistency across commercial operations. – Maximizes prescription lift while optimizing costs. – Seamlessly links rep interactions, digital marketing, and patient journeys. Closing the Gap This paper explores integrated data ecosystems along with algorithmic bias identification and management can transform siloed information into real-time, actionable insights. Attendees will gain practical strategies for integrating commercial data ecosystems to enhance customer engagement, improve market access, and drive better patient outcomes.
DV-381 : Breaking Barriers in Clinical Trials: Insights into Platform Designs
Chetankumar Patel, GSK
Tuesday, 2:15 PM – 2:25 PM, Location: Aqua Salon E
Platform trials are changing how clinical research is done by allowing multiple treatments to be tested at the same time within one trial, specially in Oncology. These trials use a shared structure called a master protocol, which makes them flexible and efficient. A single control group can be used for multiple treatments, saving time and resources. One of the defining features of platform trials is their adaptive design. New treatments can be added as the trial progresses, and ineffective treatments can be discontinued based on interim data, allowing researchers to rapidly focus on promising therapies. As medical research moves forward, platform trials are becoming a key tool for improving the way new treatments are developed and tested. I will share the challenges we encountered while working on Platform Trial, the strategies and steps we took to address and resolve them, and the success we achieved through these efforts and experiences.
DV-394 : Clinical Data Explorer: Transforming Clinical Data into Actionable Insights
Karma Tarap, BMS
Oriana Esposito, BMS
Maheshkumar Umbarkar, Bristol Myers Squibb Hyderabad
Ramnath Dhadage, Ephicacy Life Science and Analytics
Tamara Martin, Bristol Myers Squibb
Tuesday, 5:00 PM – 5:20 PM, Location: Aqua Salon E
Early Development operates in a dynamic, high-stakes space where rapid, data-informed decisions are essential. With a “go fast, kill fast” approach, innovative study designs, continuous data monitoring, and dose optimization strategies drive efficient decision-making. Clinical Data Explorer (CDEx) is a near real-time (NRT) platform providing aggregated clinical and biomarker data through a self-guided, customizable interface. Designed for OneClinical teams and other users, CDEx enables rapid visualization and analysis of safety and efficacy trends, supporting faster dose escalations and expansions. The foundation of CDEx was built utilizing an R-shiny opensource package, Teal, created by Roche. CDEx integrates SDTM and SDTM+ data, ensuring consistent updates with automated refreshed of SDTM. Key benefits of CDEx include: – Rapid access to clinical data for faster, informed decision-making. – Real-time identification of safety and efficacy trends, optimizing dose selection. – Cross-functional collaboration, driving continuous improvements to meet evolving clinical needs. This session will showcase how CDEx enhances data exploration and accelerates critical decisions, ultimately improving patient outcomes.
Hands-On Training
HT-012 : SAS® Macro Programming Tips and Techniques
Kirk Lafler, sasNerd
Tuesday, 10:00 AM – 11:30 AM, Location: Aqua 310
The SAS® Macro Language is a powerful tool for extending the capabilities of the SAS System. This presentation teaches essential macro coding concepts, techniques, tips and tricks to help beginning users learn the basics of how the Macro language works. Using a collection of proven Macro Language coding techniques, attendees learn how to write and process macro statements and parameters; replace text strings with macro (symbolic) variables; generate SAS code using macro techniques; manipulate macro variable values with macro functions; create and use global and local macro variables; construct logical expressions; interface the macro language with the SQL procedure; store and reuse macros; troubleshoot and debug macros; and develop efficient and portable macro language code.
HT-115 : Route Sixty-Six to SAS Programming, or 63 (+3) Syntax Snippets for a Table Look-up Task, or How to Learn SAS by Solving Only One Exercise!
Bart Jablonski, yabwon
Quentin McMullen, Siemens Healthineers
Monday, 8:00 AM – 9:30 AM, Location: Aqua 310
It is said that a good programmer should be lazy. And what about a good programming teacher? We dare to say the same is true. This article will show that you can be lazy and also be able to teach an entire SAS course at the same time! The aim of the article is to present a variety of examples of how to do one of the most common data processing programming tasks, table look-up, in SAS (all flavours included, i.e. both SAS9 and SAS Viya). We don’t assess these methods from a benchmarking or performance perspective but rather present them as an intellectual puzzle. Our goal is to explore how much SAS syntax (statements, PROCs, functions, etc.) could be taught using only one exercise. If you are a fan of unorthodox SAS programming, curious to learn about the variety and flexibility of the SAS language, or an innovative SAS teacher – this presentation is for you!
HT-187 : The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets
Louise Hadden, Cormac Corporation
Monday, 2:30 PM – 3:30 PM, Location: Aqua 310
SAS® procedures can convey an enormous amount of information – sometimes more information than is needed. Most SAS procedures generate ODS objects behind the scenes. SAS uses these objects with style templates that have custom buckets for certain types of output to produce the output that we see in all destinations (including the SAS listing). By tracing output objects and ODS templates using ODS TRACE (DOM) and by manipulating procedural output and ODS OUTPUT objects, we can pick and choose just the information that we want to see. We can then harness the power of SAS data management and reporting procedures to coalesce the information collected and present the information accurately and attractively.
HT-190 : AI Performing Statistical Analysis: A Major Breakthrough in Clinical Trial Data Analysis
Toshio Kimura, Arcsine Analytics
Siqi Wang, Arcsine Analytics
Weiming Du, Alnylam Pharmaceuticals
Songgu Xie, Regeneron Pharmaceuticals
Monday, 1:30 PM – 2:30 PM, Location: Aqua 310
Statisticians and statistical programmers in the pharmaceutical industry are dreaming about AI performing analysis, but this has not yet been demonstrated. Companies are only experimenting with AI generated code. AI is not integrated into system workflows or connected to analysis infrastructure – until now. This paper will demonstrate a proof of concept (PoC) showing AI performing statistical analysis. The chat interface will be used to initiate, parameterize, and execute statistical analysis. To achieve this, we mapped out a detailed workflow of the current process including the dialog between the statistician and statistical programmer. After which, we then designed and developed an AI solution to operationalize this workflow. The solution uses Python to seamlessly integrate ChatGPT through APIs and runs SAS as the analysis engine. AI generated responses are used in downstream processing, and with the AI system connected to SAS, it will directly execute the analysis and deliver results back to the user. This accomplishment represents a major breakthrough and a significant milestone in the use of Generative AI for clinical trial data analysis in the pharmaceutical industry.
HT-353 : What’s black and white and sheds all over? The Python Pandas DataFrame, the Open-Source Data Structure Supplanting the SAS® Data Set
Troy Hughes, Data Llama Analytics
Monday, 10:00 AM – 11:30 AM, Location: Aqua 310
Python is a general-purpose, object-oriented programming (OOP) language, consistently rated among the most popular and widely utilized languages, owing to powerful processing, and an abundant open-source community of developers. The Pandas library has become Python’s predominant analytic toolkit, in large part due to the flexibility and success of its principal built-in data structure, the Pandas DataFrame. Akin to the SAS® data set, the DataFrame stores tabular data in columns and rows (i.e., variables and observations, in SAS parlance). And just as built-in SAS procedures, functions, subroutines, and statements manipulate and transform SAS data sets to deliver analytic insight and business value, so, too, do built-in Pandas methods, functions, and statements deliver similar functionality. However, where the similarities end, the DataFrame inarguably outshines and outpaces the SAS data set. A data set supports only character data, numeric data, and the hash object, whereas a DataFrame additionally can contain built-in lists, sets, tuples, and dictionaries- “complex data structures unsupported by SAS. Moreover, whereas the sorting, transformation, and analysis of a SAS data set might require a PROC SORT, then a DATA step, and then another procedure, Python can deliver this functionality in a single line of readable code through method chaining- “basic OOP syntactical design.This hands-on workshop is intended for SAS practitioners interested in exploring Python data transformation and analysis, and leverages US Census data and the Centers for Disease Control and Prevention (CDC) United States Diabetes Surveillance System (USDSS) dashboard to ingest, clean, transform, investigate, and analyze diabetes and obesity data.
HT-377 : Scoring Real-World Data Reliability for Clinical Investigations
James Joseph, EDA CLINICAL
Wednesday, 9:45 AM – 10:45 AM, Location: Aqua Salon AB
Real-world data (RWD) is often unstructured and inconsistent. Assessing its reliability is necessary before use in clinical investigations. Several methods exist to measure RWD reliability: Conformance checks can determine whether variables match expected formats, lengths, and values; Completeness assessments will quantify missing data and track accrual patterns over time; Data linkages are evaluated to detect orphan records and inconsistencies in relational data. Scores are assigned to each RWD sources based on these measures of reliability. This paper presents an approach to apply these assessments to RWD datasets and produce structured outputs for decision-making. It will show that SAS procedures can be used to produce audit trails, automate assessments and generate reports that can be used to compare RWD sources and determine their suitability for regulatory submissions. This work is based on the Structured Process to Identify Fit-for-Purpose Study Design and Data (SPIFD2) framework and insights from FDA and Duke-Margolis workshops on real-world evidence.
HT-397 : Trying Out Positron: New IDE for Statistical Programming
Phil Bowsher, RStudio Inc.
Monday, 4:00 PM – 5:30 PM, Location: Aqua 310
Positron is a next generation data science IDE. As an extensible tool, it is built to facilitate exploratory data analysis, reproducible authoring, and statistical programming. Positron currently supports workflows in either or both Python and/or R, and is designed with a forward-looking architecture that can support other data science languages in the future. Posit/RStudio will be proving a hands-on session to try out Positron. This session will provide an opportunity to use the new IDE and how it can empower stakeholders and statistical programmers. This talk will explore and discuss areas such as TLGs creation with Python and provide examples for getting started.
HT-398 : Hands-on Training: eTFL Portal & TFL Designer Community
Bhavin Busa, Clymb Clinical
Wednesday, 8:00 AM – 9:30 AM, Location: Aqua Salon AB
With the introduction of CDISC Analysis Results Standards (ARS) and Analysis Results Data (ARD), the industry is moving toward greater standardization and automation in Tables, Figures, and Listings (TFLs). This hands-on training session will introduce participants to the eTFL Portal, a CDISC-hosted repository launched in October 2024 that provides examples and guidance for implementing ARS, and the Community version of TFL Designer, which was used to generate machine-readable mock-up shells and ARS metadata for eTFL artifacts. The eTFL Portal serves as a reference hub, offering standardized TFL templates, ARS metadata artifacts, and implementation examples to help organizations operationalize CDISC standards. Attendees will explore how to leverage these resources to align their workflows with ARS and ARD. The TFL Designer Community portion of the training will focus on using templates from the eTFL Portal to generate study-specific artifacts, including machine-readable TFL mock-up shells and ARS metadata. Participants will gain hands-on experience applying these artifacts to automate TFL development. By the end of this session, attendees will have practical insights into using the eTFL Portal as a reference for ARS and ARD adoption and the Community version of TFL Designer as a tool for generating study artifacts, streamlining TFL standardization and automation.
HT-399 : Introduction to SAS Viya
Jim Box, SAS Institute
Tuesday, 8:00 AM – 9:30 AM, Location: Aqua 310
Heard about Viya and not sure what that means? Want to see how SAS has evolved and what new capabilities exist that you might leverage? This Hands-on Workshop will be an overview of SAS Viya. We’ll explore the programming environment and see what’s new in coding assistance. We’ll also look at the Visualization capabilities and the automated model building. There will be a hands-on example for you to explore the data and determine the root cause of a medical procedure failure.
Leadership and Professional Development
LS-010 : A Quick Start Guide to Writing a Technical or Research Paper
Kirk Lafler, sasNerd
Monday, 1:30 PM – 1:50 PM, Location: Aqua Salon E
Writing a technical or research paper is an essential skill needed by academics, researchers, scientists, and many professionals as it permits the effective communication of literature research, methodologies, and findings to others in your profession and to the world. So, how do you begin writing a technical or research paper? This quick start guide will provide step-by-step instructions, along with a few valuable tips and examples, to help you effectively navigate the writing of a technical or research paper. Regardless of whether you’re already a skilled writer or just starting out, you will learn the essential elements and structure that is essential to effectively communicate your research and findings to others in a technical or research paper.
LS-043 : From Null to Notable: Creating a Brand on Social Media
Inka Leprince, PharmaStat, LLC
Tuesday, 8:30 AM – 8:50 AM, Location: Aqua Salon F
Navigating social media as a programmer requires a blend of technical knowledge, personal branding, and community engagement. The foci of this presentation are the best social media practices and strategies for enhancing visibility through tagging individuals/companies, establishing regular posting schedules, and leveraging fun and interactive content. Resources for free design software will be provided as well as practical tips for integrating branding elements in the creation of visually striking graphics. By following these guidelines, you can effectively promote yourself and engage your followers.
LS-082 : Building Resilient Teams
Patrick Grimes, Parexel
Sajin Johnny, Parexel
Monday, 4:00 PM – 4:20 PM, Location: Aqua Salon E
In the ever-evolving pharmaceutical landscape, characterized by Volatility, Uncertainty, Complexity, and Ambiguity (VUCA), the ability to build and maintain resilient teams is crucial for success in clinical research. This paper explores strategies for leaders to cultivate team resilience, enabling organizations to navigate challenges and seize opportunities in an unpredictable environment. We begin by examining the unique VUCA factors affecting the pharmaceutical industry, from regulatory shifts to global health crises. The discussion then investigates the key characteristics of resilient teams, emphasizing adaptability, diverse skill sets, and strong communication. Central to our exploration are practical leadership strategies for fostering resilience. These include creating a culture of continuous learning and implementing agile methodologies. The paper also addresses the critical balance between building resilience and maintaining team well-being, providing insights on preventing burnout in high-pressure scenarios. Additionally, we explore tools for measuring and improving team resilience, enabling leaders to track progress and implement targeted improvements. Finally, we look ahead, discussing how to future-proof teams by anticipating industry challenges and developing skills for emerging technologies. This forward-looking approach ensures that teams are not just resilient in the face of current challenges but are prepared for the evolving landscape of pharmaceutical research and development.
LS-093 : Bridging the Gap: Leadership of Statistical Programmers in Clinical Trials
Allison Covucci, Bristol Myers Squibb
Xiaohan Zou, BMS
Tuesday, 11:30 AM – 11:50 AM, Location: Aqua Salon F
In the complex environment of clinical trials, various functions collaborate to deliver more medications to more patients faster, effectively, and efficiently- “from protocol development, clinical trial startup, data collection, and database lock, to submission to Health Authorities (HA), disseminating scientific information, and ensuring the accessibility of new therapies to the public. Each function often operates from its own perspective, without a comprehensive understanding of how and where the data they handle will be utilized to support various objectives through statistical analyses. This is where the role of statistical programmers becomes pivotal. As the final users of the data, they possess a unique vantage point that allows them to see the entire process from data collection to statistical analyses. This presentation highlights the critical leadership qualities that statistical programmers possess to effectively act as a bridge between various functions. Statistical programmers work closely with statisticians to ensure that the data collected is effectively transformed into meaningful statistical results. They engage proactively during the development of the protocol, eCRFs, and CRF completion guidance to ensure the collection of relevant and necessary data for effective clinical review and statistical analysis. They collaborate with data management to address data issues, with clinical teams to combine data analytics with the clinicians’ medical insights, and with regulatory teams to ensure regulatory compliance. Throughout the entire process, statistical programmers streamline processes, develop standards, introduce automation, push boundaries, and drive successful clinical trials.
LS-100 : Strategies for Thriving During Change and Uncertainty
Maria Dalton, Takeda Pharmaceuticals
Tuesday, 3:00 PM – 3:20 PM, Location: Aqua Salon F
Frequent change and uncertainty have become a fact of life for life science and health research professionals. Examples of changes are organizational reorgs and upheavals, new programming languages, emerging technologies such as AI/ML, new resourcing models and global disruptions such as the COVID-19 pandemic. This presentation will first describe general techniques that everyone can practice to build resilience, optimism and proactivity during periods of change and uncertainty. The presentation will then give advice to leaders on best practices for supporting and guiding their teams during change. The presentation will focus on practical advice that has worked for the author and will include references to helpful resources.
LS-118 : Unveiling Paradoxical Pathways: A Counterintuitive Compass for Strategic Decision-Making
Anbu Damodaran, Alexion Pharmaceuticals
Monday, 2:00 PM – 2:20 PM, Location: Aqua Salon E
In the dynamic and unpredictable business landscape, strategic decision-making often hinges on our ability to anticipate and navigate counterintuitive outcomes. This paper explores the fascinating realm of statistical paradoxes – situations where seemingly sound logic leads to unexpected results. We examine a diverse range of paradoxes, including Simpson’s Paradox, Berkson’s Paradox, the Monty Hall Problem, and others, demonstrating their profound implications for effective management across various domains. From market analysis and resource allocation to risk management and competitive strategy, these paradoxes expose hidden biases and potential pitfalls in conventional decision-making processes. By understanding how seemingly positive trends can reverse at the subgroup level (Simpson’s Paradox), how selection bias can distort our perception of reality (Berkson’s Paradox), and how seemingly simple choices can have surprising consequences (Monty Hall Problem), managers can gain a critical edge. This paper provides concrete examples of how these paradoxes manifest in real-world business scenarios and offers actionable strategies for mitigating their associated risks. By embracing a paradoxical mindset, business leaders can unlock new insights, avoid costly errors, and drive innovation in an increasingly complex and competitive environment. This paper offers a unique perspective, providing a practical guide to navigating complexity and making more informed, data-driven decisions in the broader management context.
LS-134 : Empowering the Next Generation of Professionals: Merck’s Approach to Rising Talent Engagement and Leadership Development in Statistical Programming
Aston Smith, Merck & Co., Inc., Rahway, NJ, USA
Sarah Alavi, Merck
Jeff Xia, Merck
Tarak Patel, Merck & Co.
Tuesday, 2:30 PM – 2:50 PM, Location: Aqua Salon F
According to research from 20231, 40% of business leaders think recent college graduates are not prepared to enter the workforce. To create a workplace environment supportive of the growth of rising professionals, the pharmaceutical industry must proactively implement targeted talent development initiatives that equip these emerging professionals with the necessary skills, knowledge, and leadership capabilities to drive innovation and success in statistical programming. This paper aims to examine Merck’s Rising Professionals Club, an innovative talent development program designed to engage and nurture rising professionals within our statistical programming organization. In addressing these challenges, the Rising Professionals Club offers a multi-faceted approach, equipping our rising talent with the skills and knowledge essential for success in both statistical programming and future leadership roles. Participants expressed appreciation for the program’s comprehensive approach and highlighted the value of the skill development and knowledge sharing components crucial to developing their statistical programming skills and leadership potential. The Rising Professionals Club at Merck represents a proactive approach to addressing the challenges of developing rising professionals in statistical programming and leadership roles within the pharmaceutical industry.
LS-139 : Active Social Engagement in Remote Working Environments
Christine Reiff, Ephicacy Consulting Group, Inc.
Tuesday, 2:00 PM – 2:20 PM, Location: Aqua Salon F
Many companies have moved to a Work From Home model, where employees spend little or no time in an office with their peers. This model works well for hiring and retaining talented staff, especially as fewer people are willing to relocate, but can lead to feelings of disconnection and isolation. Gone are the days when we would meet in the break room to catch up with our coworkers over a cup of coffee or enjoy the donuts someone brought to celebrate Friday. These interactions allowed us to see our coworkers as more than just resources, but also as human beings with joys and struggles of their own. Today we use tools like email, Teams, or Slack to communicate, but conversations tend to focus on work, and our coworkers turn into just a pair of hands behind a keyboard. Engaging employees on a social level builds team bonds and personal connections. It helps us to understand each other, and from that understanding comes increased compassion, kindness and collaboration. However, it can be difficult to get people to interact on a more personal level, especially for global teams. Presented here are some benefits and challenges to social engagement for remote workers, as well as methodologies to increase engagement.
LS-149 : Mastering Modern Leadership Through Authenticity, Empathy, Purpose, and Influence
Jyoti (Jo) Agarwal, Gilead Sciences
Monday, 4:30 PM – 4:50 PM, Location: Aqua Salon E
This paper explores the essential traits, styles, behaviors, responsibilities, and priorities that define effective leadership in today’s complex and rapidly evolving world. By focusing on core elements such as authenticity, empathy, and purpose, it examines how these qualities shape a leader’s ability to inspire, motivate, and connect with others. Drawing on established leadership models, such as those by Bill George and Simon Sinek, the paper delves into the importance of leading with a clear sense of purpose and cultivating genuine relationships with followers. It highlights how leaders who embrace authenticity and empathy foster a culture of trust, loyalty, and collaboration, thereby enhancing organizational performance and satisfaction. Additionally, the paper explores how these leadership qualities contrast with traditional approaches and emphasizes the value of vulnerability and compassion in overcoming challenges. Through practical insights, it also offers actionable steps that leaders can take to develop and refine these traits. Ultimately, the paper aims to show how modern leadership goes beyond position and power, focusing instead on the profound influence that stems from leading with heart and clarity of purpose. Introduction Overview of leadership in the modern context -Importance of authenticity, empathy, and purpose in leadership Leadership Traitsauthenticity, empathy, purpose -How these traits contribute to effective leadership Leadership Styles -Authentic leadership model (BillGeorge’s five characteristics) -Comparison with traditional leadership models Leadership Behaviors -Practical behaviors: leading with heart, cultivating relationships, and consistency -The role of self-discipline and emotional intelligence Leadership Responsibilities and Priorities -Priorities in building trust, fostering collaboration, and leading with integrity Conclusion
LS-156 : Essential Elements for an Effective New-Hire Training Handbook
Ingrid Shu, Merck
Xinhui Zhang, Merck
Monday, 5:00 PM – 5:10 PM, Location: Aqua Salon E
New hires entering the pharmaceutical industry possess the technical skills required to excel, but if industry-specific concepts and workflows are not adequately introduced during onboarding, the learning curve may remain steep. This results in new hires struggling to unlock their full potential and experiencing hardship in deriving meaning from their work. To reduce growing pains for both mentors and new hires, teams can adopt an informal training handbook that pays special attention to their team’s unique processes. This handbook does not replace official resources; instead, it complements them, using simplified language and presentation specifically tailored for programmers transitioning into their role for the first time. To facilitate effective onboarding, the handbook should be aligned with 3 key principles: centralization, relevance, and foundational knowledge. This paper delves into the importance of these best practices and showcases examples from a SharePoint site designed for new hires in the Early-Stage Oncology Analysis & Reporting statistical programming team we belong to. Developing accessible new-hire training materials is a rewarding endeavor for the entire team. When hiring junior employees, managers can be confident that this tool will effectively prepare them for their roles. For junior staff, this resource reduces confusion and provides clear direction, fostering empowerment and lowering turnover rates. Ultimately, both senior and junior programmers experience improved performance and productivity, as less time will be spent on training. Not only does it smoothly orient new hires, but it also serves as an ongoing valuable reference even as they grow more accustomed to their role.
LS-170 : Optimized Resourcing Strategies in Statistical Programming within CROs and Considerations for AI Integration in Resource Management
Vihar Patel, PPD, part of Thermo Fisher Scientific
Tuesday, 5:00 PM – 5:20 PM, Location: Aqua Salon F
In the dynamic environment of Contract Research Organizations (CROs), an optimized resourcing strategy is crucial for efficient and successful statistical analysis in clinical trials. The pharmaceutical industry is rapidly evolving, with accelerated drug development processes and overlapping submissions necessitating innovative approaches to manage timelines, quality, resources, and risks. Resourcing managers (RMs) play a pivotal role in this process, ensuring that the right people are allocated to the right projects at the right time and successful execution of clinical trials within CROs. Their strategic oversight and management of resources ensure that projects are delivered on time, within budget, and to the highest quality standards. This paper explores the various roles and responsibilities of RMs and their impact on the success of statistical programming teams within CROs. It covers strategies for balancing personal growth and business demands, determining priorities, skillset-based resourcing, mentoring, cross-regional collaboration, and reducing staff turnover. Additionally, it discusses the tools and potential AI integration that can optimize resourcing strategies and tasks for efficient resource management.
LS-232 : AI is coming for you: New Biometric Leadership in the era of Gen AI
Kevin Lee, Clinvia
Monday, 2:30 PM – 2:50 PM, Location: Aqua Salon E
In the contemporary landscape of technological advancements, the integration of Gen AI like ChatGPT has ushered in a new era of innovation and development. This paper will discuss how these cutting-edge technologies influence biometrics department, then how biometric leadership will lead this innovation. The paper will start with the introduction with Gen AI, then the use case of ChatGPT in Biometric tasks such as coding, data analysis and exploration, contents development, and more. It will also provide how ChatGPT has contributed to a remarkable 40% increase in productivity among skilled workers, indicating the transformative power of ChatGPT. To harness this transformative potential, biometric leadership must evolve beyond traditional management paradigms. The paper will explore how effective leaders are reimagining their roles as AI integration architects, focusing on three critical dimensions: system, process and people. First, Leaders must establish a robust system that seamlessly integrates Gen AI tools with existing statistical computing and regulatory compliance systems. Secondly, leaders need to redesign workflows to optimize the synergy between AI and traditional biometric methodologies, ensuring human oversight and regulatory compliance. Finally, leadership must prioritize change management, fostering a culture of innovation while addressing concerns about AI integration and upskills rather than replaces, human expertise in biometrics. Finally, the paper outlines how biometric leadership can optimize the impact of Gen AI integration, ensuring sustainable transformation while maintaining the current biometrics tasks and offering valuable insights into the transformative possibilities and future directions of the symbiotic relationship between Gen AI and biometrics.
LS-248 : Embracing Continuous Learning in the Life Sciences Ecosystem
Iuliana Constantin, CoE Pharma
Tuesday, 4:00 PM – 4:20 PM, Location: Aqua Salon F
In our high-tech era, continuous training and upskilling have never been more crucial. We are fortunate to have a wide array of options available, whether you’re a trainee or an employer. These include online or in-person certifications, workshops, and webinars, some of which are free, while others come at a cost. Numerous professional non-profit and for-profit organizations offer these training opportunities. However, with the abundance of options, selecting the right training becomes a challenge. It’s essential to consider which training aligns most closely with the business needs of the company or the professional growth of an individual. Factors such as cost, training duration, quality, and expected outcomes play a significant role in this decision-making process. In my research on training, I have identified several options that could be beneficial to the life sciences ecosystem, including apprenticeships, co-ops, upskilling, and certifications. I will highlight the pros and cons of each method to aid in the selection process.
LS-249 : Leading Through Change: Motivating Programming Teams During Mergers and Integrations
Shefalica Chand, Pfizer, Inc.
Tuesday, 11:00 AM – 11:20 AM, Location: Aqua Salon F
When a larger pharmaceutical company acquires a smaller one, the integration process often brings challenges for the acquired company’s programming leadership. During this transition, it is crucial that the smaller company’s leaders maintain motivation, morale, and productivity among their teams. This paper discusses strategies to keep teams engaged, focused on critical projects, and aligned with the new organizational goals. A key strategy is transparent and empathetic communication. Leaders should address concerns, provide clarity on future direction, and help employees understand their role in the broader context of the merger. Building trust is paramount; leaders should consistently acknowledge the contributions of team members and reassure them about the value they bring to the new organization. Encouraging a sense of continuity is also essential. By emphasizing shared goals and outcomes, leaders can help employees see how their efforts directly contribute to the larger mission. Identifying “quick wins” early in the integration phase can build confidence and demonstrate the benefits of the merger, while maintaining focus on critical ongoing projects. Moreover, leaders should advocate for their teams, ensuring they have the resources and support needed to navigate the changes. Providing professional development opportunities and involving employees in decision-making processes can also foster a sense of ownership and alignment with the new corporate culture. This paper outlines practical leadership strategies to maintain motivation and engagement during mergers for a smooth integration process. By creating a positive, supportive environment, leaders can help their teams remain focused and productive, driving success through the transition and beyond.
LS-266 : Rising Above Artificial Intelligence: Feeling Confident With Your Human Intelligence
Priscilla Gathoni, Wakanyi Enterprises Inc.
Monday, 3:00 PM – 3:20 PM, Location: Aqua Salon E
In a rapidly advancing world dominated by technological innovation, AI is often seen as a threat to human intelligence and job security. However, I argue that rather than replacing human capacity, AI can complement and elevate human skills. By embracing AI, humans can focus on what truly sets them apart- “creativity, emotional intelligence, and the ability to think critically. Despite AI’s impressive capabilities, the uniqueness of human thought cannot be replicated. I propose the value of investing in our innate creative energy and the concept of “cosmic data”- “the wealth of unique human experience and intuition AI cannot replicate. Through reskilling, rebranding, and overcoming the fear of AI, I offer strategies for individuals to adapt and thrive alongside technological advancements. By reimagining the role of humans in the workforce and embracing AI, we can ensure a harmonious and sustainable future where both humans and machines coexist, complementing each other’s strengths. You are not going to lose your job to AI but to someone who embraces AI. As humans, each day is a new life, a new beginning with new thoughts, words, and deeds. Human Intelligence goes above all things, and our ability to think separates us from the animal kingdom and any form of AI. Being superior beings, we have the ability to discern and accept the creations of our minds instead of being fearful. Reskilling and rebranding will be the way to face AI with fervency and zeal.
LS-285 : Programming Challenges in Master Protocols
Vijaya Jonnalagadda, Revolution Medicines
Tuesday, 4:30 PM – 4:50 PM, Location: Aqua Salon F
Master protocols are becoming increasingly vital in clinical trials, as they facilitate the simultaneous evaluation of multiple treatments or hypotheses within a single, unified trial framework. However, the adoption of master protocols presents unique programming challenges that must be carefully considered by management teams. In clinical trials, master protocols allow for the concurrent testing of several treatments or strategies, optimizing resource use and enabling real-time adjustments based on interim analyses. This paper explores the statistical programming challenges that arise during the design, execution, and analysis phases of such trials, including the development of dynamic programming and the integration of adaptive designs. The paper also highlights the role of statistical programmers in addressing issues such as data consistency, the complexity of managing data from single or separate databases, flexibility in programming, and compliance with regulatory guidelines- “issues that are magnified in master protocol designs. The paper concludes by offering recommendations for statistical programming best practices, emphasizing the importance of automation, robust validation, thorough documentation, and discusses how current and future leaders can effectively navigate these complexities for mitigating risks and improving efficiency to ensure the successful implementation and scalability of master protocol trials. Furthermore, the paper highlights how adopting these best practices can significantly improve professional development by providing current/future programming leaders with the knowledge to enhance their expertise and stay abreast of industry advancements.
LS-291 : Middle Manager’s Playbook: How to Build and Lead a Strong Remote Team
Yuka Tanaka-Chambers, Phastar
Tuesday, 10:00 AM – 10:20 AM, Location: Aqua Salon F
In a global company, building a strong team in a remote environment presents both unique opportunities and challenges. Remote work allows access to a wider pool of global talent, and employees can benefit from working in the safe and comfortable environment of their homes. However, motivating employees, identifying disengagement, fostering genuine connections, and cultivating a sense of belonging are among the key challenges leaders face in remote settings. Without the traditional face-to-face interactions of a shared workspace, these efforts require intentional strategies, strong communication, and a deep understanding of team dynamics. In this presentation, I will share practical solutions, including the effective use of tools like Microsoft Teams, hosting engaging virtual events, encouraging meaningful small talks, and adopting key leadership practices. This session aims to equip leaders and managers with actionable insights to build resilient, motivated, and connected teams in a virtual landscape.
LS-315 : Follow the Yellow Brick Road: Finding the Critical Path for a Study’s Lifespan
Jake Gallagher, Catalyst Clinical Research, LLC
Tuesday, 1:30 PM – 1:50 PM, Location: Aqua Salon F
Imagine how confusing the Wizard of Oz would have been if Dorothy didn’t have the yellow brick road to follow. What if she found the Wiz before her other friends? She might not have gotten back home to Kansas! When a study is assigned to a lead programmer, a plethora of tasks has now been placed on their laps. Documents to fill out, timelines to create, programs to make (oh my!) – all these items have a due date, but in what order should they be done? Should documents be made first, then programs to follow suite? Or should outputs be the focus, documenting as you go? Without a critical path to follow, a lead programmer can easily become overwhelmed when looking at their to-do list, falling into roadblocks and inefficiencies. A critical path is the most optimized task execution list. This paper outlines how to create the critical path for a study’s lifespan to streamline task execution, reduce potential bottlenecks and organize an otherwise overwhelming task list to have the most efficient path toward study completion.
LS-360 : Data management and biostatistics synergy: how to achieve and what can be expected
Diana Avetisian, IQVIA
Tuesday, 10:30 AM – 10:50 AM, Location: Aqua Salon F
Conduct of clinical trials can be a complicated process especially when it come to data collection and statistical analysis. To achieve high quality results, meet all expectations and have a successful study submission all functions have to work in sync. Each group have their own rules, limitations and standards to consider so to succeed we have to find a way and right time to communicate and help one another. Specifically, when it is coming to data collection data management and biostatistics have to work together and understand each other needs. Obviously, the key point to have best results is effective communication but is it enough? What background knowledge should we have to understand other function and speak the same language? Even though both departments are actively involved in data collection process the responsibilities are different and it can be challenging to understand each other logic/language. The synergy between groups can help achieve tremendous progress in data collection, reduce number of data issues, save study budget, prevent questions during submission and reduce stress level. So how all of us can help to reach such ambitious but nevertheless necessary goal? To answer these questions, it will be useful to have some guidance to understand each other needs that’s why the author discusses: the process of database creation and input from different departments; common questions from both groups; tips and tricks to understand the requirements of data collection, possible downstream effects ahead of time; mutual expectations between departments for efficient collaboration.
LS-365 : Proc LIFE-REFLECT: Surviving the Leadership Journey
Steve Nicholas, Atorus Research
Tuesday, 9:00 AM – 9:20 AM, Location: Aqua Salon F
The journey from intern to leader in the life sciences industry is rarely smooth – it’s often filled with challenges, unexpected obstacles, and invaluable lessons. The path to success lies beyond technical expertise, the most successful leaders develop a deep understanding of people, culture, and the power of connection. This paper explores the critical skills needed to navigate leadership effectively, from fostering trust and accountability, to creating a culture of collaboration and engagement. Great leaders don’t just lead- “they mentor. Great leadership requires a balance of strategic vision and personal connection, the ability to create high-performing, cohesive teams while minimizing distractions and prioritizing continuous self-growth. By investing in meaningful relationships, developing communication skills, and embracing mentorship, leaders empower not only themselves but also those around them. Reflecting on key lessons learned throughout my own journey, this paper offers practical insights into building a strong leadership foundation while discussing actions and mindsets that contribute to becoming an essential team member. It highlights the importance of viewing challenges from multiple perspectives, leveraging emotional intelligence, exploring new opportunities, and creating an environment where individuals and teams can thrive. Wherever you are on your professional journey, these insights will help shape a more intentional and impactful leadership style. So, if you’re ready to level up, this one’s for you!
Metadata Management
MM-023 : Enhancing Dictionary Management and Automation with NCI EVS: A Deeper Dive for the CDISC Community
Anthony Chow, CDISC
Tuesday, 4:30 PM – 5:20 PM, Location: Indigo 204
For many in the CDISC community, the National Cancer Institute’s Enterprise Vocabulary Services (NCI EVS) may primarily serve as a source for downloading CDISC Controlled Terminology (CT) files or setting submission values. Yet, this view underestimates the critical, long-standing partnership between CDISC and NCI EVS in developing semantic frameworks that underpin our standards. Through advanced semantic management tools, NCI EVS maintains CDISC’s code lists and controlled terms in a structured and automated environment- “far beyond the spreadsheet-based systems of the past. Thanks to a comprehensive API, users can readily access these resources without navigating complex backend processes. This paper will explore two core NCI EVS tools that can support and enhance CDISC workflows: the NCI EVS Explorer, an interactive tool for concept browsing, and the API driving this robust application. We will highlight practical use cases that illustrate their value to standards professionals. For instance, in curating CDISC Biomedical Concepts (BCs), NCI EVS allows users to group concepts into meaningful taxonomies, facilitating systematic concept curation. Another powerful application is for the CT Relationships deliverables. We use these NCI EVS tools to validate content, such as code lists and terms, ensuring consistency and accuracy across CDISC domains in scope. By utilizing these freely available resources, standard implementers can significantly improve the precision, efficiency, and automation of terminology management, streamlining processes for enhanced compliance and data integrity.
MM-083 : Change the way you think of Codelist – Optimize managing Organization Controlled Terminology with a meta-model and its features to manage Codelist in MDR
Kairav Tarmaster, Sycamore Informatics
Wednesday, 9:00 AM – 9:20 AM, Location: Indigo 204
Each business function (Data Collection, SDTM, ADaM) in Data Standards team governs and release the metadata for their respective data model very well. To realize the absolute benefits of a Controlled Terminology (CT), various business functions such as Data Collection, SDTM, ADaM, and ARS must collaborate to deliver an Organization CT. The aim should be to maximize the use of NCI Dictionary with extensions across studies, eliminate redundancy with unique codelist definitions used across data models, help standard and study teams assess the impact of changing a codelist, assist in collecting clean data with codelist subsets and extensions, establish conformant mappings between Data Collection and Submission for metadata-driven data transformation, identify extended code terms in submission deliverables (define.xml) in studies and have the flexibility to extend the CT as new models (USDM) gets implemented in your organization. A well-defined meta-model helps to meet the objective of effective CT Management collaboratively to ease governance with technology solutions like MDR. This paper describes the aspects to define the meta-model and lists the processes for CT Management in an organization.
MM-085 : Use Gen AI to program rules in R & Python to generate and validate metadata for data standards and study specifications.
Priyanka Sawant, Sycamore Informatics
Kairav Tarmaster, Sycamore Informatics
Wednesday, 9:45 AM – 10:05 AM, Location: Indigo 204
A common and frequent challenge with metadata is that it needs to be entered/created, and a human making these entries is time-consuming and error prone. A simple answer is that the metadata is created and validated based on rules well-defined by the industry (CDISC and Regulatory Conformance Rules), a company, a specific business function, or a system. For example, for a new CRF, study team members design and finalize the CRF structure, and enter essential and minimal metadata for the CRF Items. The remaining metadata necessary to use the CRF in an EDC system can be automated by an (R / Python) application based on business rules. Similarly, define(.xml) specifications are mostly reverse engineered from datasets and then manually updated. The define.xml is validated for conformance to CDISC and FDA rules, and errors and warnings are reported. Authoring the define.xml specifications within the framework of rules with real-time feedback and early detection of errors will save time and reduce the validation cost by avoiding multiple iterations. Likewise, the overall metadata workflow is enhanced with plug-and-play applications that fit in a Metadata Repository (MDR) context. Companies are envisioning and expanding the capabilities for metadata generation, validation, and conformance, as well as getting insights with innovative applications in a low code-no code environment to improve the end-user experience of teams governing standards and studies metadata. This paper shows the use of GenAI to develop R & Python applications with examples to evolve our thoughts on handling and automating metadata.
MM-132 : Accelerating Data Discovery and Governance: Unlocking Insights with Metadata Management, Data Catalogs, and LLM Integration for Streamlined Regulatory Approval in Clinical Trials
Pritesh Desai, sas
Samiul Haque, SAS Institute
Tuesday, 3:00 PM – 3:20 PM, Location: Indigo 204
Data is the fuel that drives analytical decision engines. Both raw and curated data must first be discovered, prepared, managed, and made accessible to appropriate users in an analysis ready format. Effective metadata management and comprehensive data catalogs form the foundational pillars of agile data governance. Accessible and transparent data catalogs will be the cornerstone of digital transformation strategies. The modern SCE platform, significantly accelerates metadata management and analysis, drastically reducing the time and effort analysts spend on data discovery and preparation. Data catalogs aim at managing metadata for delivering a concrete classification of data across your organization so it can be easily accessed and consumed by businesspeople (self-service). Information about the localization of data is key. Examples include, where it comes from (lineage), how datasets can be used, and if they are recommended or certified by other users. This presentation gives you a glimpse of out-of-the-box solutions that provide enterprise data access and integration between SAS and third-party databases. These capabilities enable you to read, write, and update data no matter what native databases or platforms you use. In addition, this presentation also sheds light on how LLMs can be leveraged to rapidly search through cataloged metadata. Having an integration with Large Language Models (LLMs) at every step of data analysis, may become a pivotal tool for automation in clinical data processing.
MM-147 : The Many Use Cases of Standardized Data and Metadata
Sanjiv Ramalingam, Biogen Idec
Tuesday, 4:00 PM – 4:10 PM, Location: Indigo 204
Study Data Tabulation Model (SDTM) datasets are predominantly Clinical Study Report (CSR) driven and primarily created to support analysis dataset creation. The data flow is typically unidirectional. The SDTM group has pioneered the creation of SDTM datasets enabling creation of SDTM datasets within days of First-Patient-In (FPI) and automatic refreshes enabling wider use of standardized data across the organization. With this new way of working, the SDTM group is able to serve and enable efficiencies across multiple functions such as Data Management, Data Standards and Governance, Biomarker and Clinical Operations than just Statistical Programming. The use cases for each of these groups have been discussed in this paper.
MM-227 : Does SDTM Validation Really Require Double Programming?
Sunil Gupta, Gupta Programming
Tomás Sabat Stofsel, Verisian
Tuesday, 2:00 PM – 2:20 PM, Location: Indigo 204
While double programming for SDTM validation is the gold standard, it may be outdated with advances in technology! The key purpose of double programming is not to duplicate efforts but to assure independent programming to reproduce the same SDTMs based on interpretations of common specifications and raw data. The outcome expected is to prevent false positive and false negative results as well as minimize the likelihood of coding errors or biases. The FDA requires that sponsors validate SDTMs, ADaMs and TLGs but not how to validate the results which means that 100% double programming is not required. In fact, FDA has guidelines for applying a risk-based approach to the validation process and for the use of AI to support clinical trials development. This article will explore the concept of double programming and alternative methods to validate SDTMs using technology as well as sponsor oversight management methods for CRO deliverables.
MM-265 : AI Empowered Metadata Governance
Prasoon Sangwan, TCS
Wednesday, 8:00 AM – 8:20 AM, Location: Indigo 204
Efficient metadata governance is pivotal for seamless digital data flow, ensuring streamlined data collection, analysis, and standardized transformations. It cultivates standardization, harmonization, reusability and automation. Despite the evolving importance of metadata, complex trial designs often complicate governance making it an intricate process. This leading to siloed standards protocol specific nuances further exacerbate the challenges, justifying deviations from established norms. This paper explores the transformative role of Artificial Intelligence (AI) and Machine Learning (ML) in enriching metadata management. AI and ML present a paradigm shift through metadata discovery creating sponsor aligned metadata standards, facilitating rapid transitions between multiple versions of standards, and maintaining the lineage between standards. The discussion encompasses the ease of maintenance of standards, generation of traceability and transformation specifications, identifying redundancies, and accommodating protocol-specific changes within the defined organizational hierarchy. It underscores the invaluable benefits of AI/ML to elevate metadata quality, minimize complexities, and boost reusability, traceability, and automation.
MM-277 : Automating Define-XML Updates: A SAS-Based Framework for Submission Readiness
Qiong Wei, BioPier LLC (a Veramed Company)
Lixin Gao, BioPier LLC (a Veramed Company)
Tuesday, 2:30 PM – 2:50 PM, Location: Indigo 204
The data definition file (define-xml), which describes the metadata of the submitted clinical data, is one of the most critical components of electronic data submission for regulatory review. Therefore, it is essential to ensure that the Define-XML file is consistently aligned with the latest datasets and complies with the most current standards. In practice, however, various factors can lead to changes in datasets after the Define-XML has been created. Under these circumstances, updating an existing Define-XML becomes a crucial but complex and time-consuming task in preparing metadata for regulatory submissions. This paper introduces a SAS-based framework designed to automate the Define-XML update process, providing an efficient and reliable solution for metadata management. The framework utilizes SAS macros to extract metadata from the existing Define-XML, integrate it with updated dataset specifications, and generate a revised Define-XML that adheres to the latest standards. This approach ensures accurate metadata updates while reducing manual effort, minimizing errors, and accelerating the overall process.
MM-384 : The Model Maketh the Metadata
Carlo Radovsky, Independent Consultant
Tuesday, 1:30 PM – 1:50 PM, Location: Indigo 204
All submission dataset automation solutions begin with a metadata reference, be it the published standards, spreadsheets or a robust repository. To date, most organizations have looked to CDISC materials as a starting point, quickly finding that, as published, these metadata sources are insufficient both in content and model, leaving everyone to build proprietary solutions. This paper proposes a more robust representation, establishing a comprehensive model that supports both robust internal standards and study variability, while also streamlining version management, upversioning, and end to end processes.
MM-401 : Enhancing Health Equity Outcomes through Comprehensive Data Collection of Marginalized Populations including Sexual Orientation, Gender Identity and Intersex Status (SOGI)
Donna Sattler, SGM Alliance
Wednesday, 8:30 AM – 8:50 AM, Location: Indigo 204
The collection of comprehensive data from marginalized populations is essential for advancing health equity in clinical trials. Historically, clinical trials have underrepresented these populations, leading to disparities in health outcomes and access to medical advancements. This abstract aims to evaluate the impact of broader data collection on health equity outcomes by incorporating diverse subject characteristic data, such as Sexual Orientation, Gender Identity and Intersex Status from marginalized groups. And it will demonstrate how real-time data visualizations can aid in managing your trial operations. A data-driven approach will enable clinical trial teams to execute and report out their diversity plans faster.
R, Python, and Open Source Technologies
OS-007 : Benefits, Challenges, and Opportunities with Open Source Technologies in the 21st Century
Kirk Lafler, sasNerd
Ryan Lafler, Premier Analytics Consulting, LLC
Joshua Cook, University of West Florida (UWF)
Stephen Sloan, Dawson D R
Anna Wade, Emanate Biostats
Wednesday, 8:00 AM – 8:20 AM, Location: Aqua Salon C
Organizations around the globe are truly facing a paradigm shift with the type of software, the quantity and availability of software technologies, including open source, and the creative ways the many technologies live, play, and thrive in the same sand box together. We’ll explore the many benefits, challenges, and opportunities with open source technologies in the 21st century. We’ll also describe the challenges facing user communities as they find ways to integrate open source software and technologies, handle compatibility and vulnerability issues, address security limitations, manage intellectual property and warranty issues, and address inconsistent development practices. Plan to join us for an informative presentation about the benefits, challenges, and opportunities confronting open source user communities around the world, including the application and current state of Python, R, SQL, database systems, cloud computing, software standards, and the collaborative nature of community in the 21st century.
OS-024 : An End-to-End Workflow for TFL Generation using R: MMRM Applications and Comparative Insights with SAS
Kai Lei, Vertex Pharmaceuticals, INC
Jiaqiang Zhu, Vertex Pharmaceuticals, Inc
Margaret Huang, Vertex Pharmaceuticals, Inc.
Monday, 8:30 AM – 8:50 AM, Location: Aqua Salon C
Tables, Figures, and Listings (TFLs) are essential for presenting clinical trial data in regulatory submissions, ensuring transparency and compliance. Traditionally, TFLs are created using SAS, a trusted industry standard. However, the rise of R as an open-source, flexible, and cost-effective alternative is reshaping the landscape of clinical trial programming. This paper presents an end-to-end workflow for TFL creation in R, using Mixed-Effects Models for Repeated Measures (MMRM) as a representative application example. Our workflow encompasses data preparation, model implementation, TFL generation, and regulatory-compliant output formatting, leveraging R packages like dplyr, ggplot2 and r2rtf. A side-by-side comparison of R’s mmrm package with SAS’s PROC MIXED highlights the differences in syntax, flexibility, and visualization capabilities, demonstrating R’s advanced customization options for graphics and layout. This paper provides practical guidance for clinical trial programmers and statisticians interested in adopting R for TFL creation and offers recommendations for a smooth transition from SAS. Future directions include expanding R’s capabilities through standardized functions for clinical reporting and exploring additional applications such as count data models.
OS-048 : Unlocking Insights: Comparative Analysis of CRF Changes with R Shiny
Mayank Singh, Johnson and Johnson MedTech
Monday, 9:00 AM – 9:20 AM, Location: Aqua Salon C
In the clinical research industry, Case Report Forms (CRFs) undergo multiple updates throughout the course of study execution. It is the responsibility of the Biostatistics and Programming group to ensure that the downstream programs remain synchronized with any modifications to the CRFs, including the addition of forms, updates to fields, and changes to code lists. However, identifying these CRF updates is a manual and labor-intensive process, often prone to errors that can result in overlooking critical changes. This paper presents an R Shiny application designed to facilitate the comparison of various versions of CRF metadata, whether within a single study or across studies. The application generates a comprehensive comparison report that is accessible via an R Shiny dashboard or available for download in Excel format. Additionally, it enables the team to leverage programming codes from similar studies, enhancing efficiency in their current research
OS-049 : Reach for R Low Hanging Fruit for Faster Results
Sunil Gupta, Gupta Programming
Tuesday, 8:30 AM – 8:50 AM, Location: Aqua Salon C
With the pharma industry expanding packages within pharmaverse, is your organization ready to reach for R low hanging fruit for faster results? This presentation will show how R packages and functions are ‘out of the box’ production ready for fast results without having to invest in SAS silo resources or expenses. While some R mentoring is required to understand key differences with SAS and R fundamental concepts, smarter organizations can get a jump start on their submission ready deliverables. R packages in data management and graphs enable ‘plug-n-play’ queries and plots. The R packages used to create the Clinical Study Report (CSR) tables and lists offer R script templates that can be customized. CDISC submission support packages include the define.xml and tidyCDISC. Finally, popular pharmaverse packages rtables, admiral and teal will also be featured.
OS-076 : TLFQC: A High-compatible R Shiny based Platform for Automated and Codeless TLFs Generation and Validation
Chen Ling, AbbVie
Yachen Wang, AbbVie
Monday, 11:00 AM – 11:20 AM, Location: Aqua Salon C
SAS-R integration and transition is a trending topic in the industry, more and more companies are incorporating R programming. Even though many R functions have been developed, Tables, Listings, and Figures (TLFs) generating still needs substantial repetitive work and the use of R for validating SAS-generated TLFs is very limited. Also, it might be hard for SAS users with no R experience to directly code in R. Taking advantage of the open-source software R, we are able to automate the process to make it easy to use for everyone, hence improving efficiency. In this paper, we will introduce our application demo: TLFQC, which leverages the R Shiny framework to automate the generation and validation of TLFs, with following features:1.TLFQC can generate multiple TLFs simultaneously with required parameters entered; 2.Users can customize TLFs report with or without wording from TOC, ensuring the final output adheres to required specifications; 3. The in-app data manipulation capabilities streamline data review processes; 4. The validation feature provides a comprehensive comparison between R and SAS generated TLFs, providing an interactive dashboard for checking validation results intuitively as well as overall and detailed reports for enhanced QC; 5.TLFQC supports high compatibility, enabling other developers to include their own TLF-generating/quality control functions. Along with these features, we will also introduce the basic structure of R shiny and building the skeleton of the app with {shinydashboard} package. Codes and examples will be shared in this paper, we will walk you through every detail for developing an interactive shiny app.
OS-077 : Comparing SAS® and R Approaches in Reshaping data
Yachen Wang, AbbVie
Chen Ling, AbbVie
Tuesday, 10:00 AM – 10:10 AM, Location: Aqua Salon C
Data reshaping is a fundamental process in data management. Both SAS® and R offer robust capabilities to transform data between wide and long formats, enabling researchers to manipulate and reorganize datasets for effective analysis. This paper compares SAS® PROC TRANSPOSE and R’s pivot_longer() and pivot_wider() from {tidyr} package in data reshaping. Apart from their basic data transformation capabilities, both tools can customize variable names, handle duplications and deal with unused variables. SAS® is more preferred while handling labels in data reshaping, due to its unique labeling feature. R, on the other hand, offers additional flexibility with its useful options, enhancing the ability to manage more complex data scenarios. Practical example codes in SAS® and R are provided to illustrate these features and differences. These examples will help users understand the application of each method and demonstrate the strengths and limitations of each approach. By examining these approaches and providing illustrative examples, we provide insights on selecting the appropriate tool for various data transformation tasks in data analysis.
OS-094 : Unlocking Success: Lessons from Building R-Based Statistical Packages in Pharma
Sydney Hyde, Bristol Myers Squibb
Yirong Cao, Bristol Myers Squibb
Monday, 4:00 PM – 4:20 PM, Location: Aqua Salon C
Packages are essential tools that enhance R programming’s capabilities by incorporating custom functions, datasets, and fully compiled code. At BMS, we develop two types of statistical packages: one for exploratory statistical analysis and another for Health Authority submissions. Through our experience in building R-based statistical packages, we have identified several key lessons. Firstly, ensuring the accuracy, traceability, and reproducibility of internal and submission packages is critical. This includes rigorous unit testing and validation of dependency packages, ensuring compliance with regulatory requirements. Secondly, effective version control is vital in a collaborative environment. While we use corporate GitHub for version control during development, it is not used for submissions. Instead, we provide R Markdown (Rmd) files with the R code, knitted HTML files as logs, and RTF files with results for clear documentation. Thirdly, handling data securely is a significant challenge. Packages must manage data appropriately, including secure reading and writing from the Statistical Computing Environment, to maintain data integrity and confidentiality. Fourthly, a modular design approach, coupled with thorough documentation, is critical for maintenance, collaboration, and submission processes, facilitating easier updates and clearer understanding among team members. Lastly, establishing a robust internal R user community is essential. This community enhances collaboration between statisticians and programmers, supports knowledge sharing, and improves the overall development, maintenance, and utilization of statistical packages. By addressing these areas, BMS ensures the development of robust, compliant, and efficient R-based statistical packages that meet the needs of both internal stakeholders and regulatory authorities. Keywords: R, R Packages, Pharma
OS-096 : A Gentle Introduction to creating graphs in Python, R and SAS
Dane Korver, RTI International
Tuesday, 10:15 AM – 10:25 AM, Location: Aqua Salon C
If you have read my previous papers, I have used Wordle and temperature data to teach us how to create various graphs in SAS and R. This paper is going to be using a very small dataset to help us grow our abilities in graphing our data using Python, R and SAS to help us visualize trends over time.
OS-107 : Clinical Data Quality Assurance: An Interactive Application for Data Discrepancy Detection
Yushan Wang, Merck
Monday, 5:00 PM – 5:10 PM, Location: Aqua Salon C
In today’s data-driven clinical environment, ensuring the quality of clinical data is essential for evaluating trial outcomes. This paper introduces an R Shiny application designed for statistical programmers to facilitate rigorous data checks and addresses frequent inquiries from a wide range of stakeholders, including biomarker statisticians, PKPD modelers, and others involved in clinical research. The application streamlines the operational process by incorporating four key modules: Missing Key Variables, Duplicates, Date Issues, and Record Discrepancy. The Missing Key Variables module enables users to identify records lacking critical data elements, ensuring completeness for the analysis; for example, it can highlight records where critical dosing information or concentration results are missing, enabling statisticians / modelers to address data gaps. The Duplicates section identifies instances of repeated data entries, allowing users to pinpoint redundancies. The Date Issues component identifies inconsistencies in date entries, such as verifying that a start date precedes an end date, which is essential for ensuring reliable analysis in clinical settings. Finally, the Record Discrepancy section provides comparative results between two datasets, allowing modelers to examine merging results from sources such as ADA, PK, and lab data. By integrating these functionalities into a user-friendly interface, the application empowers programmers to systematically address frequent data discrepancies encountered during programming and analysis/modelling.
OS-111 : Integrating Collaborative Programming with Automated Traceability and Reproducibility in Pharma Studies and Real-World Data Projects by Adapting DevOps Best-Practices
Ariel Asper, Graticule
Sundeep Bath, Graticule
Jennifer Dusendang, Graticule
Yuval Koren, Graticule
Silvia Orozco, Graticule
Wednesday, 8:30 AM – 8:50 AM, Location: Aqua Salon C
To enhance integrity of research and study findings, data scientists should ensure that studies are traceable and reproducible, which involves meticulous management of datasets, tracking code changes, and robust storage of results. Without infrastructure to support reproducibility efforts, documentation, dependency management, and version control processes can be manual, unreliable, and unclear. This creates problems with determining when analysis changes occurred, which version of study results were produced by which version of code, and whether all study steps are processed in proper order and appropriately documented. Implementing procedures and technical infrastructure helps to maintain and automate reproducibility and traceability. To ensure that code can be executed consistently across multiple compute environments, we structure analysis scripts into parameterized pipelines within an isolated Docker container environment which specifies all versions and dependencies. We integrate Continuous Integration (CI) and Continuous Delivery (CD) into analysis pipelines to enable automatic rerunning of analyses following code modifications and storage of results in the cloud. Our process integrates and improves collaborative programming by providing code reviewers with the validated outputs that are produced by the code. By design, study close-out and compliance activities are incorporated within our infrastructure. In this paper we will discuss how we implemented and adapted DevOps best-practices like CI/CD in a collaborative coding environment to work for epidemiological studies and real-world data projects. Although the concepts discussed are applicable to many tools, our implementation uses Git, GitHub Actions, SQL, Python, R, Docker, and AWS S3. This content is applicable for all skill levels.
OS-120 : Streamlining BIMO and Patient Profile Generation: A Python-Based Semi-Automated Approach Integrated with CSR Development
Dmytro Skorba, Intego Clinical
Mykyta Vysotskyi, Intego Clinical
Wednesday, 9:00 AM – 9:20 AM, Location: Aqua Salon C
Preparing Bioresearch Monitoring (BIMO) and Patient Profile (PP) listings traditionally occurs after Clinical Study Report (CSR) development, often leading to duplicate effort, inconsistencies, and time pressure. This paper presents an innovative semi-automated approach that integrates BIMO and PP requirements during the initial CSR development phase. By leveraging Python’s capabilities alongside existing SAS infrastructure, we demonstrate how to transform CSR listings into FDA-compliant BIMO and PP outputs while maintaining data consistency and reducing validation burden. The approach combines organizational strategies for early planning with technical solutions for automated data transformation and formatting. Practical examples illustrate how this methodology significantly reduces development time, minimizes manual intervention, and ensures alignment across all submission deliverables. Special attention is given to validation efficiency, showing how proper integration can eliminate redundant quality control steps while maintaining regulatory compliance.
OS-122 : Streamlining Validation Review and SAS® Program Management with R
Huei-Ling Chen, Merck & Co.
Monday, 10:00 AM – 10:20 AM, Location: Aqua Salon C
Unlike SAS, R treats nearly everything as an object that can be easily modified. In addition to familiar objects such as datasets, the R language can also work with lists, vectors, and matrices. This flexibility in handling various objects dramatically expedites the workflow. This paper illustrates how to leverage R language to import large SAS files into a list for rapid collective processing within seconds. It highlights three specific applications that optimize post-processing tasks: (1) examining large-scale SAS validation output files, (2) automating the trimming of SAS syntax text, and (3) adding SAS syntax text to append data attributes from a central location for all analysis datasets.
OS-128 : How to compute common clinical trials statistics in R
Oleksandr Babych, Intego Group LLC
Monday, 10:30 AM – 10:50 AM, Location: Aqua Salon C
Nowadays, R has already become an indispensable tool for clinical trial analysis. Yet, many clinical programmers struggle to learn R or try to avoid it at all. In addition, the growing number of R packages offering equivalent computations could make the learning process even more confusing. This paper aims to fill in the gap on how to calculate various statistics in R and help you acquire such a necessary skill. In this article, we would like to discuss different R packages and show how they can be used to compute various statistics, starting from basic descriptive statistics to more complicated and vital efficacy ones such as p-value, hazard ratio with confidence intervals, odds ratio, etc. Moreover, we will look at different methods for these statistics as well as stratified analysis.
OS-145 : Code Switching: Parallels between Human Languages and Multilingual Programming
Danielle Stephenson, Atorus Research
Laura Mino, Atorus Research
Tuesday, 10:30 AM – 10:50 AM, Location: Aqua Salon C
As multilingualism gains importance in the pharma industry, we’ve observed that the processes for learning programming languages closely mirror those for learning human languages. What insights can we draw from the study of natural languages to enhance our approach to multilingual programming? What similarities can we find between a SAS® programmer learning R and a native Mandarin speaker learning English? Join us, two polyglots who enjoy both people-talk and code-talk, as we explore natural and programming languages to see how techniques for understanding one can be leveraged to understand the other.
OS-164 : Catch Page Overflow Issues Quick and Easy – A Simple Python Solution
Junze Zhang, Merck Co., Inc
Huei-Ling Chen, Merck & Co.
Tuesday, 4:30 PM – 4:50 PM, Location: Aqua Salon C
Clinical study reports often contain many tables, listings, and figures. When content spills over the edges of a page, it can result in overflow issues. While these issues may be easily identified in smaller files by manually reviewing the document, scrolling through hundreds of pages to mamually check the overflow is mission impossible. Therefore, it is essential to develop an efficient quality control tool to detect these overflow problems early in the process, reducing the chance of having to perform re-runs. Many pharmaceutical companies and contract research organizations utilize the Rich Text Format (RTF) file format for clinical study reports. This paper introduces a straightforward Python function designed to automate this task. The function efficiently checks RTF files in batches, offering rapid verification even when processing a large number of tables, listings, and figures (TLFs).
OS-167 : Advanced Programming with R: Leveraging Tidyverse and Admiral for ADaM Dataset Creation with a Comparison to SAS
Joshua Cook, University of West Florida (UWF)
Richann Watson, DataRich Consulting
Tuesday, 11:00 AM – 11:20 AM, Location: Aqua Salon C
The transition from SDTM to ADaM datasets is a critical step in clinical trial data preparation, requiring precise programming solutions to ensure compliance with CDISC standards. This paper presents a two-stage approach to ADSL derivation: first, we manually derive key variables using base R and the tidyverse, demonstrating the detailed logic required for compliance and flexibility in handling real-world data challenges. Then, we explore the admiral package, showcasing how its specialized functions significantly simplify and streamline this process. By leveraging admiral, we automate derivations such as treatment start and end dates (TRTSDT, TRTEDT) and treatment duration (TRTDUR) while addressing duplicates, validating dataset integrity, and enhancing reproducibility. We highlight the power of admiral in minimizing code redundancy, integrating custom logic, and ensuring seamless compliance with industry standards. To aid in the transition of code from SAS to R, this paper includes an SAS version of the code for comparison purposes. Designed for intermediate to advanced programmers, this paper illustrates the benefits of transitioning to open-source tools and the pharmaverse ecosystem for robust clinical data workflows.
OS-179 : Slice, Dice, Analyze: Revolutionizing SDTM cuts with datacutr package
Diego Madrigal Viquez, Intego Clinical
Monday, 2:00 PM – 2:20 PM, Location: Aqua Salon C
Processing SDTM data is a key step in clinical trial analysis, and doing it efficiently can save significant time and effort. The datacutr package, built in R, provides a practical solution for applying cuts to SDTM domains. With different methods, it simplifies data preparation for analysis by automating much of the process. This article focuses on real-world examples of using datacutr to handle SDTM data. We will demonstrate step-by-step how the package works, showing how different cuts can be applied based on specific needs. By automating these tasks, datacutr reduces manual effort, minimizes errors, and speeds up the workflow, making it easier to prepare datasets for regulatory submissions or internal use As an R-based tool, datacutr highlights the growing importance of R in the clinical industry. With its flexibility, open-source nature, and powerful capabilities for data manipulation and analysis, R is becoming an essential skill for programmers and statisticians. This article will also touch on the benefits of using R and how adopting it can support the future of clinical programming.
OS-182 : Interactive Longitudinal Data Analysis and Visualization in Clinical Research Using R.
Maria Gomez Ramirez, Intego Clinical
Tuesday, 1:30 PM – 1:50 PM, Location: Aqua Salon C
Analyzing a longitudinal study with clinical data allows us to understand changes that have occurred over time in patient outcomes (such as biomarkers, the effects of treatments, and disease progression, among others). Graphical visualization of the data is crucial, as it enables decision-makers to interpret analytics in a visual format, making it easier to grasp complex concepts or identify new patterns. Moreover, when users can interact dynamically with the application presenting the data, they can deepen their understanding of an event or uncover specific insights. The use of R and its different free available packages makes it possible to create user-friendly data visualization applications that allow for diverse interaction forms. This paper will demonstrate an example of an application using R packages such as shiny, matrix, lme4, plotly, ggplot2, and DT, to present an interactive visualization of a longitudinal data analysis using the R’s publicly available dataset, sleepstudy. The shiny package provides a wide range of features that facilitate user interactivity with the information displayed on the screen via an HTML page. This package allows the creation of filters, dataset displays, information downloads, data uploads, and many other functionalities. Several of these features are utilized to develop this application that enables data visualization from a longitudinal analysis. This tool presents a way that enables clinicians and researchers to explore longitudinal data by combining analytics with interactive visualization. It serves as a demonstration of a resource for improving clinical research workflows.
OS-195 : Building a Scalable Training Platform for R: Empowering Analytics Excellence in Corporate Initiatives
Michelle Page-Lopez, Syneos Health
Martyn Walker, Syneos Health
Monday, 4:30 PM – 4:50 PM, Location: Aqua Salon C
Building a scalable training platform for R within a Contract Research Organization (CRO) addresses the growing demand for advanced analytical tools in clinical research. As the industry shifts toward open-source programming, equipping staff with the necessary R skills becomes critical. This paper highlights the creation of a flexible, accessible, and comprehensive training framework that fosters skill development while meeting organizational needs. The training platform leverages modern e-learning technologies, combining self-paced modules, interactive coding exercises, and real-world case studies. We showcase the {learnr} package in R, which enables the creation of hands-on training content by combining narrative text, R code, quizzes, and visualizations. This approach is ideal for creating tutorials that are customizable, scalable, and accessible through cloud-based infrastructure. Tailored to diverse proficiency levels, the curriculum covers foundational R programming, Study Data Tabulation Models (SDTM), Analysis Data Models (ADaMs), and Tables, Figures, and Listings (TFLs). Finally, we examine the challenges associated with developing an in-house training program from the ground up and explore potential future applications of such a platform. Training staff in R is vital for clinical research, equipping teams with the skills to leverage its powerful analytical capabilities, meet organizational needs, and ensure data-driven decision-making.
OS-203 : R Programming in SAS® LSAF: How to Generate Clinical Reports Using an R Session
Praneeth Adidela, ICON plc
Tuesday, 11:30 AM – 11:40 AM, Location: Aqua Salon C
The SAS® Life Science Analytics Framework (LSAF) provides an integrated system for transforming, analyzing, reporting, and reviewing clinical research data. LSAF is a cloud-native, single solution for clinical analysis and submission and offers integrated features for regulatory compliance, quality control, job manifests, version control, audit trails, e-signatures, and documentation support. It enables automation through workflows, supports integrations, and ensures the proper implementation and management of data standards and controlled terminology. LSAF version 5.3 and later introduces integration with R. The program editor within LSAF allows users to code in SAS or R on the same flexible, open platform. Users can develop R programs in LSAF within the R session using an interface similar to the SAS session. The repository and workspace store R program files (*.r), data files (*.rdata or *.rds), and log files (*.rlog) like their SAS counterparts. Users can add R programs to jobs and incorporate SAS and R programs within the same job. The processes for running and debugging R programs are similar to those for SAS programs. Compliance, performance, and traceability are the key benefits of using R in LSAF. This paper will focus on utilizing R within LSAF, exploring the coding process on the platform to generate clinical reports, and offering strategies for working efficiently within LSAF.
OS-215 : {sdtm.oak} V0.1 on CRAN, sponsored by CDISC COSA, part of Pharmaverse, is an EDC and Data Standard agnostic solution for developing SDTM datasets in R.
Rammprasad Ganapathy, Genentech
Tuesday, 9:00 AM – 9:20 AM, Location: Aqua Salon C
An EDC (Electronic Data Capture systems) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop CDISC SDTM datasets in R. The reusable algorithms concept in ‘sdtm.oak’ provides a framework for modular programming and also can potentially automate SDTM creation based on the standard SDTM spec. In this presentation, we will introduce the {sdtm.oak} package, showcase its features and present a road map for future releases.
OS-229 : Identifying Breaking Changes in R Packages: pkgdiff
David Bosak, r-sassy.org
Monday, 2:30 PM – 2:50 PM, Location: Aqua Salon C
SAS® software is famous for its stability and backward compatibility. SAS programs written in the 1980s will still work today, often with few or no changes. R is a different story. In R, programs you wrote a few months ago and were working perfectly suddenly can have errors or warnings. After some investigation, you realize a package you were using has changed, and broken your program. This is called a “breaking change”. The purpose of this paper is to explore an R package named “pkgdiff”, which aims to identify and help manage breaking changes. The paper will summarize the major functions of the package, and explain how it can be used to reduce breakages in your code.
OS-255 : Debugging Options in R: Applications and Usage in Clinical Programming
Madhusudhan Ginnaram, Merck
Bingjun Wang, Merck &Co.
Jeetender Chauhan, Merck & Co., Inc.
Sarad Nepal, Merck
Tuesday, 11:45 AM – 11:55 AM, Location: Aqua Salon C
Debugging is a critical component of R programming, especially in the clinical data where data integrity and accuracy are paramount. R, a programming language widely used for statistical computing and data analysis, provides a variety of debugging options that help developers identify and resolve coding errors effectively. This paper explores the various debugging options available in R, including functions like browser(), debug(), trace(), and recover() functions, and discusses their practical application. Additionally, we examine the significance of proper debugging in ensuring reliable statistical outputs, regulatory compliance, and overall data quality in clinical trials. By detailing the usage of these debugging options, this paper aims to enhance the understanding of R’s capabilities in clinical programming and underscore the importance of effective debugging practices.
OS-288 : Putting the ‘R’ in RWD: Leveraging R and Posit to enhance Real World Data Programming
Darren Jeng, Pfizer
Sachin Heerah, Pfizer
Tuesday, 2:00 PM – 2:10 PM, Location: Aqua Salon C
The Pfizer Real World Data (RWD) programming team has leveraged R and Posit services to enhance the capabilities of its programmers. We have designed an R package, Shiny apps and even a Quarto website to support all programmers with varying backgrounds, including those with only SAS experience. Our R package is designed to simplify database queries, utilize both R and SAS variable syntax, and standardize deliverables. We have leveraged Posit’s RStudio features such as code snippets to make code templates readily accessible for all users within the IDE. Code guides are also presented as snippets to allow all users to load example data and explore standard RWD programming workflows. Overall, embracing and leveraging the features available to us in R and Posit is enhancing our workflow through integrated resources and documentation. This paper is applicable to all levels.
OS-290 : Packing PDF TFL with Table of Contents and Bookmarks Using Python
Jun Yang, Avidity Bioscience
Yan Moore, Avidity Bioscience
Tuesday, 2:30 PM – 2:50 PM, Location: Aqua Salon C
Packaging tables, listings, and figures (TLFs) is a common requirement in the pharmaceutical industry. Rich Text Format (RTF) is widely used for generating individual reports, as SAS offers powerful and flexible functions for RTF output. Consequently, RTF is often preferred for combining outputs, and numerous standard programs and macros have been developed for this purpose. However, RTF has certain limitations. Its display may vary across different operating systems and even between versions of Microsoft Word. Additionally, RTF files can be thousands of pages long, leading to large file sizes that are slow to open, browse, and transfer. Portable Document Format (PDF) addresses these challenges. A converted and merged PDF file offers a smaller, more consistent, and manageable solution. This paper introduces an alternative approach to converting RTF outputs into a single, well-structured PDF file. It has two options to generate a combined pdf with TOC. The first option analyzes and extracts key titles from the PDF content to generate a table of contents and bookmarks. The second option reads TLF metadata file including the order of RTF files, filenames, and corresponding titles. Leveraging Python’s extensive developer community and rich set of libraries, this approach enables the efficient creation of a refined and well-formatted output.
OS-293 : Achieving Reliable Data Verification with R: Proven Tools, Best Practices, and Innovative Workflows
Valeria Duran, Statistical Center for HIV/AIDS Research and Prevention at Fred Hutch
Xuehan Zhang, Fred Hutch Cancer Center
Tuesday, 2:15 PM – 2:25 PM, Location: Aqua Salon C
In the pharmaceutical industry, ensuring the accuracy and reliability of data is critical, particularly when the output data can influence the analysis result. For SAS programmers transitioning to R or current R programmers exploring the language’s capabilities in verification, identifying specific tools and best practices can be challenging. This paper presents the tools we can use for effective verification. We will focus on three levels of data verification: complete independent verification (double programming), targeted independent verification, and peer review. We discuss how these levels align with varying risk levels associated with data complexity and the criticality of study endpoints. We will highlight our verification practices in R, showcasing how R functions from packages such as testthat and purrr can act as alternatives to SAS’s PROC COMPARE. Additionally, we will explore a proposed peer review verification process, offering possible approaches to code and data review procedures. This paper aims to inform both R and SAS programmers about the nuances of data verification and provide practical guidance for integrating R effectively into their data verification workflows.
OS-323 : Python with Hermione: Unleash Your Inner Coding Witch & Dragon
Charu Shankar, SAS Institute
Jim Box, SAS Institute
Tuesday, 3:00 PM – 3:20 PM, Location: Aqua Salon C
Join Hermione Granger on a thrilling 60-minute coding journey where you’ll learn to slay coding challenges like a true wizard- “using Python as your magic wand! Just as Hermione faces down dragons and dark spells with wit and determination, you’ll tackle the essentials of Python programming with her strategic approach to problem-solving. In this fun and fast-paced class, you’ll learn to master Python’s core elements: variables, functions, and loops- “all with the same precision Hermione uses when crafting complex spells. Python’s readable syntax will make you feel like you’re casting spells in no time, whether you’re automating tasks or working with data. By the end of this adventure, you’ll be equipped to conquer coding challenges like a seasoned wizard, ready to take on any dragon (or program) that comes your way. Grab your wand (keyboard) and let’s start coding like Hermione!
OS-330 : The {teal} Adoption Playbook: Strategies, Tools, and Learning Paths for Open Source
Vedha Viyash, Appsilon
Monday, 11:30 AM – 11:50 AM, Location: Aqua Salon C
As the industry adopts open-source technologies, pharma companies are exploring ways to integrate R-based solutions into their workflows. {teal}, an open-source framework for building interactive clinical data review applications in Shiny, offers a powerful alternative to proprietary tools. However, adoption comes with challenges- “adapting existing workflows, integrating clinical data sources, ensuring regulatory compliance, and upskilling teams to effectively use {teal}. This talk presents a practical playbook for clinical teams looking to adopt {teal}. We will discuss strategies to streamline data integration, including the creation of data connector packages tailored to organizational data structures. Participants will be introduced to key practices for creating flexible Shiny apps with {teal} while maintaining regulatory standards. A detailed learning path focused on R and {teal} skills will also be shared to help teams adopt and scale the solution effectively. At the end of this session, attendees will have a roadmap for overcoming adoption obstacles and harnessing {teal} to drive efficiency and innovation in clinical trial data exploration.
OS-351 : Geocoding with the Google Maps API: Using PROC FCMP To Call User-Defined SAS® and Python Functions That Geocode Coordinates into Addresses, Calculate Routes, and More!
Troy Hughes, Data Llama Analytics
Monday, 3:00 PM – 3:20 PM, Location: Aqua Salon C
Software interoperability describes the ability of software systems, components, and languages to communicate effectively with each other, and must be prioritized within today’s multilingual development environments. PROC FCMP, the SAS® Function Compiler, enables Python functions to be wrapped in (and called from) SAS user-defined functions in both SAS 9.4 and SAS Viya. Productivity and the pace of development are maximized when existing Python code can be run natively rather than having to be needlessly recoded into the SAS language. This talk demonstrates SAS and Python function calls to the Google Maps API that geocode latitude/longitude coordinates into street addresses and to calculate walking and driving distances between locations. Examples will demonstrate how these geocoding functions can be utilized for clinical trials and pharmaceutical application.
OS-364 : Building Extensible Python Classes for Analysis and Research : It’s Easier Than You Think!
Sundaresh Sankaran, SAS Institute
Samiul Haque, SAS Institute
Tuesday, 4:00 PM – 4:20 PM, Location: Aqua Salon C
Data Analytics in the life sciences industry requires iteration and access to a wide range of methods and techniques for trusted, stable and robust results. The Python open-source ecosystem provides a rich array of such methods through a number of packages and modules that promote rapid and flexible experimentation. However, such variety also has its downsides such as package dependencies, strict version support and a clutter of similar packages. Data scientists and programmers require a common framework which packages many capabilities in a seamless manner. In this session, we provide a design and example of a Python class which encapsulates established packages and pipelines for data management and outcome predictions, making them available from a single instance. These methods cover a range of operations across the analytics life cycle, and are extensible to include new methods and classes. Every instance can be associated with the source dataset and analytical artifacts created in-process, and can be encapsulated into a single package for porting to other environments, thus enabling easy promotion of analysis. We also make available source code and examples for working with this class, and explain how this can be customised for your organisation’s specific needs. This session provides the audience valuable tools and knowledge on how to organise their Python code in a structured framework, and gain efficiency and productivity benefits.
OS-395 : Shiny & LLMs: Landscape and Applications in Pharma
Phil Bowsher, RStudio Inc.
Monday, 1:30 PM – 1:50 PM, Location: Aqua Salon C
The landscape of GenAI is changing quickly. Posit/RStudio will be presenting the current ladscape regarding Shiny and large language models. This talk will discuss opportunities and applications for Shiny as an interface into LLMs as well as other use cases applicable to pharma to empower stakeholders and statistical programmers. This talk will explore and discuss areas such as the integration of Shiny with LLMS and where they are being used, terminology, current research, etc.
Real World Evidence and Big Data
RW-042 : Leveraging Health Technology Assessment (HTA) for Market Access: A Statistical Programming Perspective on German HTA Submission
Rachana Agarwal, Servier Pharmaceutical
Wendy Wang, Servier Pharmaceuticals
Tuesday, 1:30 PM – 1:50 PM, Location: Indigo 206
Leveraging Health Technology Assessment (HTA) for Market Access: A Statistical Programming Perspective on German HTA Submission Background: Health Technology Assessment (HTA) evaluates the real-world value of drugs in European markets, addressing pricing, patient access, and reimbursement beyond regulatory approval. In Germany, the Gemeinsamer Bundesausschuss (G-BA) leads this process, using evidence on quality of life (QoL), safety, and efficacy to assess a drug’s value. Successful HTA outcomes can secure favorable pricing, broad reimbursement, and improved access to innovative therapies for patients. Objective: To showcase the role of statistical programmers in preparing German HTA submissions and emphasize distinctions between regulatory and HTA requirements, particularly in statistical methodologies. Methods: Statistical programming techniques (such as: SAS or R) were utilized to process data, generate reports, and visualize results for a German HTA application. Tasks included time-to-event analyses for safety and QoL outcomes and detailed subgroup analyses to meet G-BA standards. These methods are aimed to substantiate the drug’s clinical and economic value, enhancing its prospects for market access. Results: The submission successfully demonstrated the drug’s value proposition, leading to reimbursement approval. Detailed analyses of QoL and safety metrics facilitated proactive engagement with HTA agencies and highlighted the need for tailored statistical approaches in HTA submissions. Conclusion: Statistical programming is pivotal in HTA submissions, bridging regulatory and market access needs. By effectively analyzing and presenting evidence, programmers ensure equitable and timely patient access to therapies. This work underscores their strategic importance in achieving successful HTA outcomes in the German market.
RW-057 : Conducting Survival Analysis in SAS using Medicare Claims as a Real-world data source.
Jayanth Iyengar, Data Systems Consultants LLC
Tuesday, 4:00 PM – 4:20 PM, Location: Indigo 206
Applications of Survival analysis as a statistical technique extend to longitudinal studies, and other studies in health research. The SAS/STAT package contains multiple procedures for performing and running a survival analysis. The most well-known of these are PROC LIFETEST and PROC PHREG. As a data source, Medicare claims are often used in Real-world evidence studies and observational research. In this paper, survival analysis and the SAS procedures for performing it will be explored and survival analyses will be conducted using Medicare claims data sets to assess patient’s prognosis amongst Medicare beneficiaries.
RW-110 : Beyond Tokenization: Considerations for Linking Healthcare Data Sets for Scientific Research
Jennifer Dusendang, Graticule
Yuval Koren, Graticule
Tuesday, 5:00 PM – 5:20 PM, Location: Indigo 206
Linking data from disparate sources for scientific studies is highly valuable when using real-world data. A common example is linking electronic medical record (EMR) data to medical claims data or data from specialized providers outside the EMR system. Although creating privacy preserving record linkage (PPRL) tokens is part of the linkage process, additional methods and considerations are necessary to produce a reliable and usable linked data set for scientific research. Linkage rates provide an upper bound of data set patient coverage and usability for analyses. However, these are typically basic calculations of how often PPRL tokens match between two data sets, without regard to availability of key data elements, study period overlap, or issues with duplicate or low-specificity tokens. In particular, not identifying an appropriate time-frame in which linked data is applicable for a study can lead to unexpected decreases in sample size and limited study feasibility. Additionally, the study cohort that results from linking likely has different characteristics than the original, unlinked cohort. To produce a linked data set that is appropriate for scientific studies, researchers and programmers should consider expected overlap of base populations within the data sets, reduction of linkage rates due to lack of data during relevant study periods, and using stable patient characteristics to handle false-positive linked patients. Although these concepts are applicable across programming languages, examples use SQL, Python, or R. This content is applicable for all skill levels.
RW-154 : Addressing early challenges in RWD data standardization for analysis and reporting of RWE studies
Xingshu Zhu, Merck
Li Ma, Merck
Bo Zheng, Merck
Tuesday, 2:00 PM – 2:20 PM, Location: Indigo 206
The increasing reliance on Real-World data (RWD) in medical research necessitates efficient methods for data transformation to utilize modular standardized codes. The fragmented non-standard format of RWD makes it inefficient and complicated to generate reproducible RWE. The paper explores a conversion process of RWD into a Basic Data Structure (BDS) format similar to Study Data Tabulation Model (SDTM). It aims to explore a methodology of identifying common data elements in RWE data and map them to SDTM-like BDS specifications format to facilitate use of existing standard code for RWE analysis. These efforts lay the groundwork for reproducible generation and analysis of RWE.
RW-184 : Addressing Challenges in Real-World Evidence Generation: The AI-SAS for Real-World Evidence Approach
Takuji Komeda, Shionogi Co., Ltd.
Yuki Yoshida, Shionogi & Co., Ltd.
Yohei Komatsu, TIS Inc.
Yoshitake Kitanishi, Shionogi & Co., Ltd.
Tuesday, 2:30 PM – 2:50 PM, Location: Indigo 206
Shionogi has developed a product, AI-SAS, that semi-automates programming tasks for clinical trials, leading to a 33% reduction in working hours. Our achievement, including AI-SAS, earned first place in the Innovative Problem Solver category at the 2024 SAS Customer Recognition Awards. Additionally, Shionogi is offering AI-SAS externally as part of our social contribution initiatives. In July 2024, the FDA issued guideline enabling the use of Real-world Evidence (RWE) in drug approval applications. Shionogi is expanding AI-SAS to generate RWE with plans to integrate generative AI technology. We develop a system using SAS Viya, which facilitates the implementation of machine learning and deep learning. To improve transparency, we identified key requirements: predefining analysis content in the protocol and statistical analysis plan (SAP) before conducting the analysis, creating analysis result reports, and ensuring consistency between document creation and analysis timings. To improve document creation efficiency, we use generative AI technology, which significantly assists researchers in drafting protocols and SAPs from their research questions. For executing analysis tasks, we use AI-SAS, which semi-autogenerates programs from past specifications and mock-ups. These processes are recorded using GitHub in the system. This approach addresses both efficiency and transparency. The semi-automation process covers protocol, SAP, and specification creation, considering compliance with FDA guideline, and can achieve reduction in work time. Persons who are interested in standizing the process of analyzing RWDs and programmers who develop SAS macros efficiently.
RW-234 : Going from PROC SQL to PROC FedSQL for CAS Processing – Common mistakes to avoid.
Vijayasarathy Govindarajan, SAS Institute
Wednesday, 9:00 AM – 9:20 AM, Location: Indigo 206
SAS 9 customers are increasingly looking at moving to SAS Viya to harness the power of the new distributed, in-memory, Cloud Analytic Services (CAS) engine. This often helps to speed up existing processes many times over and run analytics on huge datasets faster. One of the key areas of this migration involves updating SAS 9 PROC SQL code to take advantage of the processing capabilities of CAS. This is made possible by a new(er) procedure in the SAS arsenal: PROC FedSQL. There are many differences between PROC SQL and PROC FedSQL for CAS, from supported data types, available functions, applying formats, quoting strings, referencing macro variables etc. In my experience, users new to SAS Viya often make mistakes while migrating code to FedSQL which arise from a few basic misconceptions. This paper aims to clarify the key differences between PROC SQL and PROC FedSQL for CAS. It will also highlight common mistakes when adapting SQL code for CAS, offering guidance on how to avoid them. The goal is to help users leverage the power of CAS effectively without getting bogged down by a lengthy process of fixing small, easily preventable errors when converting code to FedSQL.
RW-289 : Practical Process in SAS of Using External Controls
Hui Mao, BioPier Inc.
Na Wang, BioPier
Lixin Gao, BioPier
Wednesday, 8:00 AM – 8:20 AM, Location: Indigo 206
Usage of real-world data (RWD) or historical data as external controls (EC) for assessing treatment effect in rare disease or early oncology drug development are increasingly accepted by regulatory agencies. Various methods have been developed to construct external control arm for single-arm trial or supplement control arm in the current trial. In this paper, we present practical implementations in SAS for the process to generate well-matched EC along with analytic methods in different settings, as well as relevant SAS codes and summary outputs.
RW-328 : An Introduction to Using PROC S3 in SAS to Access and Manage Objects in Amazon S3
Kevin Russell, SAS
Russ Tyndall, SAS
Wednesday, 8:30 AM – 8:50 AM, Location: Indigo 206
Amazon S3 (AWS S3) is an object storage service that offers its customers the ability to store, manage, analyze and protect your data. Millions of customers worldwide now utilize S3’s scalability, data availability, improved security and performance to store their data. Many of the companies now taking advantage of S3 storage are also SAS customers, so we have provided our customers multiple ways to interface with AWS S3. One of the primary methods SAS provides to interface with AWS S3 is PROC S3. PROC S3 is a Base SAS procedure which provides our customers the ability to perform object management in AWS S3. PROC S3 allows you to create buckets, directories and files in S3. It also allows you to list copy and delete objects. The purpose of this paper is to provide an overview of PROC S3 and the basic steps needed to start using the procedure. It will also discuss and provide examples of various tasks that can be performed with PROC S3. Lastly, it will provide an overview of the best method to gather information about the execution of the procedure in the log to assist in debugging when necessary. In SAS Technical Support, we have seen a surge in the number of our customers who are taking advantage of AWS S3’s ability to manage and store data. Understanding PROC S3 will allow you to incorporate your AWS S3 environment to SAS.
RW-340 : You don’t have to handle the truth! Three Things to Know about Synthetic Data
Catherine Briggs, SAS
Robert Collins, SAS Institute
Sundaresh Sankaran, SAS Institute
Tuesday, 3:00 PM – 3:20 PM, Location: Indigo 206
Organizations face challenges to analytics innovation when confronted with data that’s sensitive, restricted, imbalanced in classes or insufficient in volume. Synthetic data generation is an effective tool to help mitigate these challenges. First, we outline use cases for synthetic data, such as providing researchers more freedom in using novel methods, collaboration, and data sharing without the fear of violating data privacy regulations. Additionally, this approach can also be used to “scale up” a small population or underrepresented segment of a dataset when additional source data may not be available. Second, we discuss methods for generating, evaluating, and testing synthetic data for potential privacy leaks. Legacy anonymization methods diffuse the fidelity and utility of the source data while being prone to reversal attacks. We show techniques and results in multiple coding environments to mitigate these risks. Finally, we establish two perspectives of synthetic data generation – one, as an operation which provides usable, high quality data, and the second is through a paradigm depicting synthetic data generation as a robust and comprehensive process, with important upstream and downstream considerations. Through this session we hope to educate the audience on how to implement a robust and trustworthy synthetic data practice.
RW-352 : Time to Event Analysis from Sample Size Considerations to Results Interpretation in Simple Words
Iryna Kotenko, Intego Group LLC
Tuesday, 4:30 PM – 4:50 PM, Location: Indigo 206
This paper was inspired by multiple requests from colleagues seeking guidance on the key considerations when performing time-to-event analysis. Thus, the paper is a response to those requests, offering a clear and complete overview of the process. It explains when and why this type of analysis is useful, how to decide on the right sample size, and what statistical methods can be used. It also provides practical tips for understanding the results and avoiding common mistakes. Real-world and industry-specific examples are included to make the concepts easy to understand, not just for statisticians but for professionals from various fields.
RW-374 : The Many Ways to Build Cohorts to Effectively Generate Real World Evidence and Bring Drugs to Patients Faster
Sherrine Eid, SAS Institute
Mary Dolegowski, SAS
Wednesday, 10:00 AM – 10:20 AM, Location: Indigo 206
The process of defining and building patient cohorts in pharma today involves a mix of automated and manual methods that leverage structured data, unstructured data, and advanced analytics. Each approach carries trade-offs between scalability, accuracy, interpretability, and resource requirements. The importance of these methods lies in their ability to produce reliable real-world evidence that informs clinical practice, supports regulatory decisions, and ultimately improves patient outcomes. Properly defining patient cohorts affects study validity and reliability as accurate definitions reduce bias and confounding, ensuring that the evidence generated is robust and valid. They enable meaningful comparisons between treatments and help in drawing reliable conclusions about effectiveness and safety. Regulators and payers increasingly rely on RWE to complement randomized clinical trials (RCTs) in decision-making. There is a plethora of examples illustrating how well-defined cohorts support regulatory submissions, label expansions, and post-marketing surveillance. Identifying distinct patient subgroups can lead to targeted therapies and more personalized treatment strategies. This enables the exploration of treatment effects in real-world subpopulations that might be underrepresented in RCTs. Standardized and reproducible cohort definitions streamline the research process, making studies more efficient and scalable and facilitates data sharing and collaboration across institutions, which is crucial for large-scale observational studies. This session will explore the many options researchers have at their disposal to efficiently and accurately define cohorts and accelerate evidence generation in support of drug development and integrated evidence packages.
Solution Development
SD-044 : Running the CDISC Open Rules Engine (CORE) in BASE SAS©
Lex Jansen, CDISC
Monday, 10:00 AM – 10:50 AM, Location: Aqua Salon D
CDISC Conformance Rules are an integral part of the Foundational Standards and serve as the specific guidance to Industry for the correct implementation of the Standards in clinical studies. The overall goal of the CORE Initiative is to provide a governed set of unambiguous and executable Conformance Rules for each Foundational Standard, and to provide an open-source execution engine for the executable Rules which are available from the CDISC Library. The source code of the CORE engine is available on the GitHub repository. A CLI (Command Line Interface) is available on the repository which allows users to run the rules under Windows, Mac, and Linux. If users want to run the Engine in their own Python environment or tooling, it can be implemented as it is available on PyPi (Python Package Index). For SAS users it is not always an option to run applications as a Command Line Interface. The presentation will begin with a brief overview of the CDISC CORE concept. The CORE Engine will then be covered. Then the presentation will describe a proof of concept where the CDISC CORE CLI commands have been implemented into SAS processes as Python functions in PROC FCMP, passing parameters and code to the Python interpreter and returning the results to SAS. These Python functions can be called and executed by user-defined SAS functions, which can be called from the DATA step or any context where SAS functions are available.
SD-054 : Automated Word Report Generation for Standardized CMC PC Study with Programmatically Inserted Contents in R
Song Liu, Merck & Co., Inc
Jiannan Kang, Merck
Monday, 9:00 AM – 9:20 AM, Location: Aqua Salon D
Biologics process characterization study reports have prescriptive formats and contents. The current BPC-Stat tool (JMP add-in) can produce word reports containing statistical analysis results. But this output does not have the reference linkage and required outline structure. Manual efforts and additional contents such as description of the model effects are still required from the project scientists. This paper demonstrates a new tool to use R/R Markdown/R shiny to automatically generate the Word report file with standard format, consistent language, auto-populated interpretation texts, and auto-generated TOC, TOT, TOF, header/footer, content-driven cross-references, and prompts. With this tool, PC Study Report is automatically generated with a few user inputs of document number, document title, etc. and one click of the download button from the R shiny app which shared by the developer. This tool is a great way to reduce the likelihood of human errors that often arise while performing numerous repetitive tasks and divert resources from boring copy/paste type of exercises to challenging problem solving.
SD-098 : A Method to Add Additional Necessary Datasets to Existing Define.XML
Jeff Xia, Merck
Sandeep Meesala, Merck & Co. Inc.
Monday, 1:30 PM – 1:40 PM, Location: Aqua Salon D
Define-XML is required by global health agencies for every study in each electronic submission to inform regulators about the datasets, variables, controlled terms, and other specified metadata utilized. Once a define-XML file is created, there are scenarios that necessitate updating the existing file to add additional datasets. For example, ADBASE may be added to include further baseline characteristics in response to agency information requests following the initial submission, or PK/PD-related ADaM datasets may be included for specific country requests or based on study specific needs. The manual creation and updating of define-XML files can be time-consuming and prone to errors, particularly when managing multiple datasets. This paper presents a method to automate the incorporation of additional datasets into existing define-XML files, significantly reducing preparation time and effort. This automation enhances the accuracy and consistency of metadata, thereby improving the quality of submissions and ensuring compliance with agency standards.
SD-116 : Use of SAS Packages in the Pharma Industry – Opportunities, Possibilities and Benefits
Bart Jablonski, yabwon
Tuesday, 10:00 AM – 10:50 AM, Location: Aqua Salon D
When working with SAS code, especially with a complex one, there is a moment when you decide to break it into small pieces. You create separate files for macros, formats/informats, for functions or data too. Eventually the code is ready and tested, and sooner or later you will want to share it with another SAS programmer. You have developed a program using local PC SAS, but the deployment is on a server with a different OS. Your code is complex (with dependencies such as multiple macros, formats, datasets, etc.) and is difficult to share. Often when you try to share code, the receiver will quickly encounter an error because of a missing macro, missing format, or whatever…small challenge, isn’t it? In this article we discuss a solution to the problem- the idea of SAS Packages, what they are, how to use and develop them, and how to make code-sharing a piece of cake. And of course what opportunities, possibilities, and benefits SAS Packages bring to the pharmaceutical (among others) industry are also discussed.
SD-131 : Enhance your Coding Experience with the SAS Extension for VS Code
Jim Box, SAS Institute
Monday, 8:30 AM – 8:50 AM, Location: Aqua Salon D
Visual Studio Code (VS Code) is an open-source code editor that is very popular among developers for its ease of use across all programming languages which is driven by a robust extension ecosystem. The SAS VS Code extension is an open-source, freely available add-on that allows you to use VS Code to connect to any modern SAS Environment, from SAS 9.4 on your local machine to SAS Viya in the cloud. The key features include Syntax Highlighting, Code Completion, Syntax Help, Data Viewer, and my favorite, SAS Notebooks, which offer an exciting way to share content and comments. We’ll take a look at the extension, how to use it, and explore how you can get involved with the direction of how this product evolves.
SD-138 : Using SAS with Microsoft 365: A Programming Approach
Chris Hemedinger, SAS
Tuesday, 11:00 AM – 11:50 AM, Location: Aqua Salon D
In today’s cloud-connected world, traditional methods of accessing Excel data from SAS are becoming obsolete. With more content stored in SharePoint Online and OneDrive (hosted in Microsoft 365), it can be challenging to get SAS to read these files and publish new ones to these locations. This paper guides you through the steps of connecting your SAS programs to Microsoft 365, enabling you to read and write files to SharePoint folders, OneDrive folders, and Microsoft Teams. You will learn how to use SAS to connect to Microsoft 365 using the Microsoft Graph APIs. Additionally, we will introduce SAS macros developed to simplify common tasks: listing your files, reading files into SAS, and publishing new files from SAS. By the end of this session, you will have the tools and knowledge to seamlessly integrate SAS with Microsoft 365, enhancing your data management and collaboration capabilities.
SD-142 : Share your Macros and Programs with SAS Studio Steps and Flows
Jim Box, SAS Institute
Pritesh Desai, sas
Tuesday, 1:30 PM – 1:50 PM, Location: Aqua Salon D
You’ve probably got a huge macro and programming library at your disposal, but no way of leveraging all of that capital in any way but inside a SAS program. We’ll look at how using SAS Custom Steps and Flows will allow to unlock the full potential of your work. By leveraging these capabilities, non-programmers can seamlessly run analyses and other complex processes that you built with minimal training and intervention on your part. This integration not only enhances collaboration but also democratizes access to powerful analysis tools, enabling a broader audience to contribute to data-driven decision-making. Through practical examples and detailed guidance, this paper showcases the simplicity and efficiency of converting SAS macros into user-friendly tools, ultimately fostering an inclusive and efficient research environment.
SD-143 : Low-Code Solutioning in SAS Viya for Automated Clinical Data Quality, Decisioning and Harmonization
Mary Dolegowski, SAS
Scott McClain, SAS Institute
Monday, 2:00 PM – 2:20 PM, Location: Aqua Salon D
Fact checking clinical data quality is a ubiquitous need in drug trials. It’s typically a very manual, code-heavy, time-consuming process. Complications increase due to multiple data vendors. The pharma customer demand is for a reduction in human error and to speed up data processing. Our objective was to automate quality review using statistics models, language models and business rules to reduce inaccuracies and time. A full application interface was created to allow human review at critical points in decision making and data harmonization. The result is an automated end-to-end data low-code pipeline that reduces human-in-the-loop manual review with Viya out-of-box capabilities, supporting better time-to-registration for drug development.
SD-155 : Integrating SAS DDE to Automate Excel Task Tracking in Pharmaceutical Statistical Programming
Amy Zhang, Merck & Co.
Huei-Ling Chen, Merck & Co.
Monday, 1:45 PM – 1:55 PM, Location: Aqua Salon D
Statistical programming teams in pharmaceutical companies often use Excel spreadsheets to align team members and keep track of the deliverables and programming tasks assigned to programmers and statisticians. Each of these files serves its own purpose, but also shares related information with the others. For instance, the analysis dataset specification file contains spreadsheets with the metadata for all the datasets. Another spreadsheet holds the details for the tables, listings, and figures (TLFs) needed for a deliverable, including TLF titles, output file names, macros, and their calling programs. An additional programming tracking spreadsheet outlines the programming tasks for each SAS program, covering development, validation, and peer review. It is important to ensure that the information in these files remains consistent. Several options exist for creating and populating new spreadsheets, both within SAS and externally. However, many of these Excel files have built-in formulas or templates that need to be kept intact when updating them for new studies. This paper presents a SAS macro to generate a programming and validation tasks tracker directly from already available data of ADaM dataset specifications and TLF deliverable list using the SAS Dynamic Data Exchange (DDE) method.
SD-176 : AI Search LOG
Zhuo Chen, BridgeBio Pharma
Martha Cao, BridgeBio Pharma, LLC
Ted Lystig, BridgeBio Pharma, LLC
Sateesh Arjula, BridgeBio Pharma, LLC
Monday, 2:30 PM – 2:50 PM, Location: Aqua Salon D
In a clinical study, it is important to make sure LOG files generated by SAS programs are clean. Search LOG files maybe needed daily during the study program development cycle for SDTM, ADaM, Tables Listings and Figures etc. This paper describes an automated method to reduce human labor in searching LOG efforts therefore helps team to gain efficiency.
SD-177 : Bridging RStudio and LSAF: A Framework for Faster and Smarter Task Execution
Jake Adler, Alexion Astrazeneca Rare Disease
Ben Howell, SAS
Lindsey Barden, Alexion
Monday, 3:00 PM – 3:20 PM, Location: Aqua Salon D
Open-source technologies are transforming today’s clinical trials. Flexible open-source editors like RStudio are the perfect place to build dashboards that accelerate exploratory programming work. RShiny dashboards built on REST APIs connect directly to clinical data repositories. With available data, users can quickly create multiple TLFs and view them in real time. Any number of input parameter fields allow control over which studies, deliveries, domains, and variables are used. This paper will discuss how to create the dashboard, show examples of specific use cases where the dashboard helps save time, and offer inspiration for future enhancements to make the dashboard a more comprehensive, efficient clinical programming tool.
SD-196 : Optimizing SAS Programming Pipelines Using the %Unpack and %SearchReplace Macros for Version Control and Customization
Ning Ning, PROMETRIKA LLC
Assir Abushouk, PROMETRIKA, LLC
Monday, 4:30 PM – 4:50 PM, Location: Aqua Salon D
This paper presents a two-part macro solution designed to optimize SAS programming workflows in pharmaceutical companies and contract research organizations (CROs). The first macro, %Unpack, extracts files from a zip directory, ensuring quick access to the most up-to-date versions of SAS macros, programming shells, and directory structures. The second macro, %SearchReplace, automates the process of searching and replacing specific strings within SAS programs across multiple folders, allowing users to easily customize code for different needs. We demonstrate the benefits of this integrated solution for regulatory compliance and pipeline efficiency, and explore the potential for using each part of the macro independently for various tasks across different environments. This solution is implemented in SAS 9.4 and is designed to work in environments with standard operating systems.
SD-218 : How to get your SAS’Python’R workout on a new SAS Viya Workbench.
Pritesh Desai, sas
Samiul Haque, SAS Institute
Monday, 4:00 PM – 4:20 PM, Location: Aqua Salon D
The rapid evolution of technology demands flexible, efficient, and high-performance development environments for modelers and data scientists. SAS Viya Workbench addresses these needs by providing a cloud-based platform that supports experimentation, innovation, and integration with SAS, Python, and popular IDEs like Visual Studio Code and Jupyter Notebook. Learn how SAS Viya workbench simplifies data access, accelerates development, streamlines deployment, and optimizes infrastructure costs through scalable, configurable environments. This presentation will showcase some key features including seamless data import, fast provisioning, version control via Git, and support for advanced analytics and AI/ML modeling. Know how SAS Viya Workbench empowers users to build and deploy models faster, with trusted outputs and minimized IT dependency, enabling efficient and cost-effective project execution.
SD-220 : Building Robust R Workflows: Renv for Version Control and Environment Reproducibility
Junze Zhang, Merck Co., Inc
Joshua Cook, University of West Florida (UWF)
Tuesday, 2:30 PM – 2:50 PM, Location: Aqua Salon D
Reproducibility and consistency are vital in pharmaceutical analytics, where analyses must meet rigorous industry standards. Managing R environments effectively ensures consistent and reproducible results across projects, teams, and systems. This paper focuses on the use of renv, a powerful R package designed to achieve version control and environment reproducibility. We will demonstrate how renv allows users to create isolated project environments, lock package versions, and restore dependencies seamlessly across systems and teams, ensuring uniformity in analytical outputs. Attendees will learn practical workflows for initializing and managing renv environments, sharing them among team members, and resolving compatibility challenges. The session will also explore how renv integrates with version control systems, such as git, enabling teams to manage R environment changes alongside code revisions, thus reducing errors and enhancing collaboration. Additionally, we will showcase how renv can be combined with reporting tools like Quarto to create dynamic, reproducible outputs tailored for regulatory submissions with an emphasis on team collaboration. Through real-world examples and actionable insights, participants will leave equipped to harness renv for improved version control and workflow consistency in their pharmaceutical analytics projects. Whether working with clinical trial data or preparing regulatory reports, this session offers the tools needed to elevate R workflows to industry-leading standards. Join us to explore the transformative potential of renv for reproducible and efficient analytics in pharmaceutical research.
SD-228 : A SAS® System 7-zip macro that creates a zip archive and a file archive macro: versioning in the context of a private library of programs.
Kevin Viel, Navitas Data Sciences
Tuesday, 2:00 PM – 2:20 PM, Location: Aqua Salon D
Versioning of files is essential in any project, but especially in a regulated industry, when the project has multiple resources (programmers, project managers, biostatisticians, et cetera), and in a fast-paced environment. When a programmer obtains a program from another project/delivery or in the midst of a complicated update with changing specifications, like saving a file while working on it, keeping a version of an updating file, a snapshot, is wise and helpful. While mature, well-developed versioning platforms are freely available, they may not be approved for use and have a learning curve. The goals of this paper is to describe a SAS® System macro that uses 7-zip, a file archiver with a high compression ratio (https://www.7-zip.org/), a macro based on it to create a zip archive and its corresponding table of contents file (CSV), and macro to demonstrate the creation of an archive. The advantage of 7-zip over the ZIP engine to the FILENAME statement is that the datetime of the target file is preserved including after extraction . One use would be for a programmer to create a local archive and, thus, a personal library of files (programs) including their statuses (final or ongoing) for a given folder (project) and annotations such as the (first) use of a macro, technique, or derivation, while providing the ability to quickly and easily find a candidate program to adopt to a new delivery or project or to roll back to an earlier version while keeping other versions of the file.
SD-230 : A SAS® System PowerShell macro to report directory-level metrics by Owner (programmer).
Kevin Viel, Navitas Data Sciences
Tuesday, 4:00 PM – 4:20 PM, Location: Aqua Salon D
In the Windows operating system, the Owner of a file is typically the person who last saved a file, which we expect to be the author or responsible person for the file. The SAS® System includes the FINFO() function, which returns file metadata, but not the Owner on the Windows operating system. The Windows PowerShell, however, does return the Owner. While auditing a directory or set of directories, such information is required by a Lead Programmer or other manager and can be used to provide metrics, such as the distribution of times required to execute programs or the use of programming techniques or code, such as macros. The goal of this paper is to describe a SAS macro, MAC_U_FINFO, to abstract file metadata such as datetime (of last modification) or bytes, and, a SAS macro, MAC_PS_GETCHILDITEM, which, if when on the Windows operating system, abstracts the Owner of a file. Such data can help compile programmer-specific metrics, assure that the program header correctly lists the responsible author, or determine how often a specific macro or code was used.
SD-257 : Enhancing Clarity and Efficiency in Clinical Statistical Programming Task Management: An Automated Integrated Task Reporting Solution with Excel and Python
Xinran Hu, Merck
Tuesday, 3:00 PM – 3:20 PM, Location: Aqua Salon D
Effective task management is critical in clinical statistical programming, where multiple studies and tight deadlines create challenges in tracking deliverables. This paper presents an automated reporting solution using Python and Excel to streamline task updates. The system extracts data from an Excel tracker, categorizes tasks by due date and priority, and generates automated reports for team members and leaders at scheduled intervals. Built with pandas, pywin32, and matplotlib, it runs on Windows and integrates with Task Scheduler for full automation. A case study demonstrates its impact on efficiency, accountability, and proactive task management.
SD-334 : A codelist generator for define.xml using a SAS Studio macro and RStudio function
Valerie Cadorett, Pfizer Inc.
Tuesday, 8:30 AM – 8:50 AM, Location: Aqua Salon D
A define.xml codelist should contain all possible terminology for a variable, not just the terms present in a dataset. Import Validation Metadata feature in Pinnacle 21 Enterprise (P21E) has two options for updating a define.xml: merging changes or overwriting it completely. Merging changes retains all metadata but does not create codelists for variables with more than 30 terms and does not remove unused terms. Overwriting the define.xml will only keep terms that are present in the dataset and will remove the other terms. The objective of this paper is to share a SAS Studio macro and RStudio function designed to aid reconciliation and validation of codelists and generate define.xml codelists based on study data. The generator addresses limitations in P21E when updating codelists through Import Validation Metadata feature. The CDISC Library, which is a metadata repository, is used to access and download Controlled Terminology. Key benefits include: Retention of unused metadata in the define.xml during codelist updates. Creation of codelists for variables with more than 30 terms. Simplification of updates to ensure compliance with CDISC standards.
SD-339 : Improve Your CRF Review Process: A Python-Based Approach to Capturing CRFs via Browser Automation
Andrew Herndon, Spark Therapeutics
Tuesday, 9:00 AM – 9:20 AM, Location: Aqua Salon D
Case Report Form (CRF) design and review is a foundational process for a successful clinical trial. It typically occurs following protocol finalization and requires cross-functional feedback from multiple stakeholders such as Clinical Operations, Clinical Development, Pharmacovigilance, Statistics, and Statistical Programming, often under expedited timelines. However, the default annotated CRFs generated from Electronic Data Capture (EDC) systems are poorly formatted and unreflective of the actual CRF appearance, leading to inefficient reviews that can jeopardize study start-up timelines. We present an automated solution, with complete Python code and implementation steps, that captures CRFs directly from EDC using Python and open-source libraries like Selenium. Our approach authenticates into the EDC system, processes CRF URLs from Excel, removes unnecessary UI elements, incorporates dropdown options as text, and generates well-formatted PDFs that accurately represent the EDC forms. Implementation during a recent Rave EDC study build demonstrated significant improvements in review efficiency and stakeholder satisfaction. This approach can be implemented by any team to improve CRF review quality and maintain critical study timelines.
SD-371 : SAS Program for Backup Zipping
Tong Zhao, LLX Solutions, LLC
Monday, 5:00 PM – 5:10 PM, Location: Aqua Salon D
In our everyday work, when a study is completed or some documents and datasets in the folders are out of date, we usually create zip files for archives to save memory and space. It can be time consuming when there are dozens of folders that need to be zipped separately with their original names. This SAS program is developed to replace the folders with the zip files in the same names within the directory. Everything will be finished with a batch run of this SAS program. This paper provides an example of its usage and explains the program step by step. The program covers some aspects of using command line tools in SAS, which can be used as a tiny practice or example for people who are interested in incorporating different software in SAS programming.
SD-400 : Unleash Your Coding Potential: SAS PRX Functions for Next-Level String Manipulations
John LaBore, SAS Institute
Monday, 11:00 AM – 11:50 AM, Location: Aqua Salon D
Explore the power of SAS PRX functions to elevate your string manipulation skills and enhance coding efficiency. These essential tools, available in both SAS 9 and SAS Viya, allow advanced SAS programmers to achieve complex data analysis with fewer lines of code and tackle challenges beyond traditional functions. Syntax examples and valuable resources are provided to help kickstart your usage.
Statistics and Analytics
SA-004 : Promising Zone Designs for Sample Size Re-estimation in Clinical Trials: Graphical Approaches Using SAS
Zhao Yang, Bicara Therapeutics
Shivani Nanda, HUTCHMED
Monday, 1:30 PM – 1:40 PM, Location: Aqua Salon F
The promising zone design is a highly valuable tool for achieving more efficient and effective drug development. It allows for modifications of sample size based on unblinded interim data, enhancing trial efficiency and the likelihood of success. During the design stage of a trial, clear communication among various functional stakeholders is crucial to understand and align on the proposed promising zone design. Graphical displays can effectively facilitate this communication process. However, there are currently no resources in SAS to readily create these desired graphical displays. This paper presents SAS programs for producing these graphical displays and includes brief yet informative key underlying technical details. An example is provided to demonstrate its implementation. Hopefully, these accessible SAS programs will encourage broader application of promising zone designs, which are expected to play an increasingly important role in developing new therapies.
SA-101 : Sensitivity Analysis for Overall Survival
Binal Mehta, Merck & Co.
Patel Mukesh, Merck & Co INC
Monday, 8:30 AM – 8:50 AM, Location: Aqua Salon F
Overall survival(OS) refers to the length of time from either the date of diagnosis or the start of treatment for a disease, such as cancer, during which patients diagnosed with the disease remain alive. It is an objective endpoint that directly evaluates the effectiveness of new treatments in prolonging patient life and is often a crucial criterion used in regulatory approvals for new treatments. Sensitivity analysis for overall survival is a statistical method used to assess how various assumptions and parameters such as handling of censoring, missing data, extreme data points, covariates and subgroups, or model components impact the results. The purpose of sensitivity analysis in this context is to gauge the reliability of conclusions drawn from survival data. This helps statisticians and clinicians determine whether their survival estimates remain robust under various conditions. In this paper, we will describe two widely used methods at our organization: inverse probability of censoring weighting (IPCW) and two-stage analysis, focusing on how they address the handling of censoring, covariates, and subgroups. INTRODUCTION This paper will outline details on how IPCW and two-stage analysis are carried out in late-stage clinical trials. It will review the specification and development of core variables, the step-by-step derivation of the key variables required in the analysis dataset, the programming complexity, and the problems and proposed solutions.
SA-108 : Automating Superscript Display for Upper and Lower Limit of Quantification Values in Pharmacodynamic Tables
Anu Eldho, Kite Pharma
Monday, 1:45 PM – 1:55 PM, Location: Aqua Salon F
Pharmacodynamics plays a critical role in clinical trial oncology, as it helps to understand and evaluate the therapeutic effects and mechanisms of action of anticancer drugs. This paper aims to develop a macro that automates an essential and time-consuming part of superscripting the upper limit of quantification (ULOQ) and lower limit of quantification (LLOQ) values in the cytokine table outputs. This tool simplifies the task of biostatisticians, pharmacodynamic scientists, and medical writers in superscripting the outputs manually, which is time-consuming and prone to human errors. Scientists can accurately contextualize drug safety and efficacy conclusions by clearly labeling values below or above the validated range. ULOQ, the highest concentration of cytokine values, and LLOQ, the lowest concentration of cytokine values, define the range within which the analytical method can reliably and accurately quantify the concentration of a particular cytokine in a sample. Establishing an appropriate limit of quantification values is crucial in cytokine analysis to maintain the quantification range, data interpretation, dilutional integrity, and quality control. Manual assignment of the superscripts to flag these values is tedious and impractical. This macro helps to compare the actual results values in the data transfer with the cytokine limit of quantification values in the lookup table provided by scientists and flag the consistent values in the TFL outputs. Any updates to a specific cytokine assay will trigger the lookup update, and the programmers can refresh the table outputs without any macro updates.
SA-124 : A Step-by-Step Guide to Calculating Relative Dose Intensity in Solid Tumor Studies
Yan Xu, Abbvie
Pingping Xia, AbbVie
Jagadesh Mudapaka, AbbVie
Monday, 9:00 AM – 9:20 AM, Location: Aqua Salon F
The Relative Dose Intensity (RDI) is an essential parameter used to measure the actual dose over a specific period relative to the planned dose. Calculating RDI in oncology solid – tumor studies is critical for evaluating treatment efficacy and patient outcome. This paper demonstrates a practical application of the RDI formula. Through a solid tumor study as an example, this paper illustrates how to derive each component in the RDI formula and accurately compute RDI, even in complex study conditions. It also talks about resetting the weight baseline for significant weight changes, while keeping the units of the numerator and denominator in the RDI formula consistent with the weight variable. The objective of this paper is to provide a detailed, step-by-step guide for calculating RDI in a solid tumor study, including discussions on scenarios with varying cycle lengths, incomplete cycles, and different units for the actual and planned doses.
SA-130 : An Introduction to Obtaining Test Statistics and P-Values from SAS® and R for Clinical Reporting
Brian Varney, Experis
Monday, 4:00 PM – 4:20 PM, Location: Aqua Salon F
Getting values of test statistics and p-values out of SAS and R is quite easy in each of the software packages but also quite different from each other. This paper intends to compare the SAS and R methods for obtaining these values from tests involving Chi-Square and Analysis of Variance such that they can be leveraged in tables, listings, and figures. This paper will include but not be limited to the following topics: – SAS ODS trace – SAS PROC FREQ – PROC GLM – R stats::chisq.test() function – R stats::aov() function – R broom package functions The audience for this paper is intended to be programmers familiar with SAS and R but not at an advanced level.
SA-168 : The Allowable Total Difference Zone: A construction method using the ATDzone SAS® Macro
Jesse Canchola, Roche Diagnostics Solutions
Natasha Oza, Roche Diagnostics Solutions
Monday, 4:30 PM – 4:50 PM, Location: Aqua Salon F
When comparing two systems using the same item or “sample” to produce at least two “paired” results (one on each system), for example, a new versus an older system for a molecular assay, typical method comparison methodologies used are Ordinary Least Squares (OLS), Deming, or Passing-Bablok regression (Passing & Bablok (1983), bias plots that include Bland-Altman and Error Grid Analysis (EGA) (Passing & Bablok, 1983; Linnet, 1998; Linnet, 1993; Bland & Altman, 1986; Clark et al., 1987; Parkes et al., 2000). One additional enhancement to most of these methods, not typically used for this type of analysis, called Allowable Total Difference (ATD) Zone (CLSI EP21-A, 2003; Krouwer, 2008), utilizes the reproducibility results of the older system (for example, from package/product inserts or from product requirements document) to construct the boundaries or limits that define where 95% of the differences between the two repeated measurements by the older system should be inside of those limits. Two applications of the ATD Zone include using it in a scatterplot [New system (Y) vs. Older system (X)] and/or in a bias plot [viz., Difference (Y – X) vs. X (Krouwer bias plot; Krouwer, 1987; Krouwer & Cembrowski, 1991) OR Difference (Y-X) vs. average of (X, Y) (i.e., Bland-Altman bias plot; Bland & Altman, 1986)]. Producing ATD Zone plots can be a challenging programming endeavor. However, the authors a introduce a SAS® macro, ATDzone, that simplifies their creation for any method comparison task at hand with minimal inputs.
SA-171 : Super Learner for Predictive Modeling and Causal Analysis
Honghe Zhao, SAS Institute
Clay Thompson, SAS Institute
Michael Lamm, SAS Institute
Wednesday, 8:00 AM – 8:20 AM, Location: Aqua Salon D
Model misspecification is a common challenge in causal analysis, where statistical models are used to understand the complex relationships among outcomes, exposures, and baseline characteristics in real-world observational data. Misspecified models can easily lead to biased estimates of causal effects and incorrect conclusions. The super learner algorithm tackles this problem by enabling you to consider multiple candidate models with different specifications and combine them into a single, robust model. The super learner leverages the strengths of these candidate models to learn the complex relationships between the variables, reducing the risk of overfitting and model misspecification. This makes it well-suited for applications such as patient outcome predictions and causal analysis using real-world observational data, where complex interactions and dependencies among the variables are present. This paper discusses the implementation of the super learner algorithm in SAS® Visual Statistics software by using the SUPERLEARNER procedure. It also illustrates how you can use this procedure to build ensemble models for both predictive and causal analysis tasks.
SA-173 : Calculating exact posterior probabilities and credible intervals from Bayesian borrowing robust mixture priors for binary, count and continuous outcomes in R and SAS
Darren Scott, AstraZeneca
Armando Turchetta, AstraZeneca
Monday, 8:00 AM – 8:20 AM, Location: Aqua Salon F
Recently, Bayesian statistical methods have been developed that leverage historical clinical trial information to assist in the discovery, development, and delivery of medicines. “Dynamic” models preserve the primacy of the data in the target study by discounting the external information according to how closely it matches the target data. A popular approach for “dynamic borrowing” is a robust mixture prior (RMP), where an informed element(s) is combined with a vague element and a prior weight. Typically, in the Bayesian paradigm inference is performed through various Monte Carlo (MC) sampling schemes. A prudent choice of RMP leads to a conjugate mixture posterior for the control and treatment parameter. In this article we provide insights into how simple mathematical concepts lead to the application of numerical integration in R and SAS to obtain fast an accurate inference from the posteriors after Bayesian borrowing. Exact probabilities also simplify and reduce timing of the validation process for clinical trial reporting. We use an example with binary outcome data to calculate the exact probability of a treatment effect, the prior weight tipping point and credible intervals on the control rate, treatment rate and treatment effect. Finally, we detail the approach for count and continuous data in relation to the binary example, with code which can implemented by statistical programmers.
SA-183 : “Leveraging Python for Statistical Analysis in Public Health: Techniques and Visualizations for Life Sciences Professionals”
Michael Carnival, University of West Florida
Wednesday, 10:00 AM – 10:20 AM, Location: Aqua Salon D
Public health is of vital importance to society, aiming to protect and improve the health of individuals and communities through disease prevention, health promotion, and equitable access to healthcare services. A robust public health system not only fosters healthy lifestyles but also ensures the availability of skilled healthcare professionals, maintains high-quality standards, and addresses health disparities. To achieve these goals, statistical methods and analytical techniques are essential for identifying trends, addressing challenges, and evaluating the effectiveness of public health interventions. This work focuses on leveraging Python for statistical analysis using real-world public health data. Key topics include generating visualizations such as boxplots, histograms, scatterplots, and pie charts, as well as performing statistical analyses like correlation coefficient computation, normality testing, ANOVA, and goodness-of-fit tests. These techniques and visualizations provide meaningful insights that can inform decision-making for stakeholders in the healthcare sector. By the end of this content, readers will gain practical skills in applying Python libraries to analyze data, interpret results, and address public health challenges in their respective fields or industries.
SA-189 : Deciphering Exposure-Response Analysis Datasets: A Programmer’s Perspective for Oncology Studies
Sabarinath Sundaram, Pfizer
Wednesday, 9:45 AM – 9:55 AM, Location: Aqua Salon D
Exposure-Response (E- R) evaluation is essential in drug development and regulatory reviews by informing decision-making towards optimized trial design, dose and regimen selection, and benefit-risk assessments in both early and late-stage trials. Analyzing the relationship between drug exposure and treatment outcomes using E-R data provides a level of granularity to support the primary evidence of a drug’s safety (identifying negative effects) and/or efficacy (positive effects). The preparation of high-quality E-R datasets is a key step in this space, which can get challenging especially in oncology studies which are quite complex and involve multiple factors and mechanisms. The creation of these E-R analysis datasets requires a comprehensive mix of data sources, including drug exposure, patient demographics, key covariates, PK/PD data, and key primary or secondary endpoints of the clinical trial. This intricate process demands strong programming expertise and a deep understanding of PK and PD, as it requires ongoing collaboration with PK modeling scientists to ensure accurate and meaningful analysis. This paper will explore the role of E-R analysis datasets in regulatory submissions, address key challenges in their creation, and examine the FDA’s guidance on E-R analysis. We will also discuss the development of ADaM standard E-R datasets and present masked dummy data and models to illustrate the practical application of E-R analyses. Ultimately, this paper emphasizes the importance of E-R evaluations in advancing drug development and optimizing therapeutic outcomes.
SA-198 : A SAS Macro Calculating Confidence Intervals of the Difference in Binomial Proportions from Stratified Analysis using the Miettinen & Nurminen Method with Cochran-Mantel-Haenszel Weights
Mikhail Melikov, Cytel
Brian Mosier, EMB Statistical Solutions
Monday, 10:00 AM – 10:20 AM, Location: Aqua Salon F
The Miettinen & Nurminen (MN) method with the Cochran-Mantel-Haenszel (CMH) weighting strategy is often recommended for constructing confidence intervals for the common risk difference in stratified analysis. However, it has not been implemented into SAS 9.4/SAS Studio for stratified analysis, although it is available in SAS Viya. We acknowledge that many Biostatisticians and Statistical programmers in the industry are working with SAS 9.4/SAS Studio, therefore we implemented the MN method with CMH weights (Lu, 2008) in a SAS Macro to make this analysis available for a broad SAS user audience. The SAS macro calculates the common risk difference in stratified analysis, confidence intervals at user defined significance levels, and the corresponding p-value. We do not expect the SAS macro to be dependent on operating system or SAS software version.
SA-205 : Programming Perspectives for Efficient and Accurate C-QTc Analysis
Lingjie Zhang, Merck
Lata Maganti, Merck
Richard Moreton, Merck
Runcheng Li, Merck
Monday, 10:30 AM – 10:50 AM, Location: Aqua Salon F
The significance of Phase I concentration-QTc (C-QTc) analysis in clinical trials lies in its ability to assess potential drug-induced cardiac effects, critical for ensuring patient safety. This paper aims to provide detailed recommendations on planning and conducting QTc assessments using C-QTc modeling, highlighting study design features, modeling objectives, and best practices based on current scientific literature and the authors’ personal experiences. Also, this paper provides an overview of how ADaM datasets, namely ADSL, ADPC, ADEG, and ADPKQT, facilitate data processing and statistical analysis in the context of C-QTc analysis. The objectives include presenting techniques, challenges, and solutions integral to programming for C-QTc analysis, ultimately enhancing the reliability of findings in drug safety evaluations.
SA-207 : Clopper Pearson CI? get your data ready for it!
Ruth Rivera Barragan, Ephicacy Consulting Group
Isaac Vazquez, Ephicacy Consulting Group
Monday, 2:00 PM – 2:10 PM, Location: Aqua Salon F
This paper addresses the challenge of calculating confidence intervals (CIs) for binomial proportions using the Clopper-Pearson method when faced with incomplete datasets. The Clopper-Pearson method is known for providing exact CIs, especially in situations with small sample sizes or extreme proportions. However, missing data can compromise the accuracy of these intervals. The primary focus of this paper is on the programming aspects of handling incomplete data in SAS, aiming to automate the process of data completion while ensuring the correct calculation of CIs. This work is particularly relevant for statistical programmers who work with SAS and need to handle missing values in binomial data without diving deeply into the statistical theories behind the Clopper-Pearson method. The paper outlines a step-by-step workflow that addresses data gaps and integrates the Clopper-Pearson method for CI calculation. Through practical examples and SAS code snippets, it demonstrates how programmers can effectively manage missing data and compute reliable CIs. This approach offers a practical solution for real-world scenarios where data completeness is crucial, especially in clinical trials and research involving binomial data. The results show that this methodology provides an efficient and accurate way to handle incomplete datasets while maintaining the integrity of statistical analyses. In summary, this paper provides statistical programmers with a useful tool for completing data in SAS and calculating exact confidence intervals using the Clopper-Pearson method, contributing to more robust statistical analyses.
SA-231 : Mastering the Maze of Oncology Endpoints: A Unified SAS Approach for Randomized Controlled Trial Analysis
Yuxin Wang, LLX Solutions, LLC
Kelly Chao, LLX Solutions, LLC
Wenqing Yu, LLX Solutions, LLC
Hongbing Jin, LLX Solutions, LLC
Monday, 11:00 AM – 11:20 AM, Location: Aqua Salon F
In oncology clinical trials, particularly in randomized controlled studies, the analysis of diverse endpoints is crucial for evaluating treatment efficacy compared to the control arm. This paper presents a comprehensive, unified approach to analyzing three fundamental types of endpoints in oncology research: time-to-event, categorical and continuous. By consolidating these methodologies into a single, accessible resource, we aim to provide an invaluable reference for statisticians, programmers, and researchers in the field. Our work focuses on three key areas of statistical analysis, each tailored to a specific endpoint type. Time-to-Event Endpoints: We examine both non-parametric (Kaplan-Meier) and semi-parametric (Cox proportional hazards regression) methods for analyzing endpoints like overall survival (OS) and progression-free survival (PFS). The paper introduces PROC LIFETEST for survival curve generation and estimation, and PROC PHREG for Cox regression and model evaluation. Categorical Endpoints: We explore methods for evaluating endpoints such as Objective Response Rate (ORR) and Disease Control Rate (DCR). We detail statistical tests for comparing response rates including Cochran-Mantel-Haenszel (CMH) test, Chi-square test, and Fisher’s exact test. This includes calculating confidence intervals for response rates and between-group differences using PROC FREQ with the RISKDIFF option. Continuous Endpoints: We elaborate on the Mixed Model Repeated Measures (MMRM) approach for analyzing continuous endpoints like functional tests in patient-reported outcomes (PRO). This section addresses the critical issue of missing data in longitudinal studies, demonstrating the use of PROC MIXED and PROC MI for data imputation under missing at random (MAR) and missing not at random (MNAR) assumptions.
SA-245 : Practical considerations for Intercurrent Events and Multiple Imputation
David Bushnell, Cytel
Wednesday, 8:30 AM – 8:50 AM, Location: Aqua Salon D
Estimands in clinical trials define the effect of treatment given the possibilities of intercurrent events (ICE). ICE occur after initial treatment and are defined in terms of how they affect study assessments or the interpretation of final results. However, ICE are not merely incidental missing data, they are defined by their relation to study treatment and the clinical question posed by the trial. When the estimand and ICE are conceived, it is important to have both clinician and statistical input; ICE strategies other than ‘treatment policy’ should be justified. Practical consideration must be made for implementing ICE along with other/incidental missing data – that ICE properly consider assessment failures that could be related to treatment. Multiple imputation process must be consistent with the ICE strategy. Additionally, an effort must be made to collect ‘retrieved dropouts’ so that ICE can be properly modeled.
SA-267 : Handling Missing Data in External Control Arms: Best Practices, Recommendations, and SAS Code Examples
Yutong Zhang, LLX Solutions, LLC
Monday, 2:15 PM – 2:25 PM, Location: Aqua Salon F
Missing data is a common challenge in external control arm (ECA) studies, where data comes from sources like real-world data (RWD), electronic health records (EHR), and past clinical trials. Unlike randomized controlled trials (RCTs), where missingness can be managed through study design, ECA studies face unique issues. These include selection bias, non-random missingness, and inconsistent data collection timelines. Properly handling missing data is key to ensuring valid treatment effect estimates and regulatory acceptance. This paper provides a practical guide to handling missing data in external control studies. We compare commonly used methods, including complete case analysis, multiple imputation (MI), predictive mean matching (PMM), and Bayesian approaches. Each method’s strengths and weaknesses are discussed, with a focus on real-world applications. We also outline a structured workflow to help researchers choose the best imputation strategy based on data characteristics and regulatory expectations. To support implementation, SAS code examples (PROC MI, PROC PHREG, and others) are provided for each method. The presentation is designed for biostatisticians, statistical programmers, and clinical researchers with an intermediate understanding of missing data. By the end, a clear roadmap will be available for handling missing data in external control studies effectively.
SA-270 : Simulating Optimal Sample Sizes for Canine Jaws Using SAS®
Chary Akmyradov, Arkansas Children’s Research Institute
Lida Gharibvand, Loma Linda University
Wednesday, 10:30 AM – 10:50 AM, Location: Aqua Salon D
In the realm of veterinary clinical research, ensuring animal welfare while achieving statistically significant results is paramount. This study presents an advanced approach to sample size calculation for a clinical study involving canine subjects, with a focus on dental health. The unique design of this study employs dogs as both cases and controls by longitudinally comparing treated and untreated teeth within the same animal, thus minimizing the number of subjects required and reducing animal suffering. A pivotal aspect of this research is the optimization of the number of dogs and the number of teeth extracted per dog. The goal is to minimize both, ensuring minimal discomfort to participating animals. The teeth growth in dogs is monitored at three distinct time points to assess the development of treated versus untreated teeth. To achieve a robust and reliable study design, a simulation-based approach was adopted. This involved simulating canine teeth growth trajectories based on pilot studies and existing literature using a DATA STEP. Power analysis was conducted using the simulated data, utilizing the PROC GLIMMIX and PROC FREQ procedures. Additionally, the entire simulation process was streamlined and automated using a custom SAS® macro. Lastly, the results are visualized as a heat map using PROC SGPANEL. This paper highlights the delicate balance between ethical considerations and the need for scientific rigor in veterinary research. The methodology outlined here serves as a blueprint for future studies requiring minimal animal subjects while ensuring reliable and ethically sound outcomes. This paper is developed using SAS 9.4.
SA-287 : Roadmap to Efficacy Analysis for Early Phase Oncology studies
Dhruv Bansal, Catalyst Clinical Research
Christiana Hawn, Catalyst Clinical Research
Chris Kelly, Catalyst Clinical Research
Monday, 3:00 PM – 3:20 PM, Location: Aqua Salon F
Oncology studies are among the most complex and challenging therapeutic areas for statistical programmers. Navigating the efficacy analysis for these studies can be particularly daunting for those new to the field. This paper aims to provide a comprehensive overview of the most common efficacy outputs requested for early phase oncology trials, along with basic codes and techniques to assist in programming these outputs. The paper will discuss common efficacy outputs generated for solid tumor studies, which are the most prevalent type of oncology trials. Solid tumor studies typically follow RECIST (Response Evaluation Criteria in Solid Tumor) guidelines. This paper will focus on the roadmap to efficacy analysis, including the creation of efficacy datasets (ADTR, ADEFF, ADTTE) and the parameters used to create outputs for duration of response, Kaplan-Meier estimates for survival analysis, cumulative incidence, waterfall plots, spider plots, and swimmer plots. Finally, the paper will provide relevant SAS® code for all these methods, equipping readers with the knowledge and tools needed to create tables and figures for oncology studies.
SA-321 : Efficacy Endpoints Related to CNS in Oncology Studies within ADaM
Lihui Deng, Bristol Myers Squibb
Kylie Fan, Bristol Myers Squibb
Wednesday, 9:00 AM – 9:20 AM, Location: Aqua Salon D
In oncology trials, RECIST 1.1 serves as the international standard for assessing disease response in patients. However, in recent trials of targeted therapies- “such as those for non-small cell lung cancer (NSCLC)- “specialized response patterns related to the central nervous system (CNS) have been observed. These include hypo-responsiveness or hyper-responsiveness in CNS versus extra-CNS regions. Consequently, CNS response can significantly impact the widely used endpoints derived from RECIST 1.1. This paper explores the unique aspects of CNS response and proposes a definition for time to CNS progression (CNS TTP). Additionally, it introduces endpoints for patients with brain metastases, such as intracranial objective response rate (icORR), intracranial duration of response (icDOR), intracranial progression-free survival (icPFS), and intracranial time to progression (icTTP). Furthermore, the structure of related ADaM datasets is discussed, offering insights into the derivation of these CNS-specific endpoints and their incorporation into clinical trial analyses.
SA-343 : Implementation of the inverse probability of censoring weighting (IPCW) Model in Oncology Trials
Mei Huang, Exelixis, Inc
Shibani Harite, Exelixis Inc.
Haijun Ma, Exelixis, Inc.
Linsong (Athena) Zhang, Exelixis Inc.
Monday, 11:30 AM – 11:50 AM, Location: Aqua Salon F
Crossover is a common feature in oncology clinical trial designs that allows subjects in the control arm to switch to the experimental arm, often upon disease progression. Crossover can bias the treatment effect estimate for long-term endpoints, such as overall survival. The inverse probability of censoring weighting (IPCW) model is a widely used technique to estimate the treatment effect on overall survival as if no crossover had occurred. Although the implementation of the IPCW model in SAS has been previously discussed (Mosier, 2023), applying the IPCW model remains challenging in practice and requires careful consideration. This paper provides a step-by-step guide to implementing the IPCW model, explaining the method in layman’s terms and highlighting technical issues and solutions. For instance, there is limited literature on preparing multiple time-varying covariates in a counting process format, which can be cumbersome. This paper introduces macros that can be readily used to generate counting process data with multiple time-varying covariates. And we modified the macro presented by Mosier (2023) to handle switching or use of new anti-cancer therapies in both arms. Additionally, the paper examines the processes and statistical challenges associated with implementing the IPCW model to offer guidance for practitioners.
SA-368 : Incorporating Frailty into Time-to-Event Analysis: A Practical Approach with R frailtypack
Sunil Kumar Pusarla, Omeros Corporation
Avani Alla, Omeros Corporation
Monday, 2:30 PM – 2:50 PM, Location: Aqua Salon F
Mixed models are commonly used in clinical trials with continuous and binary/ordinal endpoints involving repeated measures data, where both fixed and random effects are used to estimate the treatment effect. However, in oncology and rare disease trials with time-to-event endpoints, random effects are not routinely incorporated into the Cox multivariable regression model. This omission overlooks unobservable covariates that may influence outcomes. Additionally, correlated outcomes are rarely considered in a single survival analysis model. Frailty models extend the Cox model by accounting for unmeasured risk factors, particularly in clustered or recurrent event data. While SAS allows frailty inclusion in PROC PHREG via the RANDOM statement in a univariate analysis, it cannot handle multivariate joint frailty models. Moreover, it does not provide patient-level predicted frailty scores. Individual predicted frailty scores allow for more precise risk assessment, as population-level estimates alone may obscure individual risk levels, potentially impacting treatment and follow-up strategies. The R frailtypack package addresses this gap by fitting shared, nested, joint, and additive frailty models using penalized likelihood estimation. By generating individual predicted frailty scores, it facilitates the identification of optimal cutoffs for stratifying patients into risk groups such as low, medium, and high. This approach enhances personalized care by ensuring that high-risk patients receive intensive treatment while low-risk patients may benefit from conservative management. Additionally, it improves prognostic accuracy, disease prevention, healthcare efficiency, and overall patient outcomes.
SA-372 : Landmark Analysis: A Method for Accurate Prediction of Time-Dependent Clinical Risks and Their Effect on Patient Outcomes
Sunil Kumar Pusarla, Omeros Corporation
Avani Alla, Omeros Corporation
Monday, 5:00 PM – 5:20 PM, Location: Aqua Salon F
Estimating the effects of time-dependent variables on patient outcomes is challenging due to their changing values over time. Landmark analysis offers a practical solution by assessing clinical outcomes at predefined time points, transforming time-dependent covariates into fixed covariates for clearer interpretation. The Cox model with time-dependent covariates approximates a dynamic landmark analysis with infinitely many time points. If the parameter for landmark time point is zero, both methods may yield similar results. However, unlike standard Cox regression, which can introduce immortal time bias when analyzing post-baseline variables- “landmark analysis ensures fair comparisons by evaluating all patients at the same time points. Landmark analysis can be performed using SAS PHREG procedure. By examining multiple landmark time points, landmark analysis enables graphical visualization of time-dependent covariate effects using SAS SGPANEL, providing clinically relevant insights. Patients included at a given landmark time remain in subsequent analyses if still available, allowing sequential risk assessment. Thus, landmark models facilitate dynamic risk prediction by defining specific landmark times, with a separate survival model developed for each time point. Typically, landmark models follow a two-stage approach: first, longitudinal data is processed to define fixed covariates at each landmark time; then, survival models such as Cox proportional hazards, Fine-Gray, or cause-specific hazard models are applied to assess risk. This method assumes proportional hazards only within shorter intervals rather than across the entire study period, thereby enhancing clinical relevance by providing more precise risk estimates at different time points, ultimately leading to well-informed patient management strategies.
Strategic Implementation & Innovation
SI-046 : The Current State of Teaching Biostatistics in Academia: Challenges and Software Solutions
Lida Gharibvand, Loma Linda University
Monday, 4:00 PM – 4:20 PM, Location: Indigo 206
The traditional approaches to teaching and learning biostatistics have gone through evolutionary stages. This is mainly due to the advancement of computing and modern approaches of data analytics, including but not limited to the methodological impacts statistical machine learning. However, the growth and impact of biostatistics programs, both at the undergraduate and graduate levels, are heavily reliant upon building strong and relevant foundations. It is from that perspective that it would become necessary and essential to further evaluate the interaction of the main areas of mathematics, statistics, and computing. In this paper, we discuss the relevance and importance of incorporating modern approaches to data analytics, specifically in conjunction with biostatistics training and research. Chiefly, we aim to discuss the importance of computing and statistical thinking, as building blocks of the ideas associated with tackling projects having real-life implications, and specifically thinking about design of experiments and surveys, data collection, data visualization, modeling, modeling interpretation, and decision-making. We review these concepts from the structural, logistical, and budgetary point of views. This paper explores these challenges and discusses the emerging role of software in improving biostatistics education.
SI-061 : Generative AI in Biometrics: Transforming Clinical Trials with Supercharged Efficiency an Innovation
Kevin Lee, Clinvia
Monday, 10:00 AM – 10:20 AM, Location: Indigo 206
The pharmaceutical industry stands at the beginning of a transformative era, with Generative AI (e.g., ChatGPT) revolutionizing clinical trial development. The paper will explore the integration of Generative AI in biometrics, highlighting its potential to streamline workflows, redefine clinical trial development, and lead innovation. The paper will start with the introduction of Generative AI and its wide-ranging applications in biometrics, such as information query, codes generation (e.g., SAS, R & Python), codes conversion (e.g. SAS to R/Python), document generation (e.g., SAP, CSR), data analysis, data visualization, patient profiling and many more. The paper will also explore the tools, systems, processes and people that are reshaping clinical trials with Gen AI integration. Looking toward the future, the paper will evaluate the lasting impact of Generative AI on Biometrics department. By strategically adopting these cutting-edge technologies, biometrics teams can dramatically enhance operational efficiency, optimize trial outcomes, and expedite regulatory approval processes. The paper will culminate in a forward-looking exploration of how biometrics teams can evolve into “super biometrics teams”- “leveraging Generative AI to achieve unprecedented levels of innovation, precision, and effectiveness in clinical trial development.
SI-064 : Strategies to Encourage Adoption and Innovation in Statistical Programming
Archana Gundamraju, Biogen
Tuesday, 9:00 AM – 9:20 AM, Location: Indigo 206
Pharmaceutical companies can focus on efficiency by being open to new tools and processes. It is vital to adapt and ensure a smooth transition from a traditional or familiar way of working to a new way of working. Sometimes, change is difficult and is resisted. This could be due to constraints such as timelines, budget, lack of time to train or a fear of the unknown. Incorporation of change in a team or a department requires proper planning, evaluation, and implementation. One can be an effective leader in Statistical Programming by combining technical skills and leadership skills with a growth mindset. One can also contribute to organizational growth by being receptive to the concerns of the team, providing proper guidance and giving feedback about tools or processes during adoption. This paper presents two strategies used within statistical programming teams to navigate changes and encourage innovation that could help in meeting the standards and improving efficiency. The first strategy followed Change Management Principles for adoption of a home-built tool within a team by Statistical programming lead while working towards a pivotal deliverable. A second strategy was used to evaluate the effectiveness of incorporation of a commercially available tool into to a new process by adopting Minimum Viable Product approach within a data standards team. Both the strategies share common goals such as improving efficiency and reducing human error by following certain standards.
SI-066 : Navigating Compliance Excellence: ISO Standards & Data Privacy Implementation
Ashwini Kanade, Ephicacy Consulting Group
Syamala Schoemperlen, Ephicacy Consulting Group
Tuesday, 8:30 AM – 8:50 AM, Location: Indigo 206
Organizations face challenges aligning operational practices with global quality, security, and privacy standards in today’s data-driven world. This paper explores the foundational principles and implementation strategies for ISO standards, including ISO 9001 (Quality Management), ISO 27001 (Information Security), ISO 22303 (Business Continuity), ISO 27701 (Privacy Information Management), and the General Data Protection Regulation (GDPR) and other global privacy regulations. The focus is on Ephicacy’s approach to integrating these standards into its operational framework, emphasizing practical methods for ensuring compliance while fostering efficiency and resilience. Topics include structured management systems, risk assessment frameworks, data governance policies, and stakeholder engagement. We also offer insights into the implementation process, highlighting best practices, challenges, and solutions tailored to a global, cross-functional environment. Additionally, the paper focuses on the critical role of Human Resources (HR) in integrating complex regulatory frameworks into actionable organizational practices. By examining how HR policies strategically align with international standards like ISO regulations and GDPR, his paper highlights how compliance is embedded into corporate culture, making it a core organizational value. Key areas include ensuring employee awareness and training on quality, security, and privacy protocols, aligning HR policies with ISO standards and global regulations, and fostering a culture of compliance and accountability across the workforce. This work guides organizations to build robust, compliant processes that uphold data security, privacy, and operational excellence. With ISO standards, global data privacy regulations, and HR and IT best practices, businesses can achieve regulatory compliance while gaining a competitive edge in trust, quality, and reliability.
SI-084 : Approaches to Developing Multiple Imputation ADaM Datasets
Kang Xie, AbbVie
Tuesday, 11:30 AM – 11:50 AM, Location: Indigo 206
Multiple Imputation (MI) is a statistical technique used to handle missing data by generating multiple complete datasets. These datasets are created by imputing missing values based on observed data, and each dataset is analyzed separately. The results from these analyses are then combined to provide a final inference. This paper describes two examples of multiple imputation ADaM datasets generated by incorporating the imputed values into the ADaM dataset structure, maintaining the integrity of the analysis-ready format. Each example corresponds to a distinct multiple imputation method. The ADaM datasets include: 1) variables used in the imputation process and variables to identify the multiple imputation method, ensuring clear traceability, and 2) concatenated imputed values derived from the multiple imputation process. Furthermore, this paper introduces the use of the PARQUAL and PARQTYP variables in the multiple imputation ADaM datasets. It illustrates how these variables apply to the analysis of multiple imputation, supporting decision-making regarding the standard criteria for PARQUAL and PARQTYP in ADaM.
SI-112 : Decoding the Role of Statistical Programming – A Decade of Keytruda Submissions and Approvals
Mary Varughese, Merck & Co., Inc.
Hong Qi, Merck & Co., Inc.
Monday, 8:30 AM – 8:50 AM, Location: Indigo 206
The contribution of statistical programming (SP) to clinical development and subsequent regulatory submission of biopharmaceutical products is essential. Keytruda (pembrolizumab), a human PD-1 (programmed death receptor-1)-blocking antibody, was first approved by the US Food and Drug Administration (FDA) in September 2014. Since then, over 40 indications have received approval from regulatory agencies worldwide. Each of these approvals involves quite a complex planning and preparation process, during which SP acts as a catalyst to ensure the accuracy of all submitted documents and the efficiency of the process and responses to regulatory inquiries. This paper is intended to provide insights into the pivotal role of SP in the development and regulatory submission of Keytruda. By reviewing the extensive engagement of SP in this process, this paper will first discuss the challenges in the preparation of regulatory submissions and in responses to regulatory enquiries, followed by the strategies to address these challenges. The review will focus on the impact of analysis and reporting (A&R) SP on the submission process and the successful approval of Keytruda for its various indications, including those granted through expedited pathway
SI-159 : Navigating the transition of legacy processes for SDTM creation
Jyothi Ketavarapu, Consultant, AbbVie
Tuesday, 11:00 AM – 11:20 AM, Location: Indigo 206
Transitioning from legacy systems and processes into clinical trial data management is a critical yet complex undertaking. With industry leaning towards open-source technologies, product-based solutions, and targeted automation, organizations face the challenge of selecting or building scalable and compliant systems. This paper explores key considerations for transforming Electronic Data Capture (EDC) data into Submission Data Tabulation Model (SDTM) datasets, emphasizing the importance of balancing innovation with operational efficiency and regulatory compliance. A successful transition begins with a thorough understanding of the end-to-end process, from data collection to SDTM creation. Pain points, such as delays caused by a lack of standardization, redundant steps, or disconnected tools, must be addressed. Solutions range from point automations to complete system overhauls, with a focus on ensuring interoperability, data governance, and quality. Standardized frameworks like CDASH and CDISC principles enhance traceability and regulatory submission readiness. Key considerations include regulatory compliance with ICH E6 and FDA 21 CFR Part 11, data governance through ALCOA+ principles, scalability for evolving trial designs, and operational efficiency via automation. Organizations must also evaluate security, risk management, and the potential of AI-driven solutions for query generation and dataset creation. Metrics and KPIs offer insights into system performance and areas for improvement, while industry benchmarks provide a roadmap for innovation. By addressing these elements, organizations can modernize their processes, optimize resources, and achieve streamlined, high-quality clinical trial data management.
SI-180 : SDTM Transformation through Artificial Intelligence (AI) and Human in the Loop (HITL): Lessons Learnt from Abbvie Case Study
Aman Thukral, Abbvie
Sanjay Bhardwaj, Abbvie
Tuesday, 10:30 AM – 10:50 AM, Location: Indigo 206
The Study Data Tabulation Model (SDTM) is a standard for organizing and formatting clinical trial data. It is a regulatory mandate for many countries, including the United States and the European Union, to submit clinical datasets for drug approval. SDTM helps to ensure the quality and accuracy of data and the consistency of the data across multiple studies. It also helps to improve the efficiency of data review and analysis and the speed of the regulatory review process. The traditional approach to SDTM transformation is time-consuming and error-prone. AI and Human in the Loop (HITL) approaches are emerging as promising solutions to address these challenges. These approaches can automate many tasks, while humans can provide oversight and insights. Using high-quality training data and monitoring and evaluating AI SDTM models’ performance is essential. In short, AI and humans can work together to improve the accuracy and efficiency of SDTM transformation.
SI-206 : Build vs. Buy: Strategic Considerations for Implementing AI Solutions in Pharma and Biotech Companies
Rajesh Hagalwadi, MaxisIT Inc
Monday, 2:30 PM – 3:20 PM, Location: Indigo 206
The rapid expansion of the AI market, projected to reach $407 billion by 2027, presents both opportunities and complexities for pharma and biotech companies. A critical decision these enterprises face is whether to build custom AI solutions or buy pre-existing ones. This paper explores the strategic considerations involved in the build vs. buy dilemma, emphasizing the impact on speed, cost, and effectiveness of AI implementation. It discusses the advantages of buying pre-built solutions, such as immediate value, compliance with regulatory demands, and vendor support. Conversely, it highlights the benefits of building custom solutions, including tailored functionality and competitive advantage in strategic applications like drug discovery. The paper also addresses the importance of data readiness, business objectives, and the need for a dedicated team and infrastructure for successful AI deployment. By examining case studies and industry recommendations, this paper aims to provide a comprehensive guide for pharma and biotech companies to make informed decisions that align with their goals and resources. The intended audience includes professionals with a background in AI, data science, and strategic planning within the pharmaceutical and biotechnology sectors.
SI-224 : Elevating Clinical Research: Strategic Implementation of CDASH and SDTM Standards
Hayden Patel, DaiichiSankyo
Monday, 4:30 PM – 4:50 PM, Location: Indigo 206
In the ever-evolving landscape of pharmaceutical research and clinical trials, harmonizing data standards is crucial for efficient and reliable data management. This paper presents a comprehensive strategy for implementing CDASH and SDTM to standardize clinical data management. Clinical research involves extensive data collection, necessitating high data quality, traceability, reusability, and cost-effectiveness. CDASH and SDTM standards serve as a common language for data exchange and reporting, facilitating seamless communication among stakeholders. Effective governance is ensured through the Governance Team, consisting of members from key functional areas responsible for CDASH and SDTM compliance. The three-phase solution includes standardizing Case Report Forms, with a focus on Safety CRFs and Therapeutic Area specific CRFs, using CDASH and associated metadata. SDTM standardization involves standard CRFs and metadata, along with the development of CDISC SDTM standards, encompassing global, compound, and study specific SDTM standards while incorporating external data standards. A standardized approach for generating Global SDTM Specification Mapping Files enhances data consistency and change management and approval processes ensure documentation and traceability. Before initiating the project, a thorough analysis of past studies and metadata informs the creation of standardized templates and guidelines. Effective data standards management remains at the core of this initiative, facilitated by automation tools such as SAS, R, and Python, ensuring adherence to standard approaches and simplifying updates, reviews, and compliance checks. The benefits include improved data quality, traceability, reusability, and cost savings. This proposed strategy champions data standardization, empowering pharmaceutical organizations to address modern research challenges with enhanced efficiency and data integrity.
SI-225 : Stop Making More Physical Copies of Your Data: A Modern Approach to Traceability and Fidelity
Anthony Chow, CDISC
Monday, 9:00 AM – 9:20 AM, Location: Indigo 206
The pharma industry face significant challenges with data traceability and fidelity due to the pervasive creation of physical copies of data and derivatives. This practice diminishes data quality, increases regulatory risks, and creates inefficiencies in compliance and governance processes. Our approach proposes a paradigm shift: treating all datasets as traceable views derived from a single source of truth. By leveraging modern technologies such as data lakes, event streams, and unified analytics platforms, organizations can eliminate unnecessary redundancies, enhance traceability, and improve data fidelity. We highlight how tools like dbt and Apache Spark enable end-to-end logging of transformation logic, ensuring compliance and reproducibility. This framework offers key benefits, including eliminating data silos, enabling scalable automation, and fostering collaboration through a unified data platform. Additionally, it provides futureproofing by integrating real-time and historical insights while reducing labor-intensive manual processes. Our presentation will showcase practical implementations of these strategies, demonstrating how they streamline data workflows, enhance regulatory compliance, and ultimately empower organizations to make data-driven decisions efficiently.
SI-253 : Transitioning External Clinical Studies to Internal: A Framework for Knowledge Transfer and Operational Excellence
Chunqiu Xia, Merck & Co., Inc.
Hong Zhang, Merck & Co
Xiaohui Wang, Merck & Co., Inc.
Monday, 2:00 PM – 2:20 PM, Location: Indigo 206
This paper presents a framework for transitioning ongoing clinical studies acquired from external pharmaceutical companies into internal statistical programming operations. The primary challenge lies in ensuring regulatory compliance with the CDISC (Clinical Data Interchange Standards Consortium) standards, a critical requirement for data integrity and regulatory submissions. To address this, the framework includes extensive efforts to align external data structures to internal formats, standardize deliverables such as Tables, Listings, and Figures (TLFs), and conduct rigorous compliance checks. Additional efforts involve creating centralized knowledge repositories, re-mapping external datasets, and tailoring statistical programming to internal requirements. Operational tools, such as delivery trackers and structured meetings, further enhance the efficiency and coordination of the transition process. By focusing on regulatory compliance while implementing scalable methodologies, this framework ensures a seamless integration of external studies into internal workflows, supporting high-quality and compliant operations.
SI-292 : Integration Contemplation: Considerations for a Successful ISS/ISE from Planning to Execution
Jennifer McGrogan, Biogen
Mario Widel, IQVIA
Monday, 1:30 PM – 1:50 PM, Location: Indigo 206
During the clinical development of an investigational drug, there are many potential reasons for ceasing advancement of the product. However, when the drug is successful, part of the submission process for approval by regulatory agencies, like the FDA, will require integrated summaries of safety and efficacy (ISS/ISE). Even if every study participating in the integration have achieved their goals to perfection, they are ultimately not designed for the purpose of integration, and the pooling will present challenges. Thus, for proactive, and perhaps optimistic, study planning based on the assumption for the drug to be successful, it would be prudent to plan ahead for an ISS/ISE submission. Programmers tasked with integration of various studies often find that they are not immediately suitable for integration, which presents various technical challenges. In this paper, we will include multiple references to technical guidance documents that will enable solutions for anticipated problems programmers may encounter. Programming managers may be further challenged with determining how to mitigate some of the programmers’ technical issues, and minimize the time required to complete the work. We will also present different strategies to streamline the integration process by anticipating problems and preventing them when feasible.
SI-294 : A Game Changer for Efficient SAS Programming using ChatGPT
Jyoti (Jo) Agarwal, Gilead Sciences
Monday, 10:30 AM – 10:50 AM, Location: Indigo 206
This paper explores the transformative potential of integrating ChatGPT, a cutting-edge natural language processing model developed by OpenAI, into SAS programming within the pharmaceutical industry. Leveraging ChatGPT’s capabilities can significantly enhance productivity by assisting with code generation, debugging, and workflow optimization. By demonstrating practical applications and techniques, this paper aims to illustrate how ChatGPT can reduce programmer frustration, save time and resources, and ultimately empower programmers with innovative tools to tackle common challenges in statistical programming environments. The pharmaceutical industry is increasingly embracing advanced technologies to streamline operations and boost efficiency. One such promising technology is ChatGPT, a generative AI model created by OpenAI, which has shown substantial potential in various domains, including statistical programming, a critical component for tasks such as clinical trial data analysis, reporting, and regulatory submissions. This paper investigates how ChatGPT can be effectively integrated into SAS programming workflows to enhance productivity and alleviate common frustrations faced by programmers. SAS programming serves as a cornerstone for data analysis in numerous sectors, including pharmaceuticals, finance, and healthcare. However, SAS programmers often encounter challenges such as complex syntax, debugging difficulties, and the necessity for efficient code documentation. Through practical examples and a discussion of potential pitfalls, this paper provides a comprehensive guide for SAS programmers looking to leverage ChatGPT to streamline their coding practices and improve overall workflow efficiency.
SI-342 : Comparing SQL and Graph Database Query Methods for Answering Clinical Trial Questions with LLM-Powered Pipelines
Jaime Yan, Merck
Tuesday, 10:00 AM – 10:20 AM, Location: Indigo 206
This paper compares two advanced methods for querying clinical trial data, specifically ADaM datasets, using different database technologies: SQL and Neo4j graph databases. Both methods employ a query pipeline built with LangChain and Large Language Models (LLMs) to translate natural language questions into executable queries, enabling users to answer clinical trial-related questions without requiring specialized query skills. In the SQL-based approach, we created a reusable relational database schema to store ADaM datasets, allowing the pipeline to generate SQL queries for data retrieval. In the Neo4j-based approach, a reusable graph schema was developed to represent ADaM datasets within a graph database, where the pipeline generates Cypher queries following a graph similarity search to identify relevant schema elements. This study evaluates both methods in terms of accuracy and flexibility, particularly for handling ad hoc queries. By comparing these two approaches, we provide valuable insights into their strengths and limitations, helping user choose the most effective solution for answering clinical trial-related questions.
SI-359 : An Introduction to the Role of Statistical Programming in Medical Affairs
Nagadip Rao, Alnylam Pharmaceuticals, Inc.
Monday, 11:00 AM – 11:20 AM, Location: Indigo 206
Medical Affairs (MA) is a specialized division within research-based pharmaceutical companies, dedicated to scientific communication, education, and ensuring the safe and effective use of products. MA plays a pivotal role in supporting the commercialization process and overseeing post-market activities. The MA team comprises interdisciplinary professionals, with statistics and statistical programming teams being integral to its operations. This paper provides an overview of MA’s functionality within the industry, highlighting the role of statistical programming. Additionally, it discusses the nature of deliverables, and challenges faced and presents a real-life example of how statistical programmers support MA in achieving its objectives.
SI-380 : Towards an Integrated Submission-Ready Data Pipeline: Unifying Compliance, Automation, and Open-Source Innovation
Shivani Gupta, Clymb Clinical
Bhavin Busa, Clymb Clinical
Monday, 11:30 AM – 11:50 AM, Location: Indigo 206
Can we develop a fully integrated solution for regulatory submissions- “one that aligns CDISC compliance checks (CDISC Open Rules Engine – CORE), regulatory guidance (e.g., Study Data Technical Conformance Guide from FDA and PMDA), CDISC Metadata Submission Guidelines (define.xml, aCRF), eCTD validation, and other key requirements into a seamless, automated workflow? Regulatory submissions are becoming increasingly complex, with stringent expectations for automation, compliance, and efficiency. We must ensure alignment with evolving regulatory standards while reducing manual effort and costly rework. At Clymb Clinical, we envision a comprehensive solution that integrates all submission components into a unified framework, leveraging open-source initiatives. By embedding automated validation, QC checks, and submission packaging directly into the dataset development lifecycle, we can achieve: – Regulatory-aligned, submission-ready data with built-in quality control. – Automated compliance with CDISC and regulatory expectations. – Efficiency gains through reduced manual intervention and accelerated review cycles. This paper will explore and discuss how a collaborative, scalable approach can streamline submission processes, enhance data quality, and reduce regulatory bottlenecks. Organizations such as the CDISC Open-Source Alliance (COSA), sponsors, and vendors must work together to drive this initiative forward. This is not just a vision- “it is a call to action for the industry to come together and build the future of submission automation.
Submission Standards
SS-070 : Identification of Domains Containing Screen Failure Participants in SDTMs and ADaMs for Reviewer’s Guides
Sumanjali Mangalarapu, Merck
Tuesday, 3:00 PM – 3:10 PM, Location: Indigo 202
This paper addresses the challenge of efficiently identifying screen failure participants within submission datasets, as outlined in the Clinical Submission Data Reviewer’s Guide (CSDRG) and the Analysis Data Reviewer’s Guide (ADRG). Statisticians and programmers often face time-consuming manual reviews of numerous SDTM and ADaM domains to pinpoint screen failure participants. Popular software Pinnacle 21 that performs compliance checks cannot trace screen failure in all SDTM/ADaM datasets. This paper presents a programmatic solution that automates the detection of screen failure data, significantly reducing the workload for data analysts while ensuring the reliability of results. This approach will benefit statistical programming departments in both clinical research organizations and the pharmaceutical industry. This macro demonstrates how automation enhances data management in clinical trials, improving the overall quality of submissions and ensuring compliance with regulatory requirements.
SS-087 : A collaborative and agile approach for end to end Standards Governance and Release
Kairav Tarmaster, Sycamore Informatics
Pratiksha Wani, Sycamore Informatics
Monday, 3:00 PM – 3:20 PM, Location: Indigo 202
The Data Standards team manages and governs individual data models such as Data Collection, SDTM, and ADaM for an organization. When an organization sees each data model as a separate business function, it results in disconnected and unharmonized standards. Like a symphony, the Data Standards team should be a single unit bringing different experiences and expertise with a common goal, i.e., release standards that are connected, consistent, and error-free to enable faster study builds with improved quality. While defining the process and procedures, it is important to understand the various types of requests received by the standards team and their scope, define the roles and responsibilities of its people, and identify the touchpoints and requirements from a Metadata Repository solution for managing, governing, and releasing standards. This presentation aims at articulating the various aspects and best practices for governing standards in a collaborative environment using agile methodology.
SS-127 : PDFs Done Right: The Statistical Programmer’s Guide to Flawless Regulatory Submissions
Srivathsa Ravikiran, Agios Pharmaceuticals
Sri Raghunadh Kakani, Agios Pharmaceuticals
Monday, 2:30 PM – 2:50 PM, Location: Indigo 202
In the pharmaceutical industry, regulatory submissions require meticulous preparation to ensure compliance with stringent guidelines. A key component of this process involves preparing Case Report Tabulation (CRT) packages, which include critical PDF documents such as annotated case report forms (aCRF), define.pdf, the study data reviewer’s guide (sdrg.pdf), and the analysis data reviewer’s guide (adrg.pdf). These documents play a pivotal role in Module-5 submissions, where consistency, quality, and technical accuracy are imperative for smooth health authority reviews. This paper outlines a standardized framework for optimizing PDF deliverables used in CRT packages. Specifically, we focus on enhancing the quality and compliance of the PDFs generated within programming for submission. Our proposed approach integrates the use of the Lorenz eValidator to evaluate and validate PDFs according to the technical attributes mandated by FDA guidelines and other regulatory agencies. Additionally, Adobe Acrobat Pro is employed to resolve any identified issues, ensuring that the PDFs meet submission standards and are ready for review. This method minimizes last-minute adjustments, enhances efficiency, and mitigates the risk of non-compliance. To support statistical programmers, we offer a comprehensive checklist and toolkit based on industry best practices. This resource aims to streamline the preparation process, reducing technical issues in the final submission package. Through real-world case studies, we demonstrate the importance of adhering to standardized PDF preparation techniques and how these practices facilitate seamless reviews by health authorities. By adopting this framework, organizations can achieve greater efficiency, regulatory alignment, and confidence in their CRT submissions.
SS-135 : Evaluation of the Process to Create CLINSITE define.xml: Macro Approach vs. ADCLIN Spec
Yizhuo Zhong, Merck
Yunyi Jiang, Merck & Co., Inc.
Christine Teng, Merck
Tuesday, 5:00 PM – 5:20 PM, Location: Indigo 202
The Bioresearch Monitoring (BIMO) program is an initiative established by the U.S. Food and Drug Administration (FDA). This program conducts inspections and audits of clinical trial sites, sponsors, and contract research organizations (CROs) to ensure compliance with Good Clinical Practice (GCP) regulations. Within the BIMO submission package, Clinical Site Data (CLINSITE) pertains to the information gathered at specific locations where clinical trials are carried out. The define.xml file for the CLINSITE dataset is a crucial deliverable that describe the key information collected from each site. To streamline the process of preparing the BIMO package, this paper will assess two proposed approaches using P21 Enterprise (P21E) to generate the CLINSITE define.xml file. The first approach involves employing a SAS macro to transform sponsor-specific CLINSITE specifications into the latest P21 export format. This macro will include specific input parameters. The second approach entails developing the CLINSITE specification using the ADaM specification format (known as ADCLIN), which can be directly imported into P21E. The ADCLIN specification serves as input for both the CLINSITE data creation program and P21E for end-to-end traceability. We will evaluate the advantages and disadvantages of each approach.
SS-146 : Handling Health Regulatory Information Requests: Best Practices and Strategies
Himanshu Patel, Merck & Co.
Chintan Pandya, Merck & Co.
Tuesday, 1:30 PM – 1:50 PM, Location: Indigo 202
Addressing Information Requests (IRs) from health regulatory agencies such as FDA, EMA, MHRA, PMDA, etc., is critical during clinical trials. These agencies send information requests (IRs) for clarification, additional information, or verification of the data collected during the trial. For statistical programmers, responding to these IRs requires a detailed, step-by-step process, including information collection, resources and timeline planning, validation activities, maintaining documentation, and tracking all requests. This paper explores the statistical programming team’s roles in handling IRs, exploring best practices and strategies for ensuring accuracy, efficiency, and compliance throughout the IR process, from information gathering to final submission. It helps programmers understand the technical and procedural aspects of information requests (IRs) and provides insight into navigating the best practices and strategies to enhance productivity and accuracy.
SS-165 : A Little Bit of This and That: Use cases, implementation, and documentation when using multiple CDISC standards, CTs, and regulatory guidances in an SDTM study
Charity Quick, Emergent BioSolutions, Inc.
Monday, 4:00 PM – 4:20 PM, Location: Indigo 202
In the past, organizations typically selected and adhered to a single version of the SDTM Implementation Guide and CDISC CT when creating tabulation data for a study. However, the landscape of clinical data standards has evolved significantly with the introduction of Therapeutic Area User Guides (TAUGs), QRS supplements, and regulatory recommendations like “Submitting Patient-Reported Outcome Data in Cancer Clinical Trials”. Additionally, frequent updates to the FDA “Study Data Technical Conformance Guide” may contain agency recommendations that have not found their way into CDISC standards that are updated less frequently. This evolution has led to the increasingly common and encouraged practice of implementing multiple CDISC standards and regulatory guidances within a single study. The adoption of Define-XML 2.1 has provided a robust framework for documenting the use of multiple Implementation Guides and standards in the submission required define.xml file. However, mixing standards, versions, supplements, and regulatory documentation on a trial presents unique challenges for ensuring regulatory compliance, data quality, and proper documentation. This paper explores practical use cases and provides solutions for studies using a combination of Implementation Guides, supplements, and regulatory guidance to create the SDTM data and submission documentation. We’ll discuss best practices for creating high-quality data packages that clearly document which standards and guidances were implemented while maintaining data integrity, ensuring traceability, and preventing inconsistencies where information and metadata exist in multiple locations.
SS-254 : Key guidelines, Tricks and Experiences for PMDA and comparison with FDA and CDE submission
Ramesh Potluri, Servier Pharmaceutical
Tuesday, 2:00 PM – 2:20 PM, Location: Indigo 202
Submitting documents to regulatory bodies like the Pharmaceuticals and Medical Devices Agency (PMDA) is a complex task that requires meticulous preparation and a comprehensive understanding of regulatory guidelines. This paper outlines essential guidelines, highlights the differences between submissions to the PMDA, FDA (U.S. Food and Drug Administration), and CDE (Center for Drug Evaluation), and provides practical tips and experiences to assist programmers and regulatory teams in efficiently navigating the submission process, including preparation for PMDA inspections.
SS-258 : The Show Must Go On: Best Practices for Submitting SDTM Data for Ongoing Studies
Kristin Kelly, Pinnacle 21 by Certara
Tuesday, 2:30 PM – 2:50 PM, Location: Indigo 202
Though the CDISC SDTM Implementation Guide provides advice on how to prepare SDTM datasets for completed studies, there is little guidance on what to do when the study is ongoing, leading to varied implementation practices across the industry. At times, it may be difficult for a regulatory reviewer to readily determine that a study is still in progress without looking in the Clinical Study Data Reviewer’s Guide (cSDRG). The recent addition of the ONGOSIND (Ongoing Study Indicator) parameter in the FDA Study Data Technical Conformance Guide (sdTCG) allows sponsors to clearly specify within the data whether a study is ongoing. In this paper, some considerations for preparing domains such as Demographics (DM), Disposition (DS), and Trial Summary (TS) for an ongoing study as well as strategies to ensure data transparency across the SDTM submission package will be discussed.
SS-269 : Exploring the Upcoming Integrated cSDRG!
Srinivas Kovvuri, ADC Therapeutics USA
Christine McNichol, Fortrea Inc.
Randi McFarland, Ephicacy Consulting Group, Inc.
Kiran Kundarapu, Eli Lilly and Company
Satheesh Avvaru, Alexion AstraZeneca Rare Disease
Tuesday, 3:15 PM – 3:25 PM, Location: Indigo 202
The Integrated Summary of Safety (ISS) and Integrated Summary of Efficacy (ISE) are essential for the approval of New Drug Applications (NDAs) and Biologics License Applications (BLAs). Sponsors must submit data supporting these integrated analyses and a Reviewer’s Guide. There are multiple strategies to create integrated analysis datasets. One strategy is first to develop integrated Study Data Tabulation Model (CDISC SDTM) datasets to use as the source to build integrated analysis datasets. To facilitate the documentation of integrated SDTM, the PHUSE Optimizing the Use of Data Standards Working Group has developed a template for the integrated clinical Study Data Reviewer’s Guide (icSDRG). An example document has been created to clarify the use of this template to meet regulatory agencies’ (RA) recommendations and ensure consistent data submissions across the industry. This paper illustrates the example of the icSDRG, discusses differences from a study-level cSDRG, and offers practical guidance on preparing integrated SDTM documentation for submission.
SS-276 : Updating Define-XML packages: Tips and A Comprehensive Checklist
Qiong Wei, BioPier LLC (a Veramed Company)
Ji Qi, BioPier LLC (a Veramed Company)
Lixin Gao, BioPier LLC (a Veramed Company)
Monday, 2:00 PM – 2:20 PM, Location: Indigo 202
The Define-XML package, which provides a truthful representation of clinical trial data, is a critical component of clinical submissions to regulatory agencies such as the FDA and PMDA, in accordance with the electronic Common Technical Document (eCTD). However, as a study progresses, factors like protocol amendments, dataset changes, and new submission requirements often necessitate updates to an existing Define-XML package. This paper presents a comprehensive checklist designed to guide clinical submission professionals in updating Define-XML packages to ensure submission readiness and compliance with regulatory requirements. The checklist focuses on key tasks, including updating metadata to accurately reflect changes in protocol, statistical analysis plans (SAP), datasets or controlled terminology; maintaining consistency between annotations and metadata; verifying alignment with the appropriate Define-XML standards (v2.0 or v2.1) and regulatory agency-specific submission requirements (FDA or PMDA). Additional steps include ensuring accurate dataset-documentation links and utilizing automated validation tools such as Pinnacle 21 to enhance submission readiness. By following this checklist, organizations can streamline the update process, reduce errors, and produce high-quality Define-XML packages that meet regulatory standards for successful submission.
SS-306 : ADaM and TFLs for Drug-induced Liver Injury (DILI) Analysis
Song Liu, Novo Nordisk
Rambabu Sura, Novo Nordisk
Tuesday, 4:30 PM – 4:50 PM, Location: Indigo 202
In August 2022, FDA CDER released a guidance for standard safety tables. Drug-induced liver injury (DILI) is part of the analysis supporting new drug application. Sponsors often use Hy’s Law criteria to create table and figures for this analysis. The paper will explain Hy’s Law criteria and present how to set up ADaM data from SDTM.LB. The maximum value of the post-baseline result should not only create Ratio to Analysis Range Upper Limit flag for subjects with the baseline results in normal ranges, but also Ratio to Baseline Value flag for subjects with baseline above the normal range. Sponsor may use the first ratio to do the analysis or use both when there are subjects enrolled study with abnormal results in ALT/AST and total bilirubin. ADaM data specification with blind dummy data illustration will be presented to show the Hy’s Law can identify potential DILI cases. To identify potential Hy’s Law case, any post-baseline total bilirubin – 2 x UNL on or within 30 days after post-baseline peak value ALT or AST – 3 x UNL while ALP is < 2 x ULN. The figures show different combinations between ALT/AST and ALP vs total bilirubin, so that is would be clear to see if there is truly potential cases. If yes, what the AE or symptoms were like before or after 30 days of the lab toxicity was identified. Lab result help identify potential DILI but the stopping the trial needs safety team to assess.
SS-314 : Checking Outside the Box: A Framework for Submission Success
Julie Ann Hood, Pinnacle 21 by Certara
Seiko Yamazaki, Pinnacle 21 by Certara
Monday, 1:30 PM – 1:50 PM, Location: Indigo 202
Preparing SDTM and ADaM data packages for submissions can be a daunting task. With all the guidance documentation and checks for files and data needed for submission, it’s easy to get overwhelmed. This presentation will highlight essential documents to reference for submissions to both FDA and PMDA. A focus will be placed on key review items for traditionally non-automated checks, as well as critical cross-checks spanning both study data and submission documents to serve as a foundational resource. Attendees will be able to leverage this framework to create a more comprehensive checklist that will enhance their organization’s own submission process.
SS-333 : Leveraging previous study data for Extension studies: Structuring Subject and Participant Level Analysis datasets using CRF data
Priyanka Thumuganti, GlaxoSmithKline plc
Monday, 5:00 PM – 5:10 PM, Location: Indigo 202
Purpose: This paper explains how data from extension studies can be integrated with previous study data to strengthen the evidence required for regulatory approval. Description: The primary reason for conducting extension study is to analyze long term impact of investigational drug using previous study data as reference. Often participants from 1 or multiple previous studies are enrolled on to the extension study. In most cases, there is a need to derive baseline or analyze safety/efficacy data from continuous start of the treatment. One of the critical challenges faced while integrating data is the requirement to effectively link subject data between previous study and extension study. For this purpose, the previous study’s subject identifier should be retained in the subject-level (ADSL) and participant level (ADDQ) analysis datasets. This paper explains what and how subject characteristics data is collected in Case Report Forms (CRFs) and the structuring of this CRF data at ADaM level. Conclusion: The paper concludes that integration of subject-level data plays a crucial role in evaluating the safety and efficacy data of an investigational drug in a long-term extension study. A structured approach in maintaining and linking data across studies significantly contributes to the analysis needed for regulatory submissions.
SS-344 : Ensuring Data Integrity: Techniques for Validating SDTM Datasets in Clinical Research
Amy Welsh, Catalyst Clinical Research
Tuesday, 4:00 PM – 4:20 PM, Location: Indigo 202
In clinical research, programmers often encounter studies where the Study Data Tabulation Model (SDTM) datasets are already complete. In such scenarios, the newly assigned programmer is tasked with ensuring the reliability and accuracy of these datasets. This paper explores various techniques, adapted to help the programmer confirm the integrity of SDTM datasets. The methods discussed will: 1) Provide an overview of the study data. 2) Identify and address unexpected values within the datasets. 3) Detect missing values or variables. 4) Facilitate the rapid tracing of raw data through the SDTM process. 5) Ensure that all records from the raw data are accurately represented in the SDTM datasets. By employing these techniques, the newly assigned programmer can confidently ensure the data they are now responsible for is reliable.
SS-348 : Common Issues in BIMO Clinical Site Dataset Packages
Michael Beers, Pinnacle 21 by Certara
Monday, 4:30 PM – 4:50 PM, Location: Indigo 202
When preparing a submission package for the Bioresearch Monitoring (BIMO) Clinical Site (CLINSITE) Dataset, it is important to ensure compliance to the FDA’s specifications and guidance. It is often the case, however, that issues exist with the CLINSITE dataset and the associated documentation. This paper will review some of the most common issues seen across the industry, and how the issues should be addressed.
e-Posters
PO-052 : ADaM implementation for Anti-drug Antibody Data
Jiannan Kang, Merck
Luke Reinbolt, Navitas Data Sciences
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 2
ADA is the term which refers to Anti-drug antibodies generated by administration of Therapeutic Proteins (TP) as immunogenicity response. ADA can significantly influence drug exposure, bioavailability, safety and efficacy. ADA analysis are becoming more prevalent in drug development. The CDISC ADaM team has created a sub team to investigate the implementation and standardization of ADA analysis dataset. This poster is to share the proposal of ADA analysis dataset ADaM implementation. It covers from overview of multi-tier immunogenicity assessments, key data points included in ADA analysis, to a new structure to be proposed with highlights about dataset metadata, analysis parameters, variable metadata, data examples and summary of reporting/analysis.
PO-062 : A Macro to Automate Data Visualization using SAS Graph Template Language (GTL)
Ming Yang, Keros Therapeutics
Tuesday, 10:00 AM – 10:50 AM, Location: ePoster Station 2
When analyzing clinical trial data, we often engage in data exploration to identify trends, detect outliers and compare various subgroups. These exploratory analyses must be conducted quickly while maintaining quality and relevance. This process may involve deriving new variables and presenting the data in a clear and meaningful way. The paper introduces a powerful SAS macro that facilitates data exploration in the pharmaceutical industry by supporting three mostly used figure types: line plots, spaghetti plots and box plots. This macro includes input parameters that controls figure types, formats, subgroups and various other customizations. While a few input parameters are required to produce results, most have default values. By simply replacing the parameters, the macro can generate multiple figures with the desired formats and layouts. It supports unlimited types of analysis data, such as laboratory results, vital signs or questionnaire responses. With this tool, programmers can minimize the time spent adjusting visualization format and focus instead on data analysis and interpretation. The macro is designed to require minimal knowledge of GTL, making it easy to use and customize further based on user needs. Additionally, it functions independently across different SAS platforms.
PO-095 : Understanding HIV: At-Risk Populations, Treatment, Prevention and Standardized Data Representation in HIV Studies
Jyoti (Jo) Agarwal, Gilead Sciences
Tuesday, 10:00 AM – 10:50 AM, Location: ePoster Station 5
Human Immunodeficiency Virus (HIV) and its progression to Acquired Immunodeficiency Syndrome (AIDS) pose significant global health challenges. HIV is classified into two types, HIV-1 and HIV-2, with HIV-1 being the most common and virulent. The virus weakens the immune system by reducing CD4+ T-cell counts, increasing susceptibility to infections. Populations at high risk include sex workers, transgender individuals, and infants born to HIV-positive mothers. The standard of care for HIV treatment is Antiretroviral Therapy (ART), while Pre-Exposure Prophylaxis (PrEP) is used for prevention in high-risk individuals. Accurate diagnosis and management require extensive data collection, including immunogenicity tests, microbiology reports, and clinical event tracking. This paper provides a foundational understanding of HIV and AIDS, explores the challenges faced by at-risk populations, discusses existing treatment and prevention strategies, and highlights the role of structured data representation in advancing HIV research. A significant focus is given to CDISC SDTM standards and their role in clinical trial data management, facilitating interoperability, and ensuring regulatory compliance.
PO-133 : Checking SDTM Datasets from Biostatistical Perspective
Inka Leprince, PharmaStat, LLC
Elizabeth Li, PharmaStat, LLC
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 3
The adoption of CDISC SDTM standards has significantly improved data conformance across the clinical research industry. Conformance alone, however, does not guarantee that SDTM datasets support analyses. This paper examines critical considerations for evaluating the fitness of SDTM datasets for statistical analysis, emphasizing the importance of ensuring data quality to achieve robust and reliable results. At PharmaStat, we emphasize a proactive, biostatistics-driven approach to improving source data and SDTM data quality. By aligning SDTM datasets with analytical requirements, the data analysis process is streamlined and the overall quality of clinical study outcomes is enhanced.
PO-140 : Considerations of R submission in Japan
Yuichi Nakajima, GlaxoSmithKline K.K.
Yasutaka Moriguchi, GlaxoSmithKline K.K.
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 1
R adoption and submission are key topics in the pharmaceutical programming industry, and Japan is no exception. Although SAS users still constitute a large population in Japan, interest in R has grown significantly in recent years. Industry communities, such as the Japan Pharmaceutical Manufacturers Association (JPMA) and PHUSE, have conducted surveys on R adoption and initiated communication with the Pharmaceuticals and Medical Devices Agency (PMDA), the Japanese health authority. The PMDA has shown openness to accepting R, and sponsors need to consider the best approaches for R submissions, including understanding PMDA-specific requirements (e.g., PMDA consultations, PMDA gateway submissions following folder structure guidelines, etc.). In this presentation, I will briefly introduce the NDA process with PMDA and highlight key points for preparing PMDA communications and R submission deliverables.
PO-172 : In-house Data Monitoring Committee Report Programming
Lingjiao Qi, Alnylam
Amanda Plaisted, Alnylam Pharmaceuticals
Sreedhar Bodepudi, Alnylam Pharmaceuticals
Tuesday, 10:00 AM – 10:50 AM, Location: ePoster Station 6
A Data Monitoring Committee (DMC) is an independent group of experts tasked with overseeing the safety and/or efficacy of an ongoing double-blind clinical trial. The DMC periodically reviews accumulated data in unblinded study reports and provides recommendations on the study’s conduct. While DMC report activities in pivotal trials are typically external to the sponsor, for non-registrational trials (e.g., Phase 2 trials), it is acceptable for an internal, independent group to manage DMC report activities. This internal approach can streamline processes and enhance efficiencies in both timelines and costs. This paper shares the authors’ experience in establishing the first in-house DMC, where an independent group of statisticians and programmers was formed within the organization. It discusses the processes and responsibilities established between the blinded and unblinded programming teams to maintain data integrity and ensure appropriate access for relevant stakeholders. The paper also highlights key planning activities involving data management teams, statisticians, and cross-functional teams. Additionally, it addresses the special handling required for unblinded IRT, PK, and PD files during the transfer of SDTM, ADaM, and TLF programs from the blinded to the unblinded team.
PO-208 : Our Second Brain: A Guide to Building Note System for SAS Programmer in Clinical Trial
Zeyu Li, Arrowhead Pharmaceuticals
Tuesday, 2:00 PM – 2:50 PM, Location: ePoster Station 3
Starting as a clinical trial SAS programmer can feel overwhelming, with countless details required to implement programs and analyses. On top of that, you may be asked months later by statisticians or data managers to explain data issues from past studies months ago. With multiple studies to manage daily, it’s impossible to remember every detail from each program and project. That’s why having a clear and organized note-taking system is essential. A well-designed note system acts as your personal knowledge base, allowing you to quickly retrieve key information about studies, even long after they’re completed. It also boosts efficiency by providing easy access to reference codes, SAS shortcuts, and reusable templates. By developing and maintaining a structured note system, you can reduce stress, stay organized, and streamline your programming workflow. It’s a powerful tool that enables you to navigate the complexities of clinical trial programming with greater productivity and efficiency.
PO-212 : Embracing an Extra Step as the Ultimate Shortcut: Leveraging ADDATES for Enhanced Efficiency and Traceability in Oncology Studies
Chia-Lu Lee, AstraZeneca
Wanchian Chen, AstraZeneca
Anna Chen, AstraZeneca
Tuesday, 2:00 PM – 2:50 PM, Location: ePoster Station 4
In oncology efficacy analysis, figuring out RECIST criteria can be like solving a complex puzzle. It involves integrating various data points, such as the sum of target lesion sizes, non-target lesion responses, and the appearance of new lesions. That’s where ADDATES comes in, an approach first suggested in the CDISC Prostate Cancer TAUG back in 2017, to improve traceability and transparency. This poster illustrates how ADDATES connects existing tumor result and visit response datasets, simplifying the process of deriving critical dates and supporting endpoint analyses in ADEFF and ADTTE. It provides a step-by-step guide with examples for linking ADaM datasets using ADDATES, ideal for programmers and statisticians working on oncology studies, particularly those building efficacy endpoints based on RECIST criteria from scratch. By organizing essential dates, ADDATES creates a clear timeline for each subject, making it easier to identify significant events like confirmed responses. A couple of innovative analysis flags and analysis categories are also introduced that help assess whether progression or death occurs after specific gaps or if the disease remains stable over specific time frames. ADDATES not only simplifies the programming process by reducing interdependencies among variables, but also aligns with FDA’s recommendations for submission. Overall, this poster demonstrates how ADDATES minimizes errors, enhances efficiency, and clarifies the data, making it an invaluable intermediate step for dealing with the complexities of RECIST criteria.
PO-235 : Guidance from the FDA authored “Submitting Patient-Reported Outcome Data in Cancer Clinical Trials” and recommendations for their implementation in study CDISC COA data.
Charity Quick, Emergent BioSolutions, Inc.
Monday, 2:00 PM – 2:50 PM, Location: ePoster Station 3
In November of 2023, the U.S. Food & Drug Administration (FDA) released a 48-page document titled “Submitting Patient-Reported Outcome Data in Cancer Clinical Trials”. This document provides technical specifications for submitting patient-reported outcome (PRO) data collected in cancer clinical trials to support oncology studies. Given the breadth and specificity of the guidance, the increasing prevalence of Clinical Outcome Assessments (COAs) in the work we do, and the direct link of the content to the FDA publications such as the “Study Data Technical Conformance Guide”, adoption of the standards contained in “Submitting Patient-Reported Outcome Data in Cancer Clinical Trials” should be considered regardless of the COA therapeutic area. This paper and the accompanying poster seeks to provide a brief summary of these FDA recommendations as they relate to study data collection, tabulation, analysis, and submission.
PO-236 : From Tides to Transformation: Making Waves with Your Move to Viya
Morgan Halleen, SAS
Tania Morado, SAS
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 4
Transitioning from SAS 9 to SAS Viya can seem daunting, but with the right support, change becomes an opportunity for growth. This session will explore how Customer Success partners with organizations to ensure a smooth transition by creating joint success plans, offering abundant resources, and addressing common anxieties to help drive SAS Viya adoption and value realization. Join us to discover how Customer Success enables a successful move to Viya, empowering your organization every step of the way.
PO-252 : Methodology for AI-driven Outcome Prediction for Patients with Atrial Fibrillation After Transcatheter Aortic Valve Implantation (TAVI)
Felix Just, Daiichi Sankyo
Krishna Padmanabhan, Cytel
Parth Jinger, Cytel
Amanda Borrow, Daiichi Sankyo
Rüdiger Smolnik, Daiichi Sankyo
Eva-Maria Fronk, Daiichi Sankyo
Neelam Yadav, Daiichi Sankyo
Sergei Krivtcov, Daiichi Sankyo
Tuesday, 10:00 AM – 10:50 AM, Location: ePoster Station 3
The identification of risk factors for adverse clinical outcomes using traditional statistical methods (eg, Cox regression) has provided physicians with valuable insights for treatment decision-making. However, these methods are often restricted to capturing only linear correlations within the data. AI/machine learning (ML) overcomes this limitation, offering the potential to uncover complex, non-linear relationships, thereby providing powerful new insights at the patient level. Clinicians must balance stroke and bleeding risks when prescribing anticoagulation for atrial fibrillation (AF) patients. A patient-tailored ML approach using patient-level data from the randomized controlled ENVISAGE-TAVI AF trial will inform clinicians seeking to optimize anticoagulation treatment in patients with AF who have undergone a successful TAVI procedure. We introduce an ML approach to develop predictive models for four key endpoints: ischemic stroke, major gastrointestinal bleeding, major or non-major clinically relevant bleeding, and net adverse clinical events. Our methodology involves a pipeline where 11 ML algorithms are trained, optimized and evaluated using cross validation. The best model for each endpoint is chosen by its f1-score and validated on a separate hold-out set. The selected models are analyzed using the SHAP (SHapley Additive exPlanations) explainable AI framework to extract insights on the predictive factors of outcomes both at the cohort and patient level. Final models showed moderate f1 scores in the range from 0.08 to 0.39 and identified predictive factors are largely consistent with previous clinical knowledge. The applied methodology can easily be adapted to similar scenarios with minimal adjustments and represents a potential step towards patient-tailored treatment strategies.
PO-261 : Avoiding the Ouch: Mastering Time to Pain Progression Analysis
Kavitha Boinapally, Pfizer
Sai Krishna Pavan Nakirikanti, Pfizer
Hank Dennis, Pfizer
Yang Wang, Pfizer
Tuesday, 2:00 PM – 2:50 PM, Location: ePoster Station 5
In oncology studies, time to pain progression (TTPP) is an endpoint that measures the time from randomization or treatment start to when a patient experiences a significant worsening of their pain. While progression-free survival (PFS) and overall survival (OS) are the typical primary endpoints, TTPP can be selected as a secondary endpoint to assess disease progression and treatment efficacy. In this paper, we will present hypothetical study data to demonstrate how patient-reported outcomes (PRO) from Brief Pain Inventory Short Form (BPI-SF) information can be used to assess the TTPP endpoint. We describe how BPI-SF data is collected, mapped to SDTM and ADaM domains, and used to derive and analyze pain worsening. We will define worsening of pain as an increase of 2 or more points on BPI-SF Question 3 (pain at its worst in the last 24 hours) or an initiation of new opioid recorded in Question 7 (what treatment/medications are you receiving for pain) for at least two consecutive assessments. Additionally, we will discuss the challenges and solutions involved in identifying opioid use from the pain medication data collected on the PRO BPI-SF.
PO-271 : Streamlining Your Workflow: Creating Portable and Automated SAS® Enterprise Guide Project Using Project Name
Chary Akmyradov, Arkansas Children’s Research Institute
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 5
In this presentation, I will demonstrate how to leverage the SAS Enterprise Guide (EG) project name and directory location to automate the definition of permanent libraries and the creation of project-related folders. The session will cover the application of Autoexec workflow, DLCreateDir option, the automatic macro variable &_ClientProjectPath, and essential SAS functions such as dequote, find, and substr. Attendees will learn how to automate the following routine tasks: 1. Establish library(/ies) using a portion of the SAS EG project name. 2. Assign an automatically created (if not existing) library folder. 3. Create additional folders for exports, reports, and more. A key feature of this approach is that when the SAS EG project is executed from a new directory (in the case of directory change), the library location is updated automatically, ensuring the project’s portability. This session will provide practical insights and step-by-step instructions to enhance your SAS project management and workflow efficiency. This demonstration is suitable for all levels of SAS programmers.
PO-274 : Automating Recurring Data Reconciliation for Serious Adverse Events Using SAS
Patrick Dowe, PROMETRIKA
Gina Hird, PROMETRIKA, LLC.
Tuesday, 2:00 PM – 2:50 PM, Location: ePoster Station 6
Sometimes while working on a project, data reconciliation is required. This task may present itself as a regular, recurring task or a one-time need. In our use case, we will be showing a program built specifically for recurring data reconciliation of serious adverse events; however, the applications for this SAS program are much broader and could be applied to a host of situations. The saereconciliation.sas program has two major components: First, reconciling the SAE vendor’s Pharmacovigilance safety database, in Excel format in this example, with a SAS dataset from the clinical database containing the Adverse Event data. The program does a series of cascading matches by key variables until all records have been accounted for. After the matching is complete, the differences between data points are highlighted for the clinical safety team so that they can follow up with the SAE vendor and the sites. Second, taking the reconciled SAE/clinical data and comparing it with the results from the processes prior run in order to maintain the historical comments and notes previously made by the clinical safety team and the external vendor. The program uses a second series of cascading matches by key variables until all records have been accounted for. After matching is complete, any records whose previous discrepancies have been resolved are identified and any other changes that have occurred between report runs are highlighted.
PO-311 : Statistical Programming Outputs Comparison and Reporting Using Python
Vinayak Mane, Inference Inc.
Adel Solanki, Inference Inc.
Anindita Bhattacharjee, Inference Inc.
Tanusree Bhattacharyya, Inference Inc.
Monday, 10:00 AM – 10:50 AM, Location: ePoster Station 6
This paper showcases the functionality and workflow of an automated Python tool to compare different versions of clinical trial outputs pertaining particularly to statistical programming. This tool, named as the “Outputs Comparison Tool” is a user-friendly executable (.exe) application designed to streamline the process of comparing and analyzing original and revised outputs. Iterations of datasets, tables, listings and figures (TLFs) are a routine in a programming workflow, typically stemming from updates in work scope or analysis requirements. This leads to multiple intermediate submissions for reviews. A harrowing task for statistical programmers is to ensure consistency in outputs across these iterations, which mostly is carried out manually, making the process tedious and error-prone. Tailored for statistical programmers, the “Outputs Comparison Tool” automates and offers the following key features: Count of outputs – the total number of files from the two versions are compared and reported. Table of Contents comparison – presence of each output is checked in the two versions. Content differences – updates in the output contents are highlighted across versions. Consolidated Comparison Report – A comprehensive list of compared outputs along with reference to the corresponding pages with changes, is generated. This lightweight and robust application empowers statistical programmers to enhance productivity and maintain precision in output review processes. Designed with simplicity in mind, the tool does not require any programming expertise and supports standard file formats like .rtf, .txt, .doc, .docx, and .pdf, commonly used in statistical reporting workflows, making it a powerful resource.
PO-350 : A Novel ADEFF Design with Varying Baseline Type Selections
Tingting Tian, Merck
Chao Su, Merck
Fan Wang, MSD China, Beijing, China
Pengfei Zhu, MSD China, Beijing, China
Monday, 2:00 PM – 2:50 PM, Location: ePoster Station 4
This paper outlines a novel ADEFF design used to analyze different baseline selections for primary and secondary endpoints. The percent change from baseline in a given lab parameter, measured at different time points can contribute to both primary and key secondary endpoints. Since the observed value for the lab parameter can be collected through different computation or test methods, results using the same method at baseline and post baseline time point of interest will be selected when available. To compute the change score for the lab parameter at different post baseline time points of interest and to compare results using different computation methods as sensitivity analysis, the proposed approach was to first pair the same computation/test method between the baseline and the time point of interest, then select the appropriate computation/test method for post baseline time of interest time points. A “baseline type” was developed to clearly identify the lab method chosen for each analysis because the baseline lab method may differ between the primary and key secondary endpoints.
PO-361 : Efficient Strategies for Handling Data Issues in Clinical Trial Submissions
Kate Sun, Mirum Pharmaceuticals
Monday, 2:00 PM – 2:50 PM, Location: ePoster Station 1
Clinical trial data is often complex and prone to issues such as missing data, outliers, inconsistencies, and unit conversion errors. Efficiently identifying and resolving these issues is critical for regulatory submissions. This paper presents a SAS-based approach to identify and handle data issues using data visualizations, macros, and efficient data cleaning techniques. We demonstrate the use of SAS to detect anomalies, resolve issues, and ensure data integrity. The paper includes practical examples, SAS code, and visualizations to guide statistical programmers in implementing these strategies. Clinical trial data, particularly in the SDTM.LB domain (laboratory data), often contains issues such as: Unit mismatches: Inconsistent units (e.g., mg/dL vs. mmol/L). Out-of-range values: Values outside the normal range (e.g., ALT > 1000 U/L). Missing data: Lab results not recorded. Outliers: Extreme values due to data entry errors. This paper provides a step-by-step SAS-based framework to identify and resolve these issues efficiently, ensuring data quality for regulatory submissions.
PO-366 : Slamming Datasets Together, Without Hurt, Using SAS, R and Excel
David Franklin, TheProgrammersCabin.com
Monday, 2:00 PM – 2:50 PM, Location: ePoster Station 5
Bringing two datasets together to get a better picture of data is a common action performed by programmers dealing with sets of data, usually to bring it into a form for either storage or analysis. This paper will look at one-to-one, one-to-many and many-to-many situations using SAS, including the common way using a MERGE statement but also touch on PROC FORMAT, PROC SQL, POINT and KEY options in a SET statement, and HASH tables. On our journey we will also touch on one-to-one, one-to-many and many-to-many using R, and the limited form of one one-to-one and one-to-many using Excel of merging data.
PO-382 : The World is Not Enough: Base SAS Visualizations and Geolocations
Louise Hadden, Cormac Corporation
Tuesday, 10:00 AM – 10:50 AM, Location: ePoster Station 1
Geographic processing in SAS has recently undergone some major changes: as of Version 9.4 Maintenance Release M5 many procedures formerly a part of SAS/Graph are now available in BASE SAS. At the same time, SAS Graphics have added some new procedures such as PROC SGMAP that build on the functionality of SAS/GRAPH’s PROC GMAP and incorporate ODS graphics techniques including attribute maps and image annotation. This paper and poster will replicate a map of the world created by the author with SAS/GRAPH and PROC GMAP with the annotate facility using PROC SGMAP to map three different metrics on a map of the world. New SAS mapping and SG procedure techniques will be demonstrated, following Agent 007’s adventures across the globe.
PO-391 : ExCITE-ing! Build Your Paper’s Reference Section Programmatically Using Lex Jansen’s Website and SAS
Louise Hadden, Cormac Corporation
Tuesday, 2:00 PM – 2:50 PM, Location: ePoster Station 1
One challenge in writing a SAS White Paper is creating the perfect reference section, properly acknowledging those who have inspired and paved the way. Luckily, clever use of such tools as Lex Jansen’s website, SAS’s ability to read in and manipulate varied data sources, and Microsoft Word citation manager, every author can succeed in proper referencing in their white papers. This paper and ePoster will demonstrate how to accomplish this goal.
PO-402 : Association of Alzheimer’s Disease with Cardiovascular Disease and Depression in Older Adults: Findings from the 2018 Nationwide Inpatient Sample
Prayag Shah, Drexel University
Monday, 2:00 PM – 2:50 PM, Location: ePoster Station 6
Alzheimer’s Disease (AD) and Alzheimer’s Disease-Related Dementias (ADRD) significantly impact 6.9 million American older adults, particularly in those with major cardiovascular diseases (CVD) such as Stroke, Congestive Heart Failure (CHF), and Coronary Heart Disease (CHD), along with comorbidities of hypertension (HTN) and depression. Yet their influence is not fully understood, particularly in hospitalized populations. Most prior studies have examined single form of CVD or outpatient cohorts, offering limited insight into how multiple CVD forms and comorbidities together influence dementia prevalence and in-hospital mortality risk. There is a critical need to clarify these associations in acute-care settings where disease severity and outcomes may differ. This study uses data from the 2018 Nationwide Inpatient Sample (NIS), a nationally representative dataset that investigates how major CVD forms and comorbidities affect the odds of AD/ADRDs and examines their impact on risk of in-hospital mortality among hospitalized older adults. This will help inform strategies for clinical management and prevention for at-risk older adults.