Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2019 will feature over 200 paper presentations, posters, and hands-on workshops. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 12-Mar-2019.
|Paper No.||Author(s)||Paper Title (click for abstract)|
|HT-046||Charu Shankar||Express yourself with Python & R|
& Sanjay Matange
|Developing Custom SAS Studio Tasks for Clinical Trial Graphs|
|HT-067||Vince DelGobbo||Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You|
|HT-089||Sanjay Matange||Build Popular Clinical Graphs using SAS|
|HT-145||Kevin Lee||Hands-on Training for Machine Learning Programming|
& Mario Widel
|Value-Level Metadata Done Properly|
|HT-177||Bill Coar||Sample Size Determination with SAS® Studio|
|HT-188||Phil Bowsher||Creating & Sharing Shiny Apps & Gadgets|
& Richann Watson
Leadership and Career Development
Real World Evidence
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Andrea Coombs
|Using Real-World Evidence to Affect the Opioid Crisis|
& Kriss Harris
& Seydou Moussa COULIBALY Coulibaly
|Stratified COX Regression: Five-year follow-up of attrition risk among HIV positive adults, Bamako|
|RW-199||Youngjin Park||Patterns of risk factors and drug treatments among Hypertension patients|
|RW-232||Charan Kumar Kuyyamudira Janardhana||Artificial Intelligence and Real World Evidence - it takes two to tango|
|RW-238||Srinivasa Rao Mandava||Innovative Technologies utilization in 21st Novel Clinical Research programs towards Generation of Real World Data.|
|RW-310||Jennifer Popovic||Real-world data as real-world evidence: Establishing the meaning of data as a prerequisite to determining secondary-use value|
|RW-345||Karen Ooms||Applications and Their Limitations of Real-World Data in Gene Therapy Trials|
Reporting and Data Visualization
Statistics and Analytics
Strategic Implementation, Business Administration, Support Resources
Advanced ProgrammingAP-001 : Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement
Louise Hadden, Abt Associates Inc.
An organized directory structure is an essential cornerstone of data analytic development. Those programmers who are involved in repetitive processing of any sort control their software and data quality with directory structures that can be easily replicated for different time periods, different drug trials, etc. Practitioners (including the author) often use folder and subfolder templates or shells to create identical complex folder structures for new date spans of data or projects, or use manual processing or external code submitted from within a SAS® process to run a series of MKDIR and CHDIR commands from a command prompt to create logical folders. Desired changes have to be made manually, offering opportunities for human error. Since the advent of the DLCREATEDIR system option in SAS version 9.3, practitioners can create single folders if they do not exist from within a SAS process. Troy Hughesi describes a process using SAS macro language, the DLCREATEDIR option, and control tables to facilitate and document the logical folder creation process. This paper describes a technique wrapping another layer of macro processing which isolates and expands the recursive logical folder assignment process to create a complex, hierarchical folder structure used by the author for a project requiring monthly data intake, processing, quality control and delivery of thousands of files. Analysis of the prior month's folder structure to inform development of control tables and build executable code is discussed.
AP-018 : Ensuring Programming Integrity with Python: Dynamic Code Plagiarism Detection
Michael Stackhouse, Covance
Integrity in your team's programming is of the utmost importance, not only when preparing for a regulatory submission, but simply to ensure the quality of the analysis throughout your drug's development. A common validation technique used throughout the industry is double programming - where two programmers work independently to obtain identical results. But how do you guarantee independence? And what if you suspect programming independence was violated? Linux tools like diff may not be sufficient to identify harmful similarities. This paper will explore a Python based tool that can bring these issues to light. The tool is dynamic enough to locate similarities at any point in your programs, and can allow flexibility for minor syntactic changes, such as dataset name changes or reordering of statements. All results are gathered into easily reviewable files to help ensure that your team's work can uphold the integrity and reputation that you rely on.
AP-038 : One Macro to Produce Descriptive Statistic Summary Tables with P-Values
Rajaram Venkatesan, Cognizant Technology Solution
Harinarayanan Gopichandran, Cognizant Technology Solution
In clinical trial reporting, the most popular type of tables is those that have descriptive statistics (n, mean, SD, median, min and max, CI, and p-values) or tables having a frequency (%) count and descriptive statistics of categorical and continuous variables. These are the bread and butter of reporting. However, producing these tables is simple yet trivial, and sometimes cumbersome and time-consuming, as many variables and many conditions might be requested. The solution is to create a simple macro and easy to understand macro, which allows the user to develop and produce descriptive summary tables within minutes. This can be used to produce or validate most safety tables without any problems. This allows users to create many types of tables (demographics and baseline characteristics, laboratory, vital signs, and ECG data) with minimal effort. It also means that when statisticians would like to change the table afterward, it can be done with the minimum of effort. This will not only save a lot of time but also improve quality.
AP-042 : Excel in SAS
Charu Shankar, SAS Institute
Excel and SAS are universally loved. Both have their strengths. Excel has been around a long time and many non SAS users use the spreadsheet to enter & manage their transactions. SAS is great for analyzing data. Why not marry the strengths of both? get data from excel into SAS, complete the analysis in SAS and then send the results to excel. This would be a great help for colleagues who don't have SAS on their desktops. Come learn the many ways to get Excel to SAS. From excel and data from SAS to excel. This session will cover the following and much, much more 1. PROC IMPORT - to read excel into SAS 2. SAS Access engines to read Excel into SAS 3. ODS Tagsets - take sas to Cool excel pivot tables 4. PROC EXPORT - export SAS to excel Come watch some magic in the shaping of a pivot table right before your eyes in this session!
AP-047 : Confessions of a SAS PROC SQL Instructor
Charu Shankar, SAS Institute
After teaching at SAS for over 10 years to thousands of learners, this instructor has collected many best practices from helping customers with real-world business problems. Hear all about her confessions on making life easy with mnemonics to recall the order of statements in SQL. Learn about the data step diehard user who now loves SQL thanks to this little known secret gem in PROC SQL. Hear about the ways in which ANSI SQL falls short and PROC SQL picks up the slack. In short, there are many confessions and so little time. session is open to all interested in improving their SQL knowledge and performance.
AP-071 : Ushering SAS Emergency Medicine into the 21st Century: Toward Exception Handling Objectives, Actions, Outcomes, and Comms
Troy Hughes, Datmesis Analytics
Emergency medicine comprises a continuum of care that often commences with first aid, basic life support (BLS), or advanced life support (ALS). First responders, including firefighters, EMTs, and paramedics, are often the first to triage the sick, injured, and ailing, rapidly assessing the situation, providing curative and palliative care, and transporting patients to medical facilities. Emergency medical services (EMS) treatment protocols and SOPs ensure that, despite the singular nature of every patient as well as potential complications, trained personnel have an array of tools and techniques to provide varying degrees of care in a standardized, repeatable, and responsible manner. Just as EMS providers must assess patients to prescribe an effective course of action, software too should identify and assess process deviation or failure, and similarly prescribe its commensurate course of action. Exception handling describes both the identification and resolution of adverse, unexpected, or untimely events that can occur during software execution, and should be implemented in SAS® software that demands robustness. The goal of exception handling is always to reroute process control back to the originally intended process path that delivers full business value. When insurmountable events do occur, however, exception handling routines should instruct the process, program, or session to gracefully terminate to avoid damage or other untoward effects. Several exception resolution paths exist (in addition to program termination) that can deliver full or partial business value. This text demonstrates these paths and discusses various internal and external modalities for communicating exceptions to SAS users, developers, and other stakeholders.
AP-074 : Where Is the Information We Are Looking For?
Jeff Xia, Merck
Finding information in an efficient and effective way is essential in many challenging situations, such as responding to agency requests after a submission. Many times, the assigned programmer might not have prior knowledge about a given study involved in the request due to business or resource constrains. It becomes an increasingly important business need to enable programmers to get a clear understanding in the data flow in a fast and timely manner, including all the steps from the collected data in database to the final information in a completed CSR. This paper presents 4 innovative methods to programmatically find the information of interest. 1) Finding files of interest in a folder and its subfolders, the file name can be partial, or in a pattern; 2) Finding all records and fields that contain certain text strings in all datasets in a SAS library, it becomes handy when we need to understand the SDTM mapping from raw datasets, etc.; 3) Finding a variable name in all datasets in a SAS library, it is useful when working on legacy studies where variables do not follow CDISC standard naming convention; 4) Finding a text string in all text files including subdirectories, this is extremely useful when there is a need to locate a certain CSR table out of hundreds tables, or understand which SAS program was used to generate a specific table, which macros were involved. These 4 methods have been widely utilized in our daily programming activities, which significantly improved our capability in working on time-sensitive tasks such as agency request.
AP-093 : A Macro to Deal with Non-Printable Special Characters
Harish Yeluguri, Industry
Creating SDTM datasets from raw datasets is a painstaking process, since we may encounter many unexpected data issues. Most of the times raw datasets may have some non-printable special characters, ignoring these nonprintable special characters in raw datasets may lead to unanticipated analysis results. So it is always ideal to check for the existence of any non-printable special characters in each raw dataset in a given folder. In fact, finding these non- printable special characters in each raw dataset within a specific folder is a cumbersome process for a programmer. Since numerous raw datasets (numerous variables in each dataset) exist in each folder. In order to overcome this challenging task, I created a macro called NPSC (given in this paper) to find out non-printable special characters where they have been existed in raw dataset and this macro removes these non-printable special characters from the raw datasets as well.
AP-114 : The Art of Defensive Programming: Coping with Unseen Data
Philip Holland, Holland Numerics Ltd
This paper discusses how you cope with the following data scenario: the input data set is defined in so far as the variable names and lengths are fixed, but the content of each variable is in some way uncertain. How do you write a SAS program that can cope appropriately with data uncertainty?
AP-116 : Excel with MS Excel and X Commands: SAS® Programmers' Quick Tips Guide to Useful Advanced Excel Functions and X Commands.
Shefalica Chand, Seattle Genetics, Inc.
SAS® is the bread and butter for clinical/statistical programmers in the pharmaceutical and biotech industry. Nevertheless, there are many functionalities, tools, and resources that can be used in conjunction with SAS® to facilitate a better user experience for biostatistics and other cross-functional partners. In this paper, we will share some useful advanced MS Excel functions like VLOOKUP, HYPERLINK, CONCATENATION, etc., to improve efficiency, remove ambiguity and redundancies, and enhance accuracy by eliminating manual intervention (for example, creating specification, tracking sheet, managing review comments, and tracking resolutions and implementations). Here is a sneak-peek into these Excel functions: VLOOKUP: This function helps search values based on a particular column (index column) from one Excel sheet to another by matching common values in the index columns. HYPERLINK: This function can help create hyperlinks to external files. CONCATENATE: This function combines values from multiple cells and/or text strings. Can be separated by comma or range of Excel cells can be selected. We will also discuss the SAS® X Commands that can help make some of the manual processes more accurate and simpler, for example, using X Commands to copy files from one location to another, automatically create folders/directories at a given location, quickly convert SAS® program files to TXT files as a bulk action, and remove files or folders, and a few other similar helpful actions. In addition, we will explore practical implementation of these tools and resources in our industry, environment, and day-to-day activities.
AP-123 : Automation of Review Process
Shunbing Zhao, Merck & Co.
Statistical outputs for clinical study report are produced as separate tables, listings and figures (TLFs) in the form of Rich Text Format (RTF). The output review process usually involves a huge amount of TLFs and all kinds of comments back-and-forth. It becomes more and more challenging to track the status of each individual comment from reviewers. Very often a few important comments are left unattended or overlooked until very late stage. This paper presents a few useful SAS macros to streamline the review process in a well-controlled way: Firstly, a SAS macro was developed to combine all outputs in RTF format into an single RTF file for reviewers to comment, the original RTF file name is kept on the header of each page of the combined file for later reference. Secondly reviews can use the MS Word "Review" feature and put comments freely on each page as necessary. Lastly another SAS macro was developed to extract all comments on each page of the combined file into a well formatted Excel spreadsheet, which has a separate column for each applicable field, i.e., the comment itself, reviewer's name, page number, and the RTF file name. These information is very useful to identify the stakeholder and locate corresponding table, listing or graph for a given comment, as well as to track the status of each individual comment. This innovative approach greatly improves the communication among stakeholders, and significantly optimizes statistical review process.
AP-125 : Auto-Annotate Case Report Forms with SAS® and SAS XML Mapper
Yating Gu, Seattle Genetics, Inc.
A common approach to annotating blank case report forms (CRFs) is to manually create and draw text boxes in PDF, and then type in the annotations that document the location of the data with the corresponding names of the SDTM datasets and the names of those variables included in the submitted datasets. The CRF design is similar across many studies, and there are usually only minor updates in different CRF versions within studies, however, it is very time consuming to manually copy and paste annotations across CRFs. Therefore, an approach that could help auto-annotate a new blank CRF based on an existing annotated CRF could save a tremendous amount of time and effort. This paper introduces a method to import annotations from an annotated CRF to SAS using SAS XML Mapper, manage annotations, and incorporate annotations back into a new blank CRF using SAS. Some relevant programming details and examples will be provided in this paper.
AP-143 : PROC SORT (then and) NOW
Derek Morgan, PAREXEL International
The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than the IBM procedure SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data. These days, PROC SORT has fallen out of favor; after all, PROC SQL enables merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: Is the SORT procedure still relevant to any other than the SAS novice or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear "yes". PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper discusses those enhancements.
AP-154 : Unleash the Power of Less Well Known but Useful SAS(r) DATA Step Functions
Timothy Harrington, SAS Programmer
SAS Version 9.4 has over 600 documented SAS functions and procedures. Some SAS users may have grown accustomed to using only a few of the more widely well-known of these, and have had to invest significant time and extra effort manipulating code to achieve a desired result. This paper describes a selection of these SAS functions and how they can be used to obtain specific results with minimal coding. Solutions to a variety of programming challenges are demonstrated and the advantages of these techniques are discussed and compared to more traditional means. The functions included here are available in SAS v9.4, some may not be available in earlier versions.
AP-156 : Generate Customized Table in RTF Format by Using SAS Without ODS RTF - RTF Table File Demystified
Kai Koo, Abbott Vascular
Rich Text Format (RTF), developed and maintained by Microsoft, is a popular document file format supported by many word processors in different operating systems. Because RTF data file is human-readable, experienced programmers can write their own RTF document, including tables and graphs directly based on RTF specification. Although skilled SAS users can generate high quality RTF tables through ODS RTF and related procedures (e.g., PROC REPORT/TEMPLATE), tables in RTF format can also be produced by SAS DATA step without any ODS RTF codes involved. A SAS DATA step-based macro for RTF table preparation will be introduced in this article. The basic concept of coding tables in RTF format will be explained. The advanced techniques used in this macro for setting margins, header, footer, title, footnote, font types, highlight, border, column width, cell merging, and different approaches of inserting/embedding image file will also be demonstrated. SAS users can use those presented concepts and logics to develop their own SAS codes or even macro as an ODS RTF alternative to prepare customized tables for thesis writing, presentation, publication, and regulatory submission.
AP-157 : Practices in CDISC End-to-End Streamlined Data Processing
Chengxin Li, Luye Pharma Group
From programming perspective, principles and practices in the end-to-end streamlined data processing are described under CDISC umbrella. Besides compliance, there are several common practices across the clinical data processing lifecycle: traceability, controlled terminology, end-in-mind philosophy, structured design, and reusable solution (auto-generations for SDTM, ADaM, and TLF). The components of end-to-end streamlined data processing are also introduced: data collection, SDTM transformation, ADaM development, and TLF generation. The ISS/ISE programming model, MDR, and production harmonized with submission are depicted as well. To illustrate concepts, two examples are discussed, one on a specific data element (AE start date), another on efficacy analyses in pain therapeutic area. The end-to-end streamlined data processing with CDISC is the optimized programming model to achieve deliveries in high quality and high efficiency.
AP-158 : Six Useful Data Tool Macros
Ting Sa, Cincinnati Children's Hospital Medical Center
In this paper, six macros are introduced that can work as helpful tools for some common data tasks. The macro "HelpConsistency" can detect those fields that have same name but different data lengths or data types among the SAS data sets and fix the data length inconsistencies automatically. The macro "SelectVars" can select any variables in batches from a SAS data set, for e.g, the variables have similar naming patterns like the same suffix or the middle parts. The macro "ExportExcelWithFormat" can export formatted SAS data to excel files without losing the formats. The macro "FindFiles" can help users find and access their folders and files very easily. The macro "SearchReplace" can search and replace any string in the SAS programs. The macro "checkCharVarType" can provide more information for your character variables, like if the variable only contains numeric values, or contains no values or contains date and datetime values. The paper includes the SAS codes for all these six macros.
AP-187 : Tidyverse for Clinical Data Wrangling
Phil Bowsher, RStudio Inc.
RStudio will be presenting an overview of the Tidyverse for the R user community at Pharmasug. This is a great opportunity to learn and get inspired about new capabilities for clinical data wrangling in R. No prior knowledge of R or RStudio is needed. This short paper will provide an introduction to flexible and powerful tools for managing data as part of your research and reporting. The paper will provide an introduction to data processing with R and include an overview of the packages dplyr, magrittr, tidyr and ggplot2 with applications in drug development. A live environment will be available for attendees to explore the visualizations real-time.
AP-189 : From Static ggplot2 Output to Interactive Plotly Visualization to Shiny App
Phil Bowsher, RStudio Inc.
RStudio will be presenting an overview of creating interactive adverse events visualizations, dashboards and Shiny Apps. The session will cover the data visualization landscape in R which includes packages like ggplot2, crosstalk, plotly and various htmlwidgets. Adding interactive visualizations to R Markdown reports and Shiny apps will be covered with applications in drug development. RStudio will be showcasing several compelling examples as well as learning resources. As part of the short course, some available research-related R Shiny apps and R Markdown reports will be illustrated. Data from the OpenFDA APIs, including adverse events, will be used for the session visualizations. Moreover, RxNorm (created by the U.S. National Library of Medicine (NLM)) to provide a normalized naming system for clinical drugs) will be used for drug analytics. A live environment will be available for attendees to explore the visualizations they create real-time.
AP-191 : Creating In-line Style Macro Functions
Arthur Li, City of Hope
Macro functions that are used in our program are defined by the macro facility. By providing values for the function parameters, the macro function generates a result. We can insert the result directly into a macro statement in our program. Programmers seldom know that we can create a user-defined macro function as well. This paper will focus on the methods of creating an in-line style macro function via various examples.
AP-212 : Python-izing the SAS Programmer
Mike Molter, Wright Ave Partners
More and more, Biostats departments are contemplating and asking questions about converting certain clinical data and/or metadata processing tasks to Python. For many who have spent a career writing SAS code, the learning curve for this vast new frontier may appear to be a daunting task. In this paper we'll begin a gradual transition into Python data processing by looking at the Python DataFrame, and seeing how simple SAS Data Step tasks such as merging, sorting, and others work in the Python environment.
AP-216 : Using SAS ® ODS EXCEL Destination "Print Features" to Format Your Excel Worksheets for Printing as You Create Them.
William Benjamin, Owl Computer Consultancy LLC
This presentation will demonstrate features of the SAS® ODS EXCEL Destination that allow you to use various ODS EXCEL sub-options to prepare a worksheet to be printed, as it is written out by SAS. The sub-options that impact page level features are the following. "Choose to print in color or Black and White only"; "Center the output either horizontally or vertically"; "Print in draft quality or normally"; "Print horizontally or vertically"; "Arrange the data across the page differently"; "Select where to put the output"; "Print headers, footers, and row breaks"; "Adjust the size of headers and footers"; "Adjust the print scale to improve the look of lines in graphs"
AP-241 : Minimally Invasive Surgery to Merge Data
Many professional papers on the topic of combining data sources have been presented at SAS User conferences over the decades. Most focus on the issue of combining data through the Data Step (set, merge, update) or on joining in SQL. However, there is another, completely different technique using SAS Component Language functions that is optimal for a small range of applications. This paper will present the technique (used in a Data Step and in macro language), explain how it differs from traditional data combination techniques, and offer specific examples illustrating where it might be useful. The technique can also be used to extract either metadata or actual data from SAS data sources. Former developers of SAS/AF will recognize the technique; Base and macro programmers will be amazed!
AP-290 : PROC SQL to the Rescue: When a Data Step Just Won't Do Anymore
John O'Leary, Department of Veterans Affairs
This paper is written for beginning and intermediate SAS® Base users who are more comfortable using the traditional data step but would benefit by the efficiencies gained by PROC SQL. When processing datasets with 100 million records or more, there reaches a point when a programmer is forced to try new tools if they want to get home in time for supper. This paper provides some PROC SQL examples and other time-saving programming tips, and also explores the SQL Procedure Pass-Through Facility. Although PROC SQL can interface directly with various databases such as ORACLE, DB2, or MySQL, examples for dealing with Microsoft® SQL Server data explicitly from SAS® via SAS/ACCESS® to Microsoft® OLEDB are presented for your time-saving pleasure.
AP-298 : Automating SAS Program Table of Contents for Your FDA Submission Package
Lingjiao Qi, Statistics & Data Corporation
Bharath Donthi, Statistics & Data Corporation
To submit a complete and compliant data package to the FDA for product approval, the submission must include the SAS programs that generated the analysis datasets, tables, and figures. Including these programs in the submission package helps the FDA data reviewers to understand the process by which the variables for the respective analyses were created, and to confirm the analysis algorithms. Organizing these SAS programs into a Table of Contents is highly recommended as it serves as an easy reference for the FDA reviewers, verifies that each expected file is included in the submission, and provides the FDA with easily-accessible details for each program. The Table of Contents is usually compiled manually by reviewing each individual program and typing the required information into a word processor. This time-consuming process is inefficient and error-prone. To increase accuracy and efficiency, we have developed an in-house macro tool to automatically generate a Table of Contents by reading each submitted SAS program and its associated files. This easy-to-use macro tool can be fully executed within SAS, and it dramatically reduces documentation preparation time. To produce a complete and detailed Table of Contents, the macro tool extracts from the submitted programs: metadata (program name, size, descriptions, etc.), input datasets, output file names, and macros. This paper will provide a detailed description of our time-saving macro tool to assist SAS users in automatically generating the Table of Contents for their FDA data submission packages.
Application DevelopmentAD-032 : Extending the umbrella of compliance: LSAF with R Distributed Computing
Ben Bocchicchio, SAS Institute
Sandeep Juneja, SAS Institute
Rick Wachowiak, SAS Insitute Inc.
R studio is being utilized today to perform more and more highly specialized analysis in the Health and Life Industry. The R user's ability to 'pull' the latest package from a web-based R repository makes it easy to develop with the latest R code packages. To provide repeatability for R coding, once the R code is production ready, the R code and data are stored in Life Science Analytical Framework under version and audit control. Users can extend LSAF's compliant environment to include R code processing on a remote system, all that is needed is the LSAF API and a deployed web service. The process works by running a SAS program in LSAF and it passes instructs to the remote R server through the web service to pull the R code and data from LSAF to the remote server. A batch file then runs the R code in RStudio. Once the R processing is complete, the execution log and output is copied back into LSAF via the API. The BAT/script file then cleans up any local files that were generated on the remote server. The LOG that is posted back to LASF will contain the results of the entire process of the run: the version of the data used, the version of the R code used along with the versions of R packages used to generate the output. With this all this information, the output can be re-generated at a later date supporting the needs of compliance.
AD-048 : Reimagining Statistical Reports with R Shiny
Sudharsan Dhanavel, Cognizant Technology Solutions
Harinarayan Gopichandran, Cognizant Technology Solutions
How often you land up in a situation after providing your outputs for review and the reviewer comes back to you and ask "I would like to see with this change, would you make this adjustment?". A statistical programmer quite often receives multiple requests that results in several programming hours spent in altering the report parameters. Would it be cool if you point your output in dashboard and say "Here you go! it's an interactive Shiny app, you can alter the parameter or filter or subset as you wish to see and report right away appears in screen" The Shiny package, by RStudio, let you specify input parameters in Graphical User Interface controls like sliders, drop-downs, and text fields; incorporating plots, tables, and summary outputs. The app logic written in R modifies the outputs immediately when the inputs are altered in User Interface. This presentation demonstrates how R Shiny makes it simple for a statistical programmer to turn the analyses of data in an interactive web app to generate the TLF outputs for the Clinical trial that are traditionally generated using SAS; extending the possibility of data review/visualization and advantages of using R Shiny over SAS.
AD-052 : Text Analysis of QC/Issue Tracker Using Natural Language Processing (NLP) Tools
Huijuan Zhang, Columbia University
Todd Case, Vertex Pharmaceuticals
Many Statistical Programming groups use QC and issue trackers, which typically include text that describe discrepancies or other notes documented during programming and QC, or 'data about the data'. Each text field is associated with one specific question or problem and/or manually classified into a category and a subcategory by programmers (either by free text or a drop-down box with pre-specified issues that are common, such as 'Specs', 'aCRF', 'SDTM', etc.). Our goal is to look at this text data using some Natural Language Processing (NLP) tools. Using NLP tools allows us an opportunity to be more objective about finding high-level (and even granular) themes about our data by using algorithms that parse out categories, typically either by pre-specified number of categories or by summarizing the most common words. Most importantly, using NLP we do not have to look at text line by line, but it rather provides us an idea about what this entire text is telling us about our processes and issues at a high-level, even checking whether the problems were classified correctly and classifying problems that were not classified previously (e.g., maybe a category was forgotten, it didn't fit into a pre-specified category, etc.). Such techniques provide unique insights into data about our data and the possibility to replace manual work, thus improving work efficiency.
AD-054 : Generating Color Coded Patient ID Spreadsheet (PID list) To Make It Easier for the Reviewers.
Ranjith Kalleda, Pfizer, Inc.
Ashok Abburi, Pfizer
In general we generate patient id list (PID list) needed for patient profiles and narratives based on the requirements gathered from the users, but these PID lists will be generated repeatedly for different analysis such as interim analysis, primary analysis, 90/120 day safety update analysis, supplemental/final analysis. After the first PID list is generated for all subsequent PID list's we just add new data on top of already existing previous data. For all subsequent reviews the reviewers may not want to look at all the data but they just want to review the new data. In order to make it easier for the reviewers to just review the new data we can use color coding techniques to color code the new data so that reviewers can only review the color coded new data. In this paper we will discuss about the techniques we can use to generate the color coded new data. We will also discuss about how to make the PID list generation process efficient and easier across different portfolios within the organization.
AD-064 : %LOGBINAUTO: A SAS 9.4 Macro for Automated Forward Selection of Log-Binomial Models
Matthew Finnemeyer, Vertex Pharmaceuticals Inc.
In a study, relying on log-binomial models to generate estimates of relative risk for clinical outcomes, there was a need to select from a plethora of clinically-relevant covariates to produce parsimonious explanatory models. While SAS 9.4 allows for automated selection (forward, back, stepwise) for some non-linear models, it does not extend this functionality to all non-linear models, including log-binomial models. As it would be very time-consuming to code this for each model, the %LOGBINAUTO macro was created to automate forward selection for log-binomial models and implemented for the study. This macro utilizes nested %DO loops and non-sequentially manages a list of user-supplied covariates, using measures of statistical significance and improved model-fit, to select for their inclusion into a log-binomial model. This paper will outline the code and implementation of the %LOGBINAUTO macro for SAS 9.4, designed for an audience familiar with SAS MACRO language and the PROC GENMOD procedure.
AD-077 : An Application of Supervised Machine Learning Models on Natural Language Processing: Classifying QC Comments into Categories
Xiaoyu Tang, Vertex Pharmaceuticals
Todd Case, Vertex Pharmaceuticals
Quality Control (QC) trackers provide us text data that describe issues documented during QC. When documenting these issues, programmers classify their questions into general categories, such as ADaM and Specs, SDTM and Specs, TLF, SAP, etc., and more specific categories (e.g., AE, LB, ADSL, etc). Classifying issues by programmers is subjective and time-consuming. Hence, it's important to have an objective and efficient way to classify those issues - which category the question belongs to. Having such a tool to classify not only enables us to classify those questions in just one click, but also provides us information on frequencies of each category, and the categories of most concern. Our goal is to use natural language processing (NLP) tools and machine learning methods to model such a classifier using data from the QC Tracker text. We will investigate the relationship between categories and questions, and compare several models, such as support vector classifier, random forest, and etc. to obtain the prediction accuracy using cross-validation for each model. Our goal is to choose a model that has the highest prediction accuracy and is the most valuable for future prediction of free text from QC Trackers. We used Python for our data manipulation and data analysis. Intended audiences are expected to have knowledge on statistics.
AD-078 : ADME Study PK SDTM/ADaM And Graph
Fan Lin, Gilead Science
Yihan Jiang, Gilead Science
ADME study is usually conducted in early clinical drug development stage to understand the route of excrete of the drug and its metabolites in human body. It measures the concentrations of the parent/metabolite(s) and determines the amount of radioactivity in plasma, urine and feces. Due to the sample species involving urine/feces and subjects being discharged at different times the complexities of creating CDISC compiled SDTM and ADAM datasets are increased compared to other PK studies. In this paper we will introduce the complexities of ADME PK study and our approaches to resolve these challenges. This paper demonstrates the process of ADME study PKMERGE/PC/ADPC/TF in a flow chart, then describes the details in each step, followed by a list of challenges existing in current industry. As mentioned above ADME PK collects urine and feces species, the derived type "URINE+FECES" needs to be calculated in PKMERGE, the discussion of how to put the weight of derived species, sum of weight into CDISC standard data to facilitate analysis was brought in. Other challenges also include deriving last record of carried forward over for early discharge subjects to maximum discharge visit in the dataset and how to represent this in cumulative graph. This paper provides the detailed steps of resolving each challenges to create CDISC complied data modules and analysis presentations. Referring other industry white paper our process meets industry standard and provides high quality visual plot by utilizing the most powerful SAS graphic template language.
AD-094 : A SAS and VBScript cyborg to send emails effectively.
Nikita Sathish, Seattle Genetics
Clinical trial monitoring requires that SAS® programmers periodically generate statistical analysis reports and distribute the reports to cross-functional teams for review. Most reports are distributed via email or an enterprise content management system. Manual generation of email, however, can introduce errors. For example, recipients can be omitted, formatting errors can occur, and reports may not be sent in a timely manner. SAS® programmers can avoid these limitations by sending reports directly from SAS® to large, customized email distribution lists. SAS® programmers can further improve report distribution by combining SAS® with VBScript. VBScript allows programmers to overcome security concerns associated with SAS®, by not having to store login credentials or tweak the configuration file. We recommend that programmers send emails from SMTP server through SAS® using VBScript. This approach allows programmers to send emails with attachments, pass through SAS® macro variables to VBScript, and build dynamic email contents.
AD-104 : A Simple SAS Utility to Combine Existing RTF Tables/Figures and Create a Multi-level Bookmark Hierarchy and a Hyperlinked TOC
Lugang Larry Xie, Johnson & Johnson
RTF is a popular format that most sponsors adopt to create tables, listings and figures in the pharmaceutical industry. However, there is an unmet need to concatenate the outputs into a presentable single RTF file independent of the computing system. This paper presents a simple SAS approach to concatenate any SAS generated RTF files, including both tables/listings and figures, and create a multi-level bookmark hierarchy and a hyperlinked TOC, without having to modify those the macros/programs that create the individual output. It can be executed on any platforms with SAS installed. In addition, a simple approach to convert the combined RTF file into a PDF format is introduced in this paper.
AD-107 : Auto-generation of Clinical Laboratory Unit Conversions
Alan Meier, Cytel
When mapping local labs to SDTM data structures, dealing with the free-text used by sites to indicate the original reporting units in order to generate conversion factors to sponsor's standard reporting units is often a major headache. Most lab units are in an amount per volume structure, where amounts are normally in some multiplier of mass(grams/moles), counts(cells), International Units or Enzyme Units and volumes are mainly in liters. This paper will look at a strategy to automate with SAS, much of the task through a three-step process: step 1 - parse and categorize the amounts and volumes, step 2 - normalize to a published conversion factor, and step 3 - calculate the final conversion factor.
AD-109 : Tool Development Methods for Implementing and Converting to New Controlled Terminology in SDTM datasets
Martha O'Brien, Covance
Keith Shusterman, Reata
Controlled terminology (CT) for SDTM datasets allows for easier review for committee members, other programmers, consultants, consulting companies, the FDA, and many others. This may ultimately reduce the time it takes to get the drug or device to market. Ensuring a proper method of choosing and implementing a newer version of CT is not only necessary but vital to submission acceptance. Currently the FDA only requires CT versions of 2011-06-10 or later and with quarterly outputs there are many options to choose. This can make it difficult when starting a study or after a study starts a sponsor may decide that up versioning the CT is necessary. Sponsors will also need to harmonize the new CT with their own specific values that have been added to the extensible codelists prior to implementing new versions. Having a partially automated process to convert to a newer CT eases not only the time constraint but also reduces the possibility of human error. This paper will provide a process for creating tools when up versioning CT during an ongoing study.
AD-111 : Learning SAS® GTL Basics with Practical Examples
Jinit Mistry, Seattle genetics
Over time, the business needs for data visualization have evolved due to the increased complexity of clinical trials. There has been an increased demand for complex graphs as part of clinical study reports, based on statistical analysis plans. For efficacy endpoints of oncology trials, complex graphs such as KM plots, forest plots, waterfall plots, spider plots, and box plots require creating more robust and customizable SAS® code. To address this need, various statistical graphics (SG) procedures and Graph Template Language (GTL) have been introduced in SAS®. SAS® SG procedures use an associated GTL template in the background because SG procedures have limited ability to create graphs. This makes GTL critical to understand in order to create customizable templates and associate them with the data to accomplish the goals of the analyses. GTL can create more customizable graphs over SG procedures to address such complex graphical needs. This paper talks about GTL key statements and how GTL uses a template that can work for different scenarios. Some relevant industry examples will also be shared in this paper.
AD-118 : Automated Dynamic Data Exchange (DDE) Replacement Solution for SAS® GRID
Ajay Gupta, PPD Inc
In Pharmaceuticals and CRO industries, Excel® is widely used to create mapping specs and reporting. There are different ways in which the SAS® System allows data export into a Microsoft Excel spreadsheet and do Excel formatting. But, Dynamic Data Exchange (DDE) is the only technique providing total control over the Excel output. DDE uses a client/server relationship to enable a client application to request information from a server application. SAS is always the client. In this role, SAS requests data from server applications, sends data to server applications, or sends commands to server applications. Unfortunately, DDE is not supported on SAS GRID computing. This paper will explore a replacement solution for DDE on SAS grid using SAS stored process, Visual Basic for application (VBA), and SAS add-in for Microsoft Office. Later, the paper will explore the automation process and extend the solution to format Microsoft Word® documents.
AD-121 : The Application of SAS-Excel Handshake in DDT
Maggie Ci Jiang, Teva Pharmaceuticals
To execute the SAS code embedded in excel is always challenging due to the fact that the handshake takes place between the two different SAS and Excel software utilities. In clinical trial, when we do CDSIC data conversion, for example, instead of writing separate SAS programs to convert the Define Definition Table (DDT) to SAS datasets, we would like to have the SAS code written in excel so we can compare side by side the SAS code mapping to the rules and definitions recorded in excel file which serves as the DDT. This approach is particularly convenient for users in the clinical programming development as it allows the code mapping to be more visible; and to provide an easier way for the reviewers to trace the consistency and accuracy for the conversion between the rules and the real-time SAS code. This paper is to present the challenge of this SAS-excel handshake with a complete real-world CDISC SDTM data conversion example. The SAS-excel handshake individual examples in this paper include simple SAS code, the SAS built-in functions and the external customized macros. The benefits and the drawbacks of utilizing such SAS-excel handshake are also going to be discussed as well.
AD-129 : Automate the Mundane: Using Python for Text Mining
Nathan Kosiba, Rho, Inc
As programmers and biostatisticians, we have a number of tasks that are broken down to a "copy and paste from document A to document B" scenario. These tasks range from copying the name of the study and sponsor into a SAP to pulling Inclusion/Exclusion criteria for the TI domain. We perform these tasks ad nauseam. While SAS® is very good at dealing with structured data and analyses, it is not well equipped to deal with unstructured text such as a protocol. Python, on the other hand, has numerous tools for dealing with unstructured text and breaking it into meaningful pieces of information. While Python can do a vast amount of different things, we will be focusing on how to process information in a protocol to populate Metadata and create Trial Design datasets. Using Python and regular expressions, we can process large amounts of information held in a protocol or other document and retrieve what we need without much manual effort. These tools are often written with user interfaces included so there is no programming knowledge required for the end-user. In addition, Python integrates with a variety of existing systems and/or produces output that is easily used as input for these systems. This flexibility supports nearly fully automated results. This presentation will explore some of the Python packages that facilitate this process and how Python can be used to automate mundane tasks.
AD-136 : Why we should learn Python
Kevin Lee, Clindata Insight
Python is one of the most popular language nowadays. Python can be used to build just about anything, and it is a great language for data analysis, scientific computing, application development, backend web development, especially machine learning and many more. Python is currently featured in 70% of introductory programming courses at US universities and the latest report from Forbes states that Python grew by more than 450 percent in 2017. We, statistical programmers, have been using SAS and because of the popularity of Python, we wonder if we should learn Python. The paper will start with the current Python implementation and the future of its implementation. The paper will also show basic concepts of Python programming, similarities with SAS programming and difference from SAS programming. The paper will also introduce the benefits of learning Python including the opportunity in Data Science and Machine Learning, career opportunities and a high salary. And also, the paper will discuss the weakness of Python such as regulatory restriction and a lack of metadata. Finally, the paper will discuss the future of statistical programming in the use of Python programming.
AD-159 : Automate the Process to Ensure the Compliance with FDA Business Rules in SDTM Programming for FDA Submission
Xiangchen (Bob) Cui, Alkermes, Inc
Hao Guan, Alkermes Inc
Min Chen, Alkermes
Letan (Cleo) Lin, Alkermes Inc.
FDA has published "FDA Business Rules" , and expects sponsors to submit SDTM datasets which are compliant with the rules, as well as CDISC IG . These rules assess if the data supports regulatory review and analysis. Some of them are specific to FDA internal processes, rather than to CDISC SDTM standards. Pinnacle 21 is the most commonly used tool by both the industry and FDA to check compliance with both FDA business rules and CDSIC rules. However, Pinnacle 21 is usually used at a late stage of SDTM programming development cycle, and it cannot help users to resolve its findings regarding "Error" and/or "Warming" messages, even if it is used at the very early stage. This paper presents a systematic approach to automate SDTM programming process to ensure compliance with FDA Business Rules. It contains study data collection design, data collection (edit-checking), standard SDTM programming process, and in-house macros for automatically reporting and/or fixing the issues to address non-compliance with "FDA Business Rules". It avoids inefficient use of resources for repeated verification of the compliance and/or resolution of the findings from Pinnacle 21 for these rules. In fact, some of these non-compliant issues are often very "costly" and/or too late to be fixed at a late stage. The sharing of hands-on experiences is to assist readers to apply this methodology to prepare both FDA Business Rule and CDISC Standards compliant SDTM datasets for FDA submission to ensure the technical accuracy and submission quality, in addition to cost-effectiveness and efficiency.
Raj Kiran Boddu, Takeda
AD-203 : Large-scale TFL Automation for regulated Pharmaceutical trials using CDISC Analysis Results Metatadata (ARM)
Stuart Malcolm, Frontier Science (Scotland) Ltd
The creation of a Clinical Study Report (CSR) for Phase II/III Pharmaceutical clinicals trial involves the production of several hundred Tables, Listings and Figures (TFL). This can be a time-consuming activity when each TFL is programmed manually and a natural candidate for an automated solution that will 'batch process' all the TFL. However, although many TFL will be standard 'variations on a theme', there will also be many study-specific TFL which may have been developed in collaboration between Investigators and Statisticians and these require custom programming and a deeper understanding of the data. In addition, Sponsors often require CDISC Analysis Results Metadata (ARM) define.xml for submission to regulators, and this is often performed as an additional work-package at the end of the trial once the TFL have been delivered. This paper outlines an approach to TFL automation that involved creation of the CDISC Analysis Results Metadata at the start of the process, not the end, and then uses the metadata to generate the TFL using a SAS program structure that allows standardised TFL to be created while also providing flexibility to easily incorporate study-specific analyses.
AD-208 : Define-XML with ARM
Lei Jing, FMD K&L
Chao Wang, FMD K&L
Currently, Analysis Results Metadata (ARM) is a required component in PMDA data submission package. ARM assists the reviewer by providing traceability from key efficacy and safety analysis results to analysis datasets and dataset related elements, which adds significant value to a regulatory submission as well. However, the process of developing ARM in Define-XML may be very time-consuming if it is handled manually. This paper presents an effective approach to integrate ARM automatically into existing Define-XML v2.0. A macro is designed to convert all required information from ARM metadata into valid XML syntax and then insert the XML codes into existing Define-XML. This automating process will reduce the development cycle and increase package quality.
AD-211 : Validating Hyperlinks in SDTM define.xml Using Python
Brandon Welch, Rho, Inc
Greg Weller, Rho, Inc.
As a one-stop location for a clinical trial's metadata, the define.xml file is a vital piece of an FDA submission. Held within this file are many hyperlinks. Some links are internally specific to the XML file, while others point to external locations. Of particular interest in SDTM submissions are the links that externally map to the study's annotated case report form (CRF) - a PDF document. For a particular SDTM variable, a user clicks on the hyperlink and the annotated CRF opens to the variable's origin page. Depending on how these links are created in the define.xml, occasionally the page hyperlink fails to open the correct annotated CRF page. Manually testing each hyperlink is tedious and error prone. Fortunately, there are powerful Python modules for analyzing PDF and XML files. In this paper, we describe a technique using the Python programming language that checks each define.xml link against each page in the CRF PDF document. The techniques presented offer a good overview of basic Python techniques that will educate programmers at all levels.
AD-215 : Use of SAS Merge in Adverse Events Reporting for DSUR
Andrew Wang, Celgene
In reporting adverse events for DSUR, statistical programmers often need to separate AEs already reported in previous years from AEs present in current database. The task of identifying the adverse events already reported could present a challenge to the programmer due to the fact that there is no universally accepted way to identify an adverse event and all types of data issues could be present in the data. Different programmers might use different identification variables, and that presents a special challenge for validation as well. For an ongoing study, database might get changed from time to time, un-coded AEs last time could become coded this time, ongoing AEs will likely have an end date later, an AE could get deleted from the database or be split into multiple AEs. Many scenarios can happen. This paper will talk about how to use SAS merge to pick out already reported AEs based on the author's experience. The situations of many-to-many merge, one-to-many merge, and one-to-one merge will be discussed in detail for this particular application. A special case will be shown to illustrate that even one-to-one merge could lead to wrong results and suggestions for the selection of the ID variables will be given.
AD-228 : User-Defined Multithreading with the SAS® DS2 Procedure: Performance Testing DS2 Against Functionally Equivalent DATA Steps
Troy Hughes, Datmesis Analytics
The Data Step 2 (DS2) procedure represents the first opportunity that developers have had to build custom, multithreaded processes in Base SAS®. Multithreaded processing debuted in SAS 9, when built-in procedures such as SORT, SQL, and MEANS were threaded to reduce runtime. Despite this advancement, and in contrast with languages such as Java and Python, SAS 9 still did not provide developers the ability to create custom, multithreaded processes. This limitation was overcome in SAS 9.4 with the introduction of the DS2 procedure-a threaded, object-oriented version of the DATA step. However, because DS2 relies on methods and packages (neither of which have been previously available in Base SAS), both DS2 instruction and literature has predominantly fixated on these object-oriented aspects rather than DS2 multithreading. This text is the first to focus solely on DS2 multithreading and the performance advantages thereof. Common DATA step tasks such as data cleaning, transformation, and analysis are demonstrated, after which functionally equivalent DS2 code is introduced. Each paired example concludes with performance metrics that inarguably demonstrate faster runtimes with the DS2 language-even on a stand-alone laptop. All examples can be run in Base SAS and do not require in-database processing or the purchase of the DS2 Code Accelerator or other optional SAS components.
AD-234 : Camouflage your Clinical Trial with Machine Learning and AI
Ajith Baby Sadasivan, Genpro Life Sciences
Limna Salim, Genpro Life Sciences
Akhil Vijayan, Genpro Life Sciences
Anoop Ambika, Genpro Life Sciences
The internal focus for most pharmaceutical companies today falls under reviewing and implementing significant changes in R&D strategies. To aid this, greater transparency and sharing of clinical study reports and patient level data for further research is crucial. Also, recently in July 2018, the US Food and Drug Administration published a guidance which facilitates the use of Electronic Health Record Data in clinical investigations .With the EHR being made available for analysis and with the upcoming advocacy of Big Data in healthcare, there arises issues related to data privacy and provenance which can be overcome through the art of Data Anonymization. The clinical data in various forms like Individual Patient Data, Data from EHR or the CSR is extremely complex and it remains a challenge to develop tools required to analyze such data. This paper explores the possibilities of developing a dynamic software framework using Angular JS, Python®, SAS®, Rasa NLU and spaCy in compliance with the EMA Policy 0070 for easy and effective anonymization / pseudonymization by generating named entity recognition trainable system or similar counterparts that do not need programming work to improve. The biggest challenge is to identify the personal or quasi identifiers to be anonymised from our data and to mask it such that the original demeanor is not altered. The paper describes methods to overcome this with the help of AI and ML methodologies and gives the user the authority to approve the masking of required ID's proposed by the tool itself after proper training.
AD-278 : Creating a DOS Batch File to Run SAS® Programs
David Franklin, IQVIA
We often have many SAS programs to run in a directory. While it is possible to run each individually, it is better if a DOS Batch file be created with the list of programs being run and the order in which they are run. This paper looks at a SAS macro that will take the list of SAS programs in a directory, as specified by the user, and create a DOS Batch file to which can then be run to run all the SAS programs. Also presented will we a small SAS program that you can run at the end of the program to send you an email saying when the programs had finished, and whether there are any issues in the SAS LOG to review!
AD-279 : Statistical application in Image Processing by Integrating C and SAS
Vidhyavathi Venkataraman, Biomarin Pharmaceuticals
Srinand Ponnathapura Nandakumar, Alder Pharmaceuticals
Anupama Datta, TransUnion
Analysis and manipulation of image data after being decoded into a numerical format is known as image processing. Several filters utilizing statistical principles are applied to this numerical data to improve the quality of the image. Beginning with SAS 9.2, a unique procedure is available to incorporate the flexibility of C programs into SAS. In this paper, we discuss the procedure of customizing C functions and integrating them in SAS. An illustration of this feature is used in addressing common issues for image analysis like image size standardization, magnification and pixel transformation/contrasting. Statistical tools like weighted regression (bi-linear interpolation), Expectation Maximization (EM) algorithm in the Region of Interest (RoI) and the Histogram Equalization techniques are detailed for further image enhancement
AD-299 : Best Practices for ISS/ISE Dataset Development
Bharath Donthi, Statistics & Data Corporation
Lingjiao Qi, Statistics & Data Corporation
The integrated summary of safety (ISS) and integrated summary of efficacy (ISE) are vital components of a successful submission for regulatory approval in the pharmaceutical industry. ISS and ISE allow reviewers to easily compare individual outcomes, tracking subject's results across the entire clinical development lifespan of the investigational product. Furthermore, ISS/ISE facilitate broad views of the investigational product's overall efficacy and safety profiles. However, building integrated datasets is a challenging task as it requires the programmer to achieve consistent structures and formats while also ensuring that each dataset is CDISC-compliant. This paper provides best practices for ISS and ISE dataset development to guide integrated analysis dataset design and production in an efficient manner. First, we discuss best practices to ensure the consistency of integrated datasets by up-versioning all data with the same coding dictionaries (MedDRA, CTCAT, WHO, etc.) and by harmonizing all variable attributes (variable names, types, formats, labels, CODE and DECODES for categorical and ordinal variables, ranges for continuous variables, etc.). Next, we discuss CDISC requirements regarding the mapping of SDTM and ADaM. Then, we will talk about how to handle some complex cases in developing integrated datasets, such as when one subject participates in multiple clinical studies included the ISS/ISE. Finally, we will touch on key points of analysis involving consistent flag assignment across studies and proper application of integration methods for safety and efficacy analysis. This step-by-step guide enables the efficient and accurate creation of ISS and ISE datasets.
AD-316 : A Utility to Automate Reconciliation of Report Numbers and Titles
Valerie Williams, ICON Clinical Research
Ganesh Prasad, ICON Clinical Research
In clinical trials, analysis reports are generated in the form of tables, listings and figures (TLFs) for incorporation into Clinical Study Reports (CSRs) that are submitted to regulatory bodies for review and approval of a study drug or device. These TLFs are distinguished from one another by their unique report numbers and titles. A change in either of these two important pieces of metadata conveys a completely different meaning to the output so it is very important that they are accurate. Lead programmers need to identify and fix cases of missing TLFs and/or incorrect TLF report numbers or titles, prior to CSR submission. A utility program was developed, to automate this process and reduce time required to reconcile the report header information. This paper will describe the following main features of the report number and titles reconciliation utility: 1). Reading report numbers and titles from a .csv file of consolidated TLFs (created from bundled .pdf report files) or from an ISOTOP.xml file containing titles and footnotes; 2). Comparing them with the List of Tables (LoT) .xlsx file; 3). Identifying discrepancies between the two files; and 4). Generating an excel file with a summary report of the highlighted differences. This utility program was developed on PC SAS and runs on Windows SAS 9.3/9.4 and Unix SAS 9.4 SAS GRID operating system environments.
AD-326 : Interactive TLFs - A Smarter Way to Review your Statistical Outputs
Bhavin Busa, Vita Data Sciences (a division of Softworld, Inc.)
In the clinical industry, the data is represented and analyzed in the form of tables, listings and figures (TLFs) which are typically generated using SAS®. For the Biometrics teams, plotting data and presenting statistical information using SAS ODS and SAS graphics procedures is not new. However, the output that is generated using SAS ODS and SAS graphics procedure is for the most part static and only gives end users an ability to "look" at the output without giving them an ability to explore and drill-down further. With the rapidly increasing availability of data visualization and analytic tools, the industry landscape is shifting. In addition, the end users now want to 'see' their data more interactively, identify trends, visualize the patient profiles and review results at a high-level while still being able to drill-down to get a complete picture. They also wish to have access to their ongoing study data on 'Day 1' and don't want to wait for the study CSR TLFs to be programmed before they can use the data for their review or monitoring needs. In this paper, we will talk about how we have used the power of SAS® and TIBCO Spotfire®, to build "Interactive TLFs" using SDTM datasets to meet these demands. We will demonstrate through a case-study how a clinical team can use this platform to review typical statistical outputs/TLFs (e.g. demographics, disposition, AEs, concomitant medications, laboratory and vital signs) more interactively and thereby avoiding to flipping through hundreds if not thousands of static pages.
Data StandardsDS-049 : Streamlining the Metadata Management Process Using SAS® Life Science Analytics Framework
Alex Ford, SAS
It was not long into my clinical programming career before I discovered that CDISC is truly an acronym for "Can Do It Somewhat Correctly". Each run of a validation report uncovered new warnings or errors followed by tracking down the source of those issues to log and report for a define.xml. The latest release of the SAS® Life Science Analytics Framework (LSAF) provides a centralized framework where standards can be imported and live alongside a study and its data, managed by a graphical user interface. By associating a data standard, controlled terminology, and dictionaries with a study, team leads have the data and information necessary to produce a define.xml at the click of a button. Join us as we explore the metadata management features available in LSAF 5.1 which enable programmers of all levels to manage data standards correctly the first time, saving studies both time and money.
DS-055 : Achieving Zen: A Journey to ADaM Compliance
Kjersten Offenbecker, Clinical Solutions Group (CSG)
Alice Ehmann, Clinical Solutions Group (CSG)
Kirsty Lauderdale, Clinical Solutions Group
For many programmers and statistician creating compliant ADaM specifications, programs and datasets is confusing and a bit overwhelming. What should be included versus what to leave out? What level of traceability is needed? What information should be presented in the specs, and what efficiencies can be utilized in the code? Follow us as we work through some of the common pitfalls and map out a path which will help you navigate this winding road to come out with a more compliant product which is clearer for everyone to follow and understand. We will shine light on how to create compliant specifications which will lead you to compliant datasets that are everything the FDA is looking for.
DS-080 : Considerations in Time to Event Analysis for a Drug-Device Study
Wenying Tian, Shire
Karin LaPann, Shire
A device can be a drug delivery system in a clinical trial. In this case, the device functionality is a safety measurement in the study. There are different points of view when looking at device data: one is at the device level, and the other is at the subject level. At device level, a device can have many events: device implant, device malfunction/failure, device adjustment, device explant. At the subject level, one subject can have more than one device in the life cycle of the study. At some period of the time the subject may even have no device. The snapshots at various time points, and the period of time will be important to the safety measurement. After data collection, the device related information is mapped into seven Device domains. Device specific ADaM datasets are needed for device safety reporting. This paper will discuss three Device specific ADaM datasets, ADDL, ADDPR and ADTTE. We start by creating a Device-Level Analysis Dataset (ADDL) to capture device level characteristics. The purpose of the ADDPR is to look the device at subject level, summarizing the device functionality at the subject level. The third dataset needed is a time to event dataset (ADTTE). Two events could cause device explant: device failure or AE. We will discuss how to use ADTTE and ADDL flag variables to graph two types of time to event and create two different Figures. In addition, this paper will cover the QC steps needed for this ADTTE dataset.
DS-082 : Incremental Changes: ADaMIG v1.2 Update
Nancy Brucken, Syneos Health
Brian Harris, MedImmune
Terek Peterson, Covance
Alyssa Wittle, Covance
Deb Goodfellow, Covance
The ADaM Implementation Guide (ADaMIG) has now been available to industry since 2009, providing a standardized way to communicate and analyze study data. Improvements and clarifications were added in 2016 with the release of v1.1. Since that release, the ADaM team has been working on some items that were not yet ready for v1.1 but are now ready for the next release, v1.2. These items include important clarifications to existing text, standard nomenclature for stratification variables within ADSL, and a recommended approach for bi-directional toxicity grades. In addition, an update on the removal of the new suggested permissible variable within the Basic Data Structure (BDS) called PARQUAL will be discussed. The ADaMIG v1.2 will be discussed from both the perspective of changes from v1.1 as well as changes made since the public review of v1.2.
DS-087 : Strategy to Evaluate the Quality of Clinical Data from CROs
Charley Wu, Atara Biotherapeutics
More and more pharmaceutical and biotech companies (Sponsors) are using CROs for data management. High quality of data is the key for statistical analysis, FDA data submission and drug approval. As most of Clinical Data Management work is done by CROs now, evaluation of the quality of clinical data produced by CROs is a big challenge for all sponsors. Many sponsors just review the data manually or just do sporadic checking due to limited in-house resources. That leads to many data issues not identified before database lock. This paper introduces a comprehensive approach that includes both automatic and manual data review. Auto review consists of 1) data structure check, 2) new data check, 3) edit check, 4) SAE reconciliation, 5) PK reconciliation, 6) lab data normalization and reports, 7) critical variable check, 8) statistical check, and 9) ad-hoc reports. Auto review is achieved by SAS programming. Manual review is done by medical monitor, Pharmacovigilance, Clinical operations and data management. Manual review normally can identify 5-10% of data issues while auto review can identify over 90% of issues. Many manual review findings can lead to more SAS programming review and thus reduce the burden of manual review. Manual review usually takes about 1-2 weeks while auto review takes about 1-2 hrs. for each data transfer. This comprehensive approach greatly improves data quality and enables sponsors to lock database with confidence.
DS-088 : Pacemaker Guy: De-Mystifying a Business Use Case for SDTM Standard and Medical Device Domains
Carey Smoak, S-Cubed
Donna Sattler, Bristol Myers Squibb
Fred Wood, Data Standards Consulting Group
Medical Device Standards can be applied to even the most complicated Medical Device clinical research studies. There are many papers written on how to map certain kinds of data like Exposure Data or Lab Data, but not too many on how to incorporate Medical Device data with these SDTM Core Standards. Considerations were made for simple and complex data points when mapping to the SDTM standards. We learned that just like in biologics; you need to plan for the unexpected event with Medical Device studies. We take you through a subject experience by showing the mappings of the data but also illustrate the procedure(s) and how to visually map the data. The goal is to leave the participant/reader with a curiosity to want to map their own Medical Device data to the SDTM standards sooner than what the current expectation is. The more the Medical Device Industry uses the SDTM MD Standards, the more we can influence the regulatory agencies expectations and their tools for Medical Device domains.
DS-119 : Common Pinnacle 21 Report Issues: Shall we Document or Fix?
Ajay Gupta, PPD Inc
Pinnacle 21, also known as OpenCDISC Validator, provides great compliance checks against CDISC outputs like SDTM, ADaM, SEND and Define.xml. This validation tool provides a report in Excel or CSV format which contains information as errors, warnings, and notices. At the initial stage of clinical programming when the data is not very clean, this report can sometimes be very large and tedious to review. If the programmer is fairly new to this report s/he might not be aware of some common issues and will have to fully depend on an experienced programmer to pave the road for them. Indirectly, this will add more review time in the budget and might distract the programmer from real issues which affect the data quality. In this paper, I will discuss some common issues with the Pinnacle 21 report messages created from running against SDTM datasets and propose some solutions based on my experience. Also, I will discuss some scenarios when it is better to document the issue in reviewer's guide than doing workaround programming. While the author totally agrees that there is no one fit for all solution, my intention is to provide programmers a direction which might help them to find the right solutions for their situation.
DS-146 : Considerations when Representing Multiple Subject Enrollments in SDTM
Kristin Kelly, Pinnacle 21
Mike Hamidi, CDISC
In clinical trials, it has become more common for a study design to allow subjects to re-enroll in the same study or subsequent studies within a submission. For studies that allow subjects to re-screen for the same study, it may be difficult to determine how to represent the data for multiple enrollments in SDTM. There are a number of approaches seen in industry, but many pose issues. An example of this is creating multiple records in DM with the same USUBJID to represent each enrollment. Though this may seem the most straightforward approach, many tools used at FDA are configured to expect one record per subject and thus, the data may not readily load into their tools. Another approach is to assign different USUBJID values for the same subject within a study and across studies. This also creates issues for review because it is difficult to track the same subject across studies. This paper will focus on examples from industry as well as proposed solutions for representing this data in SDTM.
DS-148 : 7 Habits of Highly Effective (Validation Issue) Managers
Amy Garrett, Pinnacle 21
Pinnacle 21 Validator identifies problems in data; however, diagnostics, assessment and resolution of reported validation issues may feel like a complicated, never-ending process. In this presentation, we will discuss common challenges in managing data validation issues and how to handle them effectively. We will show you how to identify the source of validation issues, and how to classify them to understand when to fix or when to explain. We will also discuss cross-team collaboration, ways to improve your process, and habits that lead to faster issue resolution.
DS-151 : Bidirectionality to LOINC: Handling the Nitty Gritty of Lab Data
Bhargav Koduru, Seattle Genetics Inc
Laboratory data is often challenging to work with during analysis data set creation. This paper will include solutions to some of these complexities encountered during the LB (SDTM) and ADLB (ADaM) data set structure, including: " How to use the Model Permissible variables of the Findings class in the LB SDTM data set (Ex:__SPCCND, __SPCUFL) " How to identify the tests that could be graded by CTCAE 4.0, and determine the directionality " How to plan and execute the ADaM dataset to support the shift tables associated with bidirectional tests (Ex: Glucose-High and Low) " How to derive "Treatment Emergent" records in the ADLB ADaM datasets, bidirectional tests in particular " How to associate the preferred terms between CTCAE and MEDRA to establish the clinical connection between the adverse event and the lab result. (Ex: Neutropenia (MEDRA) and Neutrophil Count Decreased (CTCAE) can both be linked to the lab test "Neutrophils") With the requirement for LOINC (Logical Observation Identifiers Names and Codes) beginning for studies that start after March 15, 2020 for NDAs, ANDAs and certain BLAs, and on March 15, 2021 for certain INDs, I would like to share some thoughts on the LOINC implementation in the LB data. For example, glucose identified in serum/plasma or urine would both have the same TESTCD "GLUC" and often the units are also same (mg/dL), however, only the variable LBSPEC would differ between them. LOINC is proposed to address potential confusion by having a unique 6-part name.
DS-173 : Forewarned is forearmed or how to deal with ADSL issues
Anastasiia Oparii, Experis / Intego Group
Subject-Level Analysis Dataset is an important and essential part of each study which helps to review information about the patient across a clinical trial. Moreover, it provides traceability between all the analysis datasets and source data. Therefore, ADSL should be derived with special care. If you are really lucky, you even don't realize how many tricky questions it may cause when raw data is not clear enough or just something goes wrong. This paper summarizes some common issues related to ADSL programming and suggests potential solutions to avoid problems in advance. In particular, it focuses on dealing with partial or completely missing dates and their connections with other derived variables along with additional validation for variables which are selected from SDTM domains. Furthermore, it walks through some useful examples and provides SAS macros to identify issues.
DS-185 : Leveraging Intermediate Data Sets to Achieve ADaM Traceability
Yun (Julie) Zhuo, PRA Health Sciences
Traceability, a fundamental principle of ADaM, provides transparency and increases confidence for the FDA reviewers. Building traceability could be a daunting task for a complex ADaM data set especially when it involves multiple data sources and multi-step derivations. In this paper, we illustrate the benefits of using an intermediate data set to achieve traceability using example SDTM and ADaM data sets from a Phase III oncology study. Through the intermediate data set, along with its metadata, it is possible to trace the final analysis value to a record in the intermediate data set, and then from there to the source domains.
DS-196 : Practical Guide for Creating ADaM Datasets in Cross-over Studies
Neha Sakhawalkar, Rang Technologies
Kamlesh Patel, Rang Technologies
Analysis datasets (ADaM) are categorized into Subject Level Analysis Data (ADSL), Basic Data Structure (BDS), Occurrence Data Structure (OCCDS) and Other ADaM data structures. The first three are most common and are used to analyze data in most of the parallel studies. Implementation of ADaM standard in studies is relatively straightforward for experienced programmers and biostatisticians. However, cross-over study designs are implemented in selective clinical trials compared to parallel study due to many factors; hence, many programmers are not quite aware of implementing ADaM datasets for cross-over study design. This paper aims at focusing on ADaM datasets for cross-over studies with details regarding variables, difference between derivations of these variables across the various data structures (ADSL, BDS and OCCDS) as well as an example of each data structure.
DS-202 : This Paper focuses on CDISC Questionnaires, Ratings and Scales (QRS) supplements and types of FDA Clinical Outcome Assessments
Shrishaila Patil, Quanticate International Ltd
CDISC develops SDTM (tabulation) and ADaM (analysis) QRS supplements that provide information on how to structure the data in a standard format for public domain and copyright-approved instruments. An instrument is a series of questions, tasks or assessments used in clinical research to provide a qualitative or quantitative assessment of a clinical concept or task-based observation. Controlled Terminology is also developed to be used with the supplements. CDISC creates supplements for three types of instruments: " Questionnaires " Functional Tests " Clinical Classifications This Paper is an effort to " Understand QRS supplements & how it is developed " Understand how to model ratings and scales other than questionnaires, " Understand AdaM Structure to be used for QRS supplements, " Understand different types of FDA Clinical Outcome Assessment (COA)'s o Clinician-reported outcome (ClinRO) o Observer-reported outcome (ObsRO) o Patient-reported outcome (PRO) o Performance outcome (PerfO) " Understand how CDISC QRS Supplements assist in structuring Clinical Outcome Assessment (COA) data so that it is collected and reported in a standardized format.
DS-223 : Homogenizing Unique and Complex data into Standard SDTM Domains with TAUGs
Sowmya Srinivasa Mukundan, Ephicacy Lifescience Analytics
Charumathy Sreeraman, Ephicacy Lifescience Analytics
Clinical research supports discovery of new and better ways to detect, diagnose, treat, and prevent disease. Furthermore, the core focus of each therapeutic area (TA) is on research and development of treatments, together with prevention of specific diseases. It must be envisaged that each TAs demands diverse way of collecting, measuring and analysing data based on the focus of the research. SDTM is one of the pioneer CDISC foundational standards. It defines and underpins the strategy for submitting data tabulations to regulatory authorities. The SDTMIG organizes and formats data to support streamlined data collection and analysis across different TAs. TAUGs are extended Foundational Standards to represent data that pertains to specific disease areas. It supports pharmaceutical / biotech companies with implementation of these CDISC standards for a specific disease and facilitate resolutions for mapping additional or unique data points needed to support any given TA for their analysis. In this paper we will explore the TAUG (focusing on two different Therapeutic Areas) and will be elaborately discussing how to map the unique and custom data to the standard SDTM domains with the help of TAUG.
DS-239 : More Traceability: Clarity in ADaM Metadata and Beyond
Wayne Zhong, Accretion Softworks
Richann Watson, DataRich Consulting
Daphne Ewing, CSL Behring
Jasmine Zhang, Boehringer Ingelheim
One of the fundamental principles of ADaM is that datasets and associated metadata must include traceability to facilitate the understanding of the relationships between analysis results, ADaM datasets, and SDTM datasets. The existing ADaM documents contain isolated elements of traceability, such as including SDTM sequence numbers, creating new records to capture derived analysis values, and providing excerpts of define.xml documentation. An ADaM sub-team is currently developing a Traceability Examples Document with the goal of bringing these separate elements of traceability together and demonstrate how they function in detailed and complete examples. The examples cover a wide variety of practical scenarios; some expand on content from other CDISC documents, while others are developed specifically for the Traceability Examples Document. As members of the Traceability Examples ADaM sub-team, we are including in this PharmaSUG paper a selection of examples to show how traceability can bring transparency and clarity to your analyses.
DS-250 : Timing is Everything: Defining ADaM Period, Subperiod and Phase
Nancy Brucken, Syneos Health
The CDISC Analysis Data Model Implementation Guide (ADaMIG) provides several timing variables for modeling clinical trial designs in analysis datasets. APHASE, APERIOD and ASPER can be used in conjunction with related treatment variables to meet a variety of analysis requirements, from single-period parallel studies to much more complicated situations involving multiple treatment periods and even different studies. The goal of this paper is to illustrate how some of these study designs may be handled in ADaM, and provide guidelines for selecting when to use the different timing variables that are available.
DS-254 : Findings About: De-mystifying the When and How
Soumya Rajesh, Syneos Health
Michael Wise, Experis
CDISC offers Findings About (FA) and Supplemental Qualifiers (SUPPQUAL) to handle information that doesn't fit into standard domains - or 'Non-standard variables'. They are however, quite distinct from each other and the appropriate use for each may still lead to confusion. "When should FA be created?" or "When is it best to use SUPPQUAL?" These are important questions that can only be answered by asking additional data questions. When the data does not fit into the parent domain, it may only be mapped to SUPPQUAL if it relates to one parent record. However, almost all other situations are covered by FA - wherein data relates to multiple records, or when a two-way relationship is needed etc. FA would be the right approach then, because it has versatility beyond what's offered by SUPPQUAL. For example, FA would provide a way of storing symptoms along with the time that they began and relating each back to the AETERM in the AE dataset. In addition, FA as a stand-alone domain is also the only place to store information surrounding an event or intervention that has not been captured within any specific domain. This paper will present examples from a few different therapeutic areas or domain relationships to highlight the proper use of FA. Another scenario will look into hoe FA accommodates a many to many relationship. Such examples should clarify mysteries surrounding when and how to best use or create FA.
DS-261 : Raising the Bar: CDASH implementation in a biometrics CRO
Julie Barenholtz, Cytel
Since its first publication in 2011, CDASH has improved reliability in data collected in clinical trials. 67% of current CDASH maps directly to SDTM, providing increased traceability between data collection and data analysis. An advantage to having data management and statistical programming under the same roof is the ability to create efficiencies in processes that lead to more timely submissions and ultimately gets novel treatment to patients more quickly. Having a standard set of CDASH compliant CRFs can greatly reduce the time for CRF design, and time to go-live. The CDASH CRF questions and their intentions are comprehensive and widely understood among the industry. The variable naming is designed in a way that is understood by an SDTM statistical programmer. This allows for a more efficient annotation of the BlankCRF, which helps to place data more quickly and accurately into the right domain for SDTM. The goal of this paper will be to share our efficiencies and methods for creating both a CDASH library, but also the partnership with statistical programming to create standard TLF shells and programs
DS-268 : Best Practices in Data Standards Governance
Melissa Martinez, SAS Institute
Most organizations working in the pharmaceutical and biotechnology industries have adopted CDISC submission data standards by now, but the challenge of effective governance and compliance within an organization remains high. CDISC data standards are notoriously open for interpretation and the assumptions and understanding of the data standards can vary widely among users. Adopting CDISC standards is more than just making CDISC-style submission data sets and using tools to help with the programming and compliance checks. Each organization needs to define its own interpretation of CDISC standards to ensure consistency among its studies, put in place workflows and processes to facilitate the governance process, determine what it means to be compliant with the CDISC data standards, and find tools to help with the governance process and compliance determination.
DS-303 : Confessions of an ADaM specs writer: How to write a clean first draft of ADaM specifications
Anbu Damodaran, Covance
Writing ADaM specifications is one of the daunting tasks for most statistical programmers and some biostatisticians. Even though it has been ten years since the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) Implementation Guide (ADaMIG) was published, more often than not critical parts of ADaM specifications are written ambiguously which results in multiple iterations of rework at the time of database lock. In this paper, we will discuss how specs writers can explicitly apply a generic thought process to write a clean first draft of ADaM specs and minimize the impact of late dataset updates which affects the time, scope and cost of the project.
DS-304 : Considerations and Updates in the Use of Timing Variables in Submitting SDTM-Compliant Datasets
Jerry Salyers, TalentMine
Often, the appropriate use of Timing variables can present many challenges for sponsors when converting their operational database or legacy data to an SDTM-compliant format. This paper will discuss common scenarios encountered when converting operational data to SDTM. One of the scenarios involve occasions where the CRF allows for checking an "ongoing" box in lieu of providing an end date. In such cases, the SDTM-based datasets require that these data points be represented by the correct use of the Relative-Timing variables (i.e., ---STRF, ---ENRF, --STTPT, --STRTPT, --ENTPT, and --ENRTPT). When doing so, sponsors must address questions such as: 1) ongoing as of what point in time and 2) is the comparison to the study reference period more appropriate or is there an alternative anchor or reference time point that would be better suited? From controlled terminology, what are the best choices for these different scenarios where an end date might be blank? We will also explore the appropriate use of other Timing variables such as when a sponsor may need to express an evaluation interval in plain text rather than in ISO 8601 format (--EVINTX). We will also introduce the new Timing variables that will support the new Trial Milestone and Subject Milestone domains. And finally, we will update areas where we have seen continued issues where data require the use of variables to define sample-collection time points (i.e., --TPT, TPTNUM, and -ELTM), along with the anchors that identify the "reference" or baseline for these collections (i.e., --TPTREF and -RFTDTC).
DS-308 : Using CDISC Standards with an MDR for EDC to Submission Traceability
Paul Slagle, Syneos Health
Eric Larson, Syneos Health
CDISC Standards have long promised a way to add clarity to a submission through the use of traceability. This includes the ability of tracing the data collected in an EDC system through to the analysis provided to the regulatory authority. The challenge with doing this is that to manage all of the traceability is a documentation headache. With the use of a metadata repository, you can develop screens that can be pushed into an ODM compliant EDC system, such as Medidata Rave. From that push you can receive the raw data and transform that, using SAS, into a format compliant to SDTM standards while maintaining the connections of where the data came from in EDC. Adding Results Metadata, when entered into the metadata repository, then allows the creation of ADaM datasets which are built on SDTM data collected from the EDC system. This paper will demonstrate how an off the shelf MDR tool can be used with SAS to build these connections while maintaining compliance to CDISC standards.
DS-311 : What's New in the SDTMIG v3.3 and the SDTM v1.7
Fred Wood, Data Standards Consulting Group
The SDTMIG v3.3 and the SDTM v1.7 were released in November of 2018. These versions have been published as HTML documents, rather than the typical PDFs of the past. While the SDTMIG v3.2, which was published in 2013, contained 398 pages, the SDTMIG v3.3 would be more than 600 pages if formatted properly and printed to PDF. A number of new morphology/physiology domains have been added. Other new domains and new concepts have been added, and several domains that had undergone public review in 2014 were subsequently combined with existing domains. The Disease Milestones concept, introduced for the TAUG-Diabetes in 2014 is now included in this latest version of the SDTMIG. A new Section 9 includes Study References, with added models for Device Identifiers, Non-host Organism Identifiers, and Pharmacogenomic/Genetic Biomarker Identifiers. As with any new version of the SDTMIG, there is a corresponding version of the SDTM, which contains new variables in addition to the concept of domain-specific variables. This presentation will summarize the additions to the SDTM and SDTMIG described above, as well as other changes that might affect implementation.
DS-319 : Updates on validation of ADaM data
Sergiy Sirichenko, Pinnacle 21
Analysis data is critical for regulatory review process. It helps reviewers understand the details of performed analysis and reproduce results reported by sponsors. Clinical study analysis data is required to be submitted in CDISC ADaM format to both FDA and PMDA. Therefore, validation of analysis dat a for compliance with CDISC ADaM standard and additional business rules from regulatory agencies is an important step in preparation of study data for regulatory submissions. In this presentation we will provide an overview of ADaM validation implemented by Pinnacle 21 and used by both FDA and PMDA. It will cover changes related to the new ADaM IG 1.1 standard and the updated version of validation rules from CDISC team. The presentation will also detail additional regulatory business rules, including data and define.xml consistency, validation of ADAM OTHER datasets, SDTM/ADaM traceability, and integrated data.
DS-336 : Metadata Repository V1.0 - A Case Study of Standards Governance
Aparna Venkataraman, Celgene
Bharatkumar Palakurthi, Celgene
Over the last couple of decades Metadata Repository (MDR) tools have been playing a growing role in the Pharmaceutical/Bio-Technology space. It has been long since Regulatory agencies as well as the sponsor companies have, for various reasons widely recognized the need for standards in Protocol Development, Data Collection and Submission. However, we still face tremendous challenges in maintaining and managing Clinical Data Standards that fall far beyond the capabilities of any single MDR tool that is currently available in the market. We will explore in this paper, the fundamentals of a strong Governance Model that can leverage any potential MDR tool, while setting a robust framework to handle Industry and Internal standards, Versioning, System limitations, while providing high quality reusable Clinical Data Standards to internal study teams in a timely manner. Together, we will explore the tools that need to be in the Standards Manager's toolbelt to successfully dispense End-to-End Clinical Data Standards across the company. In the multi-dimensional world of Clinical Data Standards, a nimble Governance Model will play an undeniable role in getting that critical drug in the hands of the patient at the right time.
DS-344 : Panel Discussion: Medical Devices - Implementation Through Submission
Fred Wood, Data Standards Consulting Group
Carey Smoak, S-Cubed
Donna Sattler, Bristol Myers Squibb
Mike Lozano, Eli Lilly and Company
Karl Miller, Syneos Health
Currently CDRH recommends but does not require medical device companies to submit data that conforms to CDISC standards such as the SDTM and ADaM. Pharmaceutical companies were at a similar stage many years ago. The adoption of CDISC standards by pharmaceutical occurred gradually over more than a decade until they became required for studies starting after December 2016. Most major pharmaceutical companies were prepared well prior to the requirement, but such preparation took many years for most of them. What can medical device companies learn from the experience of pharmaceutical companies? The implementation of CDISC standards takes time, and it's better to develop a plan before they are required. In addition, pharmaceutical companies that participated in the development of the standards also benefited by 1) providing significant input into the new standards, and 2) getting an early look at what the standards would look like. It's recommended that medical device companies follow suit, and begin implementation of CDISC standards now, and also participate in further developing standards for medical devices. One new area for both pharmaceutical and medical device companies to get involved in is the area of real-world evidence. So, come and listen to the panelist as they discuss these important issues.
Hands-on TrainingHT-046 : Express yourself with Python & R
Charu Shankar, SAS Institute
With the entry of several new open source languages, users feel the need to learn them and understand the differences and commonalities between them. Come learn how to express your data needs by writing a python panda, basic R code. These are completely different languages. Learn how these languages stack up with SAS code, and do a compare and contrast. Learn how you can write native python code and submit it in your SAS session. No software purchase is necessary. All necessary software links will be provided by the instructor for this informative education seminar
HT-063 : Developing Custom SAS Studio Tasks for Clinical Trial Graphs
Olivia Wright, SAS
SAS Studio provides point-and-click tasks for basic statistics, biostatistics models, and statistical graphs. SAS Studio users can create their own Custom Tasks by modifying existing tasks or writing new ones using simple text commands from Apache's Velocity Template Language. Custom Tasks can simplify sharing of code by turning complex SAS graphics code into point-and-click tasks. The same functionality can be integrated into a more complete analytic modeling workflow by adding additional graphing options to an existing built-in task. In this workshop, we will work hands-on to create a graphical user interface for code-heavy SGPLOT clinical trial graphs.
HT-067 : Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You
Vince DelGobbo, SAS
This presentation explains some techniques available to you when working with SAS and Microsoft Excel data. You learn how to import Excel data into SAS using the IMPORT procedure, the SAS DATA step, SAS Enterprise Guide, and other methods. Exporting data and analytical results from SAS to Excel is performed using the EXPORT procedure, the SAS DATA step, SAS Enterprise Guide, the SAS Output Delivery System (ODS), and other tools. The material is appropriate for all skill levels, and the techniques work with various versions of SAS software running on the Windows, UNIX (including Linux), and z/OS operating systems. Some techniques require only Base SAS and others require the SAS/ACCESS Interface to PC Files.
HT-089 : Build Popular Clinical Graphs using SAS
Survival Plots, Forest Plot, Waterfall Charts and Swimmer Plots are some of the popular, frequently requested graphs in clinical research. These graphs are easy to build with the SGPLOT procedure. Once you understand how SGPLOT works, you can develop a plan, prepare the data as per this plan and then use the right plot statements to create almost any graph. This Hands-on workshop will take you step-by-step through the process needed to create these graphs. You will learn how to analyze the graph and make a plan. Then, put together the data set with all the needed information. Finally, layer the right plot statements in the right order to build the graph. Once you master the process for these graphs, you can use the same process to build almost any other graph. Come and learn how to use SGPLOT procedure like a pro.
HT-145 : Hands-on Training for Machine Learning Programming
Kevin Lee, Clindata Insight
The most popular buzz word nowadays in the technology world is "Machine Learning (ML)." Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes such as: self-driving vehicles; online recommendation on Netflix and Amazon; fraud detection in banks; image and video recognition; natural language processing; question answering machines (e.g., IBM Watson); and many more. This is leading many organizations to seek experts who can implement Machine Learning into their businesses. Hands-on Training of Machine Learning Programming is intended for statistical programmers and biostatisticians who want to learn how to conduct simple Machine Learning projects. Hands-on Training will go through the following simple steps. 1. Identify the problems to solve 2. Collect the data 3. Understand the data by data visualization and metadata analysis 4. Prepare data - training and test data 5. Feature engineering 6. Select algorithm 7. Train algorithm 8. Validate the trained model 9. Predict with the trained model The training will use the most popular Machine Learning program - Python. The training will also use the most popular Machine Learning platform, Jupyter Notebook/Lab. During hands-on training, programmers will use actual python codes in Jupyter notebook to run simple Machine Learning Projects. In the training, programmers will also get introduced popular Machine Learning modules - sci-kit learn, tensorflow and keras.
HT-171 : Value-Level Metadata Done Properly
Sandra Minjoe, PRA Health Sciences
Mario Widel, Independent
Value-level metadata is nothing mysterious. It is simply a way to describe how a variable is derived when that derivation differs based on some circumstances. When done properly, value-level metadata makes a define.xml more reviewer-friendly. A common use in ADaM for value-level metadata is when AVAL is derived based on PARAM, but this is not the only time to use value-level metadata. This HoT will include examples and exercises of value-level metadata in SDTM Findings, in ADaM BDS, and more. Additionally, it will provide guidance to help attendees decide when to use value-level metadata.
HT-177 : Sample Size Determination with SAS® Studio
Bill Coar, Axio Research
Experiments are of designed to answer specific questions. In order for the results to have a reasonable amount of certainty and statistical validity, a sufficient number of observations is required. To achieve this, the concepts of type 1 error, power, and sample size are introduced into the experimental design. Real world constraints such as budget and feasibility are equally as importation. Thus, statisticians often re-evaluate sample size under varying sets of assumptions, sometimes on the fly. SAS Studio provides a variety of tasks associated with determining sample sizes using a point-and-click approach to enter assumptions, yet it also provides the ability to save the underlying SAS code so that it can be either refined or used at a later time. The purpose of this Hands-on-Workshop is to introduce some of the features of SAS Studio for sample size determination. A number of examples will be introduced, including tests associated with proportions, means, and survival analysis. Each exercise will start with a research question, proposed methodology, and list of requirements needed for estimating the sample size. The attendees will then have the opportunity to work through the exercise using SAS Studio, and allow for discussion of adaptations that may be necessary for its use in their everyday programming environment.
HT-188 : Creating & Sharing Shiny Apps & Gadgets
Phil Bowsher, RStudio Inc.
HT-329 : Interactive Animations
Kriss Harris, SAS Specialists Ltd
Richann Watson, DataRich Consulting
This paper demonstrates how you can use interactive animation in SAS® 9.4 to assess and report your safety, and efficacy data. The interactive visualizations that you will be shown include animations of patients laboratory results, vital sign results, adverse event counts and electrocardiogram results over time. In addition, you will be shown how to display "details-on-demand" when you hover over a point. Animating your data will bring your data to life and help improve lives!
Leadership and Career DevelopmentLD-023 : Attracting and Retaining the Best!
Kelly Spak, Covance Inc.
A great manager recognizes the skill needed to attract and retain the best employees. This presentation will identify ways that you can work with your recruiting team that will help set you above the competition by attracting and retaining the best employees. Topics will include recruiting, candidate experience, onboarding, training and continued career development.
LD-106 : Considering Job Changes in an Ever-Changing Environment
Kathy Bradrick, Triangle Biostatistics, LLC
Ed Slezinger, Omeros Corporation
What is changing in our industry? The simple answer is& everything. Technology is quickly changing, employment laws are changing, sponsor companies are seeing more merger and acquisition activity than ever, and so are CROs and service providers. So how do you as the SAS programmer navigate all of these changes as you contemplate career and job changes? See perspectives from executives in both pharma and service providers as they explore changes in our industry that are affecting hiring decisions and company strategies.
LD-150 : The human side of programming: Empathetic leaders build better teams.
Bhargav Koduru, Seattle Genetics Inc
Balavenkata Pitchuka, Seattle Genetics Inc.
People management is an important aspect of project management. Motivation and passion of team members play a key role in the success of a project. An empathetic leader can understand the emotional needs of the team better, to keep members motivated and productive. Employees who are emotionally satisfied will work harder and are more likely to stay put. In turn, companies will benefit from the members with higher productivity and lower turnover. Besides productivity, compassion also drives innovation. If employees are concerned about the potential consequence of their mistakes, they may feel hesitant to take risks and to derive new solutions. An open environment encourages innovation and experimentation, without fear of failure. Like any other skills, empathy can be developed by constant practice. In this paper, we would like to share ways in which one can identify areas for improvement for an employee while creating a safe environment where he/she can open up and feel free to share, and also to provide recommendation on how to practice empathy to help the member to improve further. Practicing empathy will help to bring the team together and also to create an engaging and active environment that can churn out high-quality deliverables, driving the company's mission and future.
LD-155 : Working from a Home Office Versus Working On-Site
Timothy Harrington, SAS Programmer
This paper is a discussion of the benefits and challenges of working remotely as opposed to working on site for an employer or a client. The primary issues addressed include communication, productivity, knowledge and skill sharing, health and well-being, and social and economic factors. The impacts of telecommuting are considered from the standpoint of each of the remote worker, their employer, and the environment and society as a whole.
LD-174 : It's a wonderful day in this neighbourhood - Managing a large virtual programming team
Victoria Holloway, Covance
ABSTRACT Programming can be lonely, isolated work within a large virtual team. Whether in the office or at home, everyone needs to be part of something bigger. Creating a virtual neighborhood-style community center enables the programmers to interact with each other as neighbors. Community is the first step for building trust and respect through experience and successes. Important public works projects including infrastructure, such as electricity, water, phone, and internet are essential to the community and can be likened to training, code sharing, work breaks and process improvements. Nothing happens unless there are funds to support these endeavors, so linking the team to the financial aspects is also key to running successful projects. Counting the beans includes how to track the work and project management tools. This presentation will share my experience as a new manager where neighborhood concepts helped ensure that the work was done on time, within budget and by a competent team happy and able to work together. Mr. Rogers had it right: creating "a wonderful day in the neighborhood" allows for successful projects even under difficult circumstances.
LD-200 : Find Your Story
Adam Sales, PRA Health Sciences
Healthcare and medical treatment is something we all engage in. As such, we who work in the industry have an extraordinary opportunity to connect to our work and its impact. Ironically, many leaders do not prioritize this bond, failing to help team members develop a meaningful relationship with their work. Instead we often hear about simple motivational encouragement. Leaders promote recognition, instant awards, advancement, and personalized approaches based on employee interests-even though they have minimal control over many of these incentives and all of them are barter systems. In losing sight of the impact of our work, any job can become rote. In contrast, by keeping the significance and impact of our work front and center, the task itself can become motivational. When team members are intrinsically motivated, they understand and are inspired by the impact of their contribution, no longer investing in their career simply because they are committed employees or looking for external rewards. This presentation will focus on how to help team members find their story-the means by which they connect with their role in a deep and meaningful way. The impact of job fulfillment naturally cascades to engagement, motivation, and retention.
LD-207 : Improving the Relationship between Statisticians and Programmers in Clinical Trial Studies
Mai Ngo, Catalyst Clinical Research, LLC
Mary Grovesteen, Triangle Biostatistics,LLC
Vaughn Eason, TriStats
Successful deliveries of analysis outputs for a clinical trial study depends on a strong biostatistics team, which typically includes a study statistician and a programming team with several statistical programmers. As trials get more complex and biostatistics teams face increased pressures to produce outputs efficiently and on a timely basis, a strong working relationship between the study statistician and the programming team is vital to the success of the analysis project. Yet with the time pressure and the increased complexity of the analysis as well as challenging data issues common in clinical studies, the communication between statistician and programmers tends to break down when it is needed the most. This results in frustration from both sides, inefficiencies that could have been avoided, and stressful last-minute work and rework. Having worked both as statistical programmer and statistician, we have been fortunate to gain valuable hands-on perspectives from both sides. Based on personal reflections as well as conversations with my colleagues, we will present some of the key areas of frustration in the working relationship between a study statistician and the programming team, touch on perspectives from both the programmer and statistician, and offer suggestions for alleviating these issues.
LD-224 : Pains and Gains in software development - from PoC to Market
Reshma Rajput, Ephicacy LifeScience Analytics
Software development in clinical domain has its challenges in developing products and deploying new technologies. The challenges could be internal such as hiring, training, managing internal expectations within the development, testing and related teams. Also, challenges could external such as meeting customer expectations. During the Proof of Concept (PoC) phase of a project, extensive demonstrations of the product and technologies used are presented to prospective clients. Major issue here is aligning the product knowledge with prospective clients who have very less knowledge of the use cases of the product and in few cases, the domain too and business takes priority weakening the stance of technological value addition in the long run post implementation. Another momentous challenge is lack of enough documentation during the different phases of the project, from PoC to deployment. This could be at either the Client or the Supplier end or both, impacting the current and future releases of the project. Hiring and training are crucial factors in resolving the workforce related challenges within an organisation. All internal functions need to comprehend the relevance of having adequate documentation for User Requirements, User Stories, Trace Matrix, various plans like Project Plan, Implementation Plan, Risk & Mitigation Plan, Migration Plan etc. The Client also needs to emphasise the gravity of documentation. This would pave the building blocks for a successful project and a delighted customer. This paper will share insights on addressing these challenges in an amicable manner meeting the needs of both internal employees and the customers.
LD-252 : Statistical Programming Roles - Time to Reevaluate Job Profiles & Career Ladders
Vijay Moolaveesala, PPD, Inc
Ajay Gupta, PPD Inc
Be it a CRO or Pharma industry, traditional statistical programming/analyst roles have always been centered around programmer supporting from data programming to TLF (Tables, Listings and Figures) programming. Job profiles, job postings, screenings and hiring have always been focused on evaluating individuals on their well-rounded experience on all expected tasks to be performed by programming department. The job profiles have always been created to hire a generalist who has experience in all the aspects programming to support diverse programming needs of the department. Hiring specialists into the generic roles has been perceived as bottleneck for the department. On the contrary, drastic computing environment changes, Industry wide data standard implementations, complex study designs, and regulatory process requirements can make the generic programmers & programming skills more of bottleneck in the future. To understand these challenges and approaches to support career growth of these specialists, authors would like to take the case of Ajay Gupta, Technical Programming Manager at PPD, as an example to illustrate the career growths of Specialist programmers. He is one of the programming specialists we have at PPD. Ajay comes with experience and background of Information Technology, Non-clinical, Phase 1, Phase II-IV, CDISC. This paper will shed light on some common challenges faced by specialists and provide a roadmap to support their hiring and career growth. These approaches will help to develop a pool of resources within the department to handle specialized tasks and in turn cultivate a sense of well-being in the employee's work environment.
LD-288 : Project Management Fundamentals for Programmers and Statisticians
Jennifer Sniadecki, Covance
In the clinical trial environment, nearly everyone from individual contributors to team leaders and managers have the need to utilize project management skills. Because most programmers and statisticians lack formal training in the project management field, it can be challenging to apply and effectively integrate project management principles as part of their day-to-day job duties. This paper will extract the fundamental concepts from A Guide to the Project Management Body of Knowledge (PMBOK® Guide) that are most relevant to programmers and statisticians and provide real world examples where applicable.
LD-295 : Advance Your Career with PROC TM!
John LaBore, Consultant
Josh Horstman, Nested Loop Consulting
Most of us would like to advance our careers in one way or another. You may wish to become a respected technical expert in your field, move up into a leadership role, or even go independent as a consultant or small business owner. Regardless of your career aspirations, leadership and communication skills are critical to your success. Toastmasters International provides an avenue to develop these skills in a constructive and supportive environment. This paper will provide a summary of the Toastmasters program based on the authors' combined 20 years of experience with the program. We'll discuss how you can execute "PROC TM" and the benefits it can bring to your career.
LD-296 : Time to COMPARE Programmer to Analyst: Examine the Differences and Decide Best Path Forward
Ginger Barlow, UBC
Carol Matthews, UBC
There are almost as many titles for SAS programmers in the pharmaceutical industry as there are programmers: Statistical Programmer, Clinical Programmer, Scientific Data Analyst, SDTM Implementer, Study Lead Programmer, and many more. Regardless of the label, one thing all of these roles have in common is that these people are writing code to turn data into knowledge. In contrast, the same title in different organizations can have very different expectations on how much any given programmer (or analyst) is expected to contribute to the overall scientific process. Programmers and analysts actually have very different, distinct roles. We will explore what makes each unique, how to know which one you are, how to know which one you may be interviewing, and what the future could hold for each. We will conclude with a discussion on strategies for developing programming teams to meet business quality and financial objectives.
LD-313 : Schoveing Series 4: Inspirational Leadership: Grow Yourself into a Class Act and an Unforgettable Leader!
Priscilla Gathoni, AstraZeneca, Statistical Programming
Inspirational leadership is the human side of leadership with effects that draw people to appreciate passion, tenacity, and enthusiasm around them. Would you like to discover this proven approach to leadership characterized by traits such as business ethics, values-leadership, corporate social responsibility, and sustainability? What are the distinguishing qualities or characteristics that typically belong to inspirational leaders? What kind of leader is needed in geographically dispersed teams? Why are inspirational leaders more likely to be successful today in all industries? Is emotional intelligence a catalyst for inspirational leadership? What are the secrets to developing and applying unique ideas and new leadership methods to achieve higher performance and excellence within your company? As a leader, are you able to translate broad strategies into specific objectives and action plans? What are the keys to assuming leadership, arousing enthusiasm with people you work with, developing new techniques for managing change, coaching, influencing, and encouraging continuous improvement, innovation, and risk-taking in any organization? This paper will unlock your ability to answer these very important questions. The goal is to grow and become a better leader who is inspirational, well informed, organized, and applies the latest skills and knowledge, with the hope to make yourself and the world around you a better place. You should proudly say, "I am an inspirational leader, a class act and unforgettable person, in the past, present, and in the future!"
LD-335 : Something Old, Something New: A little programming management can go a long way
Janet Li, Pfizer
We propose a statistical programming project tracking tool that will help manage and track programming progress, deliverables, and timelines at the study level. The tool can be used across studies to prepare aggregate reports for upper management and can be updated to fit the needs of any statistical programming organization or team.
LD-342 : Panel Discussion: Speaking Data Science, Who is Ready to Listen?
Priscilla Gathoni, AstraZeneca, Statistical Programming
David D'Attilio, TalentMine
Phil Bowsher, RStudio Inc.
Faisal Khan, AstraZeneca Pharmaceuticals
We are seeing an increasing rise in an economy powered by automation, mobile internet, and artificial intelligence. How well can we measure or find clues about the future by looking at trends brought about by human behavior on the smartphone, the Internet of Things and cloud computing? These trends are affecting every industry and is unique in the degree that it is affecting white-collar as well as blue-collar jobs. The computing industrial revolution is gathering steam. What can you do to re-skill or up-skill in these times? How can you ensure that you continue to attract the best talent while also educating your own workforce? An emerging discipline is that of data scientists. Data science is a fusion of multiple disciplines, including statistics, computer science, information technology, and domain specific fields. With the availability of expressive data analysis software and computing power, we see an incredible evolution of exploratory data analysis. Analytic capabilities are now deemed as enablers of improved organizational performance and aligning the business strategy is inevitable. Conversely, could the data science field be viewed as a forum that propagates "surveillance capitalism", where companies scoop up the data we leave behind (our leftover data trails) as we go about our digital lives? Please join the Leadership and Career Development Panel Discussion session, to address and discuss these pertinent questions about data science. Bring your questions.
PostersPO-020 : Tame your SHARE with a PYTHON and SAS
Michael Stackhouse, Covance
Terek Peterson, Covance
Staying up to date with standards is of the utmost importance, and subtle changes can deviate your data from compliance. With a little bit of SAS, and a little Python, you can easily automate the extraction of CDISC standards metadata using the new SHARE API. These standards files can then be used in SDTM and ADaM development to ensure that you have access to the latest metadata available for specification writing, quality control, and custom conformance checks. Embedded CDISC metadata opens up doors of possibilities when available directly to your Programming team. This poster will explore how you can implement and automate this information to make sure your team never falls behind.
PO-022 : In the Style Of David Letterman's "Top Ten" Lists, Our "Top Ten" PROC SQL Statements To Use in Your SAS Program
Margie Merlino, Janssen Research and Development
One of the challenges in using PROC SQL is the notion that a programmer must first immerse him or herself in the education of Structured Query Language (SQL) before they can use PROC SQL. Certainly, there is syntax that a programmer must adhere to for successful execution a PROC SQL statement, but we propose that there are many simple and some not so simple statements that can be used without a robust knowledge of all the intricacies of SQL. What follows is a "Top Ten List" ala David Letterman's popular feature. We list ten scenarios and a PROC SQL example that addresses that scenario and can be used to manipulate the SAS data and display a result or provide an output dataset for further manipulation. Our goal is to provide code examples that a programmer can easily copy/paste into their own SAS program.
PO-037 : Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
Stephen Sloan, Accenture
Lindsey Puryear, SAS Institute
The Challenge: Instead of managing a single project, we had to craft a solution that would manage hundreds of higher- and lower-priority projects, taking place in different locations and different parts of a large organization, all competing for common pools of resources. Our Solution: Develop a Project Optimizer tool using the CPM procedure to schedule the projects, and using the GANTT procedure to display the resulting schedule. The Project Optimizer harnesses the power of the delay analysis feature of PROC CPM and its coordination with PROC GANTT to resolve resource conflicts, improve throughput, clearly illustrate results and improvements, and more efficiently take advantage of available people and equipment.
PO-050 : Put on the SAS® Sorting Hat and Discover Which Sort is Best for You!
Louise Hadden, Abt Associates Inc.
Charu Shankar, SAS Institute
Sorting in SAS® is an expensive process in terms of both time and resources consumed. In this session, prepare to explore some of the common and lesser known sorts that SAS provides. Become like the sorting hat in Harry Potter! Instead of waiting with baited breath for your team (or data) to be sorted, get the inside scoop and learn about the dynamic processes that go on behind the scenes during sorting that will enable you to pick the very best sort for your circumstances. Learn about some fantastical, magical SAS sorting teams: bubble sort, quick, threaded and serpentine. Behold the effervescent bubble sort! In a hurry? Take a look at the quick sort. Looking for superior efficiency? Consider the threaded sort. See how the hissing serpentine sort in SAS, like the slithering serpent Nagini sliding surreptiously through walls, can come in handy! Which sort will you choose - or which sort will choose you?
PO-066 : Using Pinnacle 21 Enterprise for define.xml Creation: Tips and Tricks from a CRO Perspective.
Frank Menius, Covance
Creating a regulatory compliant define.xml 2.0 document can be time consuming and fraught with hazards. It is widely known that Pinnacle 21 Enterprise" (P21 Enterprise) is an efficient and useful tool which can speed up the creation process as well as ensure compliance with regulatory bodies, including the United States Food and Drug Administration (FDA) and the Japanese Pharmaceuticals and Medical Devices Agency (PMDA), but how can a preparer make sure they are taking advantage of all the available benefits and getting the most out of the product? We will answer this key question, as well as detail some common issues that arise while using P21 Enterprise and what are the ways to work around them? Additionally, the paper will detail an overview of define.xml creation process within P21 Enterprise for those who are new to the product, including highlighting tips, tricks, and work-arounds.
PO-108 : Developing Analysis & Reporting Standards For Pharmaco-Epidemiology Observational Studies
Bo Zheng, Merck &Co. Inc.
Xingshu Zhu, Merck &Co. Inc.
In the blossoming world of modern data analytics, there is an unfulfilled need for standardization within real world evidence (RWE). Lack of standardization often leads to longer and more frustrating program development cycles. This paper discusses our experience with developing standards across RWE primary data collection (PDC) studies. We developed a process for Pharmacoepidemiology PDC studies by standardizing variables based on existing CDISC conventions, developing data quality review tools, and creating a set of modular macros that creates a unified TFL deliverables package. Having a core standardization process in place is beneficial for reducing the time it takes to identify and resolve data issues and getting deliverables to customers. As RWE continues to expand, implementing standards provides a unique opportunity to help guide its path towards even greater acceptance within the scientific community.
PO-110 : Raw data sets tracker: Time and project management based on the volume of available clinical data using SAS® software
Girish Kankipati, Seattle Genetics Inc
Time management and availability of clinical data play an important role in the successful execution of a project. In order to plan programming activities and resources, it is very important to understand the availability of clinical data. The volume of raw data depends on a few aspects including the enrollment speed and the type of clinical trial (Phase I, II, or III). Sometimes, enrollment rates can be slow and can cause data unavailability issues, hindering the programming activities, as programming could be challenging with limited data. To address this issue, it is important to establish a robust procedure to track raw data for the successful completion of project within a given timeline. This paper will discuss how to track raw data availability by creating a raw data set tracker using a SAS® program. This dynamic SAS® program will be demonstrated to create this tracker. The raw data set tracker proposed is to identify the number of data sets that are programmable on a weekly basis. It gives summary statistics on number of subjects and total number of records present in each raw data set during a particular week, displayed as pie and bar charts. Thus, the tracker application will help the programmer plan the activities efficiently (for example, Week 1: DM AE EX; Week 2: DS MH).
PO-130 : When biomarker drives primary endpoint: An oncology case study of SDTM design using multiple myeloma.
Girish Kankipati, Seattle Genetics Inc
Bhargav Koduru, Seattle Genetics Inc
Oncology studies are often driven by imaging, which led to the creation of the tumor-specific TU and TR domains in the SDTM IG 3.1.2, where the capture of the scan details and results is described. These domains usually are linked to the RS Domain, which contains the overall tumor response in an oncology study. There are, however, a few oncology conditions like multiple myeloma, which are not driven by imaging but by specific biomarkers. This information would be captured in the LB Domain, in contrast to TU and TR. Biomarkers play an important role in indicating normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. In multiple myeloma, serum free light chain (SFLC), serum and urine protein electrophoresis (SPEP and UPEP), and immunofixation are the key biomarker-related tests that define the standard response criteria. In this paper, we would like to share how we have mapped the efficacy biomarker tests in the LB SDTM domain, as well as the safety-related tests, while maintaining a clear demarcation between both by using LBCAT or LBSCAT, and other additional variables allowed under the findings observation class (SDTM v1.4). This helps to maintain the distinction and also ease the design of the efficacy- and safety-related ADaM datasets. At the SDTM level, we have also leveraged the RELREC to create traceability between the efficacy data in the LB to that captured in the RS Domain.
PO-137 : A Practical SAS® Macro for Converting RTF to PDF Files
Kaijun Zhang, FMD K&L Inc
Xiao Xiao, FMD K&L Inc.
Brian Wu, FMD K&L Inc.
As part of clinical trial reporting, it has been a raising demand to create a large numbers of RTF outputs when a study reaches its major milestones in a study. CROs often receive requests from clients on converting all reports in a user-friendly file format document, usually PDF format, to ease delivery and facilitate review process. One solution for RTF to PDF conversion is via some ready tool. However, if the RTF file size is extremely large, the conversion in this approach can be very time-consuming, especially when there are multiple large outputs files that needs to be converted and combined. It may take hours to finish the task and add an extra challenge for a timely delivery. The mechanism under SAS System for OS/2 and Windows enables it to talk to other PC applications by Dynamic Data Exchange (DDE). Specifically, DDE enables the creation of fully customized Microsoft Word documents from a Base SAS program, and is therefore ideal for automating the conversion of RTF files to high quality PDF files. This paper presents an alternative effective method where a practical SAS macro was developed and utilized under 2 scenarios: a.) quickly convert an extremely large size RTF file into a PDF file; b.) convert a queue of RTF files reliably into PDF files. The details of step-by-step macro development were also provided and discussed.
PO-144 : SDSP (Study Data Standardization Plan) Case Studies and Considerations
Kiran Kundarapu, Merck & Co., Inc.
This poster is intended to cover SDSP sample and SDSP case studies focusing on different stages within a drug development lifecycle. Drug development lifecycles and stage gates considered: " New programs (pre-IND, IND) " Ongoing Programs (retrospective/End of phase II/Type-C meeting/pre-NDA/pre-BLA) " Already approved programs (sNDA/sBLA). In addition, scenarios where single versus multiple SDSPs for a program should be considered based on IND, indication and population. Additional SDSP topics and considerations will cover: " Critical items required to develop a SDSP " Nuances in CDER Versus CBER SDSP recommendations/requirements " CBER Appendix key recommendations for completion. " Versioning the SDSP covering updates and checks " Consistency checks between SDSP and other submission documents
PO-198 : Highlight changes: An extension to PROC COMPARE
Abhinav Srivastva, Gilead Sciences
Although version control on the files, datasets or any document can be challenging, COMPARE® Procedure provides an easy way to compare two files and indicate the differences between them. The paper utilizes the comparison results from PROC COMPARE® and builds it into a SAS® macro to highlight changes between files in terms of addition, deletion or an update to a record in a convenient excel format. Some common examples where this utility can be useful is comparing CDISC Controlled Terminology (CT) release versions, comparing Medical dictionary versions like MedDRA, WHODrug, or comparing certain Case Report Form (CRF) data like Adverse Events (AE) to review new events being reported at various timepoints for data monitoring purposes.
PO-218 : A Cloud-based Framework for Exploring Medical Study Data
Peter Schaefer, VCA-Plus, Inc
The poster will present a cloud-based framework used to implement tools for exploring medical study data. The framework allows to easily integrate the kind of scripts that are used by the FDA in their JumpStart service or scripts that create the type of data analyses and TLFs (tables, listing, figures) as suggested in the white papers published by the PhUSE working group "Standard Analyses & Code Sharing". On one side, the poster will explain the underlying cloud-based platform which allows for a data-driven implementation of the framework itself. On the other side, the poster will show the concepts of the framework itself and how metadata about the analysis and the TLFs are integrated and driving the user interface of the resulting applications. Finally, we will show some examples based on the scripts that the FDA released to the PhUSE "Standard Analyses & Code Sharing" working group. Interested parties will have the option to see a demo and discuss the framework concepts in detail with the author.
PO-221 : Why waiting longer to check log file when SAS program in Execution? Lets Find bugs Early!!
Prakash Subramanian, Anna university
Thamarai Selvan, Anna University
Prompt error alerts through emails while an error is encountered, will help user to save the run time and restrict the creation of empty datasets from the subsequent codes. It also helps in monitoring the status of the SAS programs submitted without being in front of the computer. While running the program, if SAS finds any error in the current line, it will skip the current statement and will start executing the next statement and the process will be complete only after the execution of the end statement of the program. There are programs which takes more than a day to complete. In such cases, user has to open the log file in read only mode very frequently to check for errors, warnings and unexpected notes. The user will have to terminate the execution of program manually if any such messages are identified else the user will be notified with the errors in the log file only at the end of the execution. Our proposal is to run the parallel utility program along with the production program to check the log file of the current SAS program and to notify the user through an email while encountering an error, warning or unexpected notes in the log file. Also the execution can be terminated automatically and the user can be notified if any potential messages are identified.
PO-225 : Badge in Batch with Honeybadger: Generating Conference Badges with Quick Response (QR) Codes Containing Virtual Contact Cards (vCards) for Automatic Smart Phone Contact List Upload
Troy Hughes, Datmesis Analytics
Quick Response (QR) codes are widely used to encode information such as uniform record locators (URLs) for websites, flight passenger data on airline tickets, attendee information on concert tickets, or product information that can appear on product packaging. The proliferation of QR codes is due in part to the broad dissemination of smart phones and the accessibility of free QR code scanning applications. With the ease of self-scanning QR codes has come another common QR code usage-the identification conference attendees. Conference badges, emblazoned with an attendee-specific QR code, can communicate attendee contact and other personal information to other conference goers, including organizers, vendors, potential customers or employers, and other attendees. Unfortunately, some conference organizers choose not to include QR codes on conference badges because of the complexity and price involved in producing and including the QR codes. To that end, this text introduces flexible Base SAS® software that overcomes this limitation by dynamically creating attendee QR codes from a data set containing contact and other information. Furthermore, the flexible, data-driven approach creates attendee badges that can be maintained and printed by conference organizers. When a badge QR code is scanned by a fellow conference goer, the attendee's personal information is uploaded into a variant call format (VCF) file (or vCard) that can be uploaded automatically into a smart phone's contact list. Conference organizers are able to customize and configure badge format and content through a CSS file that dynamically alters badges without the necessity to modify the underlying code.
PO-253 : F2Plots: Visualizing relative treatment effects in cancer clinical trials
Yellareddy Badduri, QUARTESIAN
In every year there were many clinical trials are conducting on different types of Cancers. With Cancer trials increasingly reporting nontime-to-event outcomes, data visualization has evolved to incorporate parameters such as responses to therapy, duration and degree of response, and novel representations of underlying tumor biology. Graphs and figures are excellent tools for data visualization and they have capability to display data figuratively and enables rapid interpretation. F2 plots (Forest and Funnel) were initially developed for presenting results of meta-analysis. Forest plot is an intuitive, convenient and used to show the relative treatment effect of an intervention between groups within the larger cohort. Forest plot is easily understood constitute several horizontal lines, which represent the 95% confidence interval, and a central symbol in the middle of the line segment, which represents a point estimate that is usually the median or mean. Funnel plots are scatter plots of the effect estimates from individual studies against some measure of each study's size or precision. Another advantage of funnel plots are that there is no spurious ranking of institutions, the eye is naturally drawn to important points that lie outside the funnel, there is allowance for increased variability of smaller units and it is easy to produce with standard spreadsheet. This presentation will explain about different SAS programming approaches for producing both forest and the funnel plot, and representations that used to illustrate treatment effects.
PO-274 : A Macro to Expand Encrypted Zip Files on the SAS LSAF Environment
Steven Hege, Alexion Pharmaceuticals, Inc.
This paper will introduce a short SAS macro that uses java objects to expand zip archive files located on the SAS Life Science Analytics Framework (LSAF). This macro runs on LSAF storing files directly on the environment in a specified folder and can expand password encrypted archives. Our group has found this macro useful when handling data archives uploaded directly into LSAF.
PO-277 : Flagging On-Treatment Events in a Study with Multiple Treatment Periods
David Franklin, IQVIA
Typically, for data like Adverse Events, the common practice is to flag events that occur on treatment with a definition of something like "Event starting on or after date of first treatment and 30 days after date of last treatment". This fairly easy to program, but when things like a cycle of treatment may have multiple periods due to patient events, this usual approach does not work. This paper introduces the ADaM variable APHASE and how this solved a structural problem with the data, presents a macro that was very useful in producing the solution.
PO-282 : Joining the SDTM and the SUPPxx Datasets
David Franklin, IQVIA
When creating ADaM datasets from SDTM datasets, programmers often do not understand why we have the supplemental SUPPxx datasets and how to deal with the structure of the data with these datasets. This paper will briefly explain why we have these supplemental datasets, and more importantly, introduce a macro that will bring these to their 'parent' and make their use easier to handle.
PO-305 : Merging Sensor Data with Patient Records in Clinical Trials - Systems and Benefits
Surabhi Dutta, Eliassen Biometrics and Data Solutions
Patient care involves data capture in all phases of care delivery including clinical trials, provider visits, lab work, censor data from wearable censors and hand held devices. We are accustomed to devices that generate health indicator data in large volumes and rapid rate. This paper discusses the benefits, challenges and methods of merging these disparate data sets characterized by different types, volumes and velocities.
PO-324 : Bayesian Methods for Treatment Design in Rare Diseases
Ruohan Wang, Ms.
During the last thirty years, Bayesian methods have been developed rapidly with the explosively growth of high-speed computers. Bayesian Methods are quite useful and easier to interpret with the graphical displays of treatment effects by modeling. In rare diseases, we could not have the access to large samples for a parametric data analysis. Under this circumstance, Bayesian Method might be a more flexible framework for rare diseases treatment modeling. In this article, the challenge of rare diseases is introduced in section I. The Bayesian Method and how to use Bayesian Method in rare diseases treatment prediction is introduced in section II. And Bayesian Method examples are shown in section III. SAS BAYES statement can be used to solve small sample diseases problem. And Proc PHREG uses likelihood method and give assumption for different prior distribution, based on which, we can summarize whether the variables should be included to analyze the efficiency of the medicines.
Programming TechniquesBP-029 : Lit Value Locator
Varunraj Khole, Thomas Jefferson University
This paper describes a method to locate any alphanumeric text string, in character or numeric format within any dataset present in default as well as user defined SAS libraries. Often a times, data presented to us comes in different format and is stored in multiple locations. To ensure that all the required data is captured, a user needs to implement smart coding which can find the required data and give its location. This method would be very useful while searching for a particular data string when the user is working with multiple SAS libraries. This method can be an essential tool for a data analyst working in finance, pharmaceuticals, life sciences, healthcare, banking and various other industries. By combining powerful SAS macro language, procedures, efficient do loops and handful of extremely useful SAS functions this method will provide the user with a detailed description of the search value and provide summary statistics for the same. This method is an example that an extremely complicated task can be achieved with very simple and efficient coding.
BP-036 : Reducing the space requirements of SAS® data sets without sacrificing any variables or observations
Stephen Sloan, Accenture
The efficient use of space is very important when working with large SAS data sets, which could have millions of observations and hundreds of variables. We are often constrained to fit the data sets into a fixed amount of available space. Many SAS data sets are created by importing Excel or Oracle data sets or delimited text files and the default length of the variables can be much larger than necessary. When the data sets don't fit into the available space, we sometimes need to make choices about which variables and observations to keep, which files to zip, and which data sets to delete and recreate later. There are things that we can do to make the SAS data sets more compact and use our space more efficiently. They can be done in a way that allows us to keep all the desired data sets without sacrificing variables or observations. SAS has compression algorithms that can shrink the space of the entire data set. In addition, there are tests that we can run that allow us to shrink the length of different variables and evaluate whether they are more efficiently stored as numeric or as character variables. These techniques often save a significant amount of space; sometimes as much as 90% of the original space is recouped. We can use macros so that data sets with large numbers of variables can have their space reduced by applying the above tests to all the variables in an automated fashion.
BP-039 : It's All About the Base-Procedures
Jane Eslinger, SAS Institute
As a Base SAS® programmer, you spend your day manipulating data and creating reports. You know there is a procedure that can give you what you want. As a matter of fact, there is probably more than one procedure to accomplish the task. Which one should you use? How do you remember which procedure is best for which task? This paper is all about the Base procedures. It explores the strengths of the commonly used, nongraphing procedures. It discusses the challenges of using each procedure and compares it to other procedures that accomplish similar tasks. The first section of the paper looks at utility procedures that gather and structure data: APPEND, COMPARE, CONTENTS, DATASETS, FORMAT, SORT, SQL, and TRANSPOSE. The next section discusses the Base SAS procedures that work with statistics: FREQ, MEANS/SUMMARY, and UNIVARIATE. The final section provides information about reporting procedures: PRINT, REPORT, and TABULATE.
BP-040 : Running Parts of a Program while Preserving the Entire Program
Stephen Sloan, Accenture
The Challenge: We have long programs that accomplish a number of different objectives. We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: " Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don't want to re-run the parts that ran successfully. " We don't want to recreate data sets that were already created. This can take a considerable amount of time and resources, and can also occupy additional space while the data sets are being created. " We only need some of the results from the program currently, but we want to preserve the entire program. " We want to test new scenarios that only require subsets of the program.
BP-057 : The Power of PROC FORMAT
Jonas Bilenas, A Bank Near You
Kajal Tahiliani, GSK
The FORMAT procedure in SAS® is a very powerful and productive tool, yet many beginning programmers rarely make use of it. The FORMAT procedure provides a convenient way to do a table lookup in SAS. User-generated FORMATS can be used to assign descriptive labels to data values, create new variables, and find unexpected values. PROC FORMAT can also be used to generate data extracts and to merge data sets. This paper provides an introductory look at PROC FORMAT for the beginning user and provides sample code that illustrates the power of PROC FORMAT in a number of applications. Remember, SQL is a table join and not a table lookup, Using a FORMAT table look up uses a binary search method that is very powerful and more efficient that SQL. Additional examples and applications of PROC FORMAT can be found in the SAS® Press book titled "The Power of PROC FORMAT."
BP-061 : Proc Sort Revisited
Alex Chaplin, Bank of America
Proc sort can do more than sort your data. Revisit proc sort to see how you can select records and fields, rename and format fields, compress the output to save space, reuse space in your input dataset, remove and save off duplicate records and keys. Understand the difference between using the nodupkey and noduprecs options and how they affect your results at the aggregate and detail level. Software is base SAS. Example code I've written to accompany the presentation can run in SAS University Edition or SAS On Demand for Academics because it uses SASHELP datasets as inputs. Intended audience is beginner to intermediate level SAS programmers.
BP-065 : ODS Magic: Using Lesser Known Features of the ODS statement
Michael Stout, Johnson & Johnson Medical Device Companies
It is no allusion, the ODS output destination is a powerful tool that every SAS programmer should know how to use. This paper will provide details on lesser known aspects of ODS that are magical. With a slight of hand, this paper will show how to send desired SAS output to multiple output destinations. Learn techniques to combine tables, listings and figures to a single output file. What you thought was impossible, may be easy using lesser known features of ODS.
BP-079 : Performing Analytics on Free-Text Data Fields: A Programer's Wurst Nitemare
Michael Rimler, GlaxoSmithKline
Matt Pitlyk, 1904labs
Case report forms (CRFs) often contain free-text fields for collecting patient information when standard responses do not apply, e.g. 'Other specify' or 'Reason for ...'. Furthermore, the analysis plan may require analyses to be performed on this non-standardized information. Free-text fields are renowned for their difficulty to be programmatically incorporated into analyses due to human nature injecting spelling/grammatical errors or differences in languages across global sites. These fields also tend to be very difficult to monitor. Requesting sites to 'modify' content in order to support cleaner analyses is time consuming, even if the site is responsive and agreeable. This paper compares methodologies for classifying a record into a binary response based on information collected via a free-text field. For example, identifying all collected concomitant medication records taken for Chronic Obstructive Pulmonary Disease (COPD) when 'Reason for Therapy' is collected via free-text. Methodologies include typical techniques in SAS (brute force string search and fuzzy matching), natural language processing (stemming n-grams and Named Entity Recognition), and machine learning (clustering and classification algorithms). The basis of methodology comparison will include (i) degree of code complexity, (ii) incidence of Type I/Type II error, and (iii) the maintenance burden of code along the study life-cycle over iterative data cuts.
BP-084 : Useful SAS techniques in Efficacy Analysis for Oncology studies
Joy Zeng, Pfizer
Oncology refers to the research on prevention, diagnosis and treatment of cancer. Oncology studies are, in general, more complex than studies in other therapeutic fields. This paper summarizes the primary sources of complexity, including endpoints, data collection, AE reporting, tumor assessment under RECIST, oncology-specific domains, and special statistical analysis for efficacy data. This paper also discusses multiple oncology-related statistical methods (e.g. Cox regression, Kaplan-Meier, log-rank tests) and graphical data representations (e.g. waterfall plots, bar charts, mean standard error plots, spaghetti plots, and forest plots). Finally, the relevant SAS code is given for all of these methods and representations, with the goal of providing statistical programmers the necessary knowledge and tools for creating and validating tables and figures from oncology studies.
BP-105 : ADaM 1.1 Compliant ADEVENT and ADTTE development in a Cardiovascular Study
Chao Su, Merck
Many major endpoints such as Death, Stroke and Myocardial Infarction are observed in cardiovascular (CV) studies. Besides these major endpoints, additional detailed information of these events can also be collected at different levels. In CV studies, results from different evaluators, typically investigators and adjudicators, are collected and applied for the same event in analysis reports. Therefore, it is more complicated to build corresponding ADaM datasets to store these data used for analysis. In this paper, a CDISC compliant Basic Data Structure (BDS) dataset ADEVENT is developed to store all collected major endpoints and corresponding detailed information of these events. This dataset is used to support table of concordance between investigators and adjudicators. The summary of time to event is derived from dataset ADEVENT and stored at dataset ADTTE. The traceability between ADEVENT and ADTTE is also described and discussed at this paper.
BP-115 : Freq Out - Proc Freq's Quick Answers to Common Questions
Christine McNichol, Covance
Proc freq, true to its name, gives frequency counts, as well as other informative statistics, and those frequency values can be output to a dataset. But proc freq can do more than just count. Its ability to provide a unique list and flexibility to use unsorted data can save both time and keystrokes in a variety of scenarios. Combining these features with the out= option, provides another method to add to the programming arsenal for a way to grab a list of subjects or parameters, investigate a difference or do a quick comparison. This paper will look at how proc freq and its functionality can help with a quick response to common questions such as: What subjects were included in this count? What and how many subjects/records are impacted by this data issue? What does the data show for these problem subjects in another dataset? Is there uniqueness within the data by these variables? Though it might not be the obvious choice, using one proc freq can take the place of multiple steps including procs sorts, data steps and prints to answer these questions. Additionally, the output generated from the proc freq method can very easily be exported by rows, columns, or selections to provide clean and clear responses to ad-hoc requests.
BP-124 : A Quick Way to Cross Check Listing Outputs
Shunbing Zhao, Merck & Co.
Very often listings are required to support summary reports in clinical trials. Normally these listings are in RTF format, and each one could be tens or hundards pages in length. Since a single study could involve dozens of listings, It becomes very challenging to detect potential issues in listing generation process, or cross-check between the summary report and the corresponding listing. This paper presents a convenient way to find out how many subjects are included in each listing, which is a good indicator for cross-checking against summary reports. A macro was developed to go through each listing and produce an overall report, which includes a comprehensive summary tab with information, i.e., file name, title of the listing, number of subjects in that listing; also it includes a separate tab for each listing to display the list of involved subjects. Additionally, a hyperlink is built in the File name in the summary tab so that reviewers can navigate to corresponding list of subjects, and dig further if any discrepancy occurs.
BP-127 : Importing EXCEL Data in Different SAS Maintenance Release Version
Huei-Ling Chen, Merck
CHao-Min Hoe, Merck & Co.
It was noted that using PROC IMPORT procedure to convert the same Excel data file to SAS dataset, the outputs could be inconsistent from different computers. The objective of this manuscript is to investigate the inconsistencies and to provide explanations and solutions. Several Excel data files were tested. It was observed that different outcomes were due to the SAS version or the maintenance release on different PCs.
BP-128 : Implementing Laboratory Toxicity Grading for CTCAE Version 5
Keith Shusterman, Reata
Mario Widel, Independent
Laboratory toxicity grading has been an important part of safety reporting since the FDA began accepting electronic data in 1999. Staying up to date with the various CTCAE versions can be a challenge. CTCAE Version 5.0 adds a layer of complexity with new grading criteria dependent on baseline measurements. We will present a practical method for deriving toxicity grades in the SDTM LB domain based on the new CTCAE, as well as reporting toxicity events in an OCCDS dataset derived separately from the BDS dataset with the laboratory findings.
BP-132 : Code Generators: Friend or Foe
Janet Stuelpner, SAS
Good code generators are invaluable tools. Or are they? SAS is constantly changing; adding new features and functions, adding new tools to the tool bag while making the manipulation of data and the creation of tables more efficient. Many code generators exist in SAS. Some are embraced quickly, Others are not to cling to the old methodologies. This presentation will show how code generators have grown over time and what is available now to make the task of programming easier, quicker and more efficient.
BP-133 : A SAS® Macro to Provide Survival Functions along with Cox Regression Model Efficiently
Chia-Ling Ally Wu, Seattle Genetics
Depending on study design and analysis needs, multiple statistics generated from different SAS® procedures may be needed for time-to-event analyses. This paper describes a SAS® macro that combines those SAS® procedures to generate survival functions and the Cox proportional hazard ratio in one shot, which can help users save time in coding and generate a customized output quickly and easily. Details of the execution and application of the macro will be demonstrated through examples. The macro call returns two different output layouts, i.e., optional estimates in rows and independent variables in columns, or reversely, the independent variables in rows and estimates in columns. Macro users can choose the optional estimates, such as the number of subjects at risk, the number of events and censored observations, the quartiles of the survival function, the coefficient of predictors, the Cox proportional hazard ratio, and the corresponding p-value. This macro applies the SAS® PROC LIFETEST procedure to compute the survival function by the log-rank test and the Wilcoxon test, as well as running the PROC PHREG procedure based on the Cox proportional regression model to estimate the effect of predictors on hazard ratio.
BP-141 : The Knight's Tour in 3-Dimensional Chess
John R Gerlach, Dataceutics, Inc.
Three dimensional chess uses two or more chess boards such that a chess piece can traverse the several boards according to the rules for that piece. Thus, the knight can remain on the board where it resides or moves one or two steps to a successive board, then move its remaining steps. In three-dimensional chess, the Knight's Tour is a sequence of moves on multiple 8x8 chess boards such that a knight visits each square only once. Thus, for three boards, there would be 192 squares visited only once. The paper, The Knight's Tour in Chess - Implementing a Heuristic Solution (Gerlach 2015), explains a SAS® solution for finding the knight's tour on a single board, starting from any square on the board. This paper discusses several scenarios and solutions for generating the knight's tour in three-dimensional chess.
BP-147 : Patient Profile with Color-Coded Track Changes Since Last Review
Himanshu Patel, Merck & Co.
Jeff Xia, Merck
Patient profile is a summary of events experienced by patients during the conduct of a clinical trial. It gives a clear understanding into patient encounters and offers a more focused on methodology by recognizing abnormalities in the data. To ensure patient safety and monitor the significant clinical event, study clinical scientists have the responsibility to review subject patient profiles periodically during the conduct of the trial. However, the existing patient profiles generated in the data management system displays subject data in an accumulative way, which means clinical scientists have to review many data records again even they have reviewed them multiple times in previous rounds. It becomes a more and more time-consuming and attention-demanding task for clinical scientists to find the information of interest, such as the new emerging data, data altered/updated since last round of review or records that have been removed in data cleaning process. This paper presents an innovative way to compare the current data extraction against the last one programmatically, and generates patient profiles with color-coded changes: For records with no change since last round of review, the entire row will be set to the color of Cyan to let clinical scientists know there is no need to review them again. On the other hand, new records will be set to Yellow to draw reviewers attention, and the removed records will be set to Green. See attached for details.
BP-166 : MACRO TRIO IN SAS - work with ease Presenting..SANTA MACRO
Nagaraju Mancha, Ephicacy Life Sciences Analytics Pvt Ltd
Jingle bells! Jingle bells! Jingle all the way - automation is the talk and yes, SANTA MACRO is on the way! Programming world is aware of the power MACRO's yield in their daily life. The magical wand of MACROs automate repeated task handling and SAS provides a great platform for creating user-friendly macros. This paper illustrates three amazing macros: 1.Comparing datasets of two libraries and report creation Validation teams must validate datasets by comparing validation datasets with production datasets to identify the issues or updates in production datasets. Comparing individual datasets might be easy but if there is a need to compare all datasets at once, then it's a challenge. Santa Macro will help in improving the quality of data and tracing the updates by comparing datasets of two libraries. The output report of this program will have information related to variable, structural, label and data differences. 2.Compress the datasets ready to be uploaded with an auto-timer Manual activity of uploading files could lead to issues such as missing any file due to human error. SANTA Macro will help in uploading the programmed datasets to any server location - including sharepoint; PINNACLE 21 etc. thus reducing manual errors. 3.Create a folder and back-up the datasets. While programmers try to execute programs to create the datasets might end up overwriting the existing datasets. Santa Macro will help you create a folder with datetime stamp and place the datasets in the folder for future reference. There is a back-up!
BP-175 : Create publication-ready variable summary table using SAS macro
Geliang Gan, Yale Center for Analytical Sciences
Variable summary table is an important tool of not only getting to know the data but helping data management by picking up outlier data points. Even though SAS equips us with all procedures needed, it is not a fun task to make one. Creating a variable summary table involves numerous calls on certain SAS procedures. The purpose of this paper is to share a practical way of automatically generating variable summary table with p-values using a carefully designed and easy-to-use SAS macro. Before the macro starts calling any SAS procedure and conducts calculation, a series of foolproof checking steps against data and parameter assignments are run to assure the successful macro execution. After that, the macro utilizes procedures necessary to produce and collect all information needed according to parameter values specified. By default, frequencies and percentages are tallied for categorical variable, Chi-square or Fisher's Exact test is performed for p values. Mean and standard deviation are computed for continuous variable requesting p values with parametric method, T-test or ANOVA is conducted for p values. Median and interquartile range are provided for continuous variable requesting p values with non-parametric method and p values are calculated using Wilcoxon Rank Sum or Kruskal Wallis test. Final reports, a publication-ready summary table, is presented on html format using SAS internal web browser or a Rich Text Format (RTF) file usually pop-opened with Microsoft Word. Macro call, parameter value assignments and tips for special purpose are discussed in detail with examples.
BP-181 : Sum Fun with Flags! Sum Any Flagged Occurrence Data with FLAGSUM and Report It with FLAGRPT or PROC REPORT.
Brendan Bartley, Harvard T.H. Chan School of Public Health
In 2016, CDISC created the Occurrence Data Structure IG, which includes other domains than just Adverse Events (AE), although it is the most popular in the PHARMASUG arena. Why leave the other domains out when a similar table is needed for reporting in other domains? The same set of flags (AOCCFL, AOCCSFL, AOCCPFL, AOCCIFL, AOCCSIFL, and AOCCPIFL) might be created for the ADaM dataset equivalent of these other domains (CE, MH, CM, etc.). If the output table is going to be similar for each, why have a macro coded specifically for a domain. Let's be flexible. In our group, there are other studies that do not have CDISC data, but will need to adapt to this model in the future. This is a way to learn the process while using the data from their study. Abbreviated Abstract version put on website: In 2016, CDISC created the Occurrence Data Structure IG, which includes other domains than just Adverse Events (AE), although it is the most popular in the PHARMASUG arena. Why leave the other domains out when a similar table is needed for reporting in other domains? FLAGSUM and FLAGRPT can help.
BP-182 : Compare and conquer SDTM coding
Phaneendhar Gondesi, TechData Service Company LLC
Appropriate reporting of raw data in CDISC complaint format is very important in a submission. Mapping and coding of SDTM domains could be very challenging especially for a new SDTM programmer. This paper aims to ease SDTM mapping and coding by a) Giving general lay out of overall SDTM domain programming, b) Identifying common trend for coding within each class, c) Broad coding comparison between classes.
BP-194 : Macro Templates - Industry Specific SAS Programming Standardization
Tabassum Ambia, PPD (Pharmaceutical Product Development)
Industries are increasingly focusing on developing there own sets of SAS macro for datasets (SDTM, ADaM), Tables, Listings, Figures (TLFs) and advancing towards their own standardization. The practice of using specific standard set of macros exhibits a significant impact on timelines, amount and nature of resources required, expertise of programmers using the macros, debugging time and the quality of output. There are pros and cons of standardized macro templates. This is an automated process which largely cuts down the effort and amount of time required for programming that helps to complete a huge amount of work within a short time frame, remain consistent with outputs and reduces probable programming errors. This highly demanding trend of automation also has its downside which includes a lack of ambiguity - limitation of outputs required to be created outside the standard templates already defined within the industry, time required to debug errors in the background when extensive change is required, almost mandatory training for new programmers as these templates are industry specific and being entirely dependent on these macros over years sometimes limits the logical thinking and open coding approach a programmer would apply to create outputs independently and remain open to a wide range of programming needs. This paper discusses different aspects of the extensive use of industry standardized macros.
BP-219 : Making the Days Count: Counting Distinct Days in Overlapping or Disjoint Date Intervals
Noory Kim, Synteract
This paper presents an approach and implementation to counting the number of distinct days spanned by multiple date intervals which collectively have gaps or overlaps, and discusses how its implementation is more concise and easier to follow than other implementations presented in the past. A case where determining this was necessary was for reporting the number of days during a study that a subject was also on standard-of-care treatments on top of study medication.
BP-227 : From Lesion size to Best Response - Implementing RECIST through programming
Ankit Pathak, Rang Technologies
RECIST stands for Response Evaluation Criteria in Solid Tumors and serves guidance for assessing tumor shrinkage and disease progression, an important endpoint in many Oncology Clinical Trials. Investigators, cooperative groups, industries, and government authorities use RECIST, first published in 2000. Currently the revised version is RECIST v1.1, published in 2008 (Eisenhauer E.A. et al). This paper looks at RECIST v1.1 from a programmer's perspective to derive the Best Overall Response of a subject using collected Lesion data from across multiple visits in an ongoing study, and using SAS as a programming language.
BP-255 : End of Computing Chores with Automation: SAS© Techniques That Generate SAS© Codes
Yun (Julie) Zhuo, PRA Health Sciences
SAS programmers often have to confront computing chores that require repetitive typing. Manual coding is time consuming and inefficient. It also almost inevitably introduces human errors which compromise quality. While SAS has no shortage of automation techniques, it does not always occur to SAS programmers to write SAS codes that generate SAS codes. In this paper, we focus on three code-generating techniques. Each technique will be demonstrated using a simple practical example of automating variable label creation. Code samples will be provided. This paper also provides programming tips, explores other applications, and compares the three techniques using the simplicity, flexibility, and efficiency metrics.
BP-260 : SMQ SAS Dataset Macro
Mi Young Kwon, Regeneron, Inc.
Ishan Shah, PRA Health Sciences
A Standardised MedDRA Query (SMQ) is a grouping of terms from one or more SOCs that relate to a defined medical condition or area of interest. SMQs are created to standardize identification and retrieval of safety data. SMQs are part of each new MedDRA release, which is maintained by MSSO and JMO, and correspond to the terms present in that version of MedDRA. SMQs have been applied in safety and medical reviews, focused searches, signal detections, case alert, and periodic updates for clinical trials and post-marketing analyses and reports. At each new MedDRA release, we will have MedDRA and SMQ ASCII files downloaded from MedDRA dictionary. These ASCII files are with a well-defined hierarchy structure. However, SMQs with child SMQs may not directly link to Preferred Term (PT)/Lower Lever Term (LLT) codes. These files cannot be utilized directly in SAS programming for queries. It is necessary to covert the original SMQ files to SAS datasets with a user-friendly structure. The SAS macro introduced in this paper will covert SMQ ASCII files to SAS datasets. The SMQ SAS dataset is re-structured so that the parent and child SMQs are directly linked to their corresponding PT/LLT codes. The macro produces a single SAS dataset including all SMQ definitions for one MedDRA version. The SMQ dataset will be used to support medical and safety reviews, to generate CDISC compliant datasets, and to support clinical study analysis.
BP-286 : One More Paper on Dictionary Tables and Yes, I Think it Is Worth Reading
Vladlen Ivanushkin, DataFocus GmbH
Before writing this paper on dictionary tables I made some research on what was already out there so that I won't duplicate someone else's work. I found quite a number of papers, but I still decided to write my own and to concentrate on how programmers can benefit from using dictionary tables in their everyday life. In this paper I would like to share with you the tasks I actually faced during my work as a statistical programmer and how using dictionary tables makes it so much easier to deal with them. There is quite a variety of them from creating macros to programming STDMs.
BP-289 : Let Your Log do the Work for You
Yuliia Bahatska, inVentiv Health Clinical
Vladlen Ivanushkin, DataFocus GmbH
Of course, there is no magic trick that will completely free you from writing the code, at least the authors of this paper are not aware of such. However, the tips we suggest can help the programmers to avoid struggling through technical details and let them concentrate on more sophisticated tasks which require deep understanding of the subtleties of programming. In this paper we will show different scenarios where the information such as ODS table names or template specifics can be put into the log by a couple of lines of code. This means that you can use this part of your brain's CPU to learn something else.
BP-301 : Practical Proc SQL for Clinical SAS Programmers
Anbu Damodaran, Covance
Vast majority of clinical SAS programmers avoid or rarely use Proc SQL in their programs. Some of them dream of adding Proc SQL to their repertoire of tools. They buy the most recent book on Proc SQL, open it up, and get excited that they now possess the latest gospel on the subject. However, after they spend a few days looking over several examples, they inevitably shelve it alongside all their other unread programming books. Some of them prefer to read every Proc SQL paper they could find in the internet, but result is the same. Most of them still rarely use Proc SQL in their programs. By offering Proc SQL scripts that's tailor-made to fit programmer's daily programming activities, this paper makes Proc SQL easy to grasp and extremely practical for Clinical SAS programmers. The recipes in this paper run the gamut from simple to complicated. This paper will not hold programmer's hand and walk him/her through the language but it will provide the reader with interesting and useful resources to become better acquainted with the language and further upgrade his/her skillset.
BP-302 : History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
Mark Keintz, Wharton Research Data Services
Many programming tasks require merging time series of varying frequency. For instance you might have three datasets (YEAR, QTR, and MONTH) of data, each with eponymous frequency and sorted by common id and date variables. Producing a monthly file with the most recent quarterly and yearly data is a hierarchical last-observation-carried-forward (LOCF) task. Or you may have three irregular times series (ADMISSIONS, SERVICES, TESTRESULTS), in which you want to capture the latest data from each source at every date encountered (event-based LOCF). This presentation shows how to use conditional SET statements to update specific portions of the program data vector (i.e. the YEAR variables or the QTR variables) to carry forward low frequency data to multiple subsequent high frequency records. A similar approach works just as well for carrying forward data from irregular time series. We'll also show how to use "sentinel variables" as a means of controlling the maximum length of time data is carried forward, i.e. how to remove historical data that has become "stale." Finally, we will demonstrate how to modify these technique to carry future observations backward, without re-sorting data.
BP-307 : An Overview of Three New Output Delivery System Procedures in SAS® 9.4: ODSTABLE, ODSLIST and ODSTEXT
Lynn Mullins, PPD
The SAS® Output Delivery System (ODS) enables programmers to create and manipulate predefined ODS objects in a DATA step to create highly customized output. ODS gives you great flexibility in generating, storing, and reproducing SAS® procedures and DATA step output, with a wide range of formatting options. You can use ODS to accomplish the following tasks: " Create reports for viewers or browsers " Customize the report contents " Customize the presentation " Create more accessible SAS output By default, ODS output is formatted according to instructions that a PROC step or DATA step defines. However, ODS provides ways for you to customize the output. You can customize a single table, graph or the style for all your output. SAS® 9.4 contains many ODS enhancements. One of these enhancements are three new ODS procedures: " PROC ODSTABLE " PROC ODSLIST " PROC ODSTEXT These new ODS procedures allow for the creation of specific types of outputs. You can create your own new tabular output templates with ODSTABLE, ODSLIST creates bulleted lists, and ODSTEXT can be used to create text block templates to generate lists and paragraphs for your output. This paper will discuss these three new ODS procedures using SAS® version 9.4 and examples of how to use them will be given.
BP-315 : Using the PRXCHANGE Function to Remove Dictionary Code Values from the Coded Text Terms
Lynn Mullins, PPD
SAS® Perl regular expression (PRX) functions and CALL routines refer to a group of functions and CALL routines that use a modified version of the Perl programming language as a pattern-matching language to parse character strings. You can perform the following tasks: " Search for a pattern of characters within a string " Extract a substring from a string " Search and replace text with other text " Parse large amounts of text, such as Web logs or other text data You can write SAS programs that do not use regular expressions to produce the same results as you do when you use Perl regular expressions. However, the code without the regular expressions requires more function calls to handle character positions in a string and to manipulate parts of the string. Perl regular expressions combine most, if not all, of these steps into one expression. The resulting code is less prone to error, easier to maintain, and clearer to read. This paper will discuss how to use Pearl regular expressions to remove those pesky codes that are sometimes at the end of dictionary coded text terms.
BP-340 : Programmatically mapping source variables to output SDTM variables based upon entries in a standard specifications Excel file
Frederick Cieri, CSG
Rama Arja, MedImmune
Zev Kahn, CSG
Ramesh Karuppusamy, Theorem Clinical Research
Based upon programmatically reading the entries from standard worksheets of a SDTM (Study Data Tabulation Model) specifications Excel file workbook, this paper details a macro application to map source variables to output SDTM variables. After learning how to properly fill out the Excel file, the macro should reduce errors and programming time by directly implementing the mappings of the source variables to the output variables. To create the output variables, the macro first reads the Excel file worksheets to create the metadata of the variable mappings. With the variable metadata, the macro reads the raw source data files, maps raw source variables to output variables by variable rename or format transformation, and creates an output dataset of merged and appended raw source data files. For variable traceability, all the raw source variables will be in the output dataset with the original name if the variable is not mapped, or with remapped name plus an added variable containing the original values if a format transformation occurs. For dataset traceability, a variable is added to the output data set detailing the source data or datasets used to create the row. With the Excel file variable entries, the macro should be able to create about 50% or more of the variable outputs.
Real World EvidenceRW-179 : Using Real-World Evidence to Affect the Opioid Crisis
Sherrine Eid, SAS Institute
Andrea Coombs, SAS Institute
Introduction The opioid crisis is growing daily. Overreliance on opioids for pain management has led to the worst drug crisis in American history. Of the nearly 64,000 American deaths in 2016 due to drug overdoses, nearly two-thirds (66%) involved a prescription or illicit opioid. The CDC estimates the total economic burden of prescription opioid misuse in the US is $78.5 billion a year, including the costs of health care, lost productivity, addiction treatment, and criminal justice involvement. Prevention and access to treatment for opioid addiction and overdose reversal drugs are critical to fighting this epidemic. Primary care settings have increasingly become a gateway to better care for individuals with both behavioral health (including substance use) and primary care needs. In order to prevent new opioid use disorder cases, The Center for Drug Evaluation and Research has approved ongoing expansion of opioid Risk Evaluation and Mitigation Strategies and education about appropriate pain management. They are evaluating benefits and risks of currently approved opioids and additional methods to improve prescribing practices. Methods Over 100,000 deidentified, state-level patient claims records were analyzed to assess the likelihood of an opioid complication after receiving a prescription for opioids. Results Our study showed that patients who have a behavioral health diagnosis are almost 55 times more likely to have an opioid complication if prescribed opioids than patients without such a diagnosis. (OR=54.79 CI:[16.50,339.14]) Conclusion Electronic medical records should identify patients who had a previous behavioral health diagnosis to receive alternative therapies to opioids for pain management.
RW-192 : Stratified COX Regression: Five-year follow-up of attrition risk among HIV positive adults, Bamako
Mamadou Dakouo, DATASTEPS
Kriss Harris, SAS Specialists Ltd.
Seydou Moussa COULIBALY Coulibaly, HOSPITAL
This study evaluated the association between longer distance to hospital and attrition (loss to follow-up and death) rate in a cohort of HIV positive adults initiating HAART at the University Teaching Hospital, Point G, Bamako. We included all patients who initiated HAART between July 19, 2004 and July 31, 2009 at the University Teaching Hospital. Patients were considered to be in attrition if they did not show up for consultation within 90 days of their expected visit date. The Stratified Cox model is a modification of the Cox proportional hazards (PH) model that is appropriated when your predictor that does not satisfy the PH assumption. Our predictor does not satisfy the PH assumption. Thus, the Stratified COX model was used to estimate the risk of attrition among patients living further from the hospital. The analysis was adjusted for age, sex, profession, and coinfection. All analyses were performed using SAS V.9.4 (SAS Institute). Of 3042 patients included, 79.5% experienced attrition. Attrition was highest during the first six months of HAART. More frequent attrition was found among individuals living further from the hospital. (reference: Bamako; out of Bamako, HR 1.24[1.11;1.32] ); male (HR 1.21[1.12;1.31]); profession (reference: civil servant, unemployed HR 1.17[1.04;1.32], worker HR 1.39[1.21;1.59]; other HR 1.15[1.00;1.32]); age > 50 ( HR 1.17[1.02;1.32]). This study detected an increased risk of attrition among patients living further from hospital.
RW-199 : Patterns of risk factors and drug treatments among Hypertension patients
Youngjin Park, SAS
The common comorbidities of hypertension are heart disease and diabetes. These comorbidities are generally highly correlated. They can also have a disease progression relationship progressed from one disease to other disease, or co-occurrence of two diseases. Monitoring the characteristics of hypertension patients is a way to prevent disease progression of each comorbidity or co-occurrence of two diseases. Using SAS/RWE a hypertension patients cohort will be constructed using industry-standard episode-of-care definition during a specified time period. The outcome measures such as comorbidities, risk factors, and drugs managing hypertension will be also created while a cohort is created. We monitor these outcome measures over time during a specified time period. We use 3 years' worth of administrative claims data obtained from the publicly available CMS SynPUF Medicare data. The incident rates of comorbidities in hypertension patients with a different sex and age are computed. The risk factors are analyzed over time for each comorbidity. Also, the drugs managing hypertension are analyzed over time. We identify unknown data patterns and characteristics associated with belonging to certain trajectory groups. We use Addin-model feature at SAS/RWE for the statistical analyses. This Add-in model is a template which can generally apply to any other interesting cohort population and any other outcome measures.
RW-232 : Artificial Intelligence and Real World Evidence - it takes two to tango
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifescience Analytics Pvt. Ltd.
Stakeholders and decision makers are increasingly using real-world evidence (RWE) and technology to solve the problems of human health. Real World Data (RWD) and RWE will play a big role in health care decision making. Data sources can include: claims data, electronic medical record data, genomics data, imaging data, sensors, wearables and many others. As big data gathered in real-world healthcare settings becomes more prevalent and robust, it is increasingly being used across the entire healthcare system for evidentiary purposes This data has the potential to guide us create better study design and answer unanswered queries in trial set-up. The data from analytics can further form inputs to medical product development. The utility of AI comes from its application to huge data arising out of RWE information bank. Natural Language Processing (NLP), an AI tool can help in charting unstructured data and provide a contextual meaning. Machine Learning (ML) is being utilized to search through volumes of data, looking for complex relationships using library of algorithms. ML will strengthen the way predictive analytics and prescriptive analytics are being transformed with data. Deep learning (DL) concepts will automate generation of predictive features and have its impact on analyzing data related to image processing, speech recognition and language translation. Innovation in the form AI coupled with big data, real-world evidence which is more dynamic, appropriate, illustrative, complete and cost-effective can be generated.This paper focus of areas of application of AI in ensuring fruitful RWE outcome.
RW-238 : Innovative Technologies utilization in 21st Novel Clinical Research programs towards Generation of Real World Data.
Srinivasa Rao Mandava, Merck
R&D budgets of Pharmaceutical industry have been increasing year after year with oncology and metabolic disease drug developments as lead engines. However, return on R&D investments reduced from 10.1 in 2010 to 3.2% in 2017. On the other side, Technology utilization is increasing in the same phase for better handling and cost & time reduction scenarios. Current $ 60 billion clinical research market is extremely slow and there by expensive in some ways. Outdated data technics, confusion and confrontation over eligibility, large numbers of subjects drop-out rates are the prime reasons for longer trial times of about 10 years. For the last few years digital technologies utilization across the industry is increasing with three focused areas such as 1. Engage 2. Innovate 3. Execute in dealing with three key groups such as a) patients b) providers c) payers. It is the key in acquiring more efficient and accurate data collection through all stages of trial life cycle in the current era of 21st century of novel clinical research. Digital technology allows passive collection of data from a variety of different sources including wearables that measure vitals, physical activities and also amounts of sleep. In this paper, we will discuss about utilization of digital and other innovative technologies such as AI, ML and Block Chain methodologies in trial processes and data articulation in order to achieve 21st century novel clinical research objectives.
RW-310 : Real-world data as real-world evidence: Establishing the meaning of data as a prerequisite to determining secondary-use value
Jennifer Popovic, RTI International
The U.S. Food and Drug Administration (FDA) defines real world data (RWD) as data about patient health status that are routinely generated and collected through a variety of sources, such as through provision of clinical care. Real world evidence (RWE) is defined as the clinical evidence gathered through analysis of RWD. Data cannot become evidence until their meaning and value have been established. This is especially true when making secondary-use of data gathered for other primary purposes, as is often the case for use of RWD. 'Meaning' and 'value' are distinct constructs and should be evaluated as such. 'Meaning' is objective, factual and agnostic to data's use. 'Value' is subjective and situational, pegged to data's intended use. This paper introduces a framework that can be applied to discover and evaluate the meaning of data, by focusing on attributes and questions within five distinct data-related categories: provenance, governance, measurement, quality and validity. This paper provides examples of the application of this framework to both traditional and emerging RWD sources that have been used in a secondary-use manner as evidence or are being explored for their secondary-use potential.
RW-345 : Applications and Their Limitations of Real-World Data in Gene Therapy Trials
Karen Ooms, Quanticate
There are approximately 7,000 distinct rare diseases that exist affecting 350 million people worldwide, and approximately 80% of those rare diseases are caused by faulty genes. Scientific advances such as the CRISPR/Cas9 genome-engineering system have simplified the pharmaceutical and biotech industry's ability to develop gene therapies especially for single gene mutation disorders. The FDA has more than 700 active INDs for gene and cell therapies and in 2017 approved two cell-based gene therapies - chimeric antigen receptor T-cells (CAR-T) and approved the first gene-therapy product to be administered in vivo which in addition was the first to target a specific rare disease genetic condition. Collins and Gottlieb, of the NIH and FDA respectively, have stated that 'it seems reasonable to envision a day when gene therapy will be a mainstay of treatment for many diseases'. There are unique challenges associated with gene therapy trials especially in those indications which are rare diseases. These challenges include small patient numbers, lack of detailed knowledge of the disease progression, and definition of suitable endpoints. During the presentation, we will discuss how the analysis of Real-World Data can provide insight and help overcome these challenges, and discuss some of the limitations which reduce their acceptance by the regulatory authorities. We will give careful consideration to the following statistical aspects of a trial - definition of the study population given the likely phenotypic heterogeneity of the disease based on data from registry or natural history studies of different disease stages/severity - use of controls, including historical control data - endpoint choice, - identification and validation of suitable biomarkers for accelerated approval
Reporting and Data VisualizationDV-002 : Order, Order! Four Ways To Reorder Variables with SAS®, Ranked by Elegance and Efficiency.
Louise Hadden, Abt Associates Inc.
SAS® practitioners are frequently required to present variables in an output data file in a particular order, or standards may require variables in a production data file to be in a particular order. This paper and presentation offer several methods for reordering variables in a data file, encompassing both DATA step and procedural methods. Relative efficiency and elegance of the solutions will be discussed.
DV-003 : With a Trace: Making Procedural Output and ODS Output Objects Work For You
Louise Hadden, Abt Associates Inc.
The Output Delivery System (ODS) delivers what used to be printed output in many convenient forms. What most of us don't realize is that "printed output" from procedures (whether the destination is PDF, RTF, or HTML) is the result of SAS® packaging a collection of items that come out of a procedure that most people want to see in a predefined order (aka template.) With tools such as ODS TRACE, PROC CONTENTS and PROC PRINT, this paper explores the many buried treasures of procedural output and ODS output objects and demonstrates how to use these objects to get exactly the information that is needed, in exactly the format wanted.
DV-005 : Back to the Future: Heckbert's Labeling Algorithm
Chris Smith, Cytel Inc.
There are several different axis tick mark algorithms in existence. We will discover one of these by time travelling back to 1990 to learn about Heckbert's nice numbers algorithm for labeling graph axes. Then, we will travel back to the future to proactively apply this algorithm to clinical data, using ODS Graphics and DYNAMIC variables from the Graph Template language (GTL). Lastly, we explore other uses of DYNAMIC variables to provide data-driven solutions to graphical outputs. SAS® 9.4 M2 was used in the examples presented. This paper is written for the intermediate to advanced SAS users. In particular, it assumes familiarity with the SGPLOT procedure and GTL.
DV-021 : Applying an Experimental GTL Feature to CONSORT Diagrams
Shane Rosanbalm, Rho, Inc
SAS added an experimental feature to the TEXTPLOT statement in GTL as part of 9.4 M3. When the OUTLINE option was invoked, this experimental feature allowed the user to capture a dataset with information about where the outline was being drawn using the OUTFILE and OUTID options. This paper is about the application of the experimental OUTFILE and OUTID options in an attempt to make the creation of CONSORT diagrams a little less labor intensive.
DV-024 : Free Duplicates Here! Get Your Free Duplicates!
Kristen Harrington, Rho, Inc.
In clinical trials, it is common to produce an overall summary as well as several subset versions of that summary. The overall summary is typically referred to as "unique" and the subset versions as "duplicates". For traceability, a separate program is required for each individual output; cramming the creation of the unique and duplicates into a single program won't pass muster. The most common approach to this problem is to create a macro. This macro is called by each of the separate unique and duplicate programs to produce the output files. The brute force method to create these separate programs is manually copying the unique program once for each duplicate to be produced, renaming files and changing values of macro subsetting parameters. This paper will explore an automated approach to creating the duplicate SAS programs.
DV-090 : Effective Graphical Representation of Tumor Data
In recent months, there has been an increasing interest in combining the "Duration of Treatment" data with the "Tumor Response" information for subjects in a study in one graph. Traditionally, this information has been displayed in separate graphs where the subjects may be sorted by different criteria. In such a case, the investigators have to work harder to associate the subject across the graphs. Displaying the data together, sorted by the tumor response makes it easier for the investigators to understand this information. This paper will show you how to build effective graphical representation of tumor data using SAS and how these graphs can be extended to display additional subject data.
DV-131 : An Innovative Efficacy Table Programming to Automate Its Figure Generation to Ensure Both High Quality and Efficiency
Xiangchen (Bob) Cui, Alkermes, Inc
Letan (Cleo) Lin, Alkermes Inc.
Efficacy table and its figure programming are key part of Statistical Programming to support Clinical Study Report (CSR). Typically efficacy table programming and its figure programming are two totally independent processes. However they share the common SAS codes to generate statistics for table reporting and figure creation, re the reading of efficacy ADaM datasets to subset the records and select the population, and the calling of SAS Statistical procedures to generate statistics. Hence consistency between efficacy table programming and its figure programming is very crucial to achieving the quality. Since programming for CSR Reporting may undergo a lot of changes until very late in the preparation stage, it requires a lot of resources to maintain the consistency. This paper presents a new approach to change these two parallel processes into a "sequential" process by leveraging efficacy table programming from the "common SAS codes" to output extra permanent SAS datasets, which are directly used in figure programming for the automation of figure creation. The maintenance of "consistency" can be automatically achieved. Furthermore, workload for validating figure programming can be dramatically reduced from the double programming to less resource-requiring process. There is a growing recognition that the multiple imputation (MI) method can be used to handle missing values in clinical trials. It can dramatically reduce SAS program running time and helps the programming team's final delivery, especially key data readout. We illustrate it by providing examples of forest plots from subgroup analysis to show how it automates the creation of figures efficiently.
DV-164 : Heat Map and Map Chart using TIBCO Spotfire®
Ajay Gupta, PPD Inc
TIBCO Spotfire is an analytics and business intelligence platform which enables data visualization in an interactive mode. Users can create heat maps and map charts using inbuilt functions in Spotfire. The easiest way to understand a heat map is to think of a cross table or spreadsheet which contains colors instead of numbers. The default color gradient sets the lowest value in the heat map to dark blue, the highest value to a bright red, and mid-range values to light gray, with a corresponding transition (or gradient) between these extremes. Heat maps are well-suited for visualizing large amounts of multi-dimensional data and can be used to identify clusters of rows with similar values, as these are displayed as areas of similar color. Patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Also, Map chart can be useful to show the population density in the world map. This paper will demonstrate some basic heat maps and map chart created using spotfire.
DV-169 : The Power of Data Visualization in R
Oleksandr Babych, Experis Clinical
The ability to build beautiful and meaningful graphs is a valuable skill for the data analyst. R has become a popular programming language in the field of data analysis. Among many of its advantages, it has two essential ones: it is easy to learn and it has a powerful visualization package - ggplot2. Therefore, it wouldn't require a great amount of time learning how to make high-end graphs in R. In this article we will take a look at how to build graphs via ggplot2 and we will consider its underlying concept called the Grammar of Graphics. The central idea of ggplot2 consists in constructing plots by combining different layers on top of each other and using aesthetic mappings to define the graph. The main objective of this paper is to highlight the advantages of this approach by providing various examples and to demonstrate how R can be a powerful tool in the skill set of any data analyst.
DV-184 : Figure it out! Using significant figures from reported lab data to format TLF output
Elizabeth Thomas, Everest Clinical Research, Inc.
Lauren Williams, Everest Clinical Research, Inc.
Laboratory data comes from a variety of sources with differing levels of precision and often needs to be manipulated and/or transformed before the results are reported in a table, listing, or figure. While reporting precision is well-established for many standard lab parameters, pre-specification of reporting precision for a novel lab value can be challenging. One option is to let the data, as reported, determine the precision. This paper gives a short refresher on the rules of significant figures (sig figs). You will learn how to calculate sig figs given a character-formatted result, how to keep track of sig figs for transformed data (sums, averages, ratios, and transformation by a constant), and how to format a result given an unrounded value and sig fig.
DV-214 : DOMinate your ODS Output with PROC TEMPLATE, ODS Cascading Style Sheets (CCS), and the ODS Document Object Model (DOM)
Louise Hadden, Abt Associates Inc.
Troy Hughes, Datmesis Analytics
SAS® practitioners are frequently forced to produce SAS output in mandatory formats, such as using a company logo, corporate or regulated government templates and/or cascading style sheet (CSS). SAS provides several tools to enable the production of customized output. Among these tools are the ODS Document Object Model, cascading style sheets, PROC TEMPLATE, and ODS style overrides (usually applied in procedures and/or in originating data.) This paper and presentation investigates "under the hood" of the Output Delivery System destinations and the PROC REPORT procedure and investigates how mastering ODS TRACE DOM and controlling styles with the CSSSTYLE= option, PROC TEMPLATE, and style overrides can satisfy client requirements and enhance ODS output.
DV-220 : How to Build a Complicated Patient Profile Graph by Using Graph Template Language: Turn Mystery to a LEGO Game
Ruohan Wang, Ms.
Chris Qin, Mr.
Patient profiles typically include various data which are associated with each other. A visual report, such as graphical patient narratives, can improve the readability of correlated data to achieve reviewers' needs and efficiently to interpret the clinical results and findings. It is desired for submission and publication in current pharmaceutical industry. Patient profile graphs usually combine multiple forms of data to share a common factor such as time frame. Graph Template Language (GTL) can generate such complicated graphs with power and flexibility. However, GTL might be a mystery to graph beginners, and impress them time-consuming to learn. This paper works as an easy-to-follow building instruction. It shows you how to be a master builder of patient profile graphs from scratch by using GTL like playing a LEGO game. A couple of projects from daily work are given as examples using dummy but practical data. The examples are generated by SAS® 9.4 and in publication quality.
DV-246 : Profiling Patients for Fun and Profit
Troy Kukulka, Biorasi
Patient profiles have, for years, been the go to tool for post mortem review of patients that die during a clinical trial. However, patient profiling is also powerful tool for reviewing the overall quality of your data and health of your study. Much like the post mortem reviews, clinical monitors, study sponsors, and project managers can benefit greatly from the ability to quickly pull and review patient information. Regular review of study subjects, both at random and in tandem with a Risk Based Monitoring approach allows for earlier detection of issues both in the study and with data quality compared to the more common after the fact reviews that are normally performed. While many solutions exist on the market to generate a general patient profiles, these solutions generally provide rigid reports that frequently fail to meet the needs of all interested parties to the reports. By developing a more flexible profile and then presenting it in excel, your audience will be able to more readily drill into the data they are most interested in reviewing.
DV-249 : Better understanding of Clinical Reports
Sairam Veeramalla, GCE SOLUTIONS
Most of our programmers create the datasets & reports as a part of their daily routine, but we tend to forget to understand that why we need to create those reports and what is the importance of creating those reports in the drug discovery process, once we understand that clearly, we never do mistakes in programming and really help us to deliver high-quality reports, through this conference, I would like explain key points to create few common reports and how to interpret those.
DV-276 : Not That Dummy Data
Yuliia Bahatska, inVentiv Health Clinical
When it comes to data visualization it is expected that the data is presented the way it is stored in the analysis dataset. Labels may be changed, formats may be applied, or some derivations may be made, but definitely programmers are not expected to add the values to the data. But is it always true? SAS has quite a wide range of ways to create figures and usually at least one of the graphical procedures or GTL can provide you with the required output, but sometimes you still need a workaround. In this paper I will concentrate on the cases when in order to present the data in the desired way it is helpful to add artificial or dummy values.
DV-280 : A Sassy substitute to represent the longitudinal data-The Lasagna Plot
Vishnu Bapunaidu Vantakula, Mr
Soujanya Konda, Miss
Data interpretation becomes complex when the data contains thousands of digits, pages, and variables. Generally, such data is represented in a graphical format. Graphs can be tedious to interpret because of voluminous data, which may include various parameters and/or subjects. Trend analysis is most sought after, to maximize the benefits from a product and minimize the background research such as selection of subjects and so on. Additionally, dynamic representation and sorting of visual data will be used for exploratory data analysis. The Lasagna plot makes the job easier and represents the data in a pleasant way. This paper explains the basics of the Lasagna plot.
DV-285 : Forest Plots for Beginners
Savithri Jajam, Chiltern
Olesya Masucci, Chiltern
For years, forest plots have been created and published in various forms, and frequently used over the last decade because of their popularity. They are very useful plots in that they can be used to display the results of meta-analysis, subgroup analysis, and sensitivity analysis on log scale and linear scale. Although programming techniques have been improved, it is still very difficult to create them. This paper will cover the history of forest plots, explaining how and when forest plots were invented, what a forest plot is, and why they have maintained their popularity. It will then go on to explain step by step the essentials for creating forest plots using the Graph Template Language (GTL) with SAS.
DV-294 : SAFETY UPDATE: Development Safety Update Report (DSUR) & how a centralized database from DSUR helps other Regulatory Submissions
Gouthami Kanduri, Gilead Sciences, Inc
Development Safety Update Report is a document containing a comprehensive annual review and evaluation of safety information for drugs under development (Including the drugs marketed which are to be further Studied) annually. DSUR can be replaced with annual reports. This paper will discuss DSUR process, outputs produced and few issues while working on pooled studies. Also, an approach for time saving within a company that how a centralized database developed from DSUR helps other regulatory submissions and Ad-hoc requests.
DV-297 : Bridging Applications with Automation: Automating the creation of high quality outputs in SAS and R for Study Trial Reports
Anastasia Alexeeva, Eli Lilly and Co
Mei Zhao, Eli Lilly and Co
William Martersteck, Eli Lilly and Co
Besides the usual portfolio work statistical programmers perform throughout the course of a clinical trial, an organization may ask that study teams create additional safety and efficacy analyses or reports across numerous trials. The motivation for these requests may be to understand clinical data better, and to use results of analyses to make important business decisions about a compound or indication. In order to do this, analysts have to write new programs that generate the outputs, create and validate input datasets, and write up the specifications for input datasets and outputs. Whether individual datasets or analyses are executed in SAS, R, or another software is subject to the preference of the programmer or the business need. In this paper, we demonstrate an example of how to automate the generation of SAS and R programs by the use of a trial-specific file that contains global variable assignments. An R program reads this file and passes its contents to functions that create all the R, SAS, and Rmd files needed for the project. The global variables provide the key study specific information to populate the program headers, as well as everything needed to create and validate the data, and generate the output and the specifications for proper documentation. This strategy allows us to seamlessly leverage the benefits of R and SAS in one project, and offers the opportunity other teams to easily apply reference code to another study without having to make tedious modifications to many programs in order to generate results.
DV-323 : Fine-tuning your swimmer plot: another example from oncology
Steve Almond, Bayer Inc.
Swimmer plots are an effective graphical presentation of longitudinal data such as periods of treatment/observation, occurrences of events, and the durations of events or subject status. These types of plots are particularly popular for oncology trials, where the treatment and follow-up periods are displayed along with tumor assessment results and the duration of various responder criteria. The ease of creating swimmer plots has increased in successive SAS versions within the ODS Graphics procedures. This paper reviews the basics for constructing the elements of such plots, and provides tips and tricks for implementing aesthetic touches simply with the main SGPLOT procedure (SAS 9.4) statements and without the use of annotations or the Graphics Template Language.
DV-332 : Visualize Overall Survival and Progression Free Survival at the Same Time!
Kriss Harris, SAS Specialists Ltd
Usually we produce Kaplan-Meier plots to show the Overall Survival (OS) and/or the Progression-Free Survival (PFS) profile. It is rare that we see the OS and the PFS on the same graph; the endpoints are usually in two different files. This paper will show you how to produce an animated visualization that you can use to visualize the OS and PFS together which can help you to understand the treatment efficacy better.
Statistics and AnalyticsST-058 : Logistic and Linear Regression Assumptions: Violation Recognition and Control
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine
Regression analyses are one of the first steps (aside from data cleaning, preparation, and descriptive analyses) in any analytic plan, regardless of plan complexity. Therefore, it is worth acknowledging that the choice and implementation of the wrong type of regression model, or the violation of its assumptions, can have detrimental effects to the results and future directions of any analysis. Considering this, it is important to understand the assumptions of these models and be aware of the processes that can be utilized to test whether these assumptions are being violated. Given that logistic and linear regression techniques are two of the most popular types of regression models utilized today, these are the are the ones that will be covered in this paper. Some Logistic regression assumptions that will reviewed include: dependent variable structure, observation independence, absence of multicollinearity, linearity of independent variables and log odds, and large sample size. For Linear regression, the assumptions that will be reviewed include: linearity, multivariate normality, absence of multicollinearity and auto-correlation, homoscedasticity, and measurement level. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.
ST-059 : Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine
Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables are linearly related, or codependent. The presence of this phenomenon can have a negative impact on an analysis as a whole and can severely limit the conclusions of a research study. In this paper, we will briefly review how to detect multicollinearity, and once it is detected, which regularization techniques would be the most appropriate to combat it. The nuances and assumptions of R1 (Lasso), R2 (Ridge Regression), and Elastic Nets will be covered in order to provide adequate background for appropriate analytic implementation. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.
ST-060 : Jump Start your Oncology knowledge
Xiaoyin Zhong, GSK
Feng Liu, AstraZeneca
As pharmaceutical programmers in oncology, it is critical to understand and interpret the disease jargon and statistical analysis to communicate with statisticians, physicians/clinicians and other study personnel effectively. The purpose of this paper is to introduce key concepts, study design and analysis in oncology to jump start your knowledge base for new oncology programmers or experienced programs who are new to Oncology. This paper will highlight definition of oncology, uniqueness of oncology drug development, and emerging treatment options for oncology as well overview of study design including master protocol design. In addition, oncology efficacy evaluation and data standard will be presented.
ST-081 : Let's Flip: An Approach to Understand Median Follow-up by the Reverse Kaplan-Meier Estimator from a Programmer's Perspective
Nikita Sathish, Seattle Genetics
Chia-Ling Ally Wu, Seattle Genetics
In time-to-event analysis, sufficient follow-up time to capture enough events is the key element to have adequate statistical power. Achieving an adequate follow-up time may depend on the severity and prognosis of the disease. The median follow-up is the median observation time to the event of interest, which is an indicator to see the length of follow-up. There are serveral methods to calculate median follow-up, and we have chosen to use the more robust and code-efficient reverse Kaplan-Meier (KM) estimator in our paper. Median follow-up is a less commonly presented descriptive statistic in reporting survival analysis result, which could pose some challenges to understand. This paper aims to provide the concept of median follow-up, the statistical interpretation of median follow-up both numerically and visually, and the SAS® LIFETEST procedure to delineate survival plots and compute survival function using the reverse KM estimator. We present a simple and robust approach of calculating median follow-up using the reverse Kaplan-Meier estimator by flipping the meaning of event and censor (Schemper and Smith, 1996), i.e., event becomes censor while censor becomes the endpoint.
ST-091 : Equivalence, Superiority and Non-inferiority with Classical Statistical Tests: Implementation and Interpretation
Marina Komaroff, Noven Pharmaceuticals
Determining superiority of an experimental treatment versus a standard of care has been a popular objective of randomized controlled trial in the pharmaceutical industry. Superiority is determined by the statistical and clinical significance of a clinical endpoint. However, researchers often question why non-significant p-values cannot be viewed as evidence that the two treatments were equivalent. They might say: if the p-value < 0.05 we assume the null hypothesis is false, so it must also follow that if the null hypothesis is true, the p-value must be e 0.05. It is not clear whether this reasoning is a poor attempt to salvage a failed study or reflects a misunderstanding of null hypothesis testing. This paper will clarify the meaning of p-values and demonstrate how p-values can be used to conclude equivalence (no difference) in treatment effects. There are many papers that review statistical methods for analyzing equivalence, superiority and non-inferiority trials utilizing the POWER, TTEST, TOST and FREQ procedures in SAS/STAT® software. Nonetheless, subtle and confusing issues arise in the application and interpretation of such methods. The author will present simulated data to visually demonstrate how changes in boundary margins affect sample size and power calculations. The goal is to help researchers thoughtfully choose boundary parameters, plan the operating characteristics of an equivalence/inferiority clinical trial, and correctly interpret the results. The paper will provide guidance for not only implementing such studies, but also promotes better understanding of the designs for critically reviewing the published research that utilized such methods.
ST-096 : A macro of evaluating the performance of the log-rank test using different weight for enrichment studies
Chuanwu Zhang, University of Kansas
Byron Gajewski, University of Kansas Medical Center
Jianghua(Wendy) He, University of Kansas Medical Center
The log-rank test is a traditional nonparametric hypothesis test often used to compare the time-to-event data of two groups. Different weight functions applied to the log-rank test lead to different tests, e.g.: the log-rank, Gehan, Peto-Peto, Fleming-Harrington, etc. Little is known regarding the performance of log-rank test using different weight functions when one group contains subgroups with different hazard rates, which can be seen in enrichment studies. For example, compared to the control group, the male "treatment" subgroup may benefit from the treatment gradually, while the female subgroup may benefit immediately then lose it. How can we know the most efficient (i.e. best performance) weight functions for log-rank test should be selected for analyzing such data? In this paper, we will develop a macro to examine the performances of the log-rank test using different weight functions in terms of empirical type I error and power based on simulated data. Our macro contains three parts. Firstly, data is simulated with a basket of factors that include the distribution type of time-to-event/censor data, the proportion of observed event time and censoring, the hazard rate changing patterns such as increasing, decreasing, or hump-shaped for a subgroup, and the proportion of subgroups within the treatment group, etc. Secondly, the macro executes the log-rank test using different weight functions. Finally, the testing results are presented in a table and a graph to compare the performances. Users can invoke our macro to determine which weight function in log-rank test should be used to meet their needs.
ST-103 : SAS ® V9.4 MNAR statement for multiple imputations for missing not at random in longitudinal clinical trials
Lingling Li, ACADIA Pharmaceuticals Inc.
Missing data is a common problem in longitudinal clinical trials. The primary analysis commonly used in clinical trials relies on the untestable assumption of missing at random (MAR) and the sensitivity analysis under plausible assumptions of missing not at random (MNAR) is needed to examine the robustness of the statistical inference obtained from the primary analysis against departure from MAR assumption. Multiple imputations within the framework of pattern mixture models (PMM) is widely used for implementing sensitivity analysis under the assumptions of MNAR. Two MNAR assumptions, control-based pattern imputation and delta adjustment are regarded as clinically plausible, transparent, and easy to implement. SAS ® Version 9.4 PROC MI provides a MNAR statement, with two options MODEL and ADJUST, that allows implementation of the two assumptions conveniently. MNAR statement works with MONOTONE statement to handle monotone missing data and FCS statement to handle arbitrary missing data. This paper discuss the implementation of sensitivity analysis under MNAR assumptions within the PMM framework in longitudinal clinical trial data using MNAR statement. A simulated longitudinal clinical trial data with sample SAS code are described for illustration.
ST-149 : Application of R Functions in SAS to Estimate Dose Limiting Toxicity Rates for Early Oncology Dose Finding
Huei-Ling Chen, Merck
Zhen Zeng, Merck
Pool-adjacent-violators algorithm (PAVA) is a solution to isotonic regression used to estimate dose limiting toxicity (DLT) rates in early oncology trials. In the current version of SAS, there is no ready-to-use statistical procedure to implement PAVA. However, there are multiple R packages available for PAVA estimation. In most pharmaceutical companies, SAS is the mainstream working environment for statistical analysis and reporting. It would be very helpful to exploit the bridge between SAS and R to take advantage of both worlds: R's capability in versatile statistical analysis tools, and SAS's well validated existing statistical procedures and mature reporting system. This paper presents a SAS macro to accomplish this collaborative task by embedding the available R PAVA function into a SAS program. This paper provides details on three important features used to streamline the process of integration. The first feature is how to prepare a SAS environment which enables SAS code to communicate with R. The second feature, how to import and export data between SAS and R. Key PROC IML syntaxes will be provided for demonstration. The Third feature, how to make R package recognizes a SAS macro parameter value so that the R package can easily become a nested sub-macro inside a SAS macro. This macro has been implemented in real-life to support the reporting of DLT summary statistics.
ST-160 : Experiences in Building CDISC Compliant ADaM Dataset to Support Multiple Imputation Analysis for Clinical Trials
Xiangchen (Bob) Cui, Alkermes, Inc
Multiple imputation (MI) is becoming an increasingly popular method to address the missing data problem in regulatory clinical trials, especially when the outcome variables come from repeated assessments. SAS procedures, PROC MI and PROC MIANALYZE apply the multiple imputation techniques to generate multiple imputations for incomplete multivariate data and to analyze results from multiply imputed data sets, respectively. How to use PROC MI to build CDISC compliant ADaM dataset to support MI analysis is a new ADaM programming technique. This paper illustrates how to apply ADaM BDS data structure to build such one through an example. We will not present how to use PROC MI procedure, for it has been very well explained in SAS user manual and other papers. However we do provide some tutorial of related statistical concepts to help Statistical programmers to better understand this procedure and apply it in ADaM programming. We mainly focus on ADaM programming logic flow, key variable derivations for the imputed data including ADaM specification writing, programming independent validation process. Some tips and pitfalls provided in this paper could be time-saving ones, and assist you in your programming to achieve technical accuracy and operational efficiency. The sharing of hands-on experiences in this paper is intended to assist readers to prepare CDISC compliant ADaM dataset to facilitate MI analysis in regulatory clinical trials, and further to support FDA submission.
ST-176 : Practical Perspective in Sample Size Determination
Bill Coar, Axio Research
Experiments advance science. They are designed to answer specific questions. A well designed experiment accompanied by appropriate statistical methodology should yield scientifically sound results to answer specific questions with a reasonable amount of certainty. To achieve this, the concepts of type 1 error, power, and sample size are introduced into the experimental design. While these abstract concepts are based solely on assumptions, they are critical to the integrity of study results. SAS® provides numerous procedures to assist with these parts of experimental design. The designs in clinical research range in complexity, as do the SAS procedures that support them. This presentation will approach sample size determination from a practical perspective. Even if a set of assumptions are reasonable, they may not result in a feasible sample size. We use a placebo-controlled clinical study to demonstrate the sample size and study design can evolve due to practical considerations. The endpoint is 28-day survival for an often fatal medical condition. It is assumed that 50% of patients will die within 28-days under the standard of care. However, a medical procedure could possibly increase this to 80%. This presentation will discuss the use of PROC POWER, PROC SEQDESIGN, and even simulation to assist with the study design, and how practical considerations may cause the design to evolve.
ST-183 : Cluster Analysis - What it is and How to use it
Alyssa Wittle, Covance, Inc.
Michael Stackhouse, Covance
A Cluster Analysis is a great way of looking across several related data points to find possible relationships within your data which you may not have expected. The basic approach of a cluster analysis is to do the following: transform the results of a series of related variables into a standardized value such as Z-scores, then combine these values and determine if there are trends across the data which may lend the data to divide into separate, distinct groups, or "clusters". A cluster is assigned at a subject level, to be used as a grouping variable or even as a response variable. Once these clusters have been determined and assigned, they can be used in your analysis model to observe if there is a significant difference between the results of these clusters within various parameters. For example, is a certain age group more likely to give more positive answers across all questionnaires in a study or integration? Cluster analysis can also be a good way of determining exploratory endpoints or focusing an analysis on a certain number of categories for a set of variables. This paper will instruct on approaches to a clustering analysis, how the results can be interpreted, and how clusters can be determined and analyzed using several programming methods and languages, including SAS and Python. Examples of clustering analyses and their interpretations will also be provided.
ST-213 : Statistical Assurance in SAS: An Introduction and User's Guide
Jonathan L Moscovici, IQVIA
Milena Kurtinecz, GSK
A long-standing concept in the planning of clinical trials has been that of statistical power. However, traditional calculation of power implies requires specification of several unknown quantities such as measured treatment effect between groups. In this way, the power obtained is a conditional quantity that can be misleading if the underlying assumptions in the calculation are inaccurate. This leads to a miscalculation of the required sample size and other important trial determinants. Statistical assurance is a method of calculating the unconditional probability of success of a trial by assigning probability distributions to unknown parameters such as treatment effect, rather than just one "best guess" estimate. These distributions can be based on expert opinion, available pilot data and clinical considerations. Although not a new concept, statistical assurance adoption has been slow in the biopharmaceutical community partly due to the lack of software implementation. This paper will introduce the basic concepts, clinical cases and implementations in SAS.
ST-229 : SAS® Playoffs: Lay Your Bets on PROC FREQ Versus PROC DS2
Troy Hughes, Datmesis Analytics
Charu Shankar, SAS Institute
In these playoffs, the authors compare and contrast parallel processing methods in Base SAS® that analyze data frequency. The FREQ procedure is the out-of-the-box Base SAS solution for producing frequency tables, which can save these results as output (e.g., HTML, PDF) or as a SAS data set. Unfortunately, because FREQ is not multithreaded, it performs slower than a theoretically multithreaded solution that SAS has not created or released. This paper provides two innovate approaches to process data in parallel and overcome this performance deficit. The first option is a distributed processing solution that uses SYSTASK to spawn multiple instances of the SAS application to run several FREQ procedures in parallel, and this solution has been demonstrated to perform up to four times faster than the FREQ procedure. The second option is a solution that relies on the multithreading available in the DS2 procedure to also produce results faster than FREQ. Both solutions are intended to demonstrate how programmatic enhancements such as parallel processing-to include multithreading and distributed processing-can be used to improve performance even when hardware and other aspects of the SAS environment cannot be modified or improved. And, of course this talk is not to be missed if you're interested in finding out which solution took the prize!
ST-259 : Enhancing Randomization Methodology Decision-Making with SAS Simulations
Kevin Venner, Almac Clinical Technologies
Jennifer Ross, Almac Clinical Technologies
Graham Nicholls, Almac Clinical Technologies
Kyle Huber, Almac Clinical Technologies
This paper will illustrate how SAS is an effective tool to conduct simulations for making data-driven randomization methodology decisions. SAS can be used to develop simulation programs to investigate the expected treatment balance and other randomization goals with comparing various randomization methods (stratified blocked rand., minimization, etc.) and associated parameters (block size, biased-coin probability, etc.). A Case Study will be provided to show how simulations can evaluate the expected treatment balance for different randomization methodologies / parameterizations being considered; and how simulations can investigate other randomization goals, such as minimum subjects required to be randomized at a site to ensure both treatment arms are represented. This paper will illustrate how SAS simulation programs can be developed with configurable macros that are readily adapted for each individual protocol. Through including macros in the SAS Simulation Programs, different randomization design scenarios are efficiently simulated though minor macro variable updates to allow for swift delivery of statistically sound results. Incorporation of Study-specific details (expected subject / strata distributions) can enhance the precision of the results. SAS macros programming allows for re-evaluation of varying subject distributions as means of testing the robustness of the simulation results. Treatment balance for a clinical trial can be critical for establishing treatment effectiveness. The components of the randomization design that impact treatment balance (e.g., methodology, stratification factors / levels, block size, etc.) should be carefully considered at the protocol design stage. Simulation results can help make impactful design decisions to achieve the optimal treatment balance and randomization goals.
ST-314 : Application of Criterion I2 in Clinical Trials Using SAS®
Igor Goldfarb, Accenture
Mitchell Kotler, Accenture
The goal of this paper is to demonstrate how meta-analyses criteria (specifically, I2 and Cochran's Q) can be calculated using SAS® software and applied to the examination of the heterogeneity across subgroups. Some European regulatory authorities require an analysis of the subgroups effect estimated using the criterion I2, which was developed and suggested by Higgins and his colleagues to provide a researcher with a better measure of the consistency between trials in a meta-analysis. Originally the criterion I2 was developed to estimate heterogeneity across the studies, whereas the authorities required its application to the scrutiny across subgroups of the same study. The criterion I2 describes the percentage of total variation across studies that is due to heterogeneity rather than chance. Negative values of I2 are put equal to zero so that I2 lies between 0% and 100%. A value of 0% indicates no observed heterogeneity, and larger values show increasing heterogeneity. SAS® programs were developed to implement the algorithm to calculate I2 and similar parameters (e.g., Cochran's Q). An analysis showed that the SAS® code successfully reproduced published results that were produced using other statistical software. After verification the SAS® code was run on the clinical data and the results obtained were successfully submitted to authority.
ST-321 : PROC MIXED: Calculate Correlation Coefficients in the Presence of Repeated Measurements
Qinlei Huang, Merck
Radha Railkar, Merck
In drug and medical device development, it is often needed to evaluate the correlation between two technologies, platforms, or devices, in a setting of a clinical trial with repeated measurements. The solution to this problem has been studied by many authors. Bland and Altman (1995) considered the problem by comparting the correlation into two components: between-subject and within-subject correlations. Lam, Webb and O'Donnell (1999) approached this problem by using maximum likelihood estimation in the case where the replicate measurements are linked over time. Roy (2006, 2015) solved the problem by considering different correlation structures on the repeated measures. This paper first reviews the statistical methods to estimate the Pearson's correlation coefficient between two measures in settings where multiple observations are available on the same subject; and then presents how to use PROC MIXED in SAS to obtain the parameter estimates of interest. It includes the SAS example codes, macro programming language, as well as examples of hands-on data analysis and outputs. Keywords: PROC MIXED, Correlation Coefficient, Repeated Measurements
ST-325 : Machine Learning Approaches to Identify Rare Diseases
Ruohan Wang, Ms.
Rare diseases are very difficult to identify and diagnose than other diseases, since there are not enough data and experts in rare diseases. Better availability of patient data and improvement in machine learning algorithms empower us to tackle this problem computationally. In this paper, we adapt state of the art machine learning algorithms to make this classification, such as K-nearest neighbors, Support Vector Machine, Neural Networks and Naive Bayes. In this paper, we use R to train and test models. We find that using these machine learning methods, we can identify people with rare diseases with low misclassification rate.
ST-339 : An Introduction to the Process of Improving a Neural Network
YuTing Tian, 7326090713
The top-level goal of this paper is to lay out a process for building a neural network in SAS®. It is hoped that a reader can use the process, shown in this paper, as a template for building a Deep Neural Network. A lower level goal is to build a network that can outperform a network in a paper by one of my professors. Deep learning is a kind of neural network and a specific kind of machine learning (e.g. artificial intelligence). Deep learning is a recent and powerful machine learning algorithm that enables a computer to build a multi-layer non-linear model. Even though deep neural networks are popular, not many papers discuss the overall process of building a neural network in SAS. This paper explores a practical application, associated with the process of deep neural network in SAS Enterprise Miner.
Strategic Implementation, Business Administration, Support ResourcesSI-017 : Avoiding Disaster: Manager's Guide on How to Rescue a Failing Outsourced Project
Dilip Raghunathan, Insmed
Outsourcing has become a large part of the business model for many pharmaceutical companies. It allows the sponsor to focus on its core competencies, keep their workforce lean and scale up when needed. However, an unfortunate by-product of this growing trend is occasional failure of the outsourced vendor to meet the timelines and/or quality of the deliverable. The failure to meet agreed objectives on a deliverable result in loss of time, money, resources, morale and strains the relationship between sponsor and the vendor. Furthermore, such failures in studies/projects that are critical to the organization end up severely hurting the chances of regulatory approval and affect time to market. In this paper, I will present a tested, step-by-step pragmatic approach from the perspective of the sponsor in identifying a failed outsourced project and provide a mechanism to rescue and/or salvage the work. I will also discuss ways to prevent such failures in the futures and increase the chances of success on outsourcing of critical studies/projects.
SI-062 : Get to the Meat on Machine Learning
Aadesh Shah, GSK
You've probably heard of machine learning and artificial intelligence, but are you sure you know what they are? If you're struggling to make sense of them, you're not alone. There's a lot of buzz that makes it hard to tell what's science and what's science fiction. For many of us, machine learning seems futuristic and scary. Recently, though, it's been showing up, as we hear many new presentations about machine learning at different conferences like Phuse, CDISC Interchange as well as the most famous PharmaSUG. YouTube know which videos you would like to watch in your home section, Facebook recommends local event in your area, also recommends friends, LinkedIn recommends you connect with your ex-boss. And while that's all exciting, some of us are still wondering what exactly machine learning is. This paper will walk you through the process basics, work in practice, Machine Learning vs Artificial Intelligence.
SI-075 : SHIONOGI Global SAS System Renewal Project - How to Improve Statistical Programming Platform
Yura Suzuki, Shionogi & Co., Ltd.
Yoshimi Ishida, Shionogi Digital Science
Malla Reddy Boda, Shionogi Inc.
Yoshitake Kitanishi, Shionogi & Co., Ltd.
SHIONOGI Global SAS (G-SAS) System is statistical programming platform to support Shionogi & Co., Ltd. (Japan) and Shionogi Inc. (US) to develop and validate statistical deliverables of clinical trials such as SDTM, ADaM and TLFs. G-SAS is the core system to collaborate programming activities between Japan and US efficiently, and it enables to operate around the clock. As increasing datasets size, multiple languages and various statistical analyses are causing insufficient memory and hard disk drive. And also to upgrade current SAS 9.2 version to SAS 9.4 version to a new physical server with higher specifications with 64-bit architecture, to handle both SAS versions, and larger datasets, SHIONOGI G-SAS System Renewal Project was established. In Addition, method of access control for specific folders post data base lock for double-blind study was reconsidered to control users in accessing analysis results including actual treatment code and potential un-blinded data to avoid the risk of insider trading. To avoid the potential risk, G-SAS Project Team has developed a SAS macro to remove/grant user groups from/to specific folders. This macro has reduced task for System Operation Team. As a result, SHIONOGI's global statistical programming platform has been improved to be able to handle large datasets, multiple versions of SAS and to control access rights to specific folders and users efficiently. G-SAS System continues to contribute to collaboration of statistical programming activities between multiple locations in SHIONOGI, and contribute to preparation of data submission to regulatory agencies of various countries including FDA and PMDA.
SI-099 : ISS Challenges and Solutions for a Compound with Multiple Submissions in Parallel
Aiming Yang, Merck & Inc
Integrated Summary of Safety (ISS) is required for new and supplemental drug or biologic applications. For an oncology compound with multiple indications and tens of ongoing trials, challenges seeking sound strategies and execution of integrating trials are numerous. In this paper, we share some challenges and successful executed solutions. There are two parts in the paper. In part I, we share the challenges and solutions of establishing a centralized ISS team. To address the needs of the multiple submissions and filings of this compound, we first established a dedicated ISS team. The specialization enables the same team to work on multiple ISS packages, with successive or concurrent timelines in turnarounds of several weeks. In part II, we share some essential established programming techniques. The ISS programs we use can accommodate multiple versions of SDTM source datasets and meet the needs of today's CDISC standards filing requirements. The presented techniques include 1) Stacking datasets in each SDTM version and then integrating all studies. The validated stacking programs are re-used; making the stacking of dozens of trials just a routine re-run that can be accomplished within one hour magically. 2) Alignment of the ISS implementation for consistency with the submission CSR and integration needs are discussed. The discussion includes details on integrating at the SDTM or ADaM level, baseline derivation, and general principal of essential variables derivations such as TR01SDT, TR01EDT, and TRT01A. It is satisfying seeing the team deliver again and again while working towards simplified, streamlined solutions.
SI-142 : Sponsor oversight: Proof is in the documents
Shailendra Phadke, Servier US
Increasing number of biotech and pharmaceutical companies are conducting clinical trials by either partially or completely outsourcing clinical trial activities to third party organizations (e.g. Contract Research Organizations (CROs) and Clinical Trial Units (CTUs)). There is a clear requirement according to GCP (Good Clinical Practices) that the sponsor must have systems and procedures in place to ensure adequate sponsor oversight. Failure to comply in this area can result in critical findings in regulatory inspections and also prevent the organization from sponsoring any further trials until the issues are resolved. This paper will discuss in detail about the different documents that the biostatistics and statistical programming team at the sponsor can use as evidence to prove sponsor oversight. These documents will help the sponsor in conducting oversight as well as will improve the inspection readiness of the sponsor. Examples of such documents are issue logs, QC plan, Standard checklists used to check quality of the deliverables, data review plan etc. This paper will also briefly discuss about good documentation practices for maintaining these documents and how they can improve the effectiveness of sponsor oversight and provide evidence to prove sponsor oversight.
SI-161 : Lessons Learned from Teaching 250+ Life Science Analytics (LSAF) Classes to Our Colleagues at Janssen Research and Development.
Margie Merlino, Janssen Research and Development
Jeanne Keagy, Janssen
In 2014 Janssen purchased Statistical Drug Development (SDD), later renamed Life Sciences Application Framework (LSAF), from SAS Institute and successfully implemented the system into their Data Management (DM) processes for converting raw clinical trial study data into SDTM datasets. The following year, Janssen expanded the use of LSAF to their Data Analysis (DA) processes for creating ADaM datasets, TFL's and ad-hoc analyses and simulations with study data. Prior to the use of LSAF, Janssen's DA group used Windows SAS and an in-house developed user interface to manage the activities related to clinical trial deliverables and ad-hoc analyses. Using LSAF for both DM and DA had many advantages such as both groups could access one central repository to view all study data, interim and database locks. However, despite the success of LSAF within DM, the DA transition to the LSAF environment did not go well. From 2016 through 2018, Jeanne and Margie taught over 250 LSAF classes to their co-workers at Janssen Pharmaceutical Research company. Jeanne conducted both the DM and DA training and Margie conducted DA training. What they will present are some of the technical and not-so-technical lessons learned from teaching students from a variety of technical and scientific backgrounds how to develop their SAS programs and jobs in the cloud-based LSAF environment.
SI-168 : Embedded Processes - evolving face of QUALITY in the world of Robotic Process Automation (RPA).
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifescience Analytics Pvt. Ltd.
Systems and processes have been the backbone of any emerging and existing organization in clinical domain. The password to live a disease-free life is the 'most wanted' i.e., 'drug'. In search of this 'most wanted' fugitive, drug companies are investing in their best possible detectives. Quality and compliance management is the key, ensuring subject safety and data integrity in our domain falling in line with the regulatory requirements. Evolution from a paper-based quality system to electronic systems led to the emergence of regulations and guidelines such as 21 CFR PART 11 and GxP. Evolving further towards Robotic process automation (RPA) - how will quality be perceived and implemented is the problem statement. RPA does not involve any form of physical robots instead it is the software robots which mimic human activities by interacting with applications in the same way that a person does. All will have the tools to configure their own software robots to put an end to automation challenges. Ethics, human resources will be important along risk and change management, feedback management and root cause analysis from a compliance oversight point of view. Transparency, cybersecurity, platform resilience are the critical risk areas requiring high impact controls. Enhancing process efficiency and efficient methodologies for compliance would be important in the phases of development, testing, deployment, integration and owning the process too. The focus of regulatory bodies, employment opportunities, scope for innovation and organizational strategy to embrace the new culture of 'embedded process' will be the focus of this paper.
SI-190 : Integrating programming workflow into computing environments: A closer look
Tyagrajan Swaminathan, Ephicacy Lifescience Analytics
Sridhar Vijendra, Ephicacy Lifescience Analytics
System-level workflows are well known to improve process compliance, increase traceability and enhance audit trails in various domains including document management. It is time programming workflows are made integral to the domain of statistical programming for clinical trials to enhance productivity, reduce management overhead, ease tracking of large programming deliverables produced by globally dispersed teams and consequently, reduce time-to-market for every drug. System-level workflows are mapped based on the corresponding real-world processes that they model but conventional statistical computing environments (SCEs) that most programmers are used to hardly allow for automatic record keeping of processes taking place within the environment. What is recorded in a disparate programming status tracker has scope for human error since there is a possible lag and/or a gap between what is done in the programming environment and what is mentioned in the status tracker. This is where modern-day web or cloud-based SCEs make a difference, by bringing in integrated customizable workflows to streamline statistical programming activities. There is a need for workflows that integrate with our programming environment to keep track of programming tasks and record their completion without convoluted user actions. But, is it possible to exactly mirror the real-world processes within such programming environments? Are human-error-prone processes suitable for machine-compatible checks or will we be left to force-fit workflows into systems that do not want to co-operate?
SI-243 : Dataset Specifications: Recipes for Efficiency and Quality
Dave Scocca, Rho, Inc.
Specifications are the backbone of dataset programming; quality specifications are the recipes with which we produce quality datasets. While we frequently talk about programming strategies and techniques, we spend much less time focusing on dataset specifications. Having high-quality specifications can greatly improve programmer efficiency and speed up validation. Developing specifications and programs hand-in-hand can reduce overall review time and avoid a great deal of unnecessary rework. Questions addressed in this presentation include: How do dataset specifications fit into the clinical trial process? What makes a high-quality dataset specification? How can we make the overall process of dataset production-from specifications through programs-as efficient as possible? Are there ways to standardize the specification development process?
SI-257 : Using freelancers in the programming world - Challenges and opportunities
Vijay Moolaveesala, PPD, Inc
Ever since the industry has opened its gates to use of global resource pool, more than a decade ago, I have been wondering when would we embrace widely the concept of using freelancers in the programming area. Industry has been using freelancers in the areas of clinical, statistical and scientific communications for some time. Even though it may not be wide spread, but the idea of using freelancers in these areas was given an ear and same can't be said about programming freelancing usage. But again, there are many challenges in using temporary work force whether it is onsite/offsite contractors or staff augmentation models and using freelancing resource options on top of these challenges can be more difficult one to navigate. At the same time, when industry is spending lot of time in recruiting, training and retaining top talent of programmers, every resourcing option should be on the table. Author would like to provide approaches and areas of services that could use freelancer's services more effectively and would provide frame work to make this model a sustainable one.
SI-292 : Throw Away the Key: Blockchain-ed Healthcare Data
Kathy Zhai, GSK
After following SAS blogs and other social media outlets that correspond to the latest pharmaceutical trends, the words "blockchain" and "bitcoin" have been prevalent. How are any of these words associated with healthcare and patient data? Like other programmers working in the pharma industry, my curiosity grew beyond just procs and data steps. Many can agree that the credibility of clinical outputs can be undermined by a plethora of common issues including incomplete, missing, or inaccurate data. After multiple layers of data manipulation, how is it certain that what is being submitted to publications is an undistorted version of the benefits and risks of these drugs? This paper will give the audience a glimpse of how blockchain technology, whose implementation is cryptographically validated by a network, has enough potential and momentum to emerge into the healthcare industry and stick around for quite some time.
SI-300 : Don't Just Rely on Processes; Support Local Subject Matter Expertise
Deidre Kreifels, Reata
Steve Kirby, Reata Pharmaceuticals
Mario Widel, Independent
Since the dawn of time (or at least since we started analyzing data from clinical trials) companies engaged in clinical research have developed processes designed to ease the burden of programmatic data review, data standardization and data analysis tasks. These processes typically include standards for input data, standards for outputs and code to get from one to the other. In many cases people who have a limited understanding of the process (or the subject matter) can quickly generate outputs that are right most of time. But how can we efficiently manage cases where the process does not produce accurate results? And how can we be sure that people following the process are able to evaluate if the results are accurate? A big part of the answer is ensuring that process users have sufficient subject matter expertise. If users understand (in detail) how the process works, and what it is designed to accomplish, they will be able to address most cases where the process does not work or produces inaccurate results. We will share a few representative examples of where subject matter expertise is needed to ensure a process works as intended and produces accurate results. We will also suggest training methods designed to help process users gain the subject matter expertise they need to be successful.
SI-320 : Vendor's Guide to Consistent, Reliable, and Timely CDISC Deliverables
Dharmendra Tirumalasetti, Vita Data Scieneces
Santosh Lekkala, Vita Data Sciences
Bhavin Busa, Vita Data Sciences (a division of Softworld, Inc.)
When working for a CRO or an FSP vendor, programmers have to work on multiple clinical studies across different Sponsor companies and various data collection systems. In addition, even though the submission standards are common across the industry, the requirements and expectations for CDISC deliverables could differ from one Sponsor to another. The differences could be based on multiple factors such as therapeutic areas, internal data standards, and study-specific needs. Also, the interpretation and handling of the data may differ between the Sponsor companies. If not understood and documented earlier by the vendor, the differences could cause re-work at a later stage which adds up to delayed deliverables, rework time and extra cost. To meet the Sponsor's specific needs and expectations, it is highly recommended to have and follow effective processes with-in the organization. This paper will describe processes we follow at our organization beforehand, during the specification development and programming of the CDISC datasets in order to achieve consistent, reliable and timely deliverables thus benefiting both the Sponsor and the vendor. This paper also provides details on the pre-processing steps that can be followed before writing the specification, quality checks and data handling during the development process of the datasets and informed notes during the delivery of datasets to the Sponsor to ensure expected outcome.
SI-327 : Begin with the End of Validation: Adapting QbD Approach in Statistical Programming to Achieve Quality and Compliance Excellence
Linghui Zhang, PRA Health Sciences
In many ways, quality and compliance are essential in the pharmaceutical industry. Statistical programming conducts the variety of activities of data management, analysis, and report in the entire data flow in drug development, including preclinical and clinical research, regulatory submission, and post marketing surveillance. Quality and compliance are both crucial components of statistical programming. To achieve high quality, validation process is developed to identify data issues, and is usually performed before data and data-related products are released finally. However, the validation process is time- and resource-consuming. It's quite challenge to fix the data issues after validation in complex clinical trial programs, under tough timeline and temporary workforce shortness. In order to "get it right first time", Quality-by-Design (QbD), a process oriented method was applied to manage risks in quality and compliance, and to advance product and process quality in statistical programming. QbD is a systematic approach to development that begins with predefined objectives and emphasizes product and process understanding and control based on sound science and quality risk management. QbD is also a regulatory expectation. FDA and ICH published several guidelines to direct pharmaceutical manufactures. The paper talks about adapting QbD approach in statistical programming to identify critical issues relevant to quality and compliance prospectively. The QbD elements and steps will be introduced and followed by the challenges of implementing QbD approach in statistical programming. This paper is designed to the programming leads and managers, but all levels of programmer can benefit from learning the QbD approach.
Submission StandardsSS-014 : China NMPA (National Medical Products Administration) reform and new regulations/guidelines/requirements
Yi Yang, Novartis
Historically the regulatory environment in China has been highly challenging. However, since 2015 China health authority started reforming the regulatory environment in China to bring China medical products up to international standards in terms of efficacy, safety and quality, so as to better meet the public needs for drugs as well as to improve the process of access to innovative drugs and therapies from global. Reforms are building smoother processes for innovative drug development in terms of adopting global standards and technical requirments, increasing review and approval transparency, accelerating new drugs review and approval. China health authority has refined old regulations to clearly define the requirements to clinical trial operation, multi-regional clinical trial design, biostatistics principles, electronic data capture, data management and statistical analysis reporting and on-site inspection. China health authority has also released guidelines to guide the drug development in terms of communication for drug development and technical evaluation, electronic common technical document implementation, post approval safety surveillance. China health authority has also released regulations including priority review & approval, data protection regime, imported drug registration and new chemical drug classification to encourage the innovative drug development. The reform of China regulatory environment is going on. We have seen more and more IND and NDA has been submitted for innovative drugs developed in local or global than ever since the reform. China regulatory will be further aligned with global standards and requirements. Simultaneous development and approval with the US and Europe can be achieved in the near future.
SS-027 : Progression-Free Survival (PFS) Analysis in Solid Tumor Clinical Studies
Na Li, Na Li Clinical Programming Services
Progression-free survival (PFS) is commonly used as a primary endpoint in Phase III of solid tumor oncology clinical studies. PFS is defined as the time from randomization or start of study treatment until objective tumor progression or death depending on study protocol. This paper describes PFS concept and its analysis methods following ADaM standard to generate ADDATES and ADTTE analysis datasets. This paper also discusses some of the challenges encountered to define progression event (PD) and censoring events. In addition, this paper explains some statistical methods that are commonly used to estimate the distribution of duration of PFS. Such methods include PROC LIFETEST procedure to provide Kaplan-Meier estimates and PROC PHREG to provide Hazard Ratio estimate.
SS-030 : Clinical Development Standards for FDA Bioresearch Monitoring (BIMO) Submissions
Denis Michel, Janssen R&D
Julie Maynard, Janssen R&D
The Food and Drug Administration (FDA) published the Bioresearch Monitoring Technical Conformance Guide in February 2018. The document provides specifications for clinical data submission by pharmaceutical companies used in planning of FDA Bioresearch Monitoring (BIMO) inspections. Three types of information are required: clinical study level information, subject level data line listings by clinical site, and a summary level clinical site dataset. The clinical study level information are PDF files related to the clinical trial. The subject level data line listings by clinical site are typically generated as SAS programmed PDF files. The guide states that FDA will be able to generate the listings in the future from submitted clinical datasets compliant with CDISC SDTM and ADaM standards. The summary level clinical site dataset is provided as a SAS V5 transport file named clinsite.xpt with a data definition table named define.pdf. This paper describes the effort to standardize the generation of clinsite.xpt across different therapeutic areas of a pharmaceutical company. Topics include standardizing the SAS variable attributes via metadata, importing clinical site information from Excel to SAS, handling special characters in imported data, and developing standard SAS macros and programs that process data in different formats.
SS-056 : Practical Guidance for ADaM Dataset Specifications and Define.xml
Jack Shostak, DCRI
The goal of this paper is to provide practical guidance for how to specify ADaM analysis datasets within the confines of define.xml nomenclature. This paper's guidance is presented in a tool and software agnostic manner. The audience of this paper is for staff that produce ADaM datasets and the associated define.xml file, who have little to moderate prior experience in doing so. Firstly, comparing and contrasting define file specifications versus programmer ETL specifications will be addressed in detail. Then, a series of practical issues will be explored with suggested routes of action. Guidance for object derivation length is presented, and when and how links should be employed for those objects. For Origin definitions, "derived" versus "assigned" is discussed at length as to which to use when. A suggested specification strategy for logically linked and synonymous items, such as PARAM/PARAMCD/PARAMN, is presented. There is often a discussion of how much parameter value level metadata is needed, so a discussion of that is presented. That is followed by discussing how you reconcile parameter level metadata with variable level metadata for the same objects. Guidance on when and why to create user defined controlled terminology is given. For date and datetime variables, a discussion around format selection is presented. Appropriate text for object derivations and comments is explored. Finally, suggested guidance is given for other miscellaneous issues around specifying ADaM datasets.
SS-068 : Enforcing Standards in an Organization: A Practical 6 Step-Approach
Priscilla Gathoni, AstraZeneca, Statistical Programming
Dany Guerendo Christian, STATProg Inc.
Is your organization struggling to enforce standards? Is complacency and siloed programming within functional units haunting your organization? This paper will explore a 6-step practical approach for organizations to assess standards using CDISC and FDA guidance, show the importance of standards, the possible repercussions to institute for lack standards adherence, show the importance of a gatekeeper for capturing standards adherence metrics, and finally present a generic macro adherence utility for checking the usage of standards in a study folder. A clear communication for the location and type of standards available within an organization will help eliminate excuses for not using standards. Additionally, an explicit message on the value that standardization brings in increasing efficiency; reducing the need for mundane tasks, and efficient resource utilization is explored. Further, creative methods for encouraging the use of standards are mentioned in the paper. Who is the best suited person to be the gatekeeper in your organization? We investigate the role of a gatekeeper, which is crucial in bridging the gap between the data acquisition stage and the practical implementation of standards. Consequently, several utilities whose main posit is to check the adherence of standards are available. We present a generic utility that can be adopted with a few adjustments allowable to complement your organization's platform. Furthermore, we propose that organizations assess periodically the effectiveness of the standards, tools, and utilities in use. In conclusion, we recommend that organizations utilize this 6-step approach and build on it to suit the organizational standards enforcement needs.
SS-083 : Define.xml Review : Failing to Plan is Planning to Fail
Robin Mann, GCE Solutions
Imagine You are in the market to purchase a well-publicized book by a renowned author. You got the book at the nearby bookstore and It has a nice outer cover with beautifully written synopsis on the back. You start going through the Table of Contents to get more idea about the contents of the book. Wait a minute, what is this? You notice that the Table of Contents is wrong with spelling mistakes and incorrect section titles with wrong page information. How would you feel? Would you purchase the book, or You now have doubts about the contents of the book? This is for sure the same sort of feeling that Regulatory Authority people have when they are provided with incorrect Define.xml. Define.xml is the Table of Contents of the submission package and most useful document that describes the structure and content of the data submitted. A properly created and well defined Define.xml document can improve the efficiency of the Regulatory Review process, making the Regulatory people and submission team happy, whereas a poorly created will hamper the speedy review with lot of cross questions. Creating Define.xml is a daunting task and Developers are prone to making mistakes in document. To avoid potential issues, proper review of Define.xml is required before the package is handed over to Regulatory Authorities. This paper discusses the approach for foolproof planning and execution of Define.xml review. This step by step approach can be very handy for detecting even the minutest of the errors.
SS-092 : A Practical Guide to the Issues Summary in the Data Conformance Summary of Reviewer's Guides
Gary Moore, Moore Computing Services, Inc.
In the Reviewer's Guides for SDTM (SDRG) and ADaM (ADRG), the Data Conformance Summary is based on validation tools like the Pinnacle 21 Validator (formerly OpenCDISC) and/or proprietary in-house validators. This paper will discuss the purpose of a Reviewer's Guide. Using real life examples, it will illustrate when issues should be resolved by data modifications and when it is appropriate to provide explanations for non-compliance. The paper will contain ideas on the proper way to describe non-compliance issues.
SS-117 : Have You Met Define.xml 2.0?
Christine McNichol, Covance
Define.xml is critical for a reviewer getting to know a study's datasets. The define.xml facilitates the building of this relationship with the study data from casual introduction through the innermost workings of the datasets. To provide the best possible define.xml for reviewer's use, it is important to be comfortable with define.xml and how it communicates information about the datasets. Though define.xml might be intimidating at first, fear not. Following the flow through define.xml to get to know the data is like getting to know a new friend. Define.xml reveals levels of information about the study datasets from structure to details about where the values came from, even linking to documentation of issues encountered and decisions made. The anatomy of a define.xml, its purpose, what is special about each section and what can be learned about the study datasets from each will be discussed. Meet define.xml - friend, not foe.
SS-120 : An Automated, Metadata Approach to Electronic Dataset Submissions
Janette Garner, Kite Pharma, A Gilead Company
In 2014, the Food and Drug Administration (FDA) provided guidance regarding Section 745A(a), an amendment to the Federal Food, Drug, and Cosmetic Act that requires regulatory submissions (eg, new drug applications [NDAs] or biologics license applications [BLAs]) to be submitted in electronic format. The guidance took effect at the end of 2016. This paper presents a metadata-driven solution that facilitates the generation of the dataset package for electronic dataset submission that is compliant with the FDA expectations based on published FDA guidance documents.
SS-126 : Considerations in Effectively Generating PK Analysis Input Datasets
Jianli Ping, Gilead Sciences Inc
Krishna Sivakumar, Gilead Sciences Inc
Generating high quality pharmacokinetic (PK) analysis input or PK merge dataset in a timely manner is critical for PK parameter generation and downstream programming on CDISC compliant PK/PP datasets. While there is guidance for SDTM PC/PP and ADaM ADPC/ADPP detailed in the CDISC implementation guides, there is a lack of general standards for creating PK analysis input dataset. The different formats of source data as captured from CRF and PK Lab can result in further challenges during PK merge dataset creation. In this paper, general approaches for PK merge dataset creation are proposed that can facilitate PK analysis and generation of PK CDISC compliant datasets and also as a source data for NonMEM programming. This paper also discusses the core PK merge dataset variables with commonly selected matrixes that should be included and their derivation through case studies and excerpts of SAS programs. Dataset checks after PK merge dataset generation are recommended, which can help identify possible data/programming issues.
SS-134 : Next Innovation in Pharma - CDISC data and Machine Learning
Kevin Lee, Clindata Insight
The most popular buzz word nowadays in the technology world is "Machine Learning (ML)." Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes. This is leading many organizations including drug companies to explore and implement Machine Learning on their own businesses. The presentation will discuss how Machine Learning can lead the next innovation in pharma with CDISC data. The presentation will start with the introduction of most innovative companies and how they innovate and lead the industry using Machine Learning and data. Then, the presentation will show how pharma should learn from them to innovate using Machine Learning and CDISC data. The presentation will also introduce the basic concept of machine learning and the importance of data. In the Machine Learning/AI driven process, data is considered as the most important component. 80 to 90 % of works in Machine Learning is preparing the data. Since FDA mandated CDISC standards submission, all the clinical trial data are prepared in CDISC SDTM and ADaM data format. The presentation will show how CDISC data will be the perfect partner of Machine Learning for the next innovation in pharmaceutical industry. Finally, the presentation will discuss how biometric department can prepare the next innovation and lead this data-driven Machine Learning process in pharmaceutical industry.
SS-153 : Exploring Common CDISC ADaM Conformance Findings
Trevor Mankus, Pinnacle 21
Having analysis data that is compliant with the CDISC ADaM standard is critically important for the regulatory review process. ADaM data are required to be provided in both FDA and PMDA submissions because the data allows those agencies to better understand the details of the performed analyses and reproduce the results for further validation. Validation of ADaM data is a primary focus for regulatory agencies so they can begin their review of the results. This presentation will review some of the more commonly occurring validation rules which were found across all of our customer data packages which were validated using our automated software and discuss potential reasons for why these rules fired.
SS-162 : Multiple Studies BIMO Submission Package - A Programmer's Perspective
Ramanjulu Valluru, Accenture
Harsha Dyavappa, Accenture
As support documentation of its Bioresearch Monitoring (BIMO) activities, FDA's Center for Drug Evaluation and Research (CDER) requests that sponsors of new drug applications (NDAs), biologics license applications (BLAs), and NDA or BLA supplemental applications containing clinical data provide the following three items: I. Clinical Study-Level Information II. Subject-Level Data Line Listings by Clinical Site III. Summary-Level Clinical Site Dataset. Recent papers (Singh Kahlon et al , Lin et al ), give details on how to create the BIMO submission package containing the three items above. This paper emphasizes multiple studies submissions cases when working with multiple studies to align the submission with FDA's requirements in these situations. It expands on how to create a single clinsite dataset; define.xml; and BIMO reviewer's guide instead of one version of each of these documents per study. Specific considerations are given to discuss about a single clinsite dataset; define.xml; and BIMO reviewer's guide used to submit this package in eCTD module 188.8.131.52 submission. We will share our experiences while supporting successful FDA applications for several therapeutic areas on multiple studies working on BIMO SITE LEVEL data.
SS-172 : Pharmacokinetic Parameters for Sparse and Intensive Sampling - Nonclinical and Clinical Studies
Shallabh Mehta, Pharmaceutical Product Development Inc. (PPD)
Sparse sampling is very common in toxicokinetic studies, where a single blood sample can be collected on a given study day from each animal in a treatment group. Similar case can be seen in a clinical study where not more than one sample can be taken from a human on each study or study day. The purpose of this paper is to present how the process of transforming pharmacokinetic (PK) parameters to Clinical Data Interchange Standards Consortium (CDISC) Standard for Exchange of Nonclinical Data (SEND) Pharmacokinetics Parameters (PP), can be used for CDISC Study Data Tabulation Model Version 1.5 (SDTM) PP, specifically how the pooled PK parameters are formatted to SEND and SDTM PP domains using SEND Implementation Guide 3.1 and SDTM Implementation Guide 3.2.
SS-217 : Process optimization for efficient and smooth e-data submissions to both FDA and PMDA
Eri Sakai, SHIONOGI & CO., LTD.
Malla Boda, Shionogi Inc.
Akari Kamitani, SHIONOGI & CO., LTD.
Yoshitake Kitanishi, SHIONOGI & CO., LTD.
The data submission of NDA to FDA has already been mandatory, application to PMDA will be also mandatory after April 2020. We need to consider and to construct the process for e-data submission to FDA/PMDA simultaneously with a single data package. SHIONOGI has headquarters in Japan, recently, we established group of companies in U.S. and Europe to promote drug development globally. As a result, we will be doing e-data submission to not only either FDA or PMDA but also both authorities simultaneously. Currently, we are preparing SDTM, ADaM datasets and the other related submission deliverables (e.g., define.xml, SDRG and ADRG) according to CDISC standards in all our clinical trials. We established global programming process to ensure compliance to CDISC standards. As we all know that, some rules for e-data submission are different between FDA and PMDA. Therefore, we prepare separate data package for the submission to FDA and PMDA. However, we think that the process is very inefficient. As a global company, we try to optimize the process to enable us to prepare only one package that meets the most common rules for both authorities from the viewpoint of the greatest common divisor or least common multiple. In that background, we would like to share our recent experience in establishing the global process for smooth e-data submissions to both authorities and the results. In addition, we suggest efficient way to communicate between Japan and U.S. team members that required in the project.
SS-226 : De-Identification of Data & It's Techniques
Shabbir Bookseller, Pune University at Pune
In the past few years, with the emergence of modern technologies in the image of big data, the privacy concerns had grown widely. Now a day's sharing data is much easier then saying Hello. De-identification is a tool that organizations can use to remove personal information from data that they collect, use, archive, and share with other organizations. For many reasons, not the least of which is patient privacy, any shared data must first be de-identified De-identification is not a technique of securing data, but a collection of approaches, algorithms, and tools that can be applied to various kinds of data with differing levels of effectiveness of an individual's privacy. In general, privacy protection improves as more aggressive de-identification techniques are employed, but less utility remains in the resulting dataset. De-identification is especially important for government agencies, businesses, and other organizations including healthcare industry that seek to make data available to outsiders. For example, significant medical research resulting in societal benefit is made possible by the sharing of de-identified patient information under the framework established by the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, the primary US regulation providing for privacy of medical records. This topic provides an overview of Data De-identification with Data Protection Procedures & its Techniques I.e. K-Anonymity, l-Diversity & t-closeness.
SS-240 : Sponsor Considerations for Building a Reviewer's Guide to Facilitate BIMO (Bioresearch Monitoring) Review
Kiran Kundarapu, Merck & Co., Inc.
Janet Low, Merck & Co., Inc.
Majdoub Haloui, Merck & Co., Inc.
CDER's Bioresearch Monitoring (BIMO) team has specific responsibility for verifying the integrity of clinical data submitted in regulatory applications and supplements, and for determining compliance of trial conduct in accordance to FDA regulations and statutory requirements. In the FDA Draft Guidance for Industry, CDER's BIMO inspectors and Office of Regulatory Affairs (ORA) identifies sites of interest from all major pivotal studies within the submission. BIMO released a Technical Conformance Guide (TCG) in 2018 to facilitate site selection and review, but gave limited information for sponsors to consider when building a BIMO Reviewers Guide. There is no available reviewer's guide template in industry. In addition, there is insufficient guidance on types of information that should be included in a reviewer's guide. This paper will share a suggested structure and considerations when authoring BIMO Reviewer's Guide.
SS-273 : The Need for Therapeutic Area User Guide Implementation
Michael Beers, Pinnacle 21
As regulatory agencies, specifically PMDA at the moment, begin to do cross-product analysis of their accumulated study data, the need for standardization increases. CDISC standards cover much of the data common to clinical trials, but gaps exist. Therapeutic Area User Guides (TAUGs) are created to fill some of these gaps. However, consistent implementation of these provisional guides is lacking, and this will impact the ability of regulatory agencies to analyze data across products. This paper will discuss some possible reasons for the slow adoption of TAUGs, attempt to show why it is increasingly important that the industry implement the TAUGs, and show how the implementation of the TAUGs could be enforced.
SS-275 : The Future with Define V2
Vishnu Bapunaidu Vantakula, Mr
Soujanya Konda, Miss
Defining the data with standards helps to orchestrate the data with clear traceability of source and Define XML guidelines laid a foundation for the same. Implementation of Define XML v2 guidelines for studies started after March 15th, 2018. This paper emphasizes the difference between Define.xml V1 versus the Define.xml V2, the impact of the recent version on ADRG (Analysis Data Reviewer's Guide). The aim is to increase the efficiency maintaining quality and by a consistent approach. Finally, the outcome is better CRT packages.
SS-291 : Information Requests During An FDA Review
Hong Qi, Merck & Co., Inc.
Lei Xu, Merck & Co., Inc.
Mary N. Varughese, Merck & Co., Inc.
Filing a marketing application is a pivotal and exciting milestone for the long-term effort of drug development. Even though there are industry standards to follow, submissions vary among drugs, indications, and sponsoring companies. Regardless of the extraordinary efforts in preparing the submission package, it is common for regulatory agencies to issue information requests (IRs) from the pre-supplemental biologics license application (BLA) meeting, during the approval review, and labeling process. IRs may arise from different aspects including the target indication, patient population, the drug safety profile, the reviewers' scientific interest on getting further information on the potential benefits of the medicine, additional case-report forms, and even the collected previous therapies not included in the ADaM datasets. This paper will discuss the data preparation and submission pertaining to IRs received from the pre-sBLA and during the sBLA review, the approaches we utilize, and the thoughts on future strategies.
SS-306 : Making Lab Toxicity Tables Less Toxic on Your Brain
Lindsey Xie, Kite Pharma, a Gilead Company
Jinlin Wang, Kite Pharma, a Gilead company
Jennifer Sun, Kite Pharma, a Gilead company
Rita Lai, Kite Pharma, a Gilead company
Processing and presenting lab data is always challenging, especially when lab limits are assessed in two directions. The lab data process becomes even more complicated when multiple baselines are required due to different analysis criteria or are inherent in the study design. This paper utilized ADaMIGv1.2 lab data bi-directionality guidance by using additional variables to make ADLB easily interpreted and related summary tables easily produced. This paper is based on the lab CTCAE toxicity grade summary by taking into account lab tests with abnormal assessment in either increased direction or decreased direction. In this paper, the authors explain and provide examples showing how ADaM IG v1.2 new variables ATOXDIR, ATOXGRH(L)N, BTOXGRH(L)N, SHIFTy, MCRITy, ADYPCATy, BASETYPE and DTYPE can be utilized and implemented appropriately. In addition, this paper explains how to handle the baseline toxicity grade in multiple baselines for the same collected data but different analysis set in ADLB.
SS-309 : Expediting Drug Approval: Real Time Oncology Review Pilot Program
Laxmi Samhitha Bontha, Lamar University
Health care industry aims to provide right treatment and immediate care for a patient. New drug approval in the United States take an average of 12 years from pre-clinical testing to approval with just the approval process averaging around two and half years. It is important to provide patients new, potentially lifesaving therapies at the earliest. To achieve this FDA launched Real Time Oncology Review (RTOR) pilot program which allows FDA to review data earlier, before the applicant formally submits the complete application. For a drug to be selected to be evaluated in the RTOR category it should meet criteria such as easily interpreted endpoints, straight forward study design, drugs showing substantial improvements over available therapy, drugs which have been given break through designation previously. The RTOR process is designed to take about 20weeks of time. Currently, the RTOR pilot program is being used for supplemental applications for already-approved cancer drugs. FDA could later expand the pilot to new drug applications and original biologic license applications for cancer drugs. The RTOR may encourage faster data publication and greater clarity of analysis. Patient, manufacturer and FDA are benefitted by this RTOR scheme. This paper discusses about how RTOR is carried out, challenges faced in a RTOR and why its being used in the oncology therapeutic area.
SS-317 : Non-Clinical (SEND) Reference Guide for Clinical (SDTM) Programmers
Dharmendra Tirumalasetti, Vita Data Scieneces
Bhavin Busa, Vita Data Sciences (a division of Softworld, Inc.)
The U.S. FDA now requires the use of standardized data submission, SEND (Standard for Exchange of Nonclinical Data), for non-clinical data. Many Sponsor companies have started preparing SEND datasets towards their upcoming submissions, although, they still lack the much needed expertise to get their data submission-ready. In our experience, one of the reasons could be due to lack of available resources/subject matter experts in the Non-clinical team with-in an organization. One of the solution to overcome the resourcing challenges is to utilize existing pool of Clinical (SDTM) Programmers. In this paper, our intent is to provide a quick reference guide for Clinical (SDTM) Programmers to develop SEND domains for non-clinical studies. We will present commonalities and differences between SDTM and SEND domains. In addition, we will summarize our experience and lessons learned with performing mapping and standardization of non-clinical legacy studies to make it submission-ready.
SS-318 : How to use SUPPQUAL for specifying natural key variables in define.xml?
Sergiy Sirichenko, Pinnacle 21
Define.xml must identify natural keys for each dataset to specify uniqueness for records and sort order. Sometimes standard SDTM/SEND variables are not enough to completely describe the structure of collected study data. In this presentation, we will show examples and provide recommendations on when it is appropriate to use SUPPQUAL variables in the natural key and when to use other common alternatives. We will also provide guidance on how to document SUPPQUAL natural keys in define.xml and the Reviewer's Guide.
SS-328 : Framework for German Dossier Submissions
Kriss Harris, SAS Specialists Ltd
When submitting your drug benefit assessments to the German Authority or other regulatory agencies, you need to provide your reports in a specific format. These reports are usually in a non-English language and the characters will be different from the English characters that you are used to. The characters will contain accents and other special characters, and the numerical results will have comma's where you expect decimal points to be, and vice versa. Usually to provide this report, a medical writer will use Microsoft Word to copy the results from a Clinical Study Report (CSR) into the document, and include the appropriate formatting and translations. The copying will have to be done very carefully, and this process is error prone and very exhaustive. Also the German Authorities may have other follow up questions, and so following this process is inefficient. There is a way to make the process more automated and this paper will demonstrate those methods. Firstly this paper will show you a framework you can use for submitting a German Dossier, such as the time-to-event macro's, subgroup analyses and the processes that you can use to get your data in the right format. Secondly, this paper will show you how you can output the results in the exact format needed for the German Dossier's. You will be shown how to use encoding to read and write special characters, and how to use Proc Report along with the style attributes to get your outputs in the correct format needed.
SS-331 : A Standardized Data Sample: Key to Improving the Submission Strategy
Prafulla Girase, Biogen
Joanna Koft, Biogen
As mentioned in the FDA's study data technical conformance guide, the agency offers a process for submitting sample standardized datasets for validation. Although sample submissions are tests only and not considered official submissions, they can be of a great value to sponsors in various ways. This paper is based on sponsor's practical experience of submitting sample submissions on five different compounds which are currently approved therapies in the market. The paper will walk through the key parts of a sample submission as well as how to plan and implement one. The paper will also discuss excerpts from the regulatory feedback received on sample submissions and how it helped in continuous improvement of sponsor's submission strategy.
SS-334 : The Anatomy of Clinical Trials Data: A Beginner's Guide
Venky Chakravarthy, BioPharma Data Services
We are so pre-occupied with our own little programming world that we often forget that we are part of a long complex process in the discovery and development of drugs. In this gentle introduction to the Pharmaceutical world, you will learn about the various stages of drug development. You will also get to know the complications in drug development and the chances of a drug receiving Food and Drug Administration (FDA) approval. This is a US specific regulatory agency, but the process is similar for other agencies like the European Medicines Agency (EMA) or the Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. You will then learn about the Human Trials stage where the bulk of SAS Programming occurs. At this stage, you will get to know the types of documents that a programmer needs to familiarize. Next comes SAS data followed by a review of the evolution of standards and the current data standards. You will get an appreciation of these data standards in analyzing data and generating SAS reports.
SS-337 : Do-It-Yourself CDISC! A Case Study of Westat's Successful Implementation of CDISC Standards on a Fixed Budget
Rick Mitchell, Westat
Rachel Brown, Westat
Jennifer Fulton, Westat
Marie Alexander, Westat
Stephen Black, Westat
Do you find the prospect of implementing CDISC within the confines of a tight timeline and a fixed budget to be overwhelming? Westat, a premier research organization, offers the continuum of support services, such as data management, data analysis, regulatory affairs, and site monitoring for clinical research studies. Clinical research is dynamic, and it is imperative to comply with ever-changing federal guidance and regulations. This paper describes how Westat's statisticians, programmers, and data managers approached the implementation of CDISC standards on a fixed budget; discussing how we overcame challenges by creating tools and processes to improve efficiency, developing training programs, and reimagining how project teams collaborate to ensure data sets submitted to the FDA are compliant.
SS-343 : Panel Discussion: Recent Submission Experiences around the World
Carey Smoak, S-Cubed
Kriss Harris, SAS Specialists Ltd
Marianne Caramés, Novo Nordisk
Yi Yang, Novartis
David Izard, Chiltern
Karin LaPann, Shire
Heather Archambault, Urovant Sciences, Inc.
Global submissions are of interest to pharmaceutical companies that plan on submitting their product to more than one regulatory authority such as the FDA (US), PMDA (Japan), EMA (Europe), NMPA (China) and others. Various pharmaceutical companies have done presentations at conferences on this topic of global submissions. One company tried to do a global submission to the FDA and PMDA and concluded that it was not possible to do one global submission that would work for both the FDA and PMDA. The different requirements by the FDA and PMDA necessitated two separate submissions to be done. The question of global submissions has been brought up at conference with representatives from the FDA and PMDA present. In each case, the representatives from the FDA and PMDA said to ask them questions when attempting to do a global submission. Thus, regulators may be willing to flex on their requirements to help companies that want to do a global submission. The panel will share their experience and audience participation is encouraged.