Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2019 will feature over 200 paper presentations, posters, and hands-on workshops. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 12-Jun-2019.
|Paper No.||Author(s)||Paper Title (click for abstract)|
|FDA-G1||Sara Jimenez||Study Data Topics at FDA/CDER|
|FDA-G2||Elaine Thompson||CBER Data Standards Update|
|FDA-G3||Helena Sviglin||An Overview of How FDA Business Rules, FDA Validator Rules, and Others Fit Together|
|FDA-G4||FDA Panel Discussion: Evidence, Data and Review: Continuing the Discussion with FDA|
|FDA-K||Benjamin Vali||Keynote Address: Regulatory Submissions in the PDUFA Era|
|Paper No.||Author(s)||Paper Title (click for abstract)|
|FUF-01||Panel Discussion: Future Forum|
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Sanjay Matange
|Developing Custom SAS Studio Tasks for Clinical Trial Graphs|
|HT-067||Vince DelGobbo||Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You|
|HT-089||Sanjay Matange||Build Popular Clinical Graphs using SAS|
|HT-145||Kevin Lee||Hands-on Training for Machine Learning Programming|
& Mario Widel
|Value-Level Metadata Done Properly|
|HT-177||Bill Coar||Sample Size Determination with SAS® Studio|
& Kelly O'Briant
|Creating & Sharing Shiny Apps & Gadgets|
& Richann Watson
|HT-347||Charu Shankar||The Shape of SAS® Code|
Leadership and Career Development
Real World Evidence
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Andrea Coombs
|Using Real-World Evidence to Affect the Opioid Crisis|
& Kriss Harris
& Seydou Moussa COULIBALY Coulibaly
|Stratified COX Regression: Five-year follow-up of attrition risk among HIV positive adults, Bamako|
|RW-199||Youngjin Park||Patterns of risk factors and drug treatments among Hypertension patients|
|RW-232||Charan Kumar Kuyyamudira Janardhana||Artificial Intelligence and Real World Evidence - it takes two to tango|
|RW-238||Srinivasa Rao Mandava||Innovative Technologies utilization in 21st Novel Clinical Research programs towards Generation of Real World Data.|
|RW-310||Jennifer Popovic||Real-world data as real-world evidence: Establishing the meaning of data as a prerequisite to determining secondary-use value|
|RW-345||Karen Ooms||Applications and Their Limitations of Real-World Data in Gene Therapy Trials|
Reporting and Data Visualization
Statistics and Analytics
Strategic Implementation, Business Administration, Support Resources
Advanced ProgrammingAP-001 : Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement
Louise Hadden, Abt Associates Inc.
Tuesday, 3:30 PM - 3:50 PM, Location: Franklin 3
An organized directory structure is an essential cornerstone of data analytic development. Those programmers who are involved in repetitive processing of any sort control their software and data quality with directory structures that can be easily replicated for different time periods, different drug trials, etc. Practitioners (including the author) often use folder and subfolder templates or shells to create identical complex folder structures for new date spans of data or projects, or use manual processing or external code submitted from within a SAS® process to run a series of MKDIR and CHDIR commands from a command prompt to create logical folders. Desired changes have to be made manually, offering opportunities for human error. Since the advent of the DLCREATEDIR system option in SAS version 9.3, practitioners can create single folders if they do not exist from within a SAS process. Troy Hughesi describes a process using SAS macro language, the DLCREATEDIR option, and control tables to facilitate and document the logical folder creation process. This paper describes a technique wrapping another layer of macro processing which isolates and expands the recursive logical folder assignment process to create a complex, hierarchical folder structure used by the author for a project requiring monthly data intake, processing, quality control and delivery of thousands of files. Analysis of the prior month's folder structure to inform development of control tables and build executable code is discussed.
AP-018 : Ensuring Programming Integrity with Python: Dynamic Code Plagiarism Detection
Michael Stackhouse, Covance
Monday, 2:00 PM - 2:20 PM, Location: Franklin 3
Integrity in your team's programming is of the utmost importance, not only when preparing for a regulatory submission, but simply to ensure the quality of the analysis throughout your drug's development. A common validation technique used throughout the industry is double programming - where two programmers work independently to obtain identical results. But how do you guarantee independence? And what if you suspect programming independence was violated? Linux tools like diff may not be sufficient to identify harmful similarities. This paper will explore a Python based tool that can bring these issues to light. The tool is dynamic enough to locate similarities at any point in your programs, and can allow flexibility for minor syntactic changes, such as dataset name changes or reordering of statements. All results are gathered into easily reviewable files to help ensure that your team's work can uphold the integrity and reputation that you rely on.
AP-038 : One Macro to Produce Descriptive Statistic Summary Tables with P-Values
Rajaram Venkatesan, Cognizant Technology Solutions
Harinarayanan Gopichandran, Cognizant Technology Solutions
Monday, 1:30 PM - 1:50 PM, Location: Franklin 3
In clinical trial reporting, the most popular type of tables is those that have descriptive statistics (n, mean, SD, median, min and max, CI, and p-values) or tables having a frequency (%) count and descriptive statistics of categorical and continuous variables. These are the bread and butter of reporting. However, producing these tables is simple yet trivial, and sometimes cumbersome and time-consuming, as many variables and many conditions might be requested. The solution is to create a simple macro and easy to understand macro, which allows the user to develop and produce descriptive summary tables within minutes. This can be used to produce or validate most safety tables without any problems. This allows users to create many types of tables (demographics and baseline characteristics, laboratory, vital signs, and ECG data) with minimal effort. It also means that when statisticians would like to change the table afterward, it can be done with the minimum of effort. This will not only save a lot of time but also improve quality.
AP-042 : Excel in SAS
Charu Shankar, SAS
Monday, 3:00 PM - 3:20 PM, Location: Franklin 3
Excel and SAS are universally loved. Both have their strengths. Excel has been around a long time and many non SAS users use the spreadsheet to enter & manage their transactions. SAS is great for analyzing data. Why not marry the strengths of both? get data from excel into SAS, complete the analysis in SAS and then send the results to excel. This would be a great help for colleagues who don't have SAS on their desktops. Come learn the many ways to get Excel to SAS. From excel and data from SAS to excel. This session will cover the following and much, much more 1. PROC IMPORT - to read excel into SAS 2. SAS Access engines to read Excel into SAS 3. ODS Tagsets - take sas to Cool excel pivot tables 4. PROC EXPORT - export SAS to excel Come watch some magic in the shaping of a pivot table right before your eyes in this session!
AP-047 : Confessions of a SAS PROC SQL Instructor
Charu Shankar, SAS
Tuesday, 2:30 PM - 3:20 PM, Location: Franklin 3
After teaching at SAS for over 10 years to thousands of learners, this instructor has collected many best practices from helping customers with real-world business problems. Hear all about her confessions on making life easy with mnemonics to recall the order of statements in SQL. Learn about the data step diehard user who now loves SQL thanks to this little known secret gem in PROC SQL. Hear about the ways in which ANSI SQL falls short and PROC SQL picks up the slack. In short, there are many confessions and so little time. session is open to all interested in improving their SQL knowledge and performance.
AP-071 : Ushering SAS Emergency Medicine into the 21st Century: Toward Exception Handling Objectives, Actions, Outcomes, and Comms
Troy Hughes, Datmesis Analytics
Monday, 3:30 PM - 3:50 PM, Location: Franklin 3
Emergency medicine comprises a continuum of care that often commences with first aid, basic life support (BLS), or advanced life support (ALS). First responders, including firefighters, EMTs, and paramedics, are often the first to triage the sick, injured, and ailing, rapidly assessing the situation, providing curative and palliative care, and transporting patients to medical facilities. Emergency medical services (EMS) treatment protocols and SOPs ensure that, despite the singular nature of every patient as well as potential complications, trained personnel have an array of tools and techniques to provide varying degrees of care in a standardized, repeatable, and responsible manner. Just as EMS providers must assess patients to prescribe an effective course of action, software too should identify and assess process deviation or failure, and similarly prescribe its commensurate course of action. Exception handling describes both the identification and resolution of adverse, unexpected, or untimely events that can occur during software execution, and should be implemented in SAS® software that demands robustness. The goal of exception handling is always to reroute process control back to the originally intended process path that delivers full business value. When insurmountable events do occur, however, exception handling routines should instruct the process, program, or session to gracefully terminate to avoid damage or other untoward effects. Several exception resolution paths exist (in addition to program termination) that can deliver full or partial business value. This text demonstrates these paths and discusses various internal and external modalities for communicating exceptions to SAS users, developers, and other stakeholders.
AP-074 : Where Is the Information We Are Looking For?
Jeff Xia, Merck & Co., Inc.
Monday, 4:00 PM - 4:20 PM, Location: Franklin 3
Finding information in an efficient and effective way is essential in many challenging situations, such as responding to agency requests after a submission. Many times, the assigned programmer might not have prior knowledge about a given study involved in the request due to business or resource constrains. It becomes an increasingly important business need to enable programmers to get a clear understanding in the data flow in a fast and timely manner, including all the steps from the collected data in database to the final information in a completed CSR. This paper presents 4 innovative methods to programmatically find the information of interest. 1) Finding files of interest in a folder and its subfolders, the file name can be partial, or in a pattern; 2) Finding all records and fields that contain certain text strings in all datasets in a SAS library, it becomes handy when we need to understand the SDTM mapping from raw datasets, etc.; 3) Finding a variable name in all datasets in a SAS library, it is useful when working on legacy studies where variables do not follow CDISC standard naming convention; 4) Finding a text string in all text files including subdirectories, this is extremely useful when there is a need to locate a certain CSR table out of hundreds tables, or understand which SAS program was used to generate a specific table, which macros were involved. These 4 methods have been widely utilized in our daily programming activities, which significantly improved our capability in working on time-sensitive tasks such as agency request.
AP-114 : The Art of Defensive Programming: Coping with Unseen Data
Philip Holland, Holland Numerics Ltd
Tuesday, 9:00 AM - 9:50 AM, Location: Franklin 3
This paper discusses how you cope with the following data scenario: the input data set is defined in so far as the variable names and lengths are fixed, but the content of each variable is in some way uncertain. How do you write a SAS program that can cope appropriately with data uncertainty?
AP-116 : Excel with MS Excel and X Commands: SAS® Programmers' Quick Tips Guide to Useful Advanced Excel Functions and X Commands.
Shefalica Chand, Seattle Genetics, Inc.
Monday, 4:30 PM - 4:50 PM, Location: Franklin 3
SAS® is the bread and butter for clinical/statistical programmers in the pharmaceutical and biotech industry. Nevertheless, there are many functionalities, tools, and resources that can be used in conjunction with SAS® to facilitate a better user experience for biostatistics and other cross-functional partners. In this paper, we will share some useful advanced MS Excel functions like VLOOKUP, HYPERLINK, CONCATENATION, etc., to improve efficiency, remove ambiguity and redundancies, and enhance accuracy by eliminating manual intervention (for example, creating specification, tracking sheet, managing review comments, and tracking resolutions and implementations). Here is a sneak-peek into these Excel functions: VLOOKUP: This function helps search values based on a particular column (index column) from one Excel sheet to another by matching common values in the index columns. HYPERLINK: This function can help create hyperlinks to external files. CONCATENATE: This function combines values from multiple cells and/or text strings. Can be separated by comma or range of Excel cells can be selected. We will also discuss the SAS® X Commands that can help make some of the manual processes more accurate and simpler, for example, using X Commands to copy files from one location to another, automatically create folders/directories at a given location, quickly convert SAS® program files to TXT files as a bulk action, and remove files or folders, and a few other similar helpful actions. In addition, we will explore practical implementation of these tools and resources in our industry, environment, and day-to-day activities.
AP-123 : Automation of Review Process
Shunbing Zhao, Merck & Co., Inc.
Monday, 5:00 PM - 5:10 PM, Location: Franklin 3
Statistical outputs for clinical study report are produced as separate tables, listings and figures (TLFs) in the form of Rich Text Format (RTF). The output review process usually involves a huge amount of TLFs and all kinds of comments back-and-forth. It becomes more and more challenging to track the status of each individual comment from reviewers. Very often a few important comments are left unattended or overlooked until very late stage. This paper presents a few useful SAS macros to streamline the review process in a well-controlled way: Firstly, a SAS macro was developed to combine all outputs in RTF format into an single RTF file for reviewers to comment, the original RTF file name is kept on the header of each page of the combined file for later reference. Secondly reviews can use the MS Word "Review" feature and put comments freely on each page as necessary. Lastly another SAS macro was developed to extract all comments on each page of the combined file into a well formatted Excel spreadsheet, which has a separate column for each applicable field, i.e., the comment itself, reviewer's name, page number, and the RTF file name. These information is very useful to identify the stakeholder and locate corresponding table, listing or graph for a given comment, as well as to track the status of each individual comment. This innovative approach greatly improves the communication among stakeholders, and significantly optimizes statistical review process.
AP-125 : Auto-Annotate Case Report Forms with SAS® and SAS XML Mapper
Yating Gu, Seattle Genetics, Inc.
Tuesday, 8:00 AM - 8:20 AM, Location: Franklin 3
A common approach to annotating blank case report forms (CRFs) is to manually create and draw text boxes in PDF, and then type in the annotations that document the location of the data with the corresponding names of the SDTM datasets and the names of those variables included in the submitted datasets. The CRF design is similar across many studies, and there are usually only minor updates in different CRF versions within studies, however, it is very time consuming to manually copy and paste annotations across CRFs. Therefore, an approach that could help auto-annotate a new blank CRF based on an existing annotated CRF could save a tremendous amount of time and effort. This paper introduces a method to import annotations from an annotated CRF to SAS using SAS XML Mapper, manage annotations, and incorporate annotations back into a new blank CRF using SAS. Some relevant programming details and examples will be provided in this paper.
AP-143 : PROC SORT (then and) NOW
Derek Morgan, PAREXEL International
Tuesday, 8:30 AM - 8:50 AM, Location: Franklin 3
The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than the IBM procedure SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data. These days, PROC SORT has fallen out of favor; after all, PROC SQL enables merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: Is the SORT procedure still relevant to any other than the SAS novice or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear "yes". PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper discusses those enhancements.
AP-154 : Unleash the Power of Less Well Known but Useful SAS(r) DATA Step Functions
Timothy Harrington, SAS Programmer
Tuesday, 10:00 AM - 10:50 AM, Location: Franklin 3
SAS Version 9.4 has over 600 documented SAS functions and procedures. Some SAS users may have grown accustomed to using only a few of the more widely well-known of these, and have had to invest significant time and extra effort manipulating code to achieve a desired result. This paper describes a selection of these SAS functions and how they can be used to obtain specific results with minimal coding. Solutions to a variety of programming challenges are demonstrated and the advantages of these techniques are discussed and compared to more traditional means. The functions included here are available in SAS v9.4, some may not be available in earlier versions.
AP-156 : Generate Customized Table in RTF Format by Using SAS Without ODS RTF - RTF Table File Demystified
Kai Koo, Abbott Vascular
Tuesday, 5:00 PM - 5:20 PM, Location: Franklin 3
ABSTRACT Rich Text Format (RTF), developed and maintained by Microsoft, is a popular document file format supported by many word processors in different operating systems. Because RTF data file is human-readable, experienced programmers can write their own RTF document, including tables and graphs directly based on RTF specification. Although skilled SAS users can generate high quality RTF tables through ODS RTF and related procedures (e.g., PROC REPORT/TEMPLATE), tables in RTF format can also be produced by SAS DATA step without any ODS RTF codes involved. A SAS DATA step-based macro for RTF table preparation will be introduced in this article. The basic concept of coding tables in RTF format will be explained. The advanced techniques used in this macro for setting margins, header, footer, title, footnote, font types, highlight, border, column width, cell merging, and different approaches of inserting/embedding image file will also be demonstrated. SAS users can use those presented concepts and logics to develop their own SAS codes or even macro as an ODS RTF alternative to prepare customized tables for thesis writing, presentation, publication, and regulatory submission.
AP-157 : Practices in CDISC End-to-End Streamlined Data Processing
Chengxin Li, Luye Pharma Group
Tuesday, 11:00 AM - 11:20 AM, Location: Franklin 3
From programming perspective, principles and practices in the end-to-end streamlined data processing are described under CDISC umbrella. Besides compliance, there are several common practices across the clinical data processing lifecycle: traceability, controlled terminology, end-in-mind philosophy, structured design, and reusable solution (auto-generations for SDTM, ADaM, and TLF). The components of end-to-end streamlined data processing are also introduced: data collection, SDTM transformation, ADaM development, and TLF generation. The ISS/ISE programming model, MDR, and production harmonized with submission are depicted as well. To illustrate concepts, two examples are discussed, one on a specific data element (AE start date), another on efficacy analyses in pain therapeutic area. The end-to-end streamlined data processing with CDISC is the optimized programming model to achieve deliveries in high quality and high efficiency.
AP-158 : Six Useful Data Tool Macros
Ting Sa, Cincinnati Children's Hospital Medical Center
Tuesday, 1:30 PM - 2:20 PM, Location: Franklin 3
In this paper, six macros are introduced that can work as helpful tools for some common data tasks. The macro "HelpConsistency" can detect those fields that have same name but different data lengths or data types among the SAS data sets and fix the data length inconsistencies automatically. The macro "SelectVars" can select any variables in batches from a SAS data set, for e.g, the variables have similar naming patterns like the same suffix or the middle parts. The macro "ExportExcelWithFormat" can export formatted SAS data to excel files without losing the formats. The macro "FindFiles" can help users find and access their folders and files very easily. The macro "SearchReplace" can search and replace any string in the SAS programs. The macro "checkCharVarType" can provide more information for your character variables, like if the variable only contains numeric values, or contains no values or contains date and datetime values. The paper includes the SAS codes for all these six macros.
AP-187 : Tidyverse for Clinical Data Wrangling
Phil Bowsher, RStudio Inc.
Tuesday, 11:30 AM - 11:50 AM, Location: Franklin 3
RStudio will be presenting an overview of the Tidyverse for the R user community at Pharmasug. This is a great opportunity to learn and get inspired about new capabilities for clinical data wrangling in R. No prior knowledge of R or RStudio is needed. This short paper will provide an introduction to flexible and powerful tools for managing data as part of your research and reporting. The paper will provide an introduction to data processing with R and include an overview of the packages dplyr, magrittr, tidyr and ggplot2 with applications in drug development. A live environment will be available for attendees to explore the visualizations real-time.
AP-189 : From Static ggplot2 Output to Interactive Plotly Visualization to Shiny App
Kelly O'Briant, RStudio
Wednesday, 11:00 AM - 11:50 AM, Location: Franklin 3
RStudio will be presenting an overview of creating interactive adverse events visualizations, dashboards and Shiny Apps. The session will cover the data visualization landscape in R which includes packages like ggplot2, crosstalk, plotly and various htmlwidgets. Adding interactive visualizations to R Markdown reports and Shiny apps will be covered with applications in drug development. RStudio will be showcasing several compelling examples as well as learning resources. As part of the short course, some available research-related R Shiny apps and R Markdown reports will be illustrated. Data from the OpenFDA APIs, including adverse events, will be used for the session visualizations. Moreover, RxNorm (created by the U.S. National Library of Medicine (NLM)) to provide a normalized naming system for clinical drugs) will be used for drug analytics. A live environment will be available for attendees to explore the visualizations they create real-time.
AP-191 : Creating In-line Style Macro Functions
Arthur Li, City of Hope
Monday, 2:30 PM - 2:50 PM, Location: Franklin 3
Macro functions that are used in our program are defined by the macro facility. By providing values for the function parameters, the macro function generates a result. We can insert the result directly into a macro statement in our program. Programmers seldom know that we can create a user-defined macro function as well. This paper will focus on the methods of creating an in-line style macro function via various examples.
AP-205 : Quick Tips and Tricks: Perl Regular Expressions in SAS
Pratap Kunwar, EMMES
Jinson Erinjeri, EMMES
Monday, 5:15 PM - 5:25 PM, Location: Franklin 3
Programming with text strings or patterns in SAS® can be complicated without the knowledge of Perl regular expressions. Just knowing the basics of regular expressions (PRX functions) will sharpen anyone's programming skills. Having attended a few SAS conferences lately, we have noticed that there are few presentations on this topic and many programmers tend to avoid learning and applying the regular expressions. Also, many of them are not aware of the capabilities of these functions in SAS. In this presentation, we present quick tips on these expressions with various applications which will enable anyone learn this topic with ease.
AP-212 : Python-izing the SAS Programmer
Mike Molter, Wright Ave Partners
Tuesday, 4:00 PM - 4:20 PM, Location: Franklin 3
More and more, Biostats departments are contemplating and asking questions about converting certain clinical data and/or metadata processing tasks to Python. For many who have spent a career writing SAS code, the learning curve for this vast new frontier may appear to be a daunting task. In this paper we'll begin a gradual transition into Python data processing by looking at the Python DataFrame, and seeing how simple SAS Data Step tasks such as merging, sorting, and others work in the Python environment.
AP-216 : Using SAS ® ODS EXCEL Destination "Print Features" to Format Your Excel Worksheets for Printing as You Create Them.
William Benjamin, Owl Computer Consultancy LLC
Wednesday, 10:00 AM - 10:50 AM, Location: Franklin 3
This presentation will demonstrate features of the SAS® ODS EXCEL Destination that allow you to use various ODS EXCEL sub-options to prepare a worksheet to be printed, as it is written out by SAS. The sub-options that impact page level features are the following. "Choose to print in color or Black and White only"; "Center the output either horizontally or vertically"; "Print in draft quality or normally"; "Print horizontally or vertically"; "Arrange the data across the page differently"; "Select where to put the output"; "Print headers, footers, and row breaks"; "Adjust the size of headers and footers"; "Adjust the print scale to improve the look of lines in graphs"
AP-241 : Minimally Invasive Surgery to Merge Data
Tuesday, 4:30 PM - 4:50 PM, Location: Franklin 3
Many professional papers on the topic of combining data sources have been presented at SAS User conferences over the decades. Most focus on the issue of combining data through the Data Step (set, merge, update) or on joining in SQL. However, there is another, completely different technique using SAS Component Language functions that is optimal for a small range of applications. This paper will present the technique (used in a Data Step and in macro language), explain how it differs from traditional data combination techniques, and offer specific examples illustrating where it might be useful. The technique can also be used to extract either metadata or actual data from SAS data sources. Former developers of SAS/AF will recognize the technique; Base and macro programmers will be amazed!
AP-290 : PROC SQL to the Rescue: When a Data Step Just Won't Do Anymore
John O'Leary, Department of Veterans Affairs
Wednesday, 9:00 AM - 9:20 AM, Location: Franklin 3
This paper is written for beginning and intermediate Base SAS® users who are more comfortable using the traditional data step but would benefit by the efficiencies gained by PROC SQL particularly when it is used explicitly. When processing datasets with 100 million records or more, there reaches a point when a programmer is forced to try new tools if they want to get home in time for supper. This paper provides some PROC SQL examples and other time-saving programming tips with emphasis on the SQL Procedure Pass-Through Facility. Although PROC SQL can be run implicitly within the SAS® Application Server, it also can interface directly with various relational database management systems (RDBMS) such as ORACLE, DB2, MySQL and Microsoft® SQL Server so that the SQL code runs directly on the RDBMS, thereby reducing the amount of data that needs to be transferred back to SAS®. The examples for this paper are limited to the syntax applicable to Microsoft® SQL Server and require the installation of SAS/ACCESS® Interface to Microsoft® OLEDB.
AP-298 : Automating SAS Program Table of Contents for Your FDA Submission Package
Lingjiao Qi, Statistics & Data Corporation
Bharath Donthi, Statistics & Data Corporation
Wednesday, 9:30 AM - 9:50 AM, Location: Franklin 3
To submit a complete and compliant data package to the FDA for product approval, the submission must include the SAS programs that generated the analysis datasets, tables, and figures. Including these programs in the submission package helps the FDA data reviewers to understand the process by which the variables for the respective analyses were created, and to confirm the analysis algorithms. Organizing these SAS programs into a Table of Contents is highly recommended as it serves as an easy reference for the FDA reviewers, verifies that each expected file is included in the submission, and provides the FDA with easily-accessible details for each program. The Table of Contents is usually compiled manually by reviewing each individual program and typing the required information into a word processor. This time-consuming process is inefficient and error-prone. To increase accuracy and efficiency, we have developed an in-house macro tool to automatically generate a Table of Contents by reading each submitted SAS program and its associated files. This easy-to-use macro tool can be fully executed within SAS, and it dramatically reduces documentation preparation time. To produce a complete and detailed Table of Contents, the macro tool extracts from the submitted programs: metadata (program name, size, descriptions, etc.), input datasets, output file names, and macros. This paper will provide a detailed description of our time-saving macro tool to assist SAS users in automatically generating the Table of Contents for their FDA data submission packages.
Application DevelopmentAD-032 : Extending the umbrella of compliance: LSAF with R Distributed Computing
Ben Bocchicchio, SAS
Sandeep Juneja, SAS
Rick Wachowiak, SAS
Monday, 10:30 AM - 10:50 AM, Location: Salon C
R studio is being utilized today to perform more and more highly specialized analysis in the Health and Life Industry. The R user's ability to 'pull' the latest package from a web-based R repository makes it easy to develop with the latest R code packages. To provide repeatability for R coding, once the R code is production ready, the R code and data are stored in Life Science Analytical Framework under version and audit control. Users can extend LSAF's compliant environment to include R code processing on a remote system, all that is needed is the LSAF API and a deployed web service. The process works by running a SAS program in LSAF and it passes instructs to the remote R server through the web service to pull the R code and data from LSAF to the remote server. A batch file then runs the R code in RStudio. Once the R processing is complete, the execution log and output is copied back into LSAF via the API. The BAT/script file then cleans up any local files that were generated on the remote server. The LOG that is posted back to LASF will contain the results of the entire process of the run: the version of the data used, the version of the R code used along with the versions of R packages used to generate the output. With this all this information, the output can be re-generated at a later date supporting the needs of compliance.
AD-048 : Reimagining Statistical Reports with R Shiny
Sudharsan Dhanavel, Cognizant Technology Solutions
Harinarayan Gopichandran, Cognizant Technology Solutions
Tuesday, 11:30 AM - 11:50 AM, Location: Salon C
How often you land up in a situation after providing your outputs for review and the reviewer comes back to you and ask "I would like to see with this change, would you make this adjustment?". A statistical programmer quite often receives multiple requests that results in several programming hours spent in altering the report parameters. Would it be cool if you point your output in dashboard and say "Here you go! it's an interactive Shiny app, you can alter the parameter or filter or subset as you wish to see and report right away appears in screen" The Shiny package, by RStudio, let you specify input parameters in Graphical User Interface controls like sliders, drop-downs, and text fields; incorporating plots, tables, and summary outputs. The app logic written in R modifies the outputs immediately when the inputs are altered in User Interface. This presentation demonstrates how R Shiny makes it simple for a statistical programmer to turn the analyses of data in an interactive web app to generate the TLF outputs for the Clinical trial that are traditionally generated using SAS; extending the possibility of data review/visualization and advantages of using R Shiny over SAS.
AD-052 : Text Analysis of QC/Issue Tracker Using Natural Language Processing (NLP) Tools
Huijuan Zhang, Columbia University
Todd Case, Vertex Pharmaceuticals Inc.
Monday, 3:30 PM - 3:50 PM, Location: Salon C
Many Statistical Programming groups use QC and issue trackers, which typically include text that describe discrepancies or other notes documented during programming and QC, or 'data about the data'. Each text field is associated with one specific question or problem and/or manually classified into a category and a subcategory by programmers (either by free text or a drop-down box with pre-specified issues that are common, such as 'Specs', 'aCRF', 'SDTM', etc.). Our goal is to look at this text data using some Natural Language Processing (NLP) tools. Using NLP tools allows us an opportunity to be more objective about finding high-level (and even granular) themes about our data by using algorithms that parse out categories, typically either by pre-specified number of categories or by summarizing the most common words. Most importantly, using NLP we do not have to look at text line by line, but it rather provides us an idea about what this entire text is telling us about our processes and issues at a high-level, even checking whether the problems were classified correctly and classifying problems that were not classified previously (e.g., maybe a category was forgotten, it didn't fit into a pre-specified category, etc.). Such techniques provide unique insights into data about our data and the possibility to replace manual work, thus improving work efficiency.
AD-054 : Generating Color Coded Patient ID Spreadsheet (PID list) To Make It Easier for the Reviewers.
Ranjith Kalleda, Pfizer
Ashok Abburi, Pfizer
Monday, 11:45 AM - 11:55 AM, Location: Salon C
In general we generate patient id list (PID list) needed for patient profiles and narratives based on the requirements gathered from the users, but these PID lists will be generated repeatedly for different analysis such as interim analysis, primary analysis, 90/120 day safety update analysis, supplemental/final analysis. After the first PID list is generated for all subsequent PID list's we just add new data on top of already existing previous data. For all subsequent reviews the reviewers may not want to look at all the data but they just want to review the new data. In order to make it easier for the reviewers to just review the new data we can use color coding techniques to color code the new data so that reviewers can only review the color coded new data. In this paper we will discuss about the techniques we can use to generate the color coded new data. We will also discuss about how to make the PID list generation process efficient and easier across different portfolios within the organization.
AD-064 : %LOGBINAUTO: A SAS 9.4 Macro for Automated Forward Selection of Log-Binomial Models
Matthew Finnemeyer, Vertex Pharmaceuticals Inc.
Monday, 10:00 AM - 10:20 AM, Location: Salon C
In a study, relying on log-binomial models to generate estimates of relative risk for clinical outcomes, there was a need to select from a plethora of clinically-relevant covariates to produce parsimonious explanatory models. While SAS 9.4 allows for automated selection (forward, back, stepwise) for some non-linear models, it does not extend this functionality to all non-linear models, including log-binomial models. As it would be very time-consuming to code this for each model, the %LOGBINAUTO macro was created to automate forward selection for log-binomial models and implemented for the study. This macro utilizes nested %DO loops and non-sequentially manages a list of user-supplied covariates, using measures of statistical significance and improved model-fit, to select for their inclusion into a log-binomial model. This paper will outline the code and implementation of the %LOGBINAUTO macro for SAS 9.4, designed for an audience familiar with SAS MACRO language and the PROC GENMOD procedure.
AD-077 : An Application of Supervised Machine Learning Models on Natural Language Processing: Classifying QC Comments into Categories
Xiaoyu Tang, Vertex Pharmaceuticals Inc.
Todd Case, Vertex Pharmaceuticals Inc.
Monday, 11:00 AM - 11:20 AM, Location: Salon C
Quality Control (QC) trackers provide us text data that describe issues documented during QC. When documenting these issues, programmers classify their questions into general categories, such as ADaM and Specs, SDTM and Specs, TLF, SAP, etc., and more specific categories (e.g., AE, LB, ADSL, etc). Classifying issues by programmers is subjective and time-consuming. Hence, it's important to have an objective and efficient way to classify those issues - which category the question belongs to. Having such a tool to classify not only enables us to classify those questions in just one click, but also provides us information on frequencies of each category, and the categories of most concern. Our goal is to use natural language processing (NLP) tools and machine learning methods to model such a classifier using data from the QC Tracker text. We will investigate the relationship between categories and questions, and compare several models, such as support vector classifier, random forest, and etc. to obtain the prediction accuracy using cross-validation for each model. Our goal is to choose a model that has the highest prediction accuracy and is the most valuable for future prediction of free text from QC Trackers. We used Python for our data manipulation and data analysis. Intended audiences are expected to have knowledge on statistics.
AD-078 : ADME Study PK SDTM/ADaM And Graph
Fan Lin, Gilead Sciences, Inc.
Monday, 1:30 PM - 1:50 PM, Location: Salon C
ADME study is usually conducted in early clinical drug development stage to understand the route of excrete of the drug and its metabolites in human body. It measures the concentrations of the parent/metabolite(s) and determines the amount of radioactivity in plasma, urine and feces. Due to the sample species involving urine/feces and subjects being discharged at different times the complexities of creating CDISC compiled SDTM and ADAM datasets are increased compared to other PK studies. In this paper we will introduce the complexities of ADME PK study and our approaches to resolve these challenges. This paper demonstrates the process of ADME study PKMERGE/PC/ADPC/TF in a flow chart, then describes the details in each step, followed by a list of challenges existing in current industry. As mentioned above ADME PK collects urine and feces species, the derived type "URINE+FECES" needs to be calculated in PKMERGE, the discussion of how to put the weight of derived species, sum of weight into CDISC standard data to facilitate analysis was brought in. Other challenges also include deriving last record of carried forward over for early discharge subjects to maximum discharge visit in the dataset and how to represent this in cumulative graph. This paper provides the detailed steps of resolving each challenges to create CDISC complied data modules and analysis presentations. Referring other industry white paper our process meets industry standard and provides high quality visual plot by utilizing the most powerful SAS graphic template language.
AD-094 : A SAS and VBScript cyborg to send emails effectively.
Nikita Sathish, Seattle Genetics, Inc.
Monday, 11:30 AM - 11:40 AM, Location: Salon C
Clinical trial monitoring requires that SAS® programmers periodically generate statistical analysis reports and distribute the reports to cross-functional teams for review. Most reports are distributed via email or an enterprise content management system. Manual generation of email, however, can introduce errors. For example, recipients can be omitted, formatting errors can occur, and reports may not be sent in a timely manner. SAS® programmers can avoid these limitations by sending reports directly from SAS® to large, customized email distribution lists. SAS® programmers can further improve report distribution by combining SAS® with VBScript. VBScript allows programmers to overcome security concerns associated with SAS®, by not having to store login credentials or tweak the configuration file. We recommend that programmers send emails from SMTP server through SAS® using VBScript. This approach allows programmers to send emails with attachments, pass through SAS® macro variables to VBScript, and build dynamic email contents.
AD-104 : A Simple SAS Utility to Combine Existing RTF Tables/Figures and Create a Multi-level Bookmark Hierarchy and a Hyperlinked TOC
Lugang Larry Xie, Johnson & Johnson
Monday, 2:00 PM - 2:20 PM, Location: Salon C
RTF is a popular format that most sponsors adopt to create tables, listings and figures in the pharmaceutical industry. However, there is an unmet need to concatenate the outputs into a presentable single RTF file independent of the computing system. This paper presents a simple SAS approach to concatenate any SAS generated RTF files, including both tables/listings and figures, and create a multi-level bookmark hierarchy and a hyperlinked TOC, without having to modify those the macros/programs that create the individual output. It can be executed on any platforms with SAS installed. In addition, a simple approach to convert the combined RTF file into a PDF format is introduced in this paper.
AD-107 : Auto-generation of Clinical Laboratory Unit Conversions
Alan Meier, Cytel
Monday, 4:30 PM - 4:50 PM, Location: Salon C
When mapping local labs to SDTM data structures, dealing with the free-text used by sites to indicate the original reporting units in order to generate conversion factors to sponsor's standard reporting units is often a major headache. Most lab units are in an amount per volume structure, where amounts are normally in some multiplier of mass(grams/moles), counts(cells), International Units or Enzyme Units and volumes are mainly in liters. This paper will look at a strategy to automate with SAS, much of the task through a three-step process: step 1 - parse and categorize the amounts and volumes, step 2 - normalize to a published conversion factor, and step 3 - calculate the final conversion factor.
AD-109 : Tool Development Methods for Implementing and Converting to New Controlled Terminology in SDTM datasets
Martha O'Brien, Reata Pharmaceuticals
Keith Shusterman, Reata Pharmaceuticals
Tuesday, 4:30 PM - 4:50 PM, Location: Salon C
Controlled terminology (CT) for SDTM datasets allows for easier review for committee members, other programmers, consultants, consulting companies, the FDA, and many others. This may ultimately reduce the time it takes to get the drug or device to market. Ensuring a proper method of choosing and implementing a newer version of CT is not only necessary but vital to submission acceptance. Currently the FDA only requires CT versions of 2011-06-10 or later and with quarterly outputs there are many options to choose. This can make it difficult when starting a study or after a study starts a sponsor may decide that up versioning the CT is necessary. Sponsors will also need to harmonize the new CT with their own specific values that have been added to the extensible codelists prior to implementing new versions. Having a partially automated process to convert to a newer CT eases not only the time constraint but also reduces the possibility of human error. This paper will provide a process for creating tools when up versioning CT during an ongoing study.
AD-111 : Learning SAS® GTL Basics with Practical Examples
Jinit Mistry, Seattle Genetics, Inc.
Monday, 9:30 AM - 9:50 AM, Location: Salon C
Over time, the business needs for data visualization have evolved due to the increased complexity of clinical trials. There has been an increased demand for complex graphs as part of clinical study reports, based on statistical analysis plans. For efficacy endpoints of oncology trials, complex graphs such as KM plots, forest plots, waterfall plots, spider plots, and box plots require creating more robust and customizable SAS® code. To address this need, various statistical graphics (SG) procedures and Graph Template Language (GTL) have been introduced in SAS®. SAS® SG procedures use an associated GTL template in the background because SG procedures have limited ability to create graphs. This makes GTL critical to understand in order to create customizable templates and associate them with the data to accomplish the goals of the analyses. GTL can create more customizable graphs over SG procedures to address such complex graphical needs. This paper talks about GTL key statements and how GTL uses a template that can work for different scenarios. Some relevant industry examples will also be shared in this paper.
AD-118 : Automated Dynamic Data Exchange (DDE) Replacement Solution for SAS® GRID
Ajay Gupta, PPD
Monday, 4:00 PM - 4:20 PM, Location: Salon C
In Pharmaceuticals and CRO industries, Excel® is widely used to create mapping specs and reporting. There are different ways in which the SAS® System allows data export into a Microsoft Excel spreadsheet and do Excel formatting. But, Dynamic Data Exchange (DDE) is the only technique providing total control over the Excel output. DDE uses a client/server relationship to enable a client application to request information from a server application. SAS is always the client. In this role, SAS requests data from server applications, sends data to server applications, or sends commands to server applications. Unfortunately, DDE is not supported on SAS GRID computing. This paper will explore a replacement solution for DDE on SAS grid using SAS stored process, Visual Basic for application (VBA), and SAS add-in for Microsoft Office. Later, the paper will explore the automation process and extend the solution to format Microsoft Word® documents.
AD-121 : The Application of SAS-Excel Handshake in DDT
Maggie Ci Jiang, Teva Pharmaceuticals
Monday, 2:30 PM - 2:50 PM, Location: Salon C
To execute the SAS code embedded in excel is always challenging due to the fact that the handshake takes place between the two different SAS and Excel software utilities. In clinical trial, when we do CDSIC data conversion, for example, instead of writing separate SAS programs to convert the Define Definition Table (DDT) to SAS datasets, we would like to have the SAS code written in excel so we can compare side by side the SAS code mapping to the rules and definitions recorded in excel file which serves as the DDT. This approach is particularly convenient for users in the clinical programming development as it allows the code mapping to be more visible; and to provide an easier way for the reviewers to trace the consistency and accuracy for the conversion between the rules and the real-time SAS code. This paper is to present the challenge of this SAS-excel handshake with a complete real-world CDISC SDTM data conversion example. The SAS-excel handshake individual examples in this paper include simple SAS code, the SAS built-in functions and the external customized macros. The benefits and the drawbacks of utilizing such SAS-excel handshake are also going to be discussed as well.
AD-129 : Automate the Mundane: Using Python for Text Mining
Nathan Kosiba, Rho, Inc.
Tuesday, 8:00 AM - 8:20 AM, Location: Salon C
As programmers and biostatisticians, we have a number of tasks that are broken down to a "copy and paste from document A to document B" scenario. These tasks range from copying the name of the study and sponsor into a SAP to pulling Inclusion/Exclusion criteria for the TI domain. We perform these tasks ad nauseam. While SAS® is very good at dealing with structured data and analyses, it is not well equipped to deal with unstructured text such as a protocol. Python, on the other hand, has numerous tools for dealing with unstructured text and breaking it into meaningful pieces of information. While Python can do a vast amount of different things, we will be focusing on how to process information in a protocol to populate Metadata and create Trial Design datasets. Using Python and regular expressions, we can process large amounts of information held in a protocol or other document and retrieve what we need without much manual effort. These tools are often written with user interfaces included so there is no programming knowledge required for the end-user. In addition, Python integrates with a variety of existing systems and/or produces output that is easily used as input for these systems. This flexibility supports nearly fully automated results. This presentation will explore some of the Python packages that facilitate this process and how Python can be used to automate mundane tasks.
AD-136 : Why we should learn Python
Kevin Lee, Clindata Insight
Tuesday, 8:30 AM - 8:50 AM, Location: Salon C
Python is one of the most popular language nowadays. Python can be used to build just about anything, and it is a great language for data analysis, scientific computing, application development, backend web development, especially machine learning and many more. Python is currently featured in 70% of introductory programming courses at US universities and the latest report from Forbes states that Python grew by more than 450 percent in 2017. We, statistical programmers, have been using SAS and because of the popularity of Python, we wonder if we should learn Python. The paper will start with the current Python implementation and the future of its implementation. The paper will also show basic concepts of Python programming, similarities with SAS programming and difference from SAS programming. The paper will also introduce the benefits of learning Python including the opportunity in Data Science and Machine Learning, career opportunities and a high salary. And also, the paper will discuss the weakness of Python such as regulatory restriction and a lack of metadata. Finally, the paper will discuss the future of statistical programming in the use of Python programming.
AD-159 : Automate the Process to Ensure the Compliance with FDA Business Rules in SDTM Programming for FDA Submission
Xiangchen (Bob) Cui, Alkermes Inc.
Hao Guan, Alkermes Inc.
Min Chen, Alkermes Inc.
Letan (Cleo) Lin, Alkermes Inc.
Monday, 8:00 AM - 8:50 AM, Location: Salon C
FDA has published "FDA Business Rules" , and expects sponsors to submit SDTM datasets which are compliant with the rules, as well as CDISC IG . These rules assess if the data supports regulatory review and analysis. Some of them are specific to FDA internal processes, rather than to CDISC SDTM standards. Pinnacle 21 is the most commonly used tool by both the industry and FDA to check compliance with both FDA business rules and CDSIC rules. However, Pinnacle 21 is usually used at a late stage of SDTM programming development cycle, and it cannot help users to resolve its findings regarding "Error" and/or "Warming" messages, even if it is used at the very early stage. This paper presents a systematic approach to automate SDTM programming process to ensure compliance with FDA Business Rules. It contains study data collection design, data collection (edit-checking), standard SDTM programming process, and in-house macros for automatically reporting and/or fixing the issues to address non-compliance with "FDA Business Rules". It avoids inefficient use of resources for repeated verification of the compliance and/or resolution of the findings from Pinnacle 21 for these rules. In fact, some of these non-compliant issues are often very "costly" and/or too late to be fixed at a late stage. The sharing of hands-on experiences is to assist readers to apply this methodology to prepare both FDA Business Rule and CDISC Standards compliant SDTM datasets for FDA submission to ensure the technical accuracy and submission quality, in addition to cost-effectiveness and efficiency.
Raj Kiran Boddu, Takeda
Tuesday, 9:00 AM - 9:50 AM, Location: Salon C
AD-203 : Large-scale TFL Automation for regulated Pharmaceutical trials using CDISC Analysis Results Metatadata (ARM)
Stuart Malcolm, Frontier Science (Scotland) Ltd
Tuesday, 2:00 PM - 2:50 PM, Location: Salon C
The creation of a Clinical Study Report (CSR) for Phase II/III Pharmaceutical clinicals trial involves the production of several hundred Tables, Listings and Figures (TFL). This can be a time-consuming activity when each TFL is programmed manually and a natural candidate for an automated solution that will 'batch process' all the TFL. However, although many TFL will be standard 'variations on a theme', there will also be many study-specific TFL which may have been developed in collaboration between Investigators and Statisticians and these require custom programming and a deeper understanding of the data. In addition, Sponsors often require CDISC Analysis Results Metadata (ARM) define.xml for submission to regulators, and this is often performed as an additional work-package at the end of the trial once the TFL have been delivered. This paper outlines an approach to TFL automation that involved creation of the CDISC Analysis Results Metadata at the start of the process, not the end, and then uses the metadata to generate the TFL using a SAS program structure that allows standardised TFL to be created while also providing flexibility to easily incorporate study-specific analyses.
AD-208 : Define-XML with ARM
Lei Jing, FMD K&L Inc.
Chao Wang, FMD K&L Inc.
Monday, 5:00 PM - 5:10 PM, Location: Salon C
Currently, Analysis Results Metadata (ARM) is a required component in PMDA data submission package. ARM assists the reviewer by providing traceability from key efficacy and safety analysis results to analysis datasets and dataset related elements, which adds significant value to a regulatory submission as well. However, the process of developing ARM in Define-XML may be very time-consuming if it is handled manually. This paper presents an effective approach to integrate ARM automatically into existing Define-XML v2.0. A macro is designed to convert all required information from ARM metadata into valid XML syntax and then insert the XML codes into existing Define-XML. This automating process will reduce the development cycle and increase package quality.
AD-211 : Validating Hyperlinks in SDTM define.xml Using Python
Brandon Welch, Rho, Inc.
Greg Weller, Rho, Inc.
Tuesday, 10:00 AM - 10:20 AM, Location: Salon C
As a one-stop location for a clinical trial's metadata, the define.xml file is a vital piece of an FDA submission. Held within this file are many hyperlinks. Some links are internally specific to the XML file, while others point to external locations. Of particular interest in SDTM submissions are the links that externally map to the study's annotated case report form (CRF) - a PDF document. For a particular SDTM variable, a user clicks on the hyperlink and the annotated CRF opens to the variable's origin page. Depending on how these links are created in the define.xml, occasionally the page hyperlink fails to open the correct annotated CRF page. Manually testing each hyperlink is tedious and error prone. Fortunately, there are powerful Python modules for analyzing PDF and XML files. In this paper, we describe a technique using the Python programming language that checks each define.xml link against each page in the CRF PDF document. The techniques presented offer a good overview of basic Python techniques that will educate programmers at all levels.
AD-215 : Use of SAS Merge in Adverse Events Reporting for DSUR
Andrew Wang, Celgene
Tuesday, 10:30 AM - 10:50 AM, Location: Salon C
In reporting adverse events for DSUR, statistical programmers often need to separate AEs already reported in previous years from AEs present in current database. The task of identifying the adverse events already reported could present a challenge to the programmer due to the fact that there is no universally accepted way to identify an adverse event and all types of data issues could be present in the data. Different programmers might use different identification variables, and that presents a special challenge for validation as well. For an ongoing study, database might get changed from time to time, un-coded AEs last time could become coded this time, ongoing AEs will likely have an end date later, an AE could get deleted from the database or be split into multiple AEs. Many scenarios can happen. This paper will talk about how to use SAS merge to pick out already reported AEs based on the author's experience. The situations of many-to-many merge, one-to-many merge, and one-to-one merge will be discussed in detail for this particular application. A special case will be shown to illustrate that even one-to-one merge could lead to wrong results and suggestions for the selection of the ID variables will be given.
AD-228 : User-Defined Multithreading with the SAS® DS2 Procedure: Performance Testing DS2 Against Functionally Equivalent DATA Steps
Troy Hughes, Datmesis Analytics
Tuesday, 3:00 PM - 3:50 PM, Location: Salon C
The Data Step 2 (DS2) procedure represents the first opportunity that developers have had to build custom, multithreaded processes in Base SAS®. Multithreaded processing debuted in SAS 9, when built-in procedures such as SORT, SQL, and MEANS were threaded to reduce runtime. Despite this advancement, and in contrast with languages such as Java and Python, SAS 9 still did not provide developers the ability to create custom, multithreaded processes. This limitation was overcome in SAS 9.4 with the introduction of the DS2 procedure-a threaded, object-oriented version of the DATA step. However, because DS2 relies on methods and packages (neither of which have been previously available in Base SAS), both DS2 instruction and literature has predominantly fixated on these object-oriented aspects rather than DS2 multithreading. This text is the first to focus solely on DS2 multithreading and the performance advantages thereof. Common DATA step tasks such as data cleaning, transformation, and analysis are demonstrated, after which functionally equivalent DS2 code is introduced. Each paired example concludes with performance metrics that inarguably demonstrate faster runtimes with the DS2 language-even on a stand-alone laptop. All examples can be run in Base SAS and do not require in-database processing or the purchase of the DS2 Code Accelerator or other optional SAS components.
AD-234 : Camouflage your Clinical Trial with Machine Learning and AI
Ajith Baby Sadasivan, Genpro Life Sciences
Limna Salim, Genpro Life Sciences
Akhil Vijayan, Genpro Life Sciences
Anoop Ambika, Genpro Life Sciences
Tuesday, 11:00 AM - 11:20 AM, Location: Salon C
The internal focus for most pharmaceutical companies today falls under reviewing and implementing significant changes in R&D strategies. To aid this, greater transparency and sharing of clinical study reports and patient level data for further research is crucial. Also, recently in July 2018, the US Food and Drug Administration published a guidance which facilitates the use of Electronic Health Record Data in clinical investigations .With the EHR being made available for analysis and with the upcoming advocacy of Big Data in healthcare, there arises issues related to data privacy and provenance which can be overcome through the art of Data Anonymization. The clinical data in various forms like Individual Patient Data, Data from EHR or the CSR is extremely complex and it remains a challenge to develop tools required to analyze such data. This paper explores the possibilities of developing a dynamic software framework using Angular JS, Python®, SAS®, Rasa NLU and spaCy in compliance with the EMA Policy 0070 for easy and effective anonymization / pseudonymization by generating named entity recognition trainable system or similar counterparts that do not need programming work to improve. The biggest challenge is to identify the personal or quasi identifiers to be anonymised from our data and to mask it such that the original demeanor is not altered. The paper describes methods to overcome this with the help of AI and ML methodologies and gives the user the authority to approve the masking of required ID's proposed by the tool itself after proper training.
AD-278 : Creating a DOS Batch File to Run SAS® Programs
David Franklin, IQVIA
Monday, 5:15 PM - 5:25 PM, Location: Salon C
We often have many SAS programs to run in a directory. While it is possible to run each individually, it is better if a DOS Batch file be created with the list of programs being run and the order in which they are run. This paper looks at a SAS macro that will take the list of SAS programs in a directory, as specified by the user, and create a DOS Batch file to which can then be run to run all the SAS programs. Also presented will we a small SAS program that you can run at the end of the program to send you an email saying when the programs had finished, and whether there are any issues in the SAS LOG to review!
AD-279 : Statistical application in Image Processing by Integrating C and SAS
Vidhyavathi Venkataraman, Biomarin Pharmaceuticals
Srinand Ponnathapura Nandakumar, Alder Pharmaceuticals
Anupama Datta, TransUnion
Monday, 9:00 AM - 9:20 AM, Location: Salon C
Analysis and manipulation of image data after being decoded into a numerical format is known as image processing. Several filters utilizing statistical principles are applied to this numerical data to improve the quality of the image. Beginning with SAS 9.2, a unique procedure is available to incorporate the flexibility of C programs into SAS. In this paper, we discuss the procedure of customizing C functions and integrating them in SAS. An illustration of this feature is used in addressing common issues for image analysis like image size standardization, magnification and pixel transformation/contrasting. Statistical tools like weighted regression (bi-linear interpolation), Expectation Maximization (EM) algorithm in the Region of Interest (RoI) and the Histogram Equalization techniques are detailed for further image enhancement
AD-299 : Best Practices for ISS/ISE Dataset Development
Bharath Donthi, Statistics & Data Corporation
Lingjiao Qi, Statistics & Data Corporation
Tuesday, 1:30 PM - 1:50 PM, Location: Salon C
The integrated summary of safety (ISS) and integrated summary of efficacy (ISE) are vital components of a successful submission for regulatory approval in the pharmaceutical industry. ISS and ISE allow reviewers to easily compare individual outcomes, tracking subject's results across the entire clinical development lifespan of the investigational product. Furthermore, ISS/ISE facilitate broad views of the investigational product's overall efficacy and safety profiles. However, building integrated datasets is a challenging task as it requires the programmer to achieve consistent structures and formats while also ensuring that each dataset is CDISC-compliant. This paper provides best practices for ISS and ISE dataset development to guide integrated analysis dataset design and production in an efficient manner. First, we discuss best practices to ensure the consistency of integrated datasets by up-versioning all data with the same coding dictionaries (MedDRA, CTCAT, WHO, etc.) and by harmonizing all variable attributes (variable names, types, formats, labels, CODE and DECODES for categorical and ordinal variables, ranges for continuous variables, etc.). Next, we discuss CDISC requirements regarding the mapping of SDTM and ADaM. Then, we will talk about how to handle some complex cases in developing integrated datasets, such as when one subject participates in multiple clinical studies included the ISS/ISE. Finally, we will touch on key points of analysis involving consistent flag assignment across studies and proper application of integration methods for safety and efficacy analysis. This step-by-step guide enables the efficient and accurate creation of ISS and ISE datasets.
AD-316 : A Utility to Automate Reconciliation of Report Numbers and Titles
Valerie Williams, ICON Clinical Research
Ganesh Prasad, ICON Clinical Research
Tuesday, 5:00 PM - 5:20 PM, Location: Salon C
In clinical trials, analysis reports are generated in the form of tables, listings and figures (TLFs) for incorporation into Clinical Study Reports (CSRs) that are submitted to regulatory bodies for review and approval of a study drug or device. These TLFs are distinguished from one another by their unique report numbers and titles. A change in either of these two important pieces of metadata conveys a completely different meaning to the output so it is very important that they are accurate. Lead programmers need to identify and fix cases of missing TLFs and/or incorrect TLF report numbers or titles, prior to CSR submission. A utility program was developed, to automate this process and reduce time required to reconcile the report header information. This paper will describe the following main features of the report number and titles reconciliation utility: 1). Reading report numbers and titles from a .csv file of consolidated TLFs (created from bundled .pdf report files) or from an ISOTOP.xml file containing titles and footnotes; 2). Comparing them with the List of Tables (LoT) .xlsx file; 3). Identifying discrepancies between the two files; and 4). Generating an excel file with a summary report of the highlighted differences. This utility program was developed on PC SAS and runs on Windows SAS 9.3/9.4 and Unix SAS 9.4 SAS GRID operating system environments.
AD-326 : Interactive TLFs - A Smarter Way to Review your Statistical Outputs
Bhavin Busa, Vita Data Sciences
Tuesday, 4:00 PM - 4:20 PM, Location: Salon C
In the clinical industry, the data is represented and analyzed in the form of tables, listings and figures (TLFs) which are typically generated using SAS®. For the Biometrics teams, plotting data and presenting statistical information using SAS ODS and SAS graphics procedures is not new. However, the output that is generated using SAS ODS and SAS graphics procedure is for the most part static and only gives end users an ability to "look" at the output without giving them an ability to explore and drill-down further. With the rapidly increasing availability of data visualization and analytic tools, the industry landscape is shifting. In addition, the end users now want to 'see' their data more interactively, identify trends, visualize the patient profiles and review results at a high-level while still being able to drill-down to get a complete picture. They also wish to have access to their ongoing study data on 'Day 1' and don't want to wait for the study CSR TLFs to be programmed before they can use the data for their review or monitoring needs. In this paper, we will talk about how we have used the power of SAS® and TIBCO Spotfire®, to build "Interactive TLFs" using SDTM datasets to meet these demands. We will demonstrate through a case-study how a clinical team can use this platform to review typical statistical outputs/TLFs (e.g. demographics, disposition, AEs, concomitant medications, laboratory and vital signs) more interactively and thereby avoiding to flipping through hundreds if not thousands of static pages.
AD-341 : Time Since Last Dose-Anatomy of a SQL Query
Derek Morgan, PAREXEL International
Monday, 3:00 PM - 3:20 PM, Location: Salon C
Even though much of what we need to accomplish as statistical programmers can be accomplished using the DATA step, SQL can provide an alternative to large amounts of data manipulation. One such case is determining time since last (or first) dose. This paper will walk you through a SQL solution in place of multiple SORTs, MERGEs, TRANSPOSEs, and use of the LAG function. Not only is the code economical, but the execution is also economical in terms of execution. This may also help to increase your understanding of the interaction between subqueries, WHERE statement vs. WHERE option, and the SQL HAVING and GROUP BY clauses.
Data StandardsDS-049 : Streamlining the Metadata Management Process Using SAS® Life Science Analytics Framework
Alex Ford, SAS
Monday, 9:00 AM - 9:20 AM, Location: Salon B
It was not long into my clinical programming career before I discovered that CDISC is truly an acronym for "Can Do It Somewhat Correctly". Each run of a validation report uncovered new warnings or errors followed by tracking down the source of those issues to log and report for a define.xml. The latest release of the SAS® Life Science Analytics Framework (LSAF) provides a centralized framework where standards can be imported and live alongside a study and its data, managed by a graphical user interface. By associating a data standard, controlled terminology, and dictionaries with a study, team leads have the data and information necessary to produce a define.xml at the click of a button. Join us as we explore the metadata management features available in LSAF 5.1 which enable programmers of all levels to manage data standards correctly the first time, saving studies both time and money.
DS-055 : Achieving Zen: A Journey to ADaM Compliance
Kjersten Offenbecker, Clinical Solutions Group (CSG)
Alice Ehmann, Clinical Solutions Group (CSG)
Kirsty Lauderdale, Clinical Solutions Group (CSG)
Tuesday, 3:30 PM - 3:50 PM, Location: Salon B
For many programmers and statistician creating compliant ADaM specifications, programs and datasets is confusing and a bit overwhelming. What should be included versus what to leave out? What level of traceability is needed? What information should be presented in the specs, and what efficiencies can be utilized in the code? Follow us as we work through some of the common pitfalls and map out a path which will help you navigate this winding road to come out with a more compliant product which is clearer for everyone to follow and understand. We will shine light on how to create compliant specifications which will lead you to compliant datasets that are everything the FDA is looking for.
DS-082 : Incremental Changes: ADaMIG v1.2 Update
Nancy Brucken, Syneos Health
Brian Harris, MedImmune
Terek Peterson, Covance
Alyssa Wittle, Covance
Deb Goodfellow, Covance
Monday, 10:00 AM - 10:20 AM, Location: Salon B
The ADaM Implementation Guide (ADaMIG) has now been available to industry since 2009, providing a standardized way to communicate and analyze study data. Improvements and clarifications were added in 2016 with the release of v1.1. Since that release, the ADaM team has been working on some items that were not yet ready for v1.1 but are now ready for the next release, v1.2. These items include important clarifications to existing text, standard nomenclature for stratification variables within ADSL, and a recommended approach for bi-directional toxicity grades. In addition, an update on the removal of the new suggested permissible variable within the Basic Data Structure (BDS) called PARQUAL will be discussed. The ADaMIG v1.2 will be discussed from both the perspective of changes from v1.1 as well as changes made since the public review of v1.2.
DS-087 : Strategy to Evaluate the Quality of Clinical Data from CROs
Charley Wu, Atara Biotherapeutics
Tuesday, 9:00 AM - 9:20 AM, Location: Salon B
More and more pharmaceutical and biotech companies (Sponsors) are using CROs for data management. High quality of data is the key for statistical analysis, FDA data submission and drug approval. As most of Clinical Data Management work is done by CROs now, evaluation of the quality of clinical data produced by CROs is a big challenge for all sponsors. Many sponsors just review the data manually or just do sporadic checking due to limited in-house resources. That leads to many data issues not identified before database lock. This paper introduces a comprehensive approach that includes both automatic and manual data review. Auto review consists of 1) data structure check, 2) new data check, 3) edit check, 4) SAE reconciliation, 5) PK reconciliation, 6) lab data normalization and reports, 7) critical variable check, 8) statistical check, and 9) ad-hoc reports. Auto review is achieved by SAS programming. Manual review is done by medical monitor, Pharmacovigilance, Clinical operations and data management. Manual review normally can identify 5-10% of data issues while auto review can identify over 90% of issues. Many manual review findings can lead to more SAS programming review and thus reduce the burden of manual review. Manual review usually takes about 1-2 weeks while auto review takes about 1-2 hrs. for each data transfer. This comprehensive approach greatly improves data quality and enables sponsors to lock database with confidence.
DS-088 : Pacemaker Guy: De-Mystifying a Business Use Case for SDTM Standard and Medical Device Domains
Carey Smoak, S-Cubed
Donna Sattler, Bristol Myers Squibb
Fred Wood, Data Standards Consulting Group
Monday, 11:00 AM - 11:20 AM, Location: Salon B
Medical Device Standards can be applied to even the most complicated Medical Device clinical research studies. There are many papers written on how to map certain kinds of data like Exposure Data or Lab Data, but not too many on how to incorporate Medical Device data with these SDTM Core Standards. Considerations were made for simple and complex data points when mapping to the SDTM standards. We learned that just like in biologics; you need to plan for the unexpected event with Medical Device studies. We take you through a subject experience by showing the mappings of the data but also illustrate the procedure(s) and how to visually map the data. The goal is to leave the participant/reader with a curiosity to want to map their own Medical Device data to the SDTM standards sooner than what the current expectation is. The more the Medical Device Industry uses the SDTM MD Standards, the more we can influence the regulatory agencies expectations and their tools for Medical Device domains.
DS-119 : Common Pinnacle 21 Report Issues: Shall we Document or Fix?
Ajay Gupta, PPD
Tuesday, 4:00 PM - 4:20 PM, Location: Salon B
Pinnacle 21, also known as OpenCDISC Validator, provides great compliance checks against CDISC outputs like SDTM, ADaM, SEND and Define.xml. This validation tool provides a report in Excel or CSV format which contains information as errors, warnings, and notices. At the initial stage of clinical programming when the data is not very clean, this report can sometimes be very large and tedious to review. If the programmer is fairly new to this report s/he might not be aware of some common issues and will have to fully depend on an experienced programmer to pave the road for them. Indirectly, this will add more review time in the budget and might distract the programmer from real issues which affect the data quality. In this paper, I will discuss some common issues with the Pinnacle 21 report messages created from running against SDTM datasets and propose some solutions based on my experience. Also, I will discuss some scenarios when it is better to document the issue in reviewer's guide than doing workaround programming. While the author totally agrees that there is no one fit for all solution, my intention is to provide programmers a direction which might help them to find the right solutions for their situation.
DS-146 : Considerations when Representing Multiple Subject Enrollments in SDTM
Kristin Kelly, Pinnacle 21
Mike Hamidi, CDISC
Tuesday, 10:30 AM - 10:50 AM, Location: Salon B
In clinical trials, it has become more common for a study design to allow subjects to re-enroll in the same study or subsequent studies within a submission. For studies that allow subjects to re-screen for the same study, it may be difficult to determine how to represent the data for multiple enrollments in SDTM. There are a number of approaches seen in industry, but many pose issues. An example of this is creating multiple records in DM with the same USUBJID to represent each enrollment. Though this may seem the most straightforward approach, many tools used at FDA are configured to expect one record per subject and thus, the data may not readily load into their tools. Another approach is to assign different USUBJID values for the same subject within a study and across studies. This also creates issues for review because it is difficult to track the same subject across studies. This paper will focus on examples from industry as well as proposed solutions for representing this data in SDTM.
DS-148 : 7 Habits of Highly Effective (Validation Issue) Managers
Amy Garrett, Pinnacle 21
Monday, 3:30 PM - 3:50 PM, Location: Salon B
Pinnacle 21 Validator identifies problems in data; however, diagnostics, assessment and resolution of reported validation issues may feel like a complicated, never-ending process. In this presentation, we will discuss common challenges in managing data validation issues and how to handle them effectively. We will show you how to identify the source of validation issues, and how to classify them to understand when to fix or when to explain. We will also discuss cross-team collaboration, ways to improve your process, and habits that lead to faster issue resolution.
DS-151 : Bidirectionality to LOINC: Handling the Nitty Gritty of Lab Data
Bhargav Koduru, Seattle Genetics, Inc.
Tuesday, 2:30 PM - 2:50 PM, Location: Salon B
Laboratory data is often challenging to work with during analysis data set creation. This paper will include solutions to some of these complexities encountered during the LB (SDTM) and ADLB (ADaM) data set structure, including: " How to use the Model Permissible variables of the Findings class in the LB SDTM data set (Ex:__SPCCND, __SPCUFL) " How to identify the tests that could be graded by CTCAE 4.0, and determine the directionality " How to plan and execute the ADaM dataset to support the shift tables associated with bidirectional tests (Ex: Glucose-High and Low) " How to derive "Treatment Emergent" records in the ADLB ADaM datasets, bidirectional tests in particular " How to associate the preferred terms between CTCAE and MEDRA to establish the clinical connection between the adverse event and the lab result. (Ex: Neutropenia (MEDRA) and Neutrophil Count Decreased (CTCAE) can both be linked to the lab test "Neutrophils") With the requirement for LOINC (Logical Observation Identifiers Names and Codes) beginning for studies that start after March 15, 2020 for NDAs, ANDAs and certain BLAs, and on March 15, 2021 for certain INDs, I would like to share some thoughts on the LOINC implementation in the LB data. For example, glucose identified in serum/plasma or urine would both have the same TESTCD "GLUC" and often the units are also same (mg/dL), however, only the variable LBSPEC would differ between them. LOINC is proposed to address potential confusion by having a unique 6-part name.
DS-173 : Forewarned is forearmed or how to deal with ADSL issues
Anastasiia Oparii, Experis / Intego Group
Monday, 2:30 PM - 2:50 PM, Location: Salon B
Subject-Level Analysis Dataset is an important and essential part of each study which helps to review information about the patient across a clinical trial. Moreover, it provides traceability between all the analysis datasets and source data. Therefore, ADSL should be derived with special care. If you are really lucky, you even don't realize how many tricky questions it may cause when raw data is not clear enough or just something goes wrong. This paper summarizes some common issues related to ADSL programming and suggests potential solutions to avoid problems in advance. In particular, it focuses on dealing with partial or completely missing dates and their connections with other derived variables along with additional validation for variables which are selected from SDTM domains. Furthermore, it walks through some useful examples and provides SAS macros to identify issues.
DS-185 : Leveraging Intermediate Data Sets to Achieve ADaM Traceability
Yun (Julie) Zhuo, PRA Health Sciences
Monday, 3:00 PM - 3:20 PM, Location: Salon B
Traceability, a fundamental principle of ADaM, provides transparency and increases confidence for the FDA reviewers. Building traceability could be a daunting task for a complex ADaM data set especially when it involves multiple data sources and multi-step derivations. In this paper, we illustrate the benefits of using an intermediate data set to achieve traceability using example SDTM and ADaM data sets from a Phase III oncology study. Through the intermediate data set, along with its metadata, it is possible to trace the final analysis value to a record in the intermediate data set, and then from there to the source domains.
DS-196 : Practical Guide for Creating ADaM Datasets in Cross-over Studies
Neha Sakhawalkar, Rang Technologies
Kamlesh Patel, Rang Technologies
Tuesday, 11:30 AM - 11:40 AM, Location: Salon A
Analysis datasets (ADaM) are categorized into Subject Level Analysis Data (ADSL), Basic Data Structure (BDS), Occurrence Data Structure (OCCDS) and Other ADaM data structures. The first three are most common and are used to analyze data in most of the parallel studies. Implementation of ADaM standard in studies is relatively straightforward for experienced programmers and biostatisticians. However, cross-over study designs are implemented in selective clinical trials compared to parallel study due to many factors; hence, many programmers are not quite aware of implementing ADaM datasets for cross-over study design. This paper aims at focusing on ADaM datasets for cross-over studies with details regarding variables, difference between derivations of these variables across the various data structures (ADSL, BDS and OCCDS) as well as an example of each data structure.
DS-202 : This Paper focuses on CDISC Questionnaires, Ratings and Scales (QRS) supplements and types of FDA Clinical Outcome Assessments
Shrishaila Patil, Quanticate
Tuesday, 3:00 PM - 3:20 PM, Location: Salon B
CDISC develops SDTM (tabulation) and ADaM (analysis) QRS supplements that provide information on how to structure the data in a standard format for public domain and copyright-approved instruments. An instrument is a series of questions, tasks or assessments used in clinical research to provide a qualitative or quantitative assessment of a clinical concept or task-based observation. Controlled Terminology is also developed to be used with the supplements. CDISC creates supplements for three types of instruments: " Questionnaires " Functional Tests " Clinical Classifications This Paper is an effort to " Understand QRS supplements & how it is developed " Understand how to model ratings and scales other than questionnaires, " Understand AdaM Structure to be used for QRS supplements, " Understand different types of FDA Clinical Outcome Assessment (COA)'s o Clinician-reported outcome (ClinRO) o Observer-reported outcome (ObsRO) o Patient-reported outcome (PRO) o Performance outcome (PerfO) " Understand how CDISC QRS Supplements assist in structuring Clinical Outcome Assessment (COA) data so that it is collected and reported in a standardized format.
DS-223 : Homogenizing Unique and Complex data into Standard SDTM Domains with TAUGs
Sowmya Srinivasa Mukundan, Ephicacy Lifesciences Analytics
Charumathy Sreeraman, Ephicacy Lifesciences Analytics
Monday, 11:30 AM - 11:50 AM, Location: Salon B
Clinical research supports discovery of new and better ways to detect, diagnose, treat, and prevent disease. Furthermore, the core focus of each therapeutic area (TA) is on research and development of treatments, together with prevention of specific diseases. It must be envisaged that each TAs demands diverse way of collecting, measuring and analysing data based on the focus of the research. SDTM is one of the pioneer CDISC foundational standards. It defines and underpins the strategy for submitting data tabulations to regulatory authorities. The SDTMIG organizes and formats data to support streamlined data collection and analysis across different TAs. TAUGs are extended Foundational Standards to represent data that pertains to specific disease areas. It supports pharmaceutical / biotech companies with implementation of these CDISC standards for a specific disease and facilitate resolutions for mapping additional or unique data points needed to support any given TA for their analysis. In this paper we will explore the TAUG (focusing on two different Therapeutic Areas) and will be elaborately discussing how to map the unique and custom data to the standard SDTM domains with the help of TAUG.
DS-239 : More Traceability: Clarity in ADaM Metadata and Beyond
Wayne Zhong, Accretion Softworks
Richann Watson, DataRich Consulting
Daphne Ewing, CSL Behring
Jasmine Zhang, Boehringer Ingelheim
Monday, 4:30 PM - 5:20 PM, Location: Salon B
One of the fundamental principles of ADaM is that datasets and associated metadata must include traceability to facilitate the understanding of the relationships between analysis results, ADaM datasets, and SDTM datasets. The existing ADaM documents contain isolated elements of traceability, such as including SDTM sequence numbers, creating new records to capture derived analysis values, and providing excerpts of define.xml documentation. An ADaM sub-team is currently developing a Traceability Examples Document with the goal of bringing these separate elements of traceability together and demonstrate how they function in detailed and complete examples. The examples cover a wide variety of practical scenarios; some expand on content from other CDISC documents, while others are developed specifically for the Traceability Examples Document. As members of the Traceability Examples ADaM sub-team, we are including in this PharmaSUG paper a selection of examples to show how traceability can bring transparency and clarity to your analyses.
DS-250 : Timing is Everything: Defining ADaM Period, Subperiod and Phase
Nancy Brucken, Syneos Health
Tuesday, 8:00 AM - 8:20 AM, Location: Salon B
The CDISC Analysis Data Model Implementation Guide (ADaMIG) provides several timing variables for modeling clinical trial designs in analysis datasets. APHASE, APERIOD and ASPER can be used in conjunction with related treatment variables to meet a variety of analysis requirements, from single-period parallel studies to much more complicated situations involving multiple treatment periods and even different studies. The goal of this paper is to illustrate how some of these study designs may be handled in ADaM, and provide guidelines for selecting when to use the different timing variables that are available.
DS-254 : Findings About: De-mystifying the When and How
Soumya Rajesh, Syneos Health
Michael Wise, Syneos Health
Tuesday, 10:00 AM - 10:20 AM, Location: Salon B
CDISC offers Findings About (FA) and Supplemental Qualifiers (SUPPQUAL) to handle information that doesn't fit into standard domains - or 'Non-standard variables'. They are however, quite distinct from each other and the appropriate use for each may still lead to confusion. "When should FA be created?" or "When is it best to use SUPPQUAL?" These are important questions that can only be answered by asking additional data questions. When the data does not fit into the parent domain, it may only be mapped to SUPPQUAL if it relates to one parent record. However, almost all other situations are covered by FA - wherein data relates to multiple records, or when a two-way relationship is needed etc. FA would be the right approach then, because it has versatility beyond what's offered by SUPPQUAL. For example, FA would provide a way of storing symptoms along with the time that they began and relating each back to the AETERM in the AE dataset. In addition, FA as a stand-alone domain is also the only place to store information surrounding an event or intervention that has not been captured within any specific domain. This paper will present examples from a few different therapeutic areas or domain relationships to highlight the proper use of FA. Another scenario will look into hoe FA accommodates a many to many relationship. Such examples should clarify mysteries surrounding when and how to best use or create FA.
DS-261 : Raising the Bar: CDASH implementation in a biometrics CRO
Julie Barenholtz, Cytel
Tuesday, 8:30 AM - 8:50 AM, Location: Salon B
Since its first publication in 2011, CDASH has improved reliability in data collected in clinical trials. 67% of current CDASH maps directly to SDTM, providing increased traceability between data collection and data analysis. An advantage to having data management and statistical programming under the same roof is the ability to create efficiencies in processes that lead to more timely submissions and ultimately gets novel treatment to patients more quickly. Having a standard set of CDASH compliant CRFs can greatly reduce the time for CRF design, and time to go-live. The CDASH CRF questions and their intentions are comprehensive and widely understood among the industry. The variable naming is designed in a way that is understood by an SDTM statistical programmer. This allows for a more efficient annotation of the BlankCRF, which helps to place data more quickly and accurately into the right domain for SDTM. The goal of this paper will be to share our efficiencies and methods for creating both a CDASH library, but also the partnership with statistical programming to create standard TLF shells and programs
DS-268 : Best Practices in Data Standards Governance
Melissa Martinez, SAS
Monday, 4:00 PM - 4:20 PM, Location: Salon B
Most organizations working in the pharmaceutical and biotechnology industries have adopted CDISC submission data standards by now, but the challenge of effective governance and compliance within an organization remains high. CDISC data standards are notoriously open for interpretation and the assumptions and understanding of the data standards can vary widely among users. Adopting CDISC standards is more than just making CDISC-style submission data sets and using tools to help with the programming and compliance checks. Each organization needs to define its own interpretation of CDISC standards to ensure consistency among its studies, put in place workflows and processes to facilitate the governance process, determine what it means to be compliant with the CDISC data standards, and find tools to help with the governance process and compliance determination.
DS-304 : Considerations and Updates in the Use of Timing Variables in Submitting SDTM-Compliant Datasets
Jerry Salyers, TalentMine
Tuesday, 1:30 PM - 2:20 PM, Location: Salon B
Often, the appropriate use of Timing variables can present many challenges for sponsors when converting their operational database or legacy data to an SDTM-compliant format. This paper will discuss common scenarios encountered when converting operational data to SDTM. One of the scenarios involve occasions where the CRF allows for checking an "ongoing" box in lieu of providing an end date. In such cases, the SDTM-based datasets require that these data points be represented by the correct use of the Relative-Timing variables (i.e., ---STRF, ---ENRF, --STTPT, --STRTPT, --ENTPT, and --ENRTPT). When doing so, sponsors must address questions such as: 1) ongoing as of what point in time and 2) is the comparison to the study reference period more appropriate or is there an alternative anchor or reference time point that would be better suited? From controlled terminology, what are the best choices for these different scenarios where an end date might be blank? We will also explore the appropriate use of other Timing variables such as when a sponsor may need to express an evaluation interval in plain text rather than in ISO 8601 format (--EVINTX). We will also introduce the new Timing variables that will support the new Trial Milestone and Subject Milestone domains. And finally, we will update areas where we have seen continued issues where data require the use of variables to define sample-collection time points (i.e., --TPT, TPTNUM, and -ELTM), along with the anchors that identify the "reference" or baseline for these collections (i.e., --TPTREF and -RFTDTC).
DS-308 : Using CDISC Standards with an MDR for EDC to Submission Traceability
Paul Slagle, Syneos Health
Eric Larson, Syneos Health
Monday, 8:00 AM - 8:50 AM, Location: Salon B
CDISC Standards have long promised a way to add clarity to a submission through the use of traceability. This includes the ability of tracing the data collected in an EDC system through to the analysis provided to the regulatory authority. The challenge with doing this is that to manage all of the traceability is a documentation headache. With the use of a metadata repository, you can develop screens that can be pushed into an ODM compliant EDC system, such as Medidata Rave. From that push you can receive the raw data and transform that, using SAS, into a format compliant to SDTM standards while maintaining the connections of where the data came from in EDC. Adding Results Metadata, when entered into the metadata repository, then allows the creation of ADaM datasets which are built on SDTM data collected from the EDC system. This paper will demonstrate how an off the shelf MDR tool can be used with SAS to build these connections while maintaining compliance to CDISC standards.
DS-311 : What's New in the SDTMIG v3.3 and the SDTM v1.7
Fred Wood, Data Standards Consulting Group
Monday, 1:30 PM - 2:20 PM, Location: Salon B
The SDTMIG v3.3 and the SDTM v1.7 were released in November of 2018. These versions have been published as HTML documents, rather than the typical PDFs of the past. While the SDTMIG v3.2, which was published in 2013, contained 398 pages, the SDTMIG v3.3 would be more than 600 pages if formatted properly and printed to PDF. A number of new morphology/physiology domains have been added. Other new domains and new concepts have been added, and several domains that had undergone public review in 2014 were subsequently combined with existing domains. The Disease Milestones concept, introduced for the TAUG-Diabetes in 2014 is now included in this latest version of the SDTMIG. A new Section 9 includes Study References, with added models for Device Identifiers, Non-host Organism Identifiers, and Pharmacogenomic/Genetic Biomarker Identifiers. As with any new version of the SDTMIG, there is a corresponding version of the SDTM, which contains new variables in addition to the concept of domain-specific variables. This presentation will summarize the additions to the SDTM and SDTMIG described above, as well as other changes that might affect implementation.
DS-319 : Updates on validation of ADaM data
Sergiy Sirichenko, Pinnacle 21
Monday, 10:30 AM - 10:50 AM, Location: Salon B
Analysis data is critical for regulatory review process. It helps reviewers understand the details of performed analysis and reproduce results reported by sponsors. Clinical study analysis data is required to be submitted in CDISC ADaM format to both FDA and PMDA. Therefore, validation of analysis dat a for compliance with CDISC ADaM standard and additional business rules from regulatory agencies is an important step in preparation of study data for regulatory submissions. In this presentation we will provide an overview of ADaM validation implemented by Pinnacle 21 and used by both FDA and PMDA. It will cover changes related to the new ADaM IG 1.1 standard and the updated version of validation rules from CDISC team. The presentation will also detail additional regulatory business rules, including data and define.xml consistency, validation of ADAM OTHER datasets, SDTM/ADaM traceability, and integrated data.
DS-336 : Metadata Repository V1.0 - A Case Study in Standards Governance
Aparna Venkataraman, Celgene
Monday, 9:30 AM - 9:50 AM, Location: Salon B
Over the last couple of decades Metadata Repository (MDR) tools have been playing a growing role in the Pharmaceutical/Bio-Technology space. It has been long since Regulatory agencies as well as the sponsor companies have, for various reasons, widely recognized the need for integrating data standards starting at Protocol Development and Data Collection all the way through to Submission. However, we still face tremendous challenges in maintaining and managing Clinical Data Standards that fall far beyond the capabilities of any single MDR tool that is currently available in the market. We will explore in this paper, the tools that need to be in the Standards Manager's toolbelt to successfully dispense End-to-End Clinical Data Standards across the company In the multi-dimensional world of Clinical Data Standards, a nimble Governance Model will play an undeniable role in getting that critical drug in the hands of the patient at the right time.
DS-344 : Panel Discussion: Medical Devices - Implementation Through Submission
Tuesday, 11:00 AM - 11:50 AM, Location: Salon B
Moderator(s): Karl Miller, Syneos Health and Mike Lozano, Eli Lilly and Company
- Carey Smoak, S-Cubed
- Donna Sattler, Bristol Myers Squibb
- Fred Wood, Data Standards Consulting Group
- Karin LaPann, Takeda
DS-346 : Implementing CDISC Standards for Device-Drug Studies
Karin LaPann, Takeda
Wenying Tian, Takeda
Tuesday, 9:30 AM - 9:50 AM, Location: Salon B
Devices can be part of a drug or biologic submission and not just stand-alone studies for device development. In the case of the device being developed as a method to deliver drug, it can be included as an ancillary part of a drug submission. The regulatory agencies request that the device be studied separately and as a drug-device combination product. The study teams are then able to apply these standards to create standardized collection forms and datasets beginning-to-end, from CDASH to ADaM. In this case study of an implementation we will describe how we created ADaM device dataset standards using ADaM principles. The data standards at our company were developed using a "beginning to end" approach. That means we started by designing appropriate data collection forms using the CDISC CDASH principles. The device data collection forms were developed in 2012, for our CDASH forms library, and are still in use today. Then we created standards and guidance for mapping the seven SDTM device domains. These followed the guidance of SDTMIG-Medical Devices v1.0 of 2012-12-04. Once the SDTM domains were available, the programming teams were able to develop custom specifications for the device analyses. These were not yet standardized for ADaM in an internal guidance. Using the lessons learnt from these first device-drug studies, we were able to standardize our metadata templates and added examples to our internal ADaM standards Toolkit and Implementation Guide.
FDA WednesdayFDA-G1 : Study Data Topics at FDA/CDER
Sara Jimenez, Mathematical Statistician, FDA
Wednesday, 9:00 AM - 9:20 AM, Location: Salon A - C
Abstract not available
FDA-G2 : CBER Data Standards Update
Elaine Thompson, Senior Staff Fellow, FDA/CBER
Wednesday, 9:30 AM - 9:50 AM, Location: Salon A - C
Abstract not available
FDA-G3 : An Overview of How FDA Business Rules, FDA Validator Rules, and Others Fit Together
Helena Sviglin, Regulatory Information Specialist, FDA
Wednesday, 10:00 AM - 10:50 AM, Location: Salon A - C
Abstract not available
FDA-G4 : FDA Panel Discussion: Evidence, Data and Review: Continuing the Discussion with FDA
Wednesday, 11:00 AM - 11:50 AM, Location: Salon A - C
Moderator(s): Steve Wilson, Senior Staff Fellow, CDER, FDA
- Benjamin Vali, Regulatory Affairs Project Manager, FDA
- Elanie Thompson, CBER, FDA
- Helena Sviglin, Regulatory Information Specialist, FDA
- Sara Jimenez, Mathematical Statistician, FDA
FDA-K : Keynote Address: Regulatory Submissions in the PDUFA Era
Benjamin Vali, Regulatory Affairs Project Manager, FDA
Wednesday, 8:00 AM - 8:50 AM, Location: Salon A - C
Abstract not available
Future ForumFUF-01 : Panel Discussion: Future Forum
Monday, 4:30 PM - 5:20 PM, Location: Franklin 4
Moderator(s): Paul Slagle, Syneos Health
- Bill Donovan, Wright Avenue
- Matt Becker, SAS
- Patrick Nadolny, SCDM
- Yuri Pinzon, Teradata
Hands-on TrainingHT-063 : Developing Custom SAS Studio Tasks for Clinical Trial Graphs
Olivia Wright, SAS
Sanjay Matange, SAS, LinkedIn
Monday, 8:00 AM - 9:50 AM, Location: Salon D
SAS Studio provides point-and-click tasks for basic statistics, biostatistics models, and statistical graphs. SAS Studio users can create their own Custom Tasks by modifying existing tasks or writing new ones using simple text commands from Apache's Velocity Template Language. Custom Tasks can simplify sharing of code by turning complex SAS graphics code into point-and-click tasks. The same functionality can be integrated into a more complete analytic modeling workflow by adding additional graphing options to an existing built-in task. In this workshop, we will work hands-on to create a graphical user interface for code-heavy SGPLOT clinical trial graphs.
HT-067 : Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You
Vince DelGobbo, SAS
Monday, 3:30 PM - 5:20 PM, Location: Salon D
This presentation explains some techniques available to you when working with SAS and Microsoft Excel data. You learn how to import Excel data into SAS using the IMPORT procedure, the SAS DATA step, SAS Enterprise Guide, and other methods. Exporting data and analytical results from SAS to Excel is performed using the EXPORT procedure, the SAS DATA step, SAS Enterprise Guide, the SAS Output Delivery System (ODS), and other tools. The material is appropriate for all skill levels, and the techniques work with various versions of SAS software running on the Windows, UNIX (including Linux), and z/OS operating systems. Some techniques require only Base SAS and others require the SAS/ACCESS Interface to PC Files.
HT-089 : Build Popular Clinical Graphs using SAS
Sanjay Matange, SAS, LinkedIn
Tuesday, 1:30 PM - 3:20 PM, Location: Salon D
Survival Plots, Forest Plot, Waterfall Charts and Swimmer Plots are some of the popular, frequently requested graphs in clinical research. These graphs are easy to build with the SGPLOT procedure. Once you understand how SGPLOT works, you can develop a plan, prepare the data as per this plan and then use the right plot statements to create almost any graph. This Hands-on workshop will take you step-by-step through the process needed to create these graphs. You will learn how to analyze the graph and make a plan. Then, put together the data set with all the needed information. Finally, layer the right plot statements in the right order to build the graph. Once you master the process for these graphs, you can use the same process to build almost any other graph. Come and learn how to use SGPLOT procedure like a pro.
HT-145 : Hands-on Training for Machine Learning Programming
Kevin Lee, Clindata Insight
Tuesday, 10:00 AM - 11:50 AM, Location: Salon D
The most popular buzz word nowadays in the technology world is "Machine Learning (ML)." Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes such as: self-driving vehicles; online recommendation on Netflix and Amazon; fraud detection in banks; image and video recognition; natural language processing; question answering machines (e.g., IBM Watson); and many more. This is leading many organizations to seek experts who can implement Machine Learning into their businesses. Hands-on Training of Machine Learning Programming is intended for statistical programmers and biostatisticians who want to learn how to conduct simple Machine Learning projects. Hands-on Training will go through the following simple steps. 1. Identify the problems to solve 2. Collect the data 3. Understand the data by data visualization and metadata analysis 4. Prepare data - training and test data 5. Feature engineering 6. Select algorithm 7. Train algorithm 8. Validate the trained model 9. Predict with the trained model The training will use the most popular Machine Learning program - Python. The training will also use the most popular Machine Learning platform, Jupyter Notebook/Lab. During hands-on training, programmers will use actual python codes in Jupyter notebook to run simple Machine Learning Projects. In the training, programmers will also get introduced popular Machine Learning modules - sci-kit learn, tensorflow and keras.
HT-171 : Value-Level Metadata Done Properly
Sandra Minjoe, PRA Health Sciences
Mario Widel, Independent
Tuesday, 3:30 PM - 5:20 PM, Location: Salon D
Value-level metadata is nothing mysterious. It is simply a way to describe how a variable is derived when that derivation differs based on some circumstances. When done properly, value-level metadata makes a define.xml more reviewer-friendly. A common use in ADaM for value-level metadata is when AVAL is derived based on PARAM, but this is not the only time to use value-level metadata. This hands-on training will include examples and exercises of value-level metadata in SDTM Findings, in ADaM BDS, and more. Additionally, it will provide guidance to help attendees decide when to use value-level metadata.
HT-177 : Sample Size Determination with SAS® Studio
Bill Coar, Axio Research
Wednesday, 9:00 AM - 10:50 AM, Location: Salon D
Experiments are of designed to answer specific questions. In order for the results to have a reasonable amount of certainty and statistical validity, a sufficient number of observations is required. To achieve this, the concepts of type 1 error, power, and sample size are introduced into the experimental design. Real world constraints such as budget and feasibility are equally as importation. Thus, statisticians often re-evaluate sample size under varying sets of assumptions, sometimes on the fly. SAS Studio provides a variety of tasks associated with determining sample sizes using a point-and-click approach to enter assumptions, yet it also provides the ability to save the underlying SAS code so that it can be either refined or used at a later time. The purpose of this Hands-on-Workshop is to introduce some of the features of SAS Studio for sample size determination. A number of examples will be introduced, including tests associated with proportions, means, and survival analysis. Each exercise will start with a research question, proposed methodology, and list of requirements needed for estimating the sample size. The attendees will then have the opportunity to work through the exercise using SAS Studio, and allow for discussion of adaptations that may be necessary for its use in their everyday programming environment.
HT-188 : Creating & Sharing Shiny Apps & Gadgets
Phil Bowsher, RStudio Inc.
Kelly O'Briant, RStudio
Tuesday, 8:00 AM - 9:50 AM, Location: Salon D
HT-329 : Interactive Graphs
Kriss Harris, SAS Specialists Ltd.
Richann Watson, DataRich Consulting
Monday, 1:30 PM - 3:20 PM, Location: Salon D
This paper demonstrates how you can use interactive graphics in SAS® 9.4 to assess and report your safety data. The interactive visualizations that you will be shown include the adverse event and laboratory results. In addition, you will be shown how to display "details-on-demand" when you hover over a point. Adding interactivity to your graphs will bring your data to life and help improve lives!
HT-347 : The Shape of SAS® Code
Charu Shankar, SAS
Monday, 10:00 AM - 11:50 AM, Location: Salon D
There are many languages that co-exist in the ecosystem of your SAS® toolbox. This Hands-On Workshop teaches you how to use four SAS languages - Base SAS, PROC SQL, Perl language elements, and the SAS® Macro Language - to help you manipulate and investigate your data. Learn to leverage these powerful languages to check your data with simple, yet elegant techniques such as Boolean logic in PROC SQL, operators such as the SOUNDS-LIKE operator in the DATA step and PROC step, functions such as the SCAN function in the DATA step, efficient checking of your data with Perl regular expressions, and last but not least, the amazing marriage between PROC SQL and the SAS Macro Language to hold data you just found in a variable that you can use over and over again. This workshop focuses on coding techniques for data investigation and manipulation using Base SAS.
Leadership and Career DevelopmentLD-023 : Attracting and Retaining the Best!
Kelly Spak, Covance
Tuesday, 8:30 AM - 8:50 AM, Location: Franklin 4
A great manager recognizes the skill needed to attract and retain the best employees. This presentation will identify ways that you can work with your recruiting team that will help set you above the competition by attracting and retaining the best employees. Topics will include recruiting, candidate experience, onboarding, training and continued career development.
LD-106 : Considering Job Changes in an Ever-Changing Environment
Kathy Bradrick, Triangle Biostatistics, LLC
Ed Slezinger, Omeros Corporation
Tuesday, 1:30 PM - 1:50 PM, Location: Franklin 4
What is changing in our industry? The simple answer is& everything. Technology is quickly changing, employment laws are changing, sponsor companies are seeing more merger and acquisition activity than ever, and so are CROs and service providers. So how do you as the SAS programmer navigate all of these changes as you contemplate career and job changes? See perspectives from executives in both pharma and service providers as they explore changes in our industry that are affecting hiring decisions and company strategies.
LD-150 : The human side of programming: Empathetic leaders build better teams.
Bhargav Koduru, Seattle Genetics, Inc.
Balavenkata Pitchuka, Seattle Genetics, Inc.
Tuesday, 3:30 PM - 3:50 PM, Location: Franklin 4
People management is an important aspect of project management. Motivation and passion of team members play a key role in the success of a project. An empathetic leader can understand the emotional needs of the team better, to keep members motivated and productive. Employees who are emotionally satisfied will work harder and are more likely to stay put. In turn, companies will benefit from the members with higher productivity and lower turnover. Besides productivity, compassion also drives innovation. If employees are concerned about the potential consequence of their mistakes, they may feel hesitant to take risks and to derive new solutions. An open environment encourages innovation and experimentation, without fear of failure. Like any other skills, empathy can be developed by constant practice. In this paper, we would like to share ways in which one can identify areas for improvement for an employee while creating a safe environment where he/she can open up and feel free to share, and also to provide recommendation on how to practice empathy to help the member to improve further. Practicing empathy will help to bring the team together and also to create an engaging and active environment that can churn out high-quality deliverables, driving the company's mission and future.
LD-155 : Working from a Home Office Versus Working On-Site
Timothy Harrington, SAS Programmer
Tuesday, 11:00 AM - 11:20 AM, Location: Franklin 4
This paper is a discussion of the benefits and challenges of working remotely as opposed to working on site for an employer or a client. The primary issues addressed include communication, productivity, knowledge and skill sharing, health and well-being, and social and economic factors. The impacts of telecommuting are considered from the standpoint of each of the remote worker, their employer, and the environment and society as a whole.
LD-174 : It's a wonderful day in this neighbourhood - Managing a large virtual programming team
Victoria Holloway, Covance
Tuesday, 8:00 AM - 8:20 AM, Location: Franklin 4
ABSTRACT Programming can be lonely, isolated work within a large virtual team. Whether in the office or at home, everyone needs to be part of something bigger. Creating a virtual neighborhood-style community center enables the programmers to interact with each other as neighbors. Community is the first step for building trust and respect through experience and successes. Important public works projects including infrastructure, such as electricity, water, phone, and internet are essential to the community and can be likened to training, code sharing, work breaks and process improvements. Nothing happens unless there are funds to support these endeavors, so linking the team to the financial aspects is also key to running successful projects. Counting the beans includes how to track the work and project management tools. This presentation will share my experience as a new manager where neighborhood concepts helped ensure that the work was done on time, within budget and by a competent team happy and able to work together. Mr. Rogers had it right: creating "a wonderful day in the neighborhood" allows for successful projects even under difficult circumstances.
LD-200 : Find Your Story
Adam Sales, PRA Health Sciences
Tuesday, 4:00 PM - 4:20 PM, Location: Franklin 4
Healthcare and medical treatment is something we all engage in. As such, we who work in the industry have an extraordinary opportunity to connect to our work and its impact. Ironically, many leaders do not prioritize this bond, failing to help team members develop a meaningful relationship with their work. Instead we often hear about simple motivational encouragement. Leaders promote recognition, instant awards, advancement, and personalized approaches based on employee interests-even though they have minimal control over many of these incentives and all of them are barter systems. In losing sight of the impact of our work, any job can become rote. In contrast, by keeping the significance and impact of our work front and center, the task itself can become motivational. When team members are intrinsically motivated, they understand and are inspired by the impact of their contribution, no longer investing in their career simply because they are committed employees or looking for external rewards. This presentation will focus on how to help team members find their story-the means by which they connect with their role in a deep and meaningful way. The impact of job fulfillment naturally cascades to engagement, motivation, and retention.
LD-207 : Improving the Relationship between Statisticians and Programmers in Clinical Trial Studies
Mai Ngo, Catalyst Clinical Research, LLC
Mary Grovesteen, Triangle Biostatistics, LLC
Vaughn Eason, Catalyst Clinical Research, LLC
Tuesday, 10:00 AM - 10:20 AM, Location: Franklin 4
Successful deliveries of analysis outputs for a clinical trial study depends on a strong biostatistics team, which typically includes a study statistician and a programming team with several statistical programmers. As trials get more complex and biostatistics teams face increased pressures to produce outputs efficiently and on a timely basis, a strong working relationship between the study statistician and the programming team is vital to the success of the analysis project. Yet with the time pressure and the increased complexity of the analysis as well as challenging data issues common in clinical studies, the communication between statistician and programmers tends to break down when it is needed the most. This results in frustration from both sides, inefficiencies that could have been avoided, and stressful last-minute work and rework. Having worked both as statistical programmer and statistician, we have been fortunate to gain valuable hands-on perspectives from both sides. Based on personal reflections as well as conversations with my colleagues, we will present some of the key areas of frustration in the working relationship between a study statistician and the programming team, touch on perspectives from both the programmer and statistician, and offer suggestions for alleviating these issues.
LD-224 : Pains and Gains in software development - from PoC to Market
Reshma Rajput, Ephicacy Lifesciences Analytics
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifesciences Analytics
Tuesday, 10:30 AM - 10:50 AM, Location: Franklin 4
Software development in clinical domain has its challenges in developing products and deploying new technologies. The challenges could be internal such as hiring, training, managing internal expectations within the development, testing and related teams. Also, challenges could external such as meeting customer expectations. During the Proof of Concept (PoC) phase of a project, extensive demonstrations of the product and technologies used are presented to prospective clients. Major issue here is aligning the product knowledge with prospective clients who have very less knowledge of the use cases of the product and in few cases, the domain too and business takes priority weakening the stance of technological value addition in the long run post implementation. Another momentous challenge is lack of enough documentation during the different phases of the project, from PoC to deployment. This could be at either the Client or the Supplier end or both, impacting the current and future releases of the project. Hiring and training are crucial factors in resolving the workforce related challenges within an organisation. All internal functions need to comprehend the relevance of having adequate documentation for User Requirements, User Stories, Trace Matrix, various plans like Project Plan, Implementation Plan, Risk & Mitigation Plan, Migration Plan etc. The Client also needs to emphasise the gravity of documentation. This would pave the building blocks for a successful project and a delighted customer. This paper will share insights on addressing these challenges in an amicable manner meeting the needs of both internal employees and the customers.
LD-252 : Statistical Programming Roles - Time to Reevaluate Job Profiles & Career Ladders
Vijay Moolaveesala, PPD
Ajay Gupta, PPD
Tuesday, 9:00 AM - 9:20 AM, Location: Franklin 4
Be it a CRO or Pharma industry, traditional statistical programming/analyst roles have always been centered around programmer supporting from data programming to TLF (Tables, Listings and Figures) programming. Job profiles, job postings, screenings and hiring have always been focused on evaluating individuals on their well-rounded experience on all expected tasks to be performed by programming department. The job profiles have always been created to hire a generalist who has experience in all the aspects programming to support diverse programming needs of the department. Hiring specialists into the generic roles has been perceived as bottleneck for the department. On the contrary, drastic computing environment changes, Industry wide data standard implementations, complex study designs, and regulatory process requirements can make the generic programmers & programming skills more of bottleneck in the future. To understand these challenges and approaches to support career growth of these specialists, authors would like to take the case of Ajay Gupta, Technical Programming Manager at PPD, as an example to illustrate the career growths of Specialist programmers. He is one of the programming specialists we have at PPD. Ajay comes with experience and background of Information Technology, Non-clinical, Phase 1, Phase II-IV, CDISC. This paper will shed light on some common challenges faced by specialists and provide a roadmap to support their hiring and career growth. These approaches will help to develop a pool of resources within the department to handle specialized tasks and in turn cultivate a sense of well-being in the employee's work environment.
LD-288 : Project Management Fundamentals for Programmers and Statisticians
Jennifer Sniadecki, Covance
Tuesday, 2:00 PM - 2:20 PM, Location: Franklin 4
In the clinical trial environment, nearly everyone from individual contributors to team leaders and managers have the need to utilize project management skills. Because most programmers and statisticians lack formal training in the project management field, it can be challenging to apply and effectively integrate project management principles as part of their day-to-day job duties. This paper will extract the fundamental concepts from A Guide to the Project Management Body of Knowledge (PMBOK® Guide) that are most relevant to programmers and statisticians and provide real world examples where applicable.
LD-295 : Advance Your Career with PROC TM!
John LaBore, Consultant
Josh Horstman, Nested Loop Consulting
Tuesday, 3:00 PM - 3:20 PM, Location: Franklin 4
Most of us would like to advance our careers in one way or another. You may wish to become a respected technical expert in your field, move up into a leadership role, or even go independent as a consultant or small business owner. Regardless of your career aspirations, leadership and communication skills are critical to your success. Toastmasters International provides an avenue to develop these skills in a constructive and supportive environment. This paper will provide a summary of the Toastmasters program based on the authors' combined 20 years of experience with the program. We'll discuss how you can execute "PROC TM" and the benefits it can bring to your career.
LD-296 : Time to COMPARE Programmer to Analyst: Examine the Differences and Decide Best Path Forward
Ginger Barlow, UBC
Carol Matthews, UBC
Tuesday, 9:30 AM - 9:50 AM, Location: Franklin 4
There are almost as many titles for SAS programmers in the pharmaceutical industry as there are programmers: Statistical Programmer, Clinical Programmer, Scientific Data Analyst, SDTM Implementer, Study Lead Programmer, and many more. Regardless of the label, one thing all of these roles have in common is that these people are writing code to turn data into knowledge. In contrast, the same title in different organizations can have very different expectations on how much any given programmer (or analyst) is expected to contribute to the overall scientific process. Programmers and analysts actually have very different, distinct roles. We will explore what makes each unique, how to know which one you are, how to know which one you may be interviewing, and what the future could hold for each. We will conclude with a discussion on strategies for developing programming teams to meet business quality and financial objectives.
LD-313 : Schoveing Series 4: Inspirational Leadership: Grow Yourself into a Class Act and an Unforgettable Leader!
Priscilla Gathoni, AstraZeneca Pharmaceuticals
Tuesday, 11:30 AM - 11:50 AM, Location: Franklin 4
Inspirational leadership is the human side of leadership with effects that draw people to appreciate passion, tenacity, and enthusiasm around them. Would you like to discover this proven approach to leadership characterized by traits such as business ethics, values-leadership, corporate social responsibility, and sustainability? What are the distinguishing qualities or characteristics that typically belong to inspirational leaders? What kind of leader is needed in geographically dispersed teams? Why are inspirational leaders more likely to be successful today in all industries? Is emotional intelligence a catalyst for inspirational leadership? What are the secrets to developing and applying unique ideas and new leadership methods to achieve higher performance and excellence within your company? As a leader, are you able to translate broad strategies into specific objectives and action plans? What are the keys to assuming leadership, arousing enthusiasm with people you work with, developing new techniques for managing change, coaching, influencing, and encouraging continuous improvement, innovation, and risk-taking in any organization? This paper will unlock your ability to answer these very important questions. The goal is to grow and become a better leader who is inspirational, well informed, organized, and applies the latest skills and knowledge, with the hope to make yourself and the world around you a better place. You should proudly say, "I am an inspirational leader, a class act and unforgettable person, in the past, present, and in the future!"
LD-335 : Something Old, Something New: A little programming management can go a long way
Janet Li, Pfizer
Tuesday, 2:30 PM - 2:50 PM, Location: Franklin 4
We propose a statistical programming project tracking tool that will help manage and track programming progress, deliverables, and timelines at the study level. The tool can be used across studies to prepare aggregate reports for upper management and can be updated to fit the needs of any statistical programming organization or team.
LD-342 : Panel Discussion: Speaking Data Science, Who is Ready to Listen?
Tuesday, 4:30 PM - 5:20 PM, Location: Franklin 4
Moderator(s): David D’Attilio, TalentMine and Priscilla Gathoni, AstraZeneca Pharmaceuticals
- Faisal Khan, AstraZeneca Pharmaceuticals
- Kevin Lee, Clindata Insight
- Phil Bowsher, R Studio Inc
- Yuri Pinzon, Teradata
PostersPO-020 : Tame your SHARE with a PYTHON and SAS
Michael Stackhouse, Covance
Terek Peterson, Covance
Tuesday, 3:30 PM - 3:50 PM, Location: e-Poster Station 2
Staying up to date with standards is of the utmost importance, and subtle changes can deviate your data from compliance. With a little bit of SAS, and a little Python, you can easily automate the extraction of CDISC standards metadata using the new SHARE API. These standards files can then be used in SDTM and ADaM development to ensure that you have access to the latest metadata available for specification writing, quality control, and custom conformance checks. Embedded CDISC metadata opens up doors of possibilities when available directly to your Programming team. This poster will explore how you can implement and automate this information to make sure your team never falls behind.
PO-022 : In the Style Of David Letterman's "Top Ten" Lists, Our "Top Ten" PROC SQL Statements To Use in Your SAS Program
Margie Merlino, Janssen Research and Development
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 4
One of the challenges in using PROC SQL is the notion that a programmer must first immerse him or herself in the education of Structured Query Language (SQL) before they can use PROC SQL. Certainly, there is syntax that a programmer must adhere to for successful execution a PROC SQL statement, but we propose that there are many simple and some not so simple statements that can be used without a robust knowledge of all the intricacies of SQL. What follows is a "Top Ten List" ala David Letterman's popular feature. We list ten scenarios and a PROC SQL example that addresses that scenario and can be used to manipulate the SAS data and display a result or provide an output dataset for further manipulation. Our goal is to provide code examples that a programmer can easily copy/paste into their own SAS program.
PO-037 : Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
Stephen Sloan, Accenture
Lindsey Puryear, SAS
Monday, 3:30 PM - 3:50 PM, Location: e-Poster Station 1
The Challenge: Instead of managing a single project, we had to craft a solution that would manage hundreds of higher- and lower-priority projects, taking place in different locations and different parts of a large organization, all competing for common pools of resources. Our Solution: Develop a Project Optimizer tool using the CPM procedure to schedule the projects, and using the GANTT procedure to display the resulting schedule. The Project Optimizer harnesses the power of the delay analysis feature of PROC CPM and its coordination with PROC GANTT to resolve resource conflicts, improve throughput, clearly illustrate results and improvements, and more efficiently take advantage of available people and equipment.
PO-050 : Put on the SAS® Sorting Hat and Discover Which Sort is Best for You!
Louise Hadden, Abt Associates Inc.
Charu Shankar, SAS
Monday, 3:30 PM - 3:50 PM, Location: e-Poster Station 4
Sorting in SAS® is an expensive process in terms of both time and resources consumed. In this session, prepare to explore some of the common and lesser known sorts that SAS provides. Become like the sorting hat in Harry Potter! Instead of waiting with baited breath for your team (or data) to be sorted, get the inside scoop and learn about the dynamic processes that go on behind the scenes during sorting that will enable you to pick the very best sort for your circumstances. Learn about some fantastical, magical SAS sorting teams: bubble sort, quick, threaded and serpentine. Behold the effervescent bubble sort! In a hurry? Take a look at the quick sort. Looking for superior efficiency? Consider the threaded sort. See how the hissing serpentine sort in SAS, like the slithering serpent Nagini sliding surreptiously through walls, can come in handy! Which sort will you choose - or which sort will choose you?
PO-066 : Using Pinnacle 21 Enterprise for define.xml Creation: Tips and Tricks from a CRO Perspective.
Frank Menius, Covance
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 3
Creating a regulatory compliant define.xml 2.0 document can be time consuming and fraught with hazards. It is widely known that Pinnacle 21 Enterprise" (P21 Enterprise) is an efficient and useful tool which can speed up the creation process as well as ensure compliance with regulatory bodies, including the United States Food and Drug Administration (FDA) and the Japanese Pharmaceuticals and Medical Devices Agency (PMDA), but how can a preparer make sure they are taking advantage of all the available benefits and getting the most out of the product? We will answer this key question, as well as detail some common issues that arise while using P21 Enterprise and what are the ways to work around them? Additionally, the paper will detail an overview of define.xml creation process within P21 Enterprise for those who are new to the product, including highlighting tips, tricks, and work-arounds.
PO-108 : Developing Analysis & Reporting Standards For Pharmaco-Epidemiology Observational Studies
Bo Zheng, Merck & Co., Inc.
Xingshu Zhu, Merck & Co., Inc.
Monday, 3:30 PM - 3:50 PM, Location: e-Poster Station 3
In the blossoming world of modern data analytics, there is an unfulfilled need for standardization within real world evidence (RWE). Lack of standardization often leads to longer and more frustrating program development cycles. This paper discusses our experience with developing standards across RWE primary data collection (PDC) studies. We developed a process for Pharmacoepidemiology PDC studies by standardizing variables based on existing CDISC conventions, developing data quality review tools, and creating a set of modular macros that creates a unified TFL deliverables package. Having a core standardization process in place is beneficial for reducing the time it takes to identify and resolve data issues and getting deliverables to customers. As RWE continues to expand, implementing standards provides a unique opportunity to help guide its path towards even greater acceptance within the scientific community.
PO-110 : Raw data sets tracker: Time and project management based on the volume of available clinical data using SAS® software
Girish Kankipati, Seattle Genetics, Inc.
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 2
Time management and availability of clinical data play an important role in the successful execution of a project. In order to plan programming activities and resources, it is very important to understand the availability of clinical data. The volume of raw data depends on a few aspects including the enrollment speed and the type of clinical trial (Phase I, II, or III). Sometimes, enrollment rates can be slow and can cause data unavailability issues, hindering the programming activities, as programming could be challenging with limited data. To address this issue, it is important to establish a robust procedure to track raw data for the successful completion of project within a given timeline. This paper will discuss how to track raw data availability by creating a raw data set tracker using a SAS® program. This dynamic SAS® program will be demonstrated to create this tracker. The raw data set tracker proposed is to identify the number of data sets that are programmable on a weekly basis. It gives summary statistics on number of subjects and total number of records present in each raw data set during a particular week, displayed as pie and bar charts. Thus, the tracker application will help the programmer plan the activities efficiently (for example, Week 1: DM AE EX; Week 2: DS MH).
PO-130 : When biomarker drives primary endpoint: An oncology case study of SDTM design using multiple myeloma.
Girish Kankipati, Seattle Genetics, Inc.
Bhargav Koduru, Seattle Genetics, Inc.
Tuesday, 10:00 AM - 10:20 AM, Location: e-Poster Station 3
Oncology studies are often driven by imaging, which led to the creation of the tumor-specific TU and TR domains in the SDTM IG 3.1.2, where the capture of the scan details and results is described. These domains usually are linked to the RS Domain, which contains the overall tumor response in an oncology study. There are, however, a few oncology conditions like multiple myeloma, which are not driven by imaging but by specific biomarkers. This information would be captured in the LB Domain, in contrast to TU and TR. Biomarkers play an important role in indicating normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. In multiple myeloma, serum free light chain (SFLC), serum and urine protein electrophoresis (SPEP and UPEP), and immunofixation are the key biomarker-related tests that define the standard response criteria. In this paper, we would like to share how we have mapped the efficacy biomarker tests in the LB SDTM domain, as well as the safety-related tests, while maintaining a clear demarcation between both by using LBCAT or LBSCAT, and other additional variables allowed under the findings observation class (SDTM v1.4). This helps to maintain the distinction and also ease the design of the efficacy- and safety-related ADaM datasets. At the SDTM level, we have also leveraged the RELREC to create traceability between the efficacy data in the LB to that captured in the RS Domain.
PO-137 : A Practical SAS® Macro for Converting RTF to PDF Files
Kaijun Zhang, FMD K&L Inc.
Xiao Xiao, FMD K&L Inc.
Brian Wu, FMD K&L Inc.
Monday, 3:30 PM - 3:50 PM, Location: e-Poster Station 2
As part of clinical trial reporting, it has been a raising demand to create a large numbers of RTF outputs when a study reaches its major milestones in a study. CROs often receive requests from clients on converting all reports in a user-friendly file format document, usually PDF format, to ease delivery and facilitate review process. One solution for RTF to PDF conversion is via some ready tool. However, if the RTF file size is extremely large, the conversion in this approach can be very time-consuming, especially when there are multiple large outputs files that needs to be converted and combined. It may take hours to finish the task and add an extra challenge for a timely delivery. The mechanism under SAS System for OS/2 and Windows enables it to talk to other PC applications by Dynamic Data Exchange (DDE). Specifically, DDE enables the creation of fully customized Microsoft Word documents from a Base SAS program, and is therefore ideal for automating the conversion of RTF files to high quality PDF files. This paper presents an alternative effective method where a practical SAS macro was developed and utilized under 2 scenarios: a.) quickly convert an extremely large size RTF file into a PDF file; b.) convert a queue of RTF files reliably into PDF files. The details of step-by-step macro development were also provided and discussed.
PO-144 : SDSP (Study Data Standardization Plan) Case Studies and Considerations
Kiran Kundarapu, Merck & Co., Inc.
Tuesday, 3:30 PM - 3:50 PM, Location: e-Poster Station 3
This poster is intended to cover SDSP sample and SDSP case studies focusing on different stages within a drug development lifecycle. Drug development lifecycles and stage gates considered: " New programs (pre-IND, IND) " Ongoing Programs (retrospective/End of phase II/Type-C meeting/pre-NDA/pre-BLA) " Already approved programs (sNDA/sBLA). In addition, scenarios where single versus multiple SDSPs for a program should be considered based on IND, indication and population. Additional SDSP topics and considerations will cover: " Critical items required to develop a SDSP " Nuances in CDER Versus CBER SDSP recommendations/requirements " CBER Appendix key recommendations for completion. " Versioning the SDSP covering updates and checks " Consistency checks between SDSP and other submission documents
PO-218 : A Cloud-based Framework for Exploring Medical Study Data
Peter Schaefer, VCA-Plus, Inc
Tuesday, 10:00 AM - 10:20 AM, Location: e-Poster Station 2
The poster will present a cloud-based framework used to implement tools for exploring medical study data. The framework allows to easily integrate the kind of scripts that are used by the FDA in their JumpStart service or scripts that create the type of data analyses and TLFs (tables, listing, figures) as suggested in the white papers published by the PhUSE working group "Standard Analyses & Code Sharing". On one side, the poster will explain the underlying cloud-based platform which allows for a data-driven implementation of the framework itself. On the other side, the poster will show the concepts of the framework itself and how metadata about the analysis and the TLFs are integrated and driving the user interface of the resulting applications. Finally, we will show some examples based on the scripts that the FDA released to the PhUSE "Standard Analyses & Code Sharing" working group. Interested parties will have the option to see a demo and discuss the framework concepts in detail with the author.
PO-221 : Why waiting longer to check log file when SAS program in Execution? Lets Find bugs Early!!
Prakash Subramanian, Quartesian Clinical Research
Thamarai Selvan, Quartesian Clinical Research
Tuesday, 10:00 AM - 10:20 AM, Location: e-Poster Station 4
Prompt error alerts through emails while an error is encountered, will help user to save the run time and restrict the creation of empty datasets from the subsequent codes. It also helps in monitoring the status of the SAS programs submitted without being in front of the computer. While running the program, if SAS finds any error in the current line, it will skip the current statement and will start executing the next statement and the process will be complete only after the execution of the end statement of the program. There are programs which takes more than a day to complete. In such cases, user has to open the log file in read only mode very frequently to check for errors, warnings and unexpected notes. The user will have to terminate the execution of program manually if any such messages are identified else the user will be notified with the errors in the log file only at the end of the execution. Our proposal is to run the parallel utility program along with the production program to check the log file of the current SAS program and to notify the user through an email while encountering an error, warning or unexpected notes in the log file. Also the execution can be terminated automatically and the user can be notified if any potential messages are identified.
PO-225 : Badge in Batch with Honeybadger: Generating Conference Badges with Quick Response (QR) Codes Containing Virtual Contact Cards (vCards) for Automatic Smart Phone Contact List Upload
Troy Hughes, Datmesis Analytics
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 1
Quick Response (QR) codes are widely used to encode information such as uniform record locators (URLs) for websites, flight passenger data on airline tickets, attendee information on concert tickets, or product information that can appear on product packaging. The proliferation of QR codes is due in part to the broad dissemination of smart phones and the accessibility of free QR code scanning applications. With the ease of self-scanning QR codes has come another common QR code usage-the identification conference attendees. Conference badges, emblazoned with an attendee-specific QR code, can communicate attendee contact and other personal information to other conference goers, including organizers, vendors, potential customers or employers, and other attendees. Unfortunately, some conference organizers choose not to include QR codes on conference badges because of the complexity and price involved in producing and including the QR codes. To that end, this text introduces flexible Base SAS® software that overcomes this limitation by dynamically creating attendee QR codes from a data set containing contact and other information. Furthermore, the flexible, data-driven approach creates attendee badges that can be maintained and printed by conference organizers. When a badge QR code is scanned by a fellow conference goer, the attendee's personal information is uploaded into a variant call format (VCF) file (or vCard) that can be uploaded automatically into a smart phone's contact list. Conference organizers are able to customize and configure badge format and content through a CSS file that dynamically alters badges without the necessity to modify the underlying code.
PO-253 : F2Plots: Visualizing relative treatment effects in cancer clinical trials
Yellareddy Badduri, QUARTESIAN
Tuesday, 3:30 PM - 3:50 PM, Location: e-Poster Station 1
In every year there were many clinical trials are conducting on different types of Cancers. With Cancer trials increasingly reporting nontime-to-event outcomes, data visualization has evolved to incorporate parameters such as responses to therapy, duration and degree of response, and novel representations of underlying tumor biology. Graphs and figures are excellent tools for data visualization and they have capability to display data figuratively and enables rapid interpretation. F2 plots (Forest and Funnel) were initially developed for presenting results of meta-analysis. Forest plot is an intuitive, convenient and used to show the relative treatment effect of an intervention between groups within the larger cohort. Forest plot is easily understood constitute several horizontal lines, which represent the 95% confidence interval, and a central symbol in the middle of the line segment, which represents a point estimate that is usually the median or mean. Funnel plots are scatter plots of the effect estimates from individual studies against some measure of each study's size or precision. Another advantage of funnel plots are that there is no spurious ranking of institutions, the eye is naturally drawn to important points that lie outside the funnel, there is allowance for increased variability of smaller units and it is easy to produce with standard spreadsheet. This presentation will explain about different SAS programming approaches for producing both forest and the funnel plot, and representations that used to illustrate treatment effects.
PO-274 : A Macro to Expand Encrypted Zip Files on the SAS LSAF Environment
Steven Hege, Alexion Pharmaceuticals, Inc.
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 6
This paper will introduce a short SAS macro that uses java objects to expand zip archive files located on the SAS Life Science Analytics Framework (LSAF). This macro runs on LSAF storing files directly on the environment in a specified folder and can expand password encrypted archives. Our group has found this macro useful when handling data archives uploaded directly into LSAF.
PO-277 : Flagging On-Treatment Events in a Study with Multiple Treatment Periods
David Franklin, IQVIA
Tuesday, 3:30 PM - 3:50 PM, Location: e-Poster Station 4
Typically, for data like Adverse Events, the common practice is to flag events that occur on treatment with a definition of something like "Event starting on or after date of first treatment and 30 days after date of last treatment". This fairly easy to program, but when things like a cycle of treatment may have multiple periods due to patient events, this usual approach does not work. This paper introduces the ADaM variable APHASE and how this solved a structural problem with the data, presents a macro that was very useful in producing the solution.
PO-282 : Joining the SDTM and the SUPPxx Datasets
David Franklin, IQVIA
Monday, 10:00 AM - 10:20 AM, Location: e-Poster Station 5
When creating ADaM datasets from SDTM datasets, programmers often do not understand why we have the supplemental SUPPxx datasets and how to deal with the structure of the data with these datasets. This paper will briefly explain why we have these supplemental datasets, and more importantly, introduce a macro that will bring these to their 'parent' and make their use easier to handle.
PO-305 : Merging Sensor Data with Patient Records in Clinical Trials - Systems and Benefits
Surabhi Dutta, Eliassen Biometrics and Data Solutions
Shubhranshu Dutta, Student at Biomedical Sciences Academy, HCVSD
Tuesday, 10:00 AM - 10:20 AM, Location: e-Poster Station 5
Patient care involves data capture in all phases of care delivery including clinical trials, provider visits, lab work, censor data from wearable censors and hand held devices. We are accustomed to devices that generate health indicator data in large volumes and rapid rate. This paper discusses the benefits, challenges and methods of merging these disparate data sets characterized by different types, volumes and velocities.
PO-324 : Bayesian Methods for Treatment Design in Rare Diseases
Xuan Sun, Ultragenyx
Ruohan Wang, Ultragenyx
Tuesday, 10:00 AM - 10:20 AM, Location: e-Poster Station 1
During the last thirty years, Bayesian methods have been developed rapidly with the explosively growth of high-speed computers. Bayesian Methods are quite useful and easier to interpret with the graphical displays of treatment effects by modeling. In rare diseases, we could not have the access to large samples for a parametric data analysis. Under this circumstance, Bayesian Method might be a more flexible framework for rare diseases treatment modeling. In this article, the challenge of rare diseases is introduced in section I. The Bayesian Method and how to use Bayesian Method in rare diseases treatment prediction is introduced in section II. And Bayesian Method examples are shown in section III. SAS BAYES statement can be used to solve small sample diseases problem. And Proc PHREG uses likelihood method and give assumption for different prior distribution, based on which, we can summarize whether the variables should be included to analyze the efficiency of the medicines.
Programming TechniquesBP-029 : Lit Value Locator: An efficient way to pinpoint data.
Varunraj Khole, Thomas Jefferson University
Tuesday, 1:30 PM - 1:40 PM, Location: Franklin 1
This paper describes a method to locate any alphanumeric text string, in character or numeric format within any dataset present in default as well as user defined SAS libraries. Often a times, data presented to us comes in different format and is stored in multiple locations. To ensure that all the required data is captured, a user needs to implement smart coding which can find the required data and give its location. This method would be very useful while searching for a particular data string when the user is working with multiple SAS libraries. This method can be an essential tool for a data analyst working in finance, pharmaceuticals, life sciences, healthcare, banking and various other industries. By combining powerful SAS macro language, procedures, efficient do loops and handful of extremely useful SAS functions this method will provide the user with a detailed description of the search value and provide summary statistics for the same. This method is an example that an extremely complicated task can be achieved with very simple and efficient coding.
BP-036 : Reducing the space requirements of SAS® data sets without sacrificing any variables or observations
Stephen Sloan, Accenture
Monday, 2:00 PM - 2:20 PM, Location: Franklin 1
The efficient use of space is very important when working with large SAS data sets, which could have millions of observations and hundreds of variables. We are often constrained to fit the data sets into a fixed amount of available space. Many SAS data sets are created by importing Excel or Oracle data sets or delimited text files and the default length of the variables can be much larger than necessary. When the data sets don't fit into the available space, we sometimes need to make choices about which variables and observations to keep, which files to zip, and which data sets to delete and recreate later. There are things that we can do to make the SAS data sets more compact and use our space more efficiently. They can be done in a way that allows us to keep all the desired data sets without sacrificing variables or observations. SAS has compression algorithms that can shrink the space of the entire data set. In addition, there are tests that we can run that allow us to shrink the length of different variables and evaluate whether they are more efficiently stored as numeric or as character variables. These techniques often save a significant amount of space; sometimes as much as 90% of the original space is recouped. We can use macros so that data sets with large numbers of variables can have their space reduced by applying the above tests to all the variables in an automated fashion.
BP-039 : It's All About the Base-Procedures
Jane Eslinger, SAS
Monday, 2:30 PM - 3:20 PM, Location: Franklin 1
As a Base SAS® programmer, you spend your day manipulating data and creating reports. You know there is a procedure that can give you what you want. As a matter of fact, there is probably more than one procedure to accomplish the task. Which one should you use? How do you remember which procedure is best for which task? This paper is all about the Base procedures. It explores the strengths of the commonly used, nongraphing procedures. It discusses the challenges of using each procedure and compares it to other procedures that accomplish similar tasks. The first section of the paper looks at utility procedures that gather and structure data: APPEND, COMPARE, CONTENTS, DATASETS, FORMAT, SORT, SQL, and TRANSPOSE. The next section discusses the Base SAS procedures that work with statistics: FREQ, MEANS/SUMMARY, and UNIVARIATE. The final section provides information about reporting procedures: PRINT, REPORT, and TABULATE.
BP-040 : Running Parts of a Program while Preserving the Entire Program
Stephen Sloan, Accenture
Tuesday, 1:45 PM - 1:55 PM, Location: Franklin 1
The Challenge: We have long programs that accomplish a number of different objectives. We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: " Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don't want to re-run the parts that ran successfully. " We don't want to recreate data sets that were already created. This can take a considerable amount of time and resources, and can also occupy additional space while the data sets are being created. " We only need some of the results from the program currently, but we want to preserve the entire program. " We want to test new scenarios that only require subsets of the program.
BP-057 : The Power of PROC FORMAT
Jonas Bilenas, A Bank Near You
Kajal Tahiliani, GlaxoSmithKline
Monday, 3:30 PM - 4:20 PM, Location: Franklin 1
The FORMAT procedure in SAS® is a very powerful and productive tool, yet many beginning programmers rarely make use of it. The FORMAT procedure provides a convenient way to do a table lookup in SAS. User-generated FORMATS can be used to assign descriptive labels to data values, create new variables, and find unexpected values. PROC FORMAT can also be used to generate data extracts and to merge data sets. This paper provides an introductory look at PROC FORMAT for the beginning user and provides sample code that illustrates the power of PROC FORMAT in a number of applications. Remember, SQL is a table join and not a table lookup, Using a FORMAT table look up uses a binary search method that is very powerful and more efficient that SQL. Additional examples and applications of PROC FORMAT can be found in the SAS® Press book titled "The Power of PROC FORMAT."
BP-061 : Proc Sort Revisited
Alex Chaplin, Bank of America
Monday, 4:30 PM - 4:50 PM, Location: Franklin 1
Proc sort can do more than sort your data. Revisit proc sort to see how you can select records and fields, rename and format fields, compress the output to save space, reuse space in your input dataset, remove and save off duplicate records and keys. Understand the difference between using the nodupkey and noduprecs options and how they affect your results at the aggregate and detail level. Software is base SAS. Example code I've written to accompany the presentation can run in SAS University Edition or SAS On Demand for Academics because it uses SASHELP datasets as inputs. Intended audience is beginner to intermediate level SAS programmers.
BP-065 : ODS Magic: Using Lesser Known Features of the ODS statement
Michael Stout, Johnson & Johnson Medical Device Companies
Tuesday, 2:00 PM - 2:10 PM, Location: Franklin 1
It is no allusion, the ODS output destination is a powerful tool that every SAS programmer should know how to use. This paper will provide details on lesser known aspects of ODS that are magical. With a slight of hand, this paper will show how to send desired SAS output to multiple output destinations. Learn techniques to combine tables, listings and figures to a single output file. What you thought was impossible, may be easy using lesser known features of ODS.
BP-079 : Performing Analytics on Free-Text Data Fields: A Programer's Wurst Nitemare
Michael Rimler, GlaxoSmithKline
Matt Pitlyk, 1904labs
Monday, 5:00 PM - 5:20 PM, Location: Franklin 1
Case report forms (CRFs) often contain free-text fields for collecting patient information when standard responses do not apply, e.g. 'Other specify' or 'Reason for ...'. Furthermore, the analysis plan may require analyses to be performed on this non-standardized information. Free-text fields are renowned for their difficulty to be programmatically incorporated into analyses due to human nature injecting spelling/grammatical errors or differences in languages across global sites. These fields also tend to be very difficult to monitor. Requesting sites to 'modify' content to support cleaner analyses is time consuming, even if the site is responsive and agreeable. We compare methodologies for classifying a record into a binary response based on information collected via a free-text field. For example, identifying all collected concomitant medication records taken for Chronic Obstructive Pulmonary Disease (COPD) when 'Reason for Therapy' is collected via free-text. Methodologies include typical techniques in SAS (brute force string search and fuzzy matching) and machine learning (clustering and classification algorithms). The basis of methodology comparison includes (i) degree of code complexity, (ii) measures of inaccuracy such as precision and recall, and (iii) the maintenance burden of code along the study life-cycle over iterative data cuts. In our problem, machine learning algorithms using logistic regression and random forest classifiers exhibit the lowest incidence of false negatives on the test data. The brute force technique also exhibits a low incidence of false negatives, but code maintenance with this method is arguably more burdensome than the two machine learning algorithms as new data is observed.
BP-084 : Useful SAS techniques in Efficacy Analysis for Oncology studies
Joy Zeng, Pfizer
Tuesday, 8:00 AM - 8:20 AM, Location: Franklin 1
Oncology refers to the research on prevention, diagnosis and treatment of cancer. Oncology studies are, in general, more complex than studies in other therapeutic fields. This paper summarizes the primary sources of complexity, including endpoints, data collection, AE reporting, tumor assessment under RECIST, oncology-specific domains, and special statistical analysis for efficacy data. This paper also discusses multiple oncology-related statistical methods (e.g. Cox regression, Kaplan-Meier, log-rank tests) and graphical data representations (e.g. waterfall plots, bar charts, mean standard error plots, spaghetti plots, and forest plots). Finally, the relevant SAS code is given for all of these methods and representations, with the goal of providing statistical programmers the necessary knowledge and tools for creating and validating tables and figures from oncology studies.
BP-105 : ADaM 1.1 Compliant ADEVENT and ADTTE development in a Cardiovascular Study
Chao Su, Merck & Co., Inc.
Tuesday, 8:30 AM - 8:50 AM, Location: Franklin 1
Many major endpoints such as Death, Stroke and Myocardial Infarction are observed in cardiovascular (CV) studies. Besides these major endpoints, additional detailed information of these events can also be collected at different levels. In CV studies, results from different evaluators, typically investigators and adjudicators, are collected and applied for the same event in analysis reports. Therefore, it is more complicated to build corresponding ADaM datasets to store these data used for analysis. In this paper, a CDISC compliant Basic Data Structure (BDS) dataset ADEVENT is developed to store all collected major endpoints and corresponding detailed information of these events. This dataset is used to support table of concordance between investigators and adjudicators. The summary of time to event is derived from dataset ADEVENT and stored at dataset ADTTE. The traceability between ADEVENT and ADTTE is also described and discussed at this paper.
BP-115 : Freq Out - Proc Freq's Quick Answers to Common Questions
Christine McNichol, Covance
Tuesday, 2:15 PM - 2:25 PM, Location: Franklin 1
Proc freq, true to its name, gives frequency counts, as well as other informative statistics, and those frequency values can be output to a dataset. But proc freq can do more than just count. Its ability to provide a unique list and flexibility to use unsorted data can save both time and keystrokes in a variety of scenarios. Combining these features with the out= option, provides another method to add to the programming arsenal for a way to grab a list of subjects or parameters, investigate a difference or do a quick comparison. This paper will look at how proc freq and its functionality can help with a quick response to common questions such as: What subjects were included in this count? What and how many subjects/records are impacted by this data issue? What does the data show for these problem subjects in another dataset? Is there uniqueness within the data by these variables? Though it might not be the obvious choice, using one proc freq can take the place of multiple steps including procs sorts, data steps and prints to answer these questions. Additionally, the output generated from the proc freq method can very easily be exported by rows, columns, or selections to provide clean and clear responses to ad-hoc requests.
BP-124 : A Quick Way to Cross Check Listing Outputs
Shunbing Zhao, Merck & Co., Inc.
Tuesday, 2:30 PM - 2:40 PM, Location: Franklin 1
Very often listings are required to support summary reports in clinical trials. Normally these listings are in RTF format, and each one could be tens or hundards pages in length. Since a single study could involve dozens of listings, It becomes very challenging to detect potential issues in listing generation process, or cross-check between the summary report and the corresponding listing. This paper presents a convenient way to find out how many subjects are included in each listing, which is a good indicator for cross-checking against summary reports. A macro was developed to go through each listing and produce an overall report, which includes a comprehensive summary tab with information, i.e., file name, title of the listing, number of subjects in that listing; also it includes a separate tab for each listing to display the list of involved subjects. Additionally, a hyperlink is built in the File name in the summary tab so that reviewers can navigate to corresponding list of subjects, and dig further if any discrepancy occurs.
BP-127 : Importing EXCEL Data in Different SAS Maintenance Release Version
Huei-Ling Chen, Merck & Co., Inc.
CHao-Min Hoe, Merck & Co., Inc.
Tuesday, 2:45 PM - 2:55 PM, Location: Franklin 1
It was noted that using PROC IMPORT procedure to convert the same Excel data file to SAS dataset, the outputs could be inconsistent from different computers. The objective of this manuscript is to investigate the inconsistencies and to provide explanations and solutions. Several Excel data files were tested. It was observed that different outcomes were due to the SAS version or the maintenance release on different PCs.
BP-128 : Implementing Laboratory Toxicity Grading for CTCAE Version 5
Keith Shusterman, Reata Pharmaceuticals
Mario Widel, Independent
Tuesday, 9:00 AM - 9:20 AM, Location: Franklin 1
Laboratory toxicity grading has been an important part of safety reporting since the FDA began accepting electronic data in 1999. Staying up to date with the various CTCAE versions can be a challenge. CTCAE Version 5.0 adds a layer of complexity with new grading criteria dependent on baseline measurements. We will present a practical method for deriving toxicity grades in the SDTM LB domain based on the new CTCAE, as well as reporting toxicity events in an OCCDS dataset derived separately from the BDS dataset with the laboratory findings.
BP-132 : Code Generators: Friend or Foe
Janet Stuelpner, SAS
Tuesday, 9:30 AM - 9:50 AM, Location: Franklin 1
Good code generators are invaluable tools. Or are they? SAS is constantly changing; adding new features and functions, adding new tools to the tool bag while making the manipulation of data and the creation of tables more efficient. Many code generators exist in SAS. Some are embraced quickly, Others are not to cling to the old methodologies. This presentation will show how code generators have grown over time and what is available now to make the task of programming easier, quicker and more efficient.
BP-133 : A SAS® Macro to Provide Survival Functions along with Cox Regression Model Efficiently
Chia-Ling Ally Wu, Seattle Genetics, Inc.
Tuesday, 10:00 AM - 10:20 AM, Location: Franklin 1
Depending on study design and analysis needs, multiple statistics generated from different SAS® procedures may be needed for time-to-event analyses. This paper describes a SAS® macro that combines those SAS® procedures to generate survival functions and the Cox proportional hazard ratio in one shot, which can help users save time in coding and generate a customized output quickly and easily. Details of the execution and application of the macro will be demonstrated through examples. The macro call returns two different output layouts, i.e., optional estimates in rows and independent variables in columns, or reversely, the independent variables in rows and estimates in columns. Macro users can choose the optional estimates, such as the number of subjects at risk, the number of events and censored observations, the quartiles of the survival function, the coefficient of predictors, the Cox proportional hazard ratio, and the corresponding p-value. This macro applies the SAS® PROC LIFETEST procedure to compute the survival function by the log-rank test and the Wilcoxon test, as well as running the PROC PHREG procedure based on the Cox proportional regression model to estimate the effect of predictors on hazard ratio.
BP-141 : The Knight's Tour in 3-Dimensional Chess
John R Gerlach, Dataceutics, Inc.
Scott M Gerlach, Dartmouth College
Tuesday, 11:00 AM - 11:50 AM, Location: Franklin 1
Three dimensional chess uses two or more chess boards such that a chess piece can traverse the several boards according to the rules for that piece. Thus, the knight can remain on the board where it resides or moves one or two steps to a successive board, then move its remaining steps. In three-dimensional chess, the Knight's Tour is a sequence of moves on multiple 8x8 chess boards such that a knight visits each square only once. Thus, for three boards, there would be 192 squares visited only once. The paper, The Knight's Tour in Chess - Implementing a Heuristic Solution (Gerlach 2015), explains a SAS® solution for finding the knight's tour on a single board, starting from any square on the board. This paper discusses several scenarios and solutions for generating the knight's tour in three-dimensional chess.
BP-147 : Patient Profile with Color-Coded Track Changes Since Last Review
Himanshu Patel, Merck & Co., Inc.
Jeff Xia, Merck & Co., Inc.
Tuesday, 10:30 AM - 10:50 AM, Location: Franklin 1
Patient profile is a summary of events experienced by patients during the conduct of a clinical trial. It gives a clear understanding into patient encounters and offers a more focused on methodology by recognizing abnormalities in the data. To ensure patient safety and monitor the significant clinical event, study clinical scientists have the responsibility to review subject patient profiles periodically during the conduct of the trial. However, the existing patient profiles generated in the data management system displays subject data in an accumulative way, which means clinical scientists have to review many data records again even they have reviewed them multiple times in previous rounds. It becomes a more and more time-consuming and attention-demanding task for clinical scientists to find the information of interest, such as the new emerging data, data altered/updated since last round of review or records that have been removed in data cleaning process. This paper presents an innovative way to compare the current data extraction against the last one programmatically, and generates patient profiles with color-coded changes: For records with no change since last round of review, the entire row will be set to the color of Cyan to let clinical scientists know there is no need to review them again. On the other hand, new records will be set to Yellow to draw reviewers attention, and the removed records will be set to Green. See attached for details.
BP-175 : Create publication-ready variable summary table using SAS macro
Geliang Gan, Yale Center for Analytical Sciences
Tuesday, 3:00 PM - 3:10 PM, Location: Franklin 1
Variable summary table is an important tool of not only getting to know the data but helping data management by picking up outlier data points. Even though SAS equips us with all procedures needed, it is not a fun task to make one. Creating a variable summary table involves numerous calls on certain SAS procedures. The purpose of this paper is to share a practical way of automatically generating variable summary table with p-values using a carefully designed and easy-to-use SAS macro. Before the macro starts calling any SAS procedure and conducts calculation, a series of foolproof checking steps against data and parameter assignments are run to assure the successful macro execution. After that, the macro utilizes procedures necessary to produce and collect all information needed according to parameter values specified. By default, frequencies and percentages are tallied for categorical variable, Chi-square or Fisher's Exact test is performed for p values. Mean and standard deviation are computed for continuous variable requesting p values with parametric method, T-test or ANOVA is conducted for p values. Median and interquartile range are provided for continuous variable requesting p values with non-parametric method and p values are calculated using Wilcoxon Rank Sum or Kruskal Wallis test. Final reports, a publication-ready summary table, is presented on html format using SAS internal web browser or a Rich Text Format (RTF) file usually pop-opened with Microsoft Word. Macro call, parameter value assignments and tips for special purpose are discussed in detail with examples.
BP-181 : Sum Fun with Flags! Sum Any Flagged Occurrence Data with FLAGSUM and Report It with FLAGRPT or PROC REPORT.
Brendan Bartley, Harvard T.H. Chan School of Public Health
Wednesday, 10:30 AM - 10:50 AM, Location: Franklin 1
In 2016, CDISC created the ADaM Structure for Occurrence Data Structure Implementation Guide, which describes the data structure and content for counting any events in the CDISC world. One of the most commonly flagged SDTM domains as it moves to ADaM is the Adverse Event (AE) domain, although other domains could be flagged as they move to ADaM such as CE, CM, and MH. The concepts behind flagged records in the AE/ADAE domain (AOCCFL, AOCCSFL, AOCCPFL, AOCCIFL, AOCCSIFL, and AOCCPIFL) can apply to the creation of flags for other ADaM dataset equivalent domains (CE, CM, MH, etc.). Instead of creating several reporting macros coded specifically for individual domains, let's be flexible. We describe and demonstrate two macros, FLAGSUM and FLAGRPT, which are flexible enough to handle occurrence data in various ADaM equivalent domains. These macros have the added bonus of being capable of handling non-CDISC data, which allow users to learn and adopt CDISC ways with non-CDISC data. Abbreviated Abstract version put on website: In 2016, CDISC created the Occurrence Data Structure IG, which includes other domains than just Adverse Events (AE), although it is the most popular in the PHARMASUG arena. Why leave the other domains out when a similar table is needed for reporting in other domains? FLAGSUM and FLAGRPT can help.
BP-182 : Compare and conquer SDTM coding
Phaneendhar Gondesi, TechData Service Company LLC
Tuesday, 3:15 PM - 3:25 PM, Location: Franklin 1
Appropriate reporting of raw data in CDISC complaint format is very important in a submission. Mapping and coding of SDTM domains could be very challenging especially for a new SDTM programmer. This paper aims to ease SDTM mapping and coding by a) Giving general lay out of overall SDTM domain programming, b) Identifying common trend for coding within each class, c) Broad coding comparison between classes.
BP-194 : Macro Templates - Industry Specific SAS® Programming Standardization
Tabassum Ambia, Alnylam Pharmaceuticals, Inc
Wednesday, 9:00 AM - 9:20 AM, Location: Franklin 1
Industries are increasingly focusing on developing their own sets of SAS® macros to support development of datasets (SDTM, ADaM), Tables, Listings, Figures (TLFs) and proceeding towards their own standardization. The use of job-specific standard set of macros affects production time, resources required, programmers' learning opportunities, debugging time and the quality of output. There are pros and cons of using standardized macro templates. This is a quasi-automated process which largely cuts down the effort and amount of time required for programming that helps to complete a huge amount of work within a short time frame, outputs remain consistent and reduce probable programming errors. Stringent use of standardized automated macros also has its challenges which includes limitation of outputs different from standard ones where programmers need to circumvent macros, time required to debug errors, training required for new programmers etc. Being entirely dependent on these macros for a long duration may limit the logical thinking and open coding skills of programmers in developing outputs independently. These skills are essential to remain open to a wide range of programming needs. This paper discusses different aspects of the extensive use of industry standardized macros.
BP-219 : Making the Days Count: Counting Distinct Days in Overlapping or Disjoint Date Intervals
Noory Kim, Synteract
Tuesday, 3:30 PM - 3:40 PM, Location: Franklin 1
Suppose you want to combine multiple date intervals. How do you avoid double-counting calendar days when the intervals overlap? Previous papers on this topic have followed a sensible "combine-then-count" method in principle, but in practice have presented code implementations whose calculations are not easy to follow. This paper presents a macro implementation that follows the "combine-then-count" paradigm more closely for the purpose of having calculations that are easier to follow and verify.
BP-227 : From Lesion size to Best Response - Implementing RECIST through programming
Ankit Pathak, Rang Technologies
Tuesday, 3:45 PM - 3:55 PM, Location: Franklin 1
RECIST stands for Response Evaluation Criteria in Solid Tumors and serves guidance for assessing tumor shrinkage and disease progression, an important endpoint in many Oncology Clinical Trials. Investigators, cooperative groups, industries, and government authorities use RECIST, first published in 2000. Currently the revised version is RECIST v1.1, published in 2008 (Eisenhauer E.A. et al). This paper looks at RECIST v1.1 from a programmer's perspective to derive the Best Overall Response of a subject using collected Lesion data from across multiple visits in an ongoing study, and using SAS as a programming language.
BP-255 : End of Computing Chores with Automation: SAS© Techniques That Generate SAS© Codes
Yun (Julie) Zhuo, PRA Health Sciences
Tuesday, 4:00 PM - 4:10 PM, Location: Franklin 1
SAS programmers often have to confront computing chores that require repetitive typing. Manual coding is time consuming and inefficient. It also almost inevitably introduces human errors which compromise quality. While SAS has no shortage of automation techniques, it does not always occur to SAS programmers to write SAS codes that generate SAS codes. In this paper, we focus on three code-generating techniques. Each technique will be demonstrated using a simple practical example of automating variable label creation. Code samples will be provided. This paper also provides programming tips, explores other applications, and compares the three techniques using the simplicity, flexibility, and efficiency metrics.
BP-260 : SMQ SAS Dataset Macro
Mi Young Kwon, Regeneron, Inc.
Ishan Shah, PRA Health Sciences
Wednesday, 9:30 AM - 9:50 AM, Location: Franklin 1
A Standardised MedDRA Query (SMQ) is a grouping of terms from one or more SOCs that relate to a defined medical condition or area of interest. SMQs are created to standardize identification and retrieval of safety data. SMQs are part of each new MedDRA release, which is maintained by MSSO and JMO, and correspond to the terms present in that version of MedDRA. SMQs have been applied in safety and medical reviews, focused searches, signal detections, case alert, and periodic updates for clinical trials and post-marketing analyses and reports. At each new MedDRA release, we will have MedDRA and SMQ ASCII files downloaded from MedDRA dictionary. These ASCII files are with a well-defined hierarchy structure. However, SMQs with child SMQs may not directly link to Preferred Term (PT)/Lower Lever Term (LLT) codes. These files cannot be utilized directly in SAS programming for queries. It is necessary to covert the original SMQ files to SAS datasets with a user-friendly structure. The SAS macro introduced in this paper will covert SMQ ASCII files to SAS datasets. The SMQ SAS dataset is re-structured so that the parent and child SMQs are directly linked to their corresponding PT/LLT codes. The macro produces a single SAS dataset including all SMQ definitions for one MedDRA version. The SMQ dataset will be used to support medical and safety reviews, to generate CDISC compliant datasets, and to support clinical study analysis.
BP-286 : One More Paper on Dictionary Tables and Yes, I Think it Is Worth Reading
Vladlen Ivanushkin, DataFocus GmbH
Wednesday, 10:00 AM - 10:20 AM, Location: Franklin 1
Before writing this paper on dictionary tables I made some research on what was already out there so that I won't duplicate someone else's work. I found quite a number of papers, but I still decided to write my own and to concentrate on how programmers can benefit from using dictionary tables in their everyday life. In this paper I would like to share with you the tasks I actually faced during my work as a statistical programmer and how using dictionary tables makes it so much easier to deal with them. There is quite a variety of them from creating macros to programming STDMs.
BP-289 : Let Your Log Do the Work for You
Yuliia Bahatska, PRA Health Sciences
Vladlen Ivanushkin, DataFocus GmbH
Tuesday, 4:15 PM - 4:25 PM, Location: Franklin 1
Of course, there is no magic trick that will completely free you from writing the code, at least the authors of this paper are not aware of such. However, the tips we suggest can help the programmers to avoid struggling through technical details and let them concentrate on more sophisticated tasks which require deep understanding of the subtleties of programming. In this paper we will show different scenarios where the information such as ODS table names or template specifics can be put into the log by a couple of lines of code. This means that you can use this part of your brain's CPU to learn something else.
BP-302 : History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
Mark Keintz, Wharton Research Data Services
Wednesday, 11:00 AM - 11:20 AM, Location: Franklin 1
Many programming tasks require merging time series of varying frequency. For instance you might have three datasets (YEAR, QTR, and MONTH) of data, each with eponymous frequency and sorted by common id and date variables. Producing a monthly file with the most recent quarterly and yearly data is a hierarchical last-observation-carried-forward (LOCF) task. Or you may have three irregular times series (ADMISSIONS, SERVICES, TESTRESULTS), in which you want to capture the latest data from each source at every date encountered (event-based LOCF). This presentation shows how to use conditional SET statements to update specific portions of the program data vector (i.e. the YEAR variables or the QTR variables) to carry forward low frequency data to multiple subsequent high frequency records. A similar approach works just as well for carrying forward data from irregular time series. We'll also show how to use "sentinel variables" as a means of controlling the maximum length of time data is carried forward, i.e. how to remove historical data that has become "stale." Finally, we will demonstrate how to modify these technique to carry future observations backward, without re-sorting data.
BP-307 : An Overview of Three New Output Delivery System Procedures in SAS® 9.4: ODSTABLE, ODSLIST and ODSTEXT
Lynn Mullins, PPD
Tuesday, 4:30 PM - 4:40 PM, Location: Franklin 1
The SAS® Output Delivery System (ODS) enables programmers to create and manipulate predefined ODS objects in a DATA step to create highly customized output. ODS gives you great flexibility in generating, storing, and reproducing SAS® procedures and DATA step output, with a wide range of formatting options. You can use ODS to accomplish the following tasks: " Create reports for viewers or browsers " Customize the report contents " Customize the presentation " Create more accessible SAS output By default, ODS output is formatted according to instructions that a PROC step or DATA step defines. However, ODS provides ways for you to customize the output. You can customize a single table, graph or the style for all your output. SAS® 9.4 contains many ODS enhancements. One of these enhancements are three new ODS procedures: " PROC ODSTABLE " PROC ODSLIST " PROC ODSTEXT These new ODS procedures allow for the creation of specific types of outputs. You can create your own new tabular output templates with ODSTABLE, ODSLIST creates bulleted lists, and ODSTEXT can be used to create text block templates to generate lists and paragraphs for your output. This paper will discuss these three new ODS procedures using SAS® version 9.4 and examples of how to use them will be given.
BP-315 : Using the PRXCHANGE Function to Remove Dictionary Code Values from the Coded Text Terms
Lynn Mullins, PPD
Tuesday, 4:45 PM - 4:55 PM, Location: Franklin 1
SAS® Perl regular expression (PRX) functions and CALL routines refer to a group of functions and CALL routines that use a modified version of the Perl programming language as a pattern-matching language to parse character strings. You can perform the following tasks: search for a pattern of characters within a string, extract a sub-string from a string, search and replace text with other text, and parse large amounts of text, such as Web logs or other text data. You can write SAS programs that do not use regular expressions to produce the same results as you do when you use Perl regular expressions. However, the code without the regular expressions requires more function calls to handle character positions in a string and to manipulate parts of the string. Perl regular expressions combine most, if not all, of these steps into one expression. The resulting code is less prone to error, easier to maintain, and clearer to read. This paper will discuss how to use Pearl regular expressions to remove those pesky codes that are sometimes at the end of dictionary coded text terms.
BP-340 : Programmatically mapping source variables to output SDTM variables based upon entries in a standard specifications Excel file
Frederick Cieri, Clinical Solutions Group (CSG)
Rama Arja, MedImmune
Zev Kahn, Clinical Solutions Group (CSG)
Ramesh Karuppusamy, Theorem Clinical Research
Monday, 1:30 PM - 1:50 PM, Location: Franklin 1
Based upon programmatically reading the entries from standard worksheets of a SDTM (Study Data Tabulation Model) specifications Excel file workbook, this paper details a macro application to map source variables to output SDTM variables. After learning how to properly fill out the Excel file, the macro should reduce errors and programming time by directly implementing the mappings of the source variables to the output variables. To create the output variables, the macro first reads the Excel file worksheets to create the metadata of the variable mappings. With the variable metadata, the macro reads the raw source data files, maps raw source variables to output variables by variable rename or format transformation, and creates an output dataset of merged and appended raw source data files. For variable traceability, all the raw source variables will be in the output dataset with the original name if the variable is not mapped, or with remapped name plus an added variable containing the original values if a format transformation occurs. For dataset traceability, a variable is added to the output data set detailing the source data or datasets used to create the row. With the Excel file variable entries, the macro should be able to create about 50% or more of the variable outputs.
Real World EvidenceRW-179 : Using Real-World Evidence to Affect the Opioid Crisis
Sherrine Eid, SAS
Andrea Coombs, SAS
Monday, 10:00 AM - 10:20 AM, Location: Franklin 3
Introduction The opioid crisis is growing daily. Overreliance on opioids for pain management has led to the worst drug crisis in American history. Of the nearly 64,000 American deaths in 2016 due to drug overdoses, nearly two-thirds (66%) involved a prescription or illicit opioid. The CDC estimates the total economic burden of prescription opioid misuse in the US is $78.5 billion a year, including the costs of health care, lost productivity, addiction treatment, and criminal justice involvement. Prevention and access to treatment for opioid addiction and overdose reversal drugs are critical to fighting this epidemic. Primary care settings have increasingly become a gateway to better care for individuals with both behavioral health (including substance use) and primary care needs. In order to prevent new opioid use disorder cases, The Center for Drug Evaluation and Research has approved ongoing expansion of opioid Risk Evaluation and Mitigation Strategies and education about appropriate pain management. They are evaluating benefits and risks of currently approved opioids and additional methods to improve prescribing practices. Methods Over 100,000 deidentified, state-level patient claims records were analyzed to assess the likelihood of an opioid complication after receiving a prescription for opioids. Results Our study showed that patients who have a behavioral health diagnosis are almost 55 times more likely to have an opioid complication if prescribed opioids than patients without such a diagnosis. (OR=54.79 CI:[16.50,339.14]) Conclusion Electronic medical records should identify patients who had a previous behavioral health diagnosis to receive alternative therapies to opioids for pain management.
RW-192 : Stratified COX Regression: Five-year follow-up of attrition risk among HIV positive adults, Bamako
Mamadou Dakouo, DATASTEPS
Kriss Harris, SAS Specialists Ltd.
Seydou Moussa COULIBALY Coulibaly, HOSPITAL
Monday, 8:00 AM - 8:20 AM, Location: Franklin 3
This study evaluated the association between longer distance to hospital and attrition (loss to follow-up and death) rate in a cohort of HIV positive adults initiating HAART at the University Teaching Hospital, Point G, Bamako. We included all patients who initiated HAART between July 19, 2004 and July 31, 2009 at the University Teaching Hospital. Patients were considered to be in attrition if they did not show up for consultation within 90 days of their expected visit date. The Stratified Cox model is a modification of the Cox proportional hazards (PH) model that is appropriated when your predictor that does not satisfy the PH assumption. Our predictor does not satisfy the PH assumption. Thus, the Stratified COX model was used to estimate the risk of attrition among patients living further from the hospital. The analysis was adjusted for age, sex, profession, and coinfection. All analyses were performed using SAS V.9.4 (SAS Institute). Of 3042 patients included, 79.5% experienced attrition. Attrition was highest during the first six months of HAART. More frequent attrition was found among individuals living further from the hospital. (reference: Bamako; out of Bamako, HR 1.24[1.11;1.32] ); male (HR 1.21[1.12;1.31]); profession (reference: civil servant, unemployed HR 1.17[1.04;1.32], worker HR 1.39[1.21;1.59]; other HR 1.15[1.00;1.32]); age > 50 ( HR 1.17[1.02;1.32]). This study detected an increased risk of attrition among patients living further from hospital.
RW-199 : Patterns of risk factors and drug treatments among Hypertension patients
Youngjin Park, SAS
Monday, 11:00 AM - 11:20 AM, Location: Franklin 3
The common comorbidities of hypertension are heart disease and diabetes. These comorbidities are generally highly correlated. They can also have a disease progression relationship progressed from one disease to other disease, or co-occurrence of two diseases. Monitoring the characteristics of hypertension patients is a way to prevent disease progression of each comorbidity or co-occurrence of two diseases. Using SAS/RWE a hypertension patients cohort will be constructed using industry-standard episode-of-care definition during a specified time period. The outcome measures such as comorbidities, risk factors, and drugs managing hypertension will be also created while a cohort is created. We monitor these outcome measures over time during a specified time period. We use 3 years' worth of administrative claims data obtained from the publicly available CMS SynPUF Medicare data. The incident rates of comorbidities in hypertension patients with a different sex and age are computed. The risk factors are analyzed over time for each comorbidity. Also, the drugs managing hypertension are analyzed over time. We identify unknown data patterns and characteristics associated with belonging to certain trajectory groups. We use Addin-model feature at SAS/RWE for the statistical analyses. This Add-in model is a template which can generally apply to any other interesting cohort population and any other outcome measures.
RW-232 : Artificial Intelligence and Real World Evidence - it takes two to tango
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifesciences Analytics
Monday, 8:30 AM - 8:50 AM, Location: Franklin 3
Stakeholders and decision makers are increasingly using real-world evidence (RWE) and technology to solve the problems of human health. Real World Data (RWD) and RWE will play a big role in health care decision making. Data sources can include: claims data, electronic medical record data, genomics data, imaging data, sensors, wearables and many others. As big data gathered in real-world healthcare settings becomes more prevalent and robust, it is increasingly being used across the entire healthcare system for evidentiary purposes This data has the potential to guide us create better study design and answer unanswered queries in trial set-up. The data from analytics can further form inputs to medical product development. The utility of AI comes from its application to huge data arising out of RWE information bank. Natural Language Processing (NLP), an AI tool can help in charting unstructured data and provide a contextual meaning. Machine Learning (ML) is being utilized to search through volumes of data, looking for complex relationships using library of algorithms. ML will strengthen the way predictive analytics and prescriptive analytics are being transformed with data. Deep learning (DL) concepts will automate generation of predictive features and have its impact on analyzing data related to image processing, speech recognition and language translation. Innovation in the form AI coupled with big data, real-world evidence which is more dynamic, appropriate, illustrative, complete and cost-effective can be generated.This paper focus of areas of application of AI in ensuring fruitful RWE outcome.
RW-238 : Innovative Technologies utilization in 21st Novel Clinical Research programs towards Generation of Real World Data.
Srinivasa Rao Mandava, Merck & Co., Inc.
Monday, 10:30 AM - 10:50 AM, Location: Franklin 3
R&D budgets of Pharmaceutical industry have been increasing year after year with oncology and metabolic disease drug developments as lead engines. However, return on R&D investments reduced from 10.1 in 2010 to 3.2% in 2017. On the other side, Technology utilization is increasing in the same phase for better handling and cost & time reduction scenarios. Current $ 60 billion clinical research market is extremely slow and there by expensive in some ways. Outdated data technics, confusion and confrontation over eligibility, large numbers of subjects drop-out rates are the prime reasons for longer trial times of about 10 years. For the last few years digital technologies utilization across the industry is increasing with three focused areas such as 1. Engage 2. Innovate 3. Execute in dealing with three key groups such as a) patients b) providers c) payers. It is the key in acquiring more efficient and accurate data collection through all stages of trial life cycle in the current era of 21st century of novel clinical research. Digital technology allows passive collection of data from a variety of different sources including wearables that measure vitals, physical activities and also amounts of sleep. In this paper, we will discuss about utilization of digital and other innovative technologies such as AI, ML and Block Chain methodologies in trial processes and data articulation in order to achieve 21st century novel clinical research objectives.
RW-310 : Real-world data as real-world evidence: Establishing the meaning of data as a prerequisite to determining secondary-use value
Jennifer Popovic, RTI International
Monday, 9:00 AM - 9:50 AM, Location: Franklin 3
The U.S. Food and Drug Administration (FDA) defines real world data (RWD) as data about patient health status that are routinely generated and collected through a variety of sources, such as through provision of clinical care. Real world evidence (RWE) is defined as the clinical evidence gathered through analysis of RWD. Data cannot become evidence until their meaning and value have been established. This is especially true when making secondary-use of data gathered for other primary purposes, as is often the case for use of RWD. 'Meaning' and 'value' are distinct constructs and should be evaluated as such. 'Meaning' is objective, factual and agnostic to data's use. 'Value' is subjective and situational, pegged to data's intended use. This paper introduces a framework that can be applied to discover and evaluate the meaning of data, by focusing on attributes and questions within five distinct data-related categories: provenance, governance, measurement, quality and validity. This paper provides examples of the application of this framework to both traditional and emerging RWD sources that have been used in a secondary-use manner as evidence or are being explored for their secondary-use potential.
RW-345 : Applications and Their Limitations of Real-World Data in Gene Therapy Trials
Karen Ooms, Quanticate
Monday, 11:30 AM - 11:50 AM, Location: Franklin 3
There are approximately 7,000 distinct rare diseases that exist affecting 350 million people worldwide, and approximately 80% of those rare diseases are caused by faulty genes. Scientific advances such as the CRISPR/Cas9 genome-engineering system have simplified the pharmaceutical and biotech industry's ability to develop gene therapies especially for single gene mutation disorders. The FDA has more than 700 active INDs for gene and cell therapies and in 2017 approved two cell-based gene therapies - chimeric antigen receptor T-cells (CAR-T) and approved the first gene-therapy product to be administered in vivo which in addition was the first to target a specific rare disease genetic condition. Collins and Gottlieb, of the NIH and FDA respectively, have stated that 'it seems reasonable to envision a day when gene therapy will be a mainstay of treatment for many diseases'. There are unique challenges associated with gene therapy trials especially in those indications which are rare diseases. These challenges include small patient numbers, lack of detailed knowledge of the disease progression, and definition of suitable endpoints. During the presentation, we will discuss how the analysis of Real-World Data can provide insight and help overcome these challenges, and discuss some of the limitations which reduce their acceptance by the regulatory authorities. We will give careful consideration to the following statistical aspects of a trial - definition of the study population given the likely phenotypic heterogeneity of the disease based on data from registry or natural history studies of different disease stages/severity - use of controls, including historical control data - endpoint choice, - identification and validation of suitable biomarkers for accelerated approval
Reporting and Data VisualizationDV-002 : Order, Order! Four Ways To Reorder Variables with SAS®, Ranked by Elegance and Efficiency.
Louise Hadden, Abt Associates Inc.
Monday, 8:00 AM - 8:20 AM, Location: Franklin 4
SAS® practitioners are frequently required to present variables in an output data file in a particular order, or standards may require variables in a production data file to be in a particular order. This paper and presentation offer several methods for reordering variables in a data file, encompassing both DATA step and procedural methods. Relative efficiency and elegance of the solutions will be discussed.
DV-003 : With a Trace: Making Procedural Output and ODS Output Objects Work For You
Louise Hadden, Abt Associates Inc.
Monday, 11:00 AM - 11:20 AM, Location: Franklin 4
The Output Delivery System (ODS) delivers what used to be printed output in many convenient forms. What most of us don't realize is that "printed output" from procedures (whether the destination is PDF, RTF, or HTML) is the result of SAS® packaging a collection of items that come out of a procedure that most people want to see in a predefined order (aka template.) With tools such as ODS TRACE, PROC CONTENTS and PROC PRINT, this paper explores the many buried treasures of procedural output and ODS output objects and demonstrates how to use these objects to get exactly the information that is needed, in exactly the format wanted.
DV-005 : Back to the Future: Heckbert's Labeling Algorithm
Chris Smith, Cytel
Monday, 8:30 AM - 8:50 AM, Location: Franklin 4
There are several different axis tick mark algorithms in existence. We will discover one of these by time travelling back to 1990 to learn about Heckbert's nice numbers algorithm for labeling graph axes. Then, we will travel back to the future to proactively apply this algorithm to clinical data, using ODS Graphics and DYNAMIC variables from the Graph Template Language (GTL). Lastly, we explore other uses of DYNAMIC variables to provide data-driven solutions to graphical outputs. SAS® 9.4 M2 was used in the examples presented. This paper is written for the intermediate to advanced SAS users. In particular, it assumes familiarity with the SGPLOT procedure and GTL.
DV-021 : Applying an Experimental GTL Feature to CONSORT Diagrams
Shane Rosanbalm, Rho, Inc.
Monday, 9:00 AM - 9:20 AM, Location: Franklin 4
SAS added an experimental feature to the TEXTPLOT statement in GTL as part of 9.4 M3. When the OUTLINE option was invoked, this experimental feature allowed the user to capture a dataset with information about where the outline was being drawn using the OUTFILE and OUTID options. This paper is about the application of the experimental OUTFILE and OUTID options in an attempt to make the creation of CONSORT diagrams a little less labor intensive.
DV-024 : Free Duplicates Here! Get Your Free Duplicates!
Kristen Harrington, Rho, Inc.
Monday, 9:30 AM - 9:50 AM, Location: Franklin 4
In clinical trials, it is common to produce an overall summary as well as several subset versions of that summary. The overall summary is typically referred to as "unique" and the subset versions as "duplicates". For traceability, a separate program is required for each individual output; cramming the creation of the unique and duplicates into a single program won't pass muster. The most common approach to this problem is to create a macro. This macro is called by each of the separate unique and duplicate programs to produce the output files. The brute force method to create these separate programs is manually copying the unique program once for each duplicate to be produced, renaming files and changing values of macro subsetting parameters. This paper will explore an automated approach to creating the duplicate SAS programs.
DV-090 : Effective Graphical Representation of Tumor Data
Sanjay Matange, SAS, LinkedIn
Monday, 1:30 PM - 1:50 PM, Location: Franklin 4
In recent months, there has been an increasing interest in combining the "Duration of Treatment" data with the "Tumor Response" information for subjects in a study in one graph. Traditionally, this information has been displayed in separate graphs where the subjects may be sorted by different criteria. In such a case, the investigators have to work harder to associate the subject across the graphs. Displaying the data together, sorted by the tumor response makes it easier for the investigators to understand this information. This paper will show you how to build effective graphical representation of tumor data using SAS and how these graphs can be extended to display additional subject data.
DV-131 : An Innovative Efficacy Table Programming to Automate Its Figure Generation to Ensure Both High Quality and Efficiency
Xiangchen (Bob) Cui, Alkermes Inc.
Letan (Cleo) Lin, Alkermes Inc.
Monday, 10:00 AM - 10:50 AM, Location: Franklin 4
Efficacy table and its figure programming are key part of Statistical Programming to support Clinical Study Report (CSR). Typically efficacy table programming and its figure programming are two totally independent processes. However they share the common SAS codes to generate statistics for table reporting and figure creation, re the reading of efficacy ADaM datasets to subset the records and select the population, and the calling of SAS Statistical procedures to generate statistics. Hence consistency between efficacy table programming and its figure programming is very crucial to achieving the quality. Since programming for CSR Reporting may undergo a lot of changes until very late in the preparation stage, it requires a lot of resources to maintain the consistency. This paper presents a new approach to change these two parallel processes into a "sequential" process by leveraging efficacy table programming from the "common SAS codes" to output extra permanent SAS datasets, which are directly used in figure programming for the automation of figure creation. The maintenance of "consistency" can be automatically achieved. Furthermore, workload for validating figure programming can be dramatically reduced from the double programming to less resource-requiring process. There is a growing recognition that the multiple imputation (MI) method can be used to handle missing values in clinical trials. It can dramatically reduce SAS program running time and helps the programming team's final delivery, especially key data readout. We illustrate it by providing examples of forest plots from subgroup analysis to show how it automates the creation of figures efficiently.
DV-164 : Heat Map and Map Chart using TIBCO Spotfire®
Ajay Gupta, PPD
Monday, 11:30 AM - 11:50 AM, Location: Franklin 4
TIBCO Spotfire is an analytics and business intelligence platform which enables data visualization in an interactive mode. Users can create heat maps and map charts using inbuilt functions in Spotfire. The easiest way to understand a heat map is to think of a cross table or spreadsheet which contains colors instead of numbers. The default color gradient sets the lowest value in the heat map to dark blue, the highest value to a bright red, and mid-range values to light gray, with a corresponding transition (or gradient) between these extremes. Heat maps are well-suited for visualizing large amounts of multi-dimensional data and can be used to identify clusters of rows with similar values, as these are displayed as areas of similar color. Patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Also, Map chart can be useful to show the population density in the world map. This paper will demonstrate some basic heat maps and map chart created using spotfire.
DV-169 : The Power of Data Visualization in R
Oleksandr Babych, Experis Clinical
Monday, 3:30 PM - 3:50 PM, Location: Franklin 4
The ability to build beautiful and meaningful graphs is a valuable skill for the data analyst. R has become a popular programming language in the field of data analysis. Among many of its advantages, it has two essential ones: it is easy to learn and it has a powerful visualization package - ggplot2. Therefore, it wouldn't require a great amount of time learning how to make high-end graphs in R. In this article we will take a look at how to build graphs via ggplot2 and we will consider its underlying concept called the Grammar of Graphics. The central idea of ggplot2 consists in constructing plots by combining different layers on top of each other and using aesthetic mappings to define the graph. The main objective of this paper is to highlight the advantages of this approach by providing various examples and to demonstrate how R can be a powerful tool in the skill set of any data analyst.
DV-184 : Figure it out! Using significant figures from reported lab data to format TLF output
Elizabeth Thomas, Everest Clinical Research, Inc.
Lauren Williams, Everest Clinical Research, Inc.
Monday, 2:30 PM - 2:50 PM, Location: Franklin 4
Laboratory data comes from a variety of sources with differing levels of precision and often needs to be manipulated and/or transformed before the results are reported in a table, listing, or figure. While reporting precision is well-established for many standard lab parameters, pre-specification of reporting precision for a novel lab value can be challenging. One option is to let the data, as reported, determine the precision. This paper gives a short refresher on the rules of significant figures (sig figs). You will learn how to calculate sig figs given a character-formatted result, how to keep track of sig figs for transformed data (sums, averages, ratios, and transformation by a constant), and how to format a result given an unrounded value and sig fig.
DV-214 : DOMinate your ODS Output with PROC TEMPLATE, ODS Cascading Style Sheets (CCS), and the ODS Document Object Model (DOM)
Louise Hadden, Abt Associates Inc.
Troy Hughes, Datmesis Analytics
Wednesday, 9:30 AM - 9:50 AM, Location: Franklin 4
SAS® practitioners are frequently forced to produce SAS output in mandatory formats, such as using a company logo, corporate or regulated government templates and/or cascading style sheet (CSS). SAS provides several tools to enable the production of customized output. Among these tools are the ODS Document Object Model, cascading style sheets, PROC TEMPLATE, and ODS style overrides (usually applied in procedures and/or in originating data.) This paper and presentation investigates "under the hood" of the Output Delivery System destinations and the PROC REPORT procedure and investigates how mastering ODS TRACE DOM and controlling styles with the CSSSTYLE= option, PROC TEMPLATE, and style overrides can satisfy client requirements and enhance ODS output.
DV-220 : How to Build a Complicated Patient Profile Graph by Using Graph Template Language: Turn Mystery to a LEGO Game
Ruohan Wang, Ultragenyx
Chris Qin, Ultragenyx
Monday, 2:00 PM - 2:20 PM, Location: Franklin 4
Patient profiles typically include various data which are associated with each other. A visual report, such as graphical patient narratives, can improve the readability of correlated data to achieve reviewers' needs and efficiently to interpret the clinical results and findings. It is desired for submission and publication in current pharmaceutical industry. Patient profile graphs usually combine multiple forms of data to share a common factor such as time frame. Graph Template Language (GTL) can generate such complicated graphs with power and flexibility. However, GTL might be a mystery to graph beginners, and impress them time-consuming to learn. This paper works as an easy-to-follow building instruction. It shows you how to be a master builder of patient profile graphs from scratch by using GTL like playing a LEGO game. A couple of projects from daily work are given as examples using dummy but practical data. The examples are generated by SAS® 9.4 and in publication quality.
DV-276 : Not That Dummy Data
Yuliia Bahatska, PRA Health Sciences
Monday, 3:00 PM - 3:20 PM, Location: Franklin 4
When it comes to data visualization it is expected that the data is presented the way it is stored in the analysis dataset. Labels may be changed, formats may be applied, or some derivations may be made, but definitely programmers are not expected to add the values to the data. But is it always true? SAS has quite a wide range of ways to create figures and usually at least one of the graphical procedures or GTL can provide you with the required output, but sometimes you still need a workaround. In this paper I will concentrate on the cases when in order to present the data in the desired way it is helpful to add artificial or dummy values.
DV-285 : Forest Plots for Beginners
Savithri Jajam, Covance
Olesya Masucci, Covance
Wednesday, 9:00 AM - 9:20 AM, Location: Franklin 4
For years, forest plots have been created and published in various forms, and frequently used over the last decade because of their popularity. They are very useful plots in that they can be used to display the results of meta-analysis, subgroup analysis, and sensitivity analysis on log scale and linear scale. Although programming techniques have been improved, it is still very difficult to create them. This paper will cover the history of forest plots, explaining how and when forest plots were invented, what a forest plot is, and why they have maintained their popularity. It will then go on to explain step by step the essentials for creating forest plots using the Graph Template Language (GTL) with SAS.
DV-297 : Building Automations for Generating R and SAS Code Supporting Visualizations Across Multiple Therapeutic Areas
Anastasia Alexeeva, Eli Lilly and Company
Mei Zhao, Eli Lilly and Company
William Martersteck, Eli Lilly and Company
Wednesday, 10:30 AM - 10:50 AM, Location: Franklin 4
Besides the usual portfolio work statistical programmers perform throughout the course of a clinical trial, an organization may ask that study teams create additional safety and efficacy analyses or reports across numerous trials. The motivation for these requests may be to understand clinical data better, and to use results of analyses to make important business decisions about a compound or indication. In order to do this, analysts have to write new programs that generate the outputs, create and validate input datasets, and write up the specifications for input datasets and outputs. Whether individual datasets or analyses are executed in SAS, R, or another software is subject to the preference of the programmer or the business need. In this paper, we demonstrate an example of how to automate the generation of SAS and R programs by the use of a trial-specific file that contains global variable assignments. An R program reads this file and passes its contents to functions that create all the R, SAS, and Rmd files needed for the project. The global variables provide the key study specific information to populate the program headers, as well as everything needed to create and validate the data, and generate the output and the specifications for proper documentation. This strategy allows us to seamlessly leverage the benefits of R and SAS in one project, and offers the opportunity other teams to easily apply reference code to another study without having to make tedious modifications to many programs in order to generate results.
DV-323 : Fine-tuning your swimmer plot: another example from oncology
Steve Almond, Bayer Inc.
Wednesday, 11:00 AM - 11:20 AM, Location: Franklin 4
Swimmer plots are an effective graphical presentation of longitudinal data such as periods of treatment/observation, occurrences of events, and the durations of events or subject status. These types of plots are particularly popular for oncology trials, where the treatment and follow-up periods are displayed along with tumor assessment results and the duration of various responder criteria. The ease of creating swimmer plots has increased in successive SAS versions within the ODS Graphics procedures. This paper reviews the basics for constructing the elements of such plots, and provides tips and tricks for implementing aesthetic touches simply with the main SGPLOT procedure (SAS 9.4) statements and without the use of annotations or the Graphics Template Language.
DV-332 : Visualize Overall Survival and Progression Free Survival at the Same Time!
Kriss Harris, SAS Specialists Ltd.
Wednesday, 10:00 AM - 10:20 AM, Location: Franklin 4
Usually we produce Kaplan-Meier plots to show the Overall Survival (OS) and/or the Progression-Free Survival (PFS) profile. It is rare that we see the OS and the PFS on the same graph; the endpoints are usually in two different files. This paper will show you how to produce an animated visualization that you can use to visualize the OS and PFS together which can help you to understand the treatment efficacy better.
DV-349 : Power Up Your Reporting Using the SAS® Output Delivery System
Chevell Parker, SAS
Monday, 4:00 PM - 4:20 PM, Location: Franklin 4
Making sense of a large amount of data is one of the most important aspects of a reporting system. Reporting helps you and others in your organization discover important insights into trends, business strengths and weaknesses, and the overall health of a company. Therefore, report output should be in a format that anyone can understand easily. To create such output, you need to use the correct reporting tools. This paper, written for data analysts, discusses techniques to power up (amplify) the effectiveness of your reporting. These techniques use SAS® Output Deliver System (ODS) destinations (especially the ODS Excel destination) to generate functional, presentation-ready Microsoft Excel worksheets.
Statistics and AnalyticsST-058 : Logistic and Linear Regression Assumptions: Violation Recognition and Control
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine
Tuesday, 8:00 AM - 8:50 AM, Location: Franklin 2
Regression analyses are one of the first steps (aside from data cleaning, preparation, and descriptive analyses) in any analytic plan, regardless of plan complexity. Therefore, it is worth acknowledging that the choice and implementation of the wrong type of regression model, or the violation of its assumptions, can have detrimental effects to the results and future directions of any analysis. Considering this, it is important to understand the assumptions of these models and be aware of the processes that can be utilized to test whether these assumptions are being violated. Given that logistic and linear regression techniques are two of the most popular types of regression models utilized today, these are the are the ones that will be covered in this paper. Some Logistic regression assumptions that will reviewed include: dependent variable structure, observation independence, absence of multicollinearity, linearity of independent variables and log odds, and large sample size. For Linear regression, the assumptions that will be reviewed include: linearity, multivariate normality, absence of multicollinearity and auto-correlation, homoscedasticity, and measurement level. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.
ST-059 : Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine
Tuesday, 1:30 PM - 2:20 PM, Location: Franklin 2
Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables are linearly related, or codependent. The presence of this phenomenon can have a negative impact on an analysis as a whole and can severely limit the conclusions of a research study. In this paper, we will briefly review how to detect multicollinearity, and once it is detected, which regularization techniques would be the most appropriate to combat it. The nuances and assumptions of R1 (Lasso), R2 (Ridge Regression), and Elastic Nets will be covered in order to provide adequate background for appropriate analytic implementation. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.
ST-060 : Jump Start your Oncology knowledge
Xiaoyin Zhong, GlaxoSmithKline
Feng Liu, AstraZeneca Pharmaceuticals
Tuesday, 9:00 AM - 9:20 AM, Location: Franklin 2
As pharmaceutical programmers in oncology, it is critical to understand and interpret the disease jargon and statistical analysis to communicate with statisticians, physicians/clinicians and other study personnel effectively. The purpose of this paper is to introduce key concepts, study design and analysis in oncology to jump start your knowledge base for new oncology programmers or experienced programs who are new to Oncology. This paper will highlight definition of oncology, uniqueness of oncology drug development, and emerging treatment options for oncology as well overview of study design including master protocol design. In addition, oncology efficacy evaluation and data standard will be presented.
ST-081 : Let's Flip: An Approach to Understand Median Follow-up by the Reverse Kaplan-Meier Estimator from a Programmer's Perspective
Nikita Sathish, Seattle Genetics, Inc.
Chia-Ling Ally Wu, Seattle Genetics, Inc.
Wednesday, 9:00 AM - 9:20 AM, Location: Franklin 2
In time-to-event analysis, sufficient follow-up time to capture enough events is the key element to have adequate statistical power. Achieving an adequate follow-up time may depend on the severity and prognosis of the disease. The median follow-up is the median observation time to the event of interest, which is an indicator to see the length of follow-up. There are serveral methods to calculate median follow-up, and we have chosen to use the more robust and code-efficient reverse Kaplan-Meier (KM) estimator in our paper. Median follow-up is a less commonly presented descriptive statistic in reporting survival analysis result, which could pose some challenges to understand. This paper aims to provide the concept of median follow-up, the statistical interpretation of median follow-up both numerically and visually, and the SAS® LIFETEST procedure to delineate survival plots and compute survival function using the reverse KM estimator. We present a simple and robust approach of calculating median follow-up using the reverse Kaplan-Meier estimator by flipping the meaning of event and censor (Schemper and Smith, 1996), i.e., event becomes censor while censor becomes the endpoint.
ST-091 : Equivalence, Superiority and Non-inferiority with Classical Statistical Tests: Implementation and Interpretation
Marina Komaroff, Noven Pharmaceuticals
Tuesday, 10:00 AM - 10:20 AM, Location: Franklin 2
Determining superiority of an experimental treatment versus a standard of care has been a popular objective of randomized controlled trial in the pharmaceutical industry. Superiority is determined by the statistical and clinical significance of a clinical endpoint. However, researchers often question why non-significant p-values cannot be viewed as evidence that the two treatments were equivalent. They might say: if the p-value < 0.05 we assume the null hypothesis is false, so it must also follow that if the null hypothesis is true, the p-value must be e 0.05. It is not clear whether this reasoning is a poor attempt to salvage a failed study or reflects a misunderstanding of null hypothesis testing. This paper will clarify the meaning of p-values and demonstrate how p-values can be used to conclude equivalence (no difference) in treatment effects. There are many papers that review statistical methods for analyzing equivalence, superiority and non-inferiority trials utilizing the POWER, TTEST, TOST and FREQ procedures in SAS/STAT® software. Nonetheless, subtle and confusing issues arise in the application and interpretation of such methods. The author will present simulated data to visually demonstrate how changes in boundary margins affect sample size and power calculations. The goal is to help researchers thoughtfully choose boundary parameters, plan the operating characteristics of an equivalence/inferiority clinical trial, and correctly interpret the results. The paper will provide guidance for not only implementing such studies, but also promotes better understanding of the designs for critically reviewing the published research that utilized such methods.
ST-096 : A macro of evaluating the performance of the log-rank test using different weight for enrichment studies
Chuanwu Zhang, University of Kansas
Byron Gajewski, University of Kansas Medical Center
Jianghua(Wendy) He, University of Kansas Medical Center
Tuesday, 10:30 AM - 10:50 AM, Location: Franklin 2
Log-rank test is a traditional nonparametric hypothesis test often used to compare the time-to-event data of two arms. Different weight functions applied to the log-rank test lead to different tests, e.g., the log-rank, Gehan, Peto-Peto, Fleming-Harrington, etc. Little is known regarding the performance of log-rank test with different weight functions when one arm contains subgroups with different hazard rates, which can be seen in enrichment studies. For example, in a two arms and two subgroups study, compared to the control arm, one treatment subgroup may benefit gradually, while the other treatment subgroup may benefit immediately then lose it. How can we know the most efficient (i.e. best performance) weight functions for log-rank test should be selected for analyzing such data? In this paper, we will develop a macro to examine the performances of the log-rank test using different weight functions in terms of empirical power based on simulated data. Our macro contains three parts. Firstly, data is simulated with a basket of factors. They include the sample sizes and number of simulations, the type of distribution for data, the proportion of subgroups, the proportion of censoring within each arm and the pattern of hazard function for a subgroup. Secondly, the macro executes the log-rank test using different weight functions. Finally, the testing results are presented in a table and a graph to compare the performances. Users can invoke our macro to determine which weight function in log-rank test should be used to meet their needs.
ST-103 : SAS ® V9.4 MNAR statement for multiple imputations for missing not at random in longitudinal clinical trials
Lingling Li, Independent
Tuesday, 11:00 AM - 11:20 AM, Location: Franklin 2
Missing data is a common problem in longitudinal clinical trials. The primary analysis commonly used in clinical trials relies on the untestable assumption of missing at random (MAR) and the sensitivity analysis under plausible assumptions of missing not at random (MNAR) is needed to examine the robustness of the statistical inference obtained from the primary analysis against departure from MAR assumption. Multiple imputations within the framework of pattern mixture models (PMM) is widely used for implementing sensitivity analysis under the assumptions of MNAR. Two MNAR assumptions, control-based pattern imputation and delta adjustment are regarded as clinically plausible, transparent, and easy to implement. SAS ® Version 9.4 PROC MI provides a MNAR statement, with two options MODEL and ADJUST, that allows implementation of the two assumptions conveniently. MNAR statement works with MONOTONE statement to handle monotone missing data and FCS statement to handle arbitrary missing data. This paper discuss the implementation of sensitivity analysis under MNAR assumptions within the PMM framework in longitudinal clinical trial data using MNAR statement. A simulated longitudinal clinical trial data with sample SAS code are described for illustration.
ST-149 : Application of R Functions in SAS to Estimate Dose Limiting Toxicity Rates for Early Oncology Dose Finding
Huei-Ling Chen, Merck & Co., Inc.
Zhen Zeng, Merck & Co., Inc.
Tuesday, 11:30 AM - 11:50 AM, Location: Franklin 2
Pool-adjacent-violators algorithm (PAVA) is a solution to isotonic regression used to estimate dose limiting toxicity (DLT) rates in early oncology trials. In the current version of SAS, there is no ready-to-use statistical procedure to implement PAVA. However, there are multiple R packages available for PAVA estimation. In most pharmaceutical companies, SAS is the mainstream working environment for statistical analysis and reporting. It would be very helpful to exploit the bridge between SAS and R to take advantage of both worlds: R's capability in versatile statistical analysis tools, and SAS's well validated existing statistical procedures and mature reporting system. This paper presents a SAS macro to accomplish this collaborative task by embedding the available R PAVA function into a SAS program. This paper provides details on three important features used to streamline the process of integration. The first feature is how to prepare a SAS environment which enables SAS code to communicate with R. The second feature, how to import and export data between SAS and R. Key PROC IML syntaxes will be provided for demonstration. The Third feature, how to make R package recognizes a SAS macro parameter value so that the R package can easily become a nested sub-macro inside a SAS macro. This macro has been implemented in real-life to support the reporting of DLT summary statistics.
ST-160 : Experiences in Building CDISC Compliant ADaM Dataset to Support Multiple Imputation Analysis for Clinical Trials
Xiangchen (Bob) Cui, Alkermes Inc.
Tuesday, 2:30 PM - 3:20 PM, Location: Franklin 2
Multiple imputation (MI) is becoming an increasingly popular method to address the missing data problem in regulatory clinical trials, especially when the outcome variables come from repeated assessments. SAS procedures, PROC MI and PROC MIANALYZE apply the multiple imputation techniques to generate multiple imputations for incomplete multivariate data and to analyze results from multiply imputed data sets, respectively. How to use PROC MI to build CDISC compliant ADaM dataset to support MI analysis is a new ADaM programming technique. This paper illustrates how to apply ADaM BDS data structure to build such one through an example. We will not present how to use PROC MI procedure, for it has been very well explained in SAS user manual and other papers. However we do provide some tutorial of related statistical concepts to help Statistical programmers to better understand this procedure and apply it in ADaM programming. We mainly focus on ADaM programming logic flow, key variable derivations for the imputed data including ADaM specification writing, programming independent validation process. Some tips and pitfalls provided in this paper could be time-saving ones, and assist you in your programming to achieve technical accuracy and operational efficiency. The sharing of hands-on experiences in this paper is intended to assist readers to prepare CDISC compliant ADaM dataset to facilitate MI analysis in regulatory clinical trials, and further to support FDA submission.
ST-176 : Practical Perspective in Sample Size Determination
Bill Coar, Axio Research
Tuesday, 3:30 PM - 4:20 PM, Location: Franklin 2
Experiments advance science. They are designed to answer specific questions. A well designed experiment accompanied by appropriate statistical methodology should yield scientifically sound results to answer specific questions with a reasonable amount of certainty. To achieve this, the concepts of type 1 error, power, and sample size are introduced into the experimental design. While these abstract concepts are based solely on assumptions, they are critical to the integrity of study results. SAS® provides numerous procedures to assist with these parts of experimental design. The designs in clinical research range in complexity, as do the SAS procedures that support them. This presentation will approach sample size determination from a practical perspective. Even if a set of assumptions are reasonable, they may not result in a feasible sample size. We use a placebo-controlled clinical study to demonstrate the sample size and study design can evolve due to practical considerations. The endpoint is 28-day survival for an often fatal medical condition. It is assumed that 50% of patients will die within 28-days under the standard of care. However, a medical procedure could possibly increase this to 80%. This presentation will discuss the use of PROC POWER, PROC SEQDESIGN, and even simulation to assist with the study design, and how practical considerations may cause the design to evolve.
ST-183 : Cluster Analysis - What it is and How to use it
Alyssa Wittle, Covance
Michael Stackhouse, Covance
Tuesday, 4:30 PM - 4:50 PM, Location: Franklin 2
A Cluster Analysis is a great way of looking across several related data points to find possible relationships within your data which you may not have expected. The basic approach of a cluster analysis is to do the following: transform the results of a series of related variables into a standardized value such as Z-scores, then combine these values and determine if there are trends across the data which may lend the data to divide into separate, distinct groups, or "clusters". A cluster is assigned at a subject level, to be used as a grouping variable or even as a response variable. Once these clusters have been determined and assigned, they can be used in your analysis model to observe if there is a significant difference between the results of these clusters within various parameters. For example, is a certain age group more likely to give more positive answers across all questionnaires in a study or integration? Cluster analysis can also be a good way of determining exploratory endpoints or focusing an analysis on a certain number of categories for a set of variables. This paper will instruct on approaches to a clustering analysis, how the results can be interpreted, and how clusters can be determined and analyzed using several programming methods and languages, including SAS and Python. Examples of clustering analyses and their interpretations will also be provided.
ST-213 : Statistical Assurance in SAS: An Introduction and User's Guide
Jonathan L Moscovici, IQVIA
Milena Kurtinecz, GlaxoSmithKline
Tuesday, 5:00 PM - 5:20 PM, Location: Franklin 2
A long-standing concept in the planning of clinical trials has been that of statistical power. However, traditional calculation of power implies requires specification of several unknown quantities such as measured treatment effect between groups. In this way, the power obtained is a conditional quantity that can be misleading if the underlying assumptions in the calculation are inaccurate. This leads to a miscalculation of the required sample size and other important trial determinants. Statistical assurance is a method of calculating the unconditional probability of success of a trial by assigning probability distributions to unknown parameters such as treatment effect, rather than just one "best guess" estimate. These distributions can be based on expert opinion, available pilot data and clinical considerations. Although not a new concept, statistical assurance adoption has been slow in the biopharmaceutical community partly due to the lack of software implementation. This paper will introduce the basic concepts, clinical cases and implementations in SAS.
ST-259 : Enhancing Randomization Methodology Decision-Making with SAS Simulations
Kevin Venner, Almac Clinical Technologies
Jennifer Ross, Almac Clinical Technologies
Graham Nicholls, Almac Clinical Technologies
Kyle Huber, Almac Clinical Technologies
Wednesday, 9:30 AM - 9:50 AM, Location: Franklin 2
This paper will illustrate how SAS is an effective tool to conduct simulations for making data-driven randomization methodology decisions. SAS can be used to develop simulation programs to investigate the expected treatment balance and other randomization goals with comparing various randomization methods (stratified blocked rand., minimization, etc.) and associated parameters (block size, biased-coin probability, etc.). A Case Study will be provided to show how simulations can evaluate the expected treatment balance for different randomization methodologies / parameterizations being considered; and how simulations can investigate other randomization goals, such as minimum subjects required to be randomized at a site to ensure both treatment arms are represented. This paper will illustrate how SAS simulation programs can be developed with configurable macros that are readily adapted for each individual protocol. Through including macros in the SAS Simulation Programs, different randomization design scenarios are efficiently simulated though minor macro variable updates to allow for swift delivery of statistically sound results. Incorporation of Study-specific details (expected subject / strata distributions) can enhance the precision of the results. SAS macros programming allows for re-evaluation of varying subject distributions as means of testing the robustness of the simulation results. Treatment balance for a clinical trial can be critical for establishing treatment effectiveness. The components of the randomization design that impact treatment balance (e.g., methodology, stratification factors / levels, block size, etc.) should be carefully considered at the protocol design stage. Simulation results can help make impactful design decisions to achieve the optimal treatment balance and randomization goals.
ST-314 : Application of Criterion I2 in Clinical Trials Using SAS®
Igor Goldfarb, Accenture
Mitchell Kotler, Accenture
Wednesday, 11:30 AM - 11:50 AM, Location: Franklin 2
The goal of this paper is to demonstrate how meta-analyses criteria (specifically, I2 and Cochran's Q) can be calculated using SAS® software and applied to the examination of the heterogeneity across subgroups. Some European regulatory authorities require an analysis of the subgroups effect estimated using the criterion I2, which was developed and suggested by Higgins and his colleagues to provide a researcher with a better measure of the consistency between trials in a meta-analysis. Originally the criterion I2 was developed to estimate heterogeneity across the studies, whereas the authorities required its application to the scrutiny across subgroups of the same study. The criterion I2 describes the percentage of total variation across studies that is due to heterogeneity rather than chance. Negative values of I2 are put equal to zero so that I2 lies between 0% and 100%. A value of 0% indicates no observed heterogeneity, and larger values show increasing heterogeneity. SAS® programs were developed to implement the algorithm to calculate I2 and similar parameters (e.g., Cochran's Q). An analysis showed that the SAS® code successfully reproduced published results that were produced using other statistical software. After verification the SAS® code was run on the clinical data and the results obtained were successfully submitted to authority.
ST-321 : PROC MIXED: Calculate Correlation Coefficients in the Presence of Repeated Measurements
Qinlei Huang, Merck & Co., Inc.
Radha Railkar, Merck & Co., Inc.
Wednesday, 10:30 AM - 10:50 AM, Location: Franklin 2
In drug and medical device development, it is often needed to evaluate the correlation between two technologies, platforms, or devices, in a setting of a clinical trial with repeated measurements. The solution to this problem has been studied by many authors. Bland and Altman (1995) considered the problem by comparting the correlation into two components: between-subject and within-subject correlations. Lam, Webb and O'Donnell (1999) approached this problem by using maximum likelihood estimation in the case where the replicate measurements are linked over time. Roy (2006, 2015) solved the problem by considering different correlation structures on the repeated measures. This paper first reviews the statistical methods to estimate the Pearson's correlation coefficient between two measures in settings where multiple observations are available on the same subject; and then presents how to use PROC MIXED in SAS to obtain the parameter estimates of interest. It includes the SAS example codes, macro programming language, as well as examples of hands-on data analysis and outputs. Keywords: PROC MIXED, Correlation Coefficient, Repeated Measurements
ST-325 : Machine Learning Approaches to Identify Rare Diseases
Xuan Sun, Ultragenyx
Ruohan Wang, Ultragenyx
Wednesday, 11:00 AM - 11:20 AM, Location: Franklin 2
Rare diseases are very difficult to identify and diagnose than other diseases, since there are not enough data and experts in rare diseases. Better availability of patient data and improvement in machine learning algorithms empower us to tackle this problem computationally. In this paper, we adapt state of the art machine learning algorithms to make this classification, such as K-nearest neighbors, Support Vector Machine, Neural Networks and Naive Bayes. In this paper, we use R to train and test models. We find that using these machine learning methods, we can identify people with rare diseases with low misclassification rate.
ST-339 : An Introduction to the Process of Improving a Neural Network
YuTing Tian, Student
Wednesday, 10:00 AM - 10:20 AM, Location: Franklin 2
The top-level goal of this paper is to lay out a process for building a neural network in SAS®. It is hoped that a reader can use the process, shown in this paper, as a template for building a Deep Neural Network. A lower level goal is to build a network that can outperform a network in a paper by one of my professors. Deep learning is a kind of neural network and a specific kind of machine learning (e.g. artificial intelligence). Deep learning is a recent and powerful machine learning algorithm that enables a computer to build a multi-layer non-linear model. Even though deep neural networks are popular, not many papers discuss the overall process of building a neural network in SAS. This paper explores a practical application, associated with the process of deep neural network in SAS Enterprise Miner.
ST-348 : Comparing SAS® Viya® and SAS® 9.4: How Their Features Complement Each Other
Amy Peters, SAS
Tuesday, 9:30 AM - 9:50 AM, Location: Franklin 2
SAS® Viya® extends the SAS® Platform in a number of ways and has opened the door for new SAS software to take advantage of its capabilities. SAS® 9.4 continues to be a foundational component of the SAS Platform, not only providing the backbone for a product suite that has matured over the last forty years, but also delivering direct interoperability with the next-generation analytics engine in SAS Viya. Learn about the core capabilities shared between SAS Viya 3.4 and SAS 9.4, and about how they are unique. See how the capabilities complement each other in a common environment. In addition to these core capabilities, see how the SAS software product lines stack up within each one, including analytics, visualization, and data management. Some products, like SAS® Visual Analytics, have one version aligned with SAS Viya and a different version with SAS 9.4. Other products, like SAS® Econometrics, leverage the in-memory, distributed processing in SAS Viya while at the same time including SAS 9.4 functionality like Base SAS® and SAS/ETS®. Still other products target one engine or the other. Learn which products are available on each, and see functional comparisons between the two. In general, gain a better understanding of the similarities and differences between these two engines behind the SAS Platform, and the ways in which products leverage them.
Strategic Implementation, Business Administration, Support ResourcesSI-017 : Avoiding Disaster: Manager's Guide on How to Rescue a Failing Outsourced Project
Dilip Raghunathan, Insmed
Monday, 8:00 AM - 8:50 AM, Location: Franklin 2
Outsourcing has become a large part of the business model for many pharmaceutical companies. It allows the sponsor to focus on its core competencies, keep their workforce lean and scale up when needed. However, an unfortunate by-product of this growing trend is occasional failure of the outsourced vendor to meet the timelines and/or quality of the deliverable. The failure to meet agreed objectives on a deliverable result in loss of time, money, resources, morale and strains the relationship between sponsor and the vendor. Furthermore, such failures in studies/projects that are critical to the organization end up severely hurting the chances of regulatory approval and affect time to market. In this paper, I will present a tested, step-by-step pragmatic approach from the perspective of the sponsor in identifying a failed outsourced project and provide a mechanism to rescue and/or salvage the work. I will also discuss ways to prevent such failures in the futures and increase the chances of success on outsourcing of critical studies/projects.
SI-062 : Get to the Meat on Machine Learning
Aadesh Shah, GlaxoSmithKline
Monday, 11:30 AM - 11:40 AM, Location: Franklin 2
You've probably heard of machine learning and artificial intelligence, but are you sure you know what they are? If you're struggling to make sense of them, you're not alone. There's a lot of buzz that makes it hard to tell what's science and what's science fiction. For many of us, machine learning seems futuristic and scary. Recently, though, it's been showing up, as we hear many new presentations about machine learning at different conferences like Phuse, CDISC Interchange as well as the most famous PharmaSUG. YouTube know which videos you would like to watch in your home section, Facebook recommends local event in your area, also recommends friends, LinkedIn recommends you connect with your ex-boss. And while that's all exciting, some of us are still wondering what exactly machine learning is. This paper will walk you through the process basics, work in practice, Machine Learning vs Artificial Intelligence.
SI-075 : SHIONOGI Global SAS System Renewal Project - How to Improve Statistical Programming Platform
Yura Suzuki, Shionogi & Co., Ltd.
Yoshimi Ishida, Shionogi Digital Science
Malla Reddy Boda, Shionogi Inc.
Yoshitake Kitanishi, Shionogi & Co., Ltd.
Monday, 3:00 PM - 3:20 PM, Location: Franklin 2
SHIONOGI Global SAS (G-SAS) System is statistical programming platform to support Shionogi & Co., Ltd. (Japan) and Shionogi Inc. (US) to develop and validate statistical deliverables of clinical trials such as SDTM, ADaM and TLFs. G-SAS is the core system to collaborate programming activities between Japan and US efficiently, and it enables to operate around the clock. As increasing datasets size, multiple languages and various statistical analyses are causing insufficient memory and hard disk drive. And also to upgrade current SAS 9.2 version to SAS 9.4 version to a new physical server with higher specifications with 64-bit architecture, to handle both SAS versions, and larger datasets, SHIONOGI G-SAS System Renewal Project was established. In Addition, method of access control for specific folders post data base lock for double-blind study was reconsidered to control users in accessing analysis results including actual treatment code and potential un-blinded data to avoid the risk of insider trading. To avoid the potential risk, G-SAS Project Team has developed a SAS macro to remove/grant user groups from/to specific folders. This macro has reduced task for System Operation Team. As a result, SHIONOGI's global statistical programming platform has been improved to be able to handle large datasets, multiple versions of SAS and to control access rights to specific folders and users efficiently. G-SAS System continues to contribute to collaboration of statistical programming activities between multiple locations in SHIONOGI, and contribute to preparation of data submission to regulatory agencies of various countries including FDA and PMDA.
SI-099 : ISS Challenges and Solutions for a Compound with Multiple Submissions in Parallel
Aiming Yang, Merck & Co., Inc.
Monday, 11:00 AM - 11:20 AM, Location: Franklin 2
Integrated Summary of Safety (ISS) is required for new and supplemental drug or biologic applications. For an oncology compound with multiple indications and tens of ongoing trials, challenges seeking sound strategies and execution of integrating trials are numerous. In this paper, we share some challenges and successful executed solutions. There are two parts in the paper. In part I, we share the challenges and solutions of establishing a centralized ISS team. To address the needs of the multiple submissions and filings of this compound, we first established a dedicated ISS team. The specialization enables the same team to work on multiple ISS packages, with successive or concurrent timelines in turnarounds of several weeks. In part II, we share some essential established programming techniques. The ISS programs we use can accommodate multiple versions of SDTM source datasets and meet the needs of today's CDISC standards filing requirements. The presented techniques include 1) Stacking datasets in each SDTM version and then integrating all studies. The validated stacking programs are re-used; making the stacking of dozens of trials just a routine re-run that can be accomplished within one hour magically. 2) Alignment of the ISS implementation for consistency with the submission CSR and integration needs are discussed. The discussion includes details on integrating at the SDTM or ADaM level, baseline derivation, and general principal of essential variables derivations such as TR01SDT, TR01EDT, and TRT01A. It is satisfying seeing the team deliver again and again while working towards simplified, streamlined solutions.
SI-142 : Sponsor oversight: Proof is in the documents
Shailendra Phadke, Servier US
Monday, 5:00 PM - 5:20 PM, Location: Franklin 2
Increasing number of biotech and pharmaceutical companies are conducting clinical trials by either partially or completely outsourcing clinical trial activities to third party organizations (e.g. Contract Research Organizations (CROs) and Clinical Trial Units (CTUs)). There is a clear requirement according to GCP (Good Clinical Practices) that the sponsor must have systems and procedures in place to ensure adequate sponsor oversight. Failure to comply in this area can result in critical findings in regulatory inspections and also prevent the organization from sponsoring any further trials until the issues are resolved. This paper will discuss in detail about the different documents that the biostatistics and statistical programming team at the sponsor can use as evidence to prove sponsor oversight. These documents will help the sponsor in conducting oversight as well as will improve the inspection readiness of the sponsor. Examples of such documents are issue logs, QC plan, Standard checklists used to check quality of the deliverables, data review plan etc. This paper will also briefly discuss about good documentation practices for maintaining these documents and how they can improve the effectiveness of sponsor oversight and provide evidence to prove sponsor oversight.
SI-161 : Lessons Learned from Teaching 250+ Life Science Analytics (LSAF) Classes to Our Colleagues at Janssen Research and Development.
Margie Merlino, Janssen Research and Development
Jeanne Keagy, Janssen Research and Development
Monday, 11:45 AM - 11:55 AM, Location: Franklin 2
In 2014 Janssen purchased Statistical Drug Development (SDD), later renamed Life Sciences Application Framework (LSAF), from SAS Institute and successfully implemented the system into their Data Management (DM) processes for converting raw clinical trial study data into SDTM datasets. The following year, Janssen expanded the use of LSAF to their Data Analysis (DA) processes for creating ADaM datasets, TFL's and ad-hoc analyses and simulations with study data. Prior to the use of LSAF, Janssen's DA group used Windows SAS and an in-house developed user interface to manage the activities related to clinical trial deliverables and ad-hoc analyses. Using LSAF for both DM and DA had many advantages such as both groups could access one central repository to view all study data, interim and database locks. However, despite the success of LSAF within DM, the DA transition to the LSAF environment did not go well. From 2016 through 2018, Jeanne and Margie taught over 250 LSAF classes to their co-workers at Janssen Pharmaceutical Research company. Jeanne conducted both the DM and DA training and Margie conducted DA training. What they will present are some of the technical and not-so-technical lessons learned from teaching students from a variety of technical and scientific backgrounds how to develop their SAS programs and jobs in the cloud-based LSAF environment.
SI-168 : Embedded Processes - evolving face of QUALITY in the world of Robotic Process Automation (RPA).
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifesciences Analytics
Monday, 10:00 AM - 10:50 AM, Location: Franklin 2
Systems and processes have been the backbone of any emerging and existing organization in clinical domain. The password to live a disease-free life is the 'most wanted' i.e., 'drug'. In search of this 'most wanted' fugitive, drug companies are investing in their best possible detectives. Quality and compliance management is the key, ensuring subject safety and data integrity in our domain falling in line with the regulatory requirements. Evolution from a paper-based quality system to electronic systems led to the emergence of regulations and guidelines such as 21 CFR PART 11 and GxP. Evolving further towards Robotic process automation (RPA) - how will quality be perceived and implemented is the problem statement. RPA does not involve any form of physical robots instead it is the software robots which mimic human activities by interacting with applications in the same way that a person does. All will have the tools to configure their own software robots to put an end to automation challenges. Ethics, human resources will be important along risk and change management, feedback management and root cause analysis from a compliance oversight point of view. Transparency, cybersecurity, platform resilience are the critical risk areas requiring high impact controls. Enhancing process efficiency and efficient methodologies for compliance would be important in the phases of development, testing, deployment, integration and owning the process too. The focus of regulatory bodies, employment opportunities, scope for innovation and organizational strategy to embrace the new culture of 'embedded process' will be the focus of this paper.
SI-190 : Integrating programming workflow into computing environments: A closer look
Tyagrajan Swaminathan, Ephicacy Lifesciences Analytics
Sridhar Vijendra, Ephicacy Lifesciences Analytics
Monday, 2:30 PM - 2:50 PM, Location: Franklin 2
System-level workflows are well known to improve process compliance, increase traceability and enhance audit trails in various domains including document management. It is time programming workflows are made integral to the domain of statistical programming for clinical trials to enhance productivity, reduce management overhead, ease tracking of large programming deliverables produced by globally dispersed teams and consequently, reduce time-to-market for every drug. System-level workflows are mapped based on the corresponding real-world processes that they model but conventional statistical computing environments (SCEs) that most programmers are used to hardly allow for automatic record keeping of processes taking place within the environment. What is recorded in a disparate programming status tracker has scope for human error since there is a possible lag and/or a gap between what is done in the programming environment and what is mentioned in the status tracker. This is where modern-day web or cloud-based SCEs make a difference, by bringing in integrated customizable workflows to streamline statistical programming activities. There is a need for workflows that integrate with our programming environment to keep track of programming tasks and record their completion without convoluted user actions. But, is it possible to exactly mirror the real-world processes within such programming environments? Are human-error-prone processes suitable for machine-compatible checks or will we be left to force-fit workflows into systems that do not want to co-operate?
SI-243 : Dataset Specifications: Recipes for Efficiency and Quality
Dave Scocca, Rho, Inc.
Monday, 1:30 PM - 2:20 PM, Location: Franklin 2
Specifications are the backbone of dataset programming; quality specifications are the recipes with which we produce quality datasets. While we frequently talk about programming strategies and techniques, we spend much less time focusing on dataset specifications. Having high quality specifications can greatly improve programmer efficiency and speed up validation. Developing specifications and programs hand-in-hand can reduce overall review time and avoid a great deal of unnecessary rework. Questions addressed in this presentation include: How do dataset specifications fit into the clinical trial process? What makes a high quality dataset specification? How can we make the overall process of dataset production- from specifications through programs- as efficient as possible? Are there ways to standardize the specification development process?
SI-257 : Using freelancers in the programming world - Challenges and opportunities
Vijay Moolaveesala, PPD
Monday, 9:00 AM - 9:20 AM, Location: Franklin 2
Ever since the industry has opened its gates to use of global resource pool, more than a decade ago, I have been wondering when would we embrace widely the concept of using freelancers in the programming area. Industry has been using freelancers in the areas of clinical, statistical and scientific communications for some time. Even though it may not be wide spread, but the idea of using freelancers in these areas was given an ear and same can't be said about programming freelancing usage. But again, there are many challenges in using temporary work force whether it is onsite/offsite contractors or staff augmentation models and using freelancing resource options on top of these challenges can be more difficult one to navigate. At the same time, when industry is spending lot of time in recruiting, training and retaining top talent of programmers, every resourcing option should be on the table. Author would like to provide approaches and areas of services that could use freelancer's services more effectively and would provide frame work to make this model a sustainable one.
SI-292 : Throw Away the Key: Blockchain-ed Healthcare Data
Kathy Zhai, GlaxoSmithKline
Monday, 4:00 PM - 4:20 PM, Location: Franklin 2
After following SAS blogs and other social media outlets that correspond to the latest pharmaceutical trends, the words "blockchain" and "bitcoin" have been prevalent. How are any of these words associated with healthcare and patient data? Like other programmers working in the pharma industry, my curiosity grew beyond just procs and data steps. Many can agree that the credibility of clinical outputs can be undermined by a plethora of common issues including incomplete, missing, or inaccurate data. After multiple layers of data manipulation, how is it certain that what is being submitted to publications is an undistorted version of the benefits and risks of these drugs? This paper will give the audience a glimpse of how blockchain technology, whose implementation is cryptographically validated by a network, has enough potential and momentum to emerge into the healthcare industry and stick around for quite some time.
SI-300 : Don't Just Rely on Processes; Support Local Subject Matter Expertise
Deidre Kreifels, Reata Pharmaceuticals
Steve Kirby, Reata Pharmaceuticals
Mario Widel, Independent
Monday, 4:30 PM - 4:50 PM, Location: Franklin 2
Since the dawn of time (or at least since we started analyzing data from clinical trials) companies engaged in clinical research have developed processes designed to ease the burden of programmatic data review, data standardization and data analysis tasks. These processes typically include standards for input data, standards for outputs and code to get from one to the other. In many cases people who have a limited understanding of the process (or the subject matter) can quickly generate outputs that are right most of time. But how can we efficiently manage cases where the process does not produce accurate results? And how can we be sure that people following the process are able to evaluate if the results are accurate? A big part of the answer is ensuring that process users have sufficient subject matter expertise. If users understand (in detail) how the process works, and what it is designed to accomplish, they will be able to address most cases where the process does not work or produces inaccurate results. We will share a few representative examples of where subject matter expertise is needed to ensure a process works as intended and produces accurate results. We will also suggest training methods designed to help process users gain the subject matter expertise they need to be successful.
SI-320 : Vendor's Guide to Consistent, Reliable, and Timely CDISC Deliverables
Dharmendra Tirumalasetti, Vita Data Sciences
Santosh Lekkala, Vita Data Sciences
Bhavin Busa, Vita Data Sciences
Monday, 9:30 AM - 9:50 AM, Location: Franklin 2
When working for a CRO or an FSP vendor, programmers have to work on multiple clinical studies across different Sponsor companies and various data collection systems. In addition, even though the submission standards are common across the industry, the requirements and expectations for CDISC deliverables could differ from one Sponsor to another. The differences could be based on multiple factors such as therapeutic areas, internal data standards, and study-specific needs. Also, the interpretation and handling of the data may differ between the Sponsor companies. If not understood and documented earlier by the vendor, the differences could cause re-work at a later stage which adds up to delayed deliverables, rework time and extra cost. To meet the Sponsor's specific needs and expectations, it is highly recommended to have and follow effective processes with-in the organization. This paper will describe processes we follow at our organization beforehand, during the specification development and programming of the CDISC datasets in order to achieve consistent, reliable and timely deliverables thus benefiting both the Sponsor and the vendor. This paper also provides details on the pre-processing steps that can be followed before writing the specification, quality checks and data handling during the development process of the datasets and informed notes during the delivery of datasets to the Sponsor to ensure expected outcome.
SI-327 : Begin with the End of Validation: Adapting QbD Approach in Statistical Programming to Achieve Quality and Compliance Excellence
Linghui Zhang, PRA Health Sciences
Monday, 3:30 PM - 3:50 PM, Location: Franklin 2
In many ways, quality and compliance are essential in the pharmaceutical industry. Statistical programming conducts the variety of activities of data management, analysis, and report in the entire data flow in drug development, including preclinical and clinical research, regulatory submission, and post marketing surveillance. Quality and compliance are both crucial components of statistical programming. To achieve high quality, validation process is developed to identify data issues, and is usually performed before data and data-related products are released finally. However, the validation process is time- and resource-consuming. It's quite challenge to fix the data issues after validation in complex clinical trial programs, under tough timeline and temporary workforce shortness. In order to "get it right first time", Quality-by-Design (QbD), a process oriented method was applied to manage risks in quality and compliance, and to advance product and process quality in statistical programming. QbD is a systematic approach to development that begins with predefined objectives and emphasizes product and process understanding and control based on sound science and quality risk management. QbD is also a regulatory expectation. FDA and ICH published several guidelines to direct pharmaceutical manufactures. The paper talks about adapting QbD approach in statistical programming to identify critical issues relevant to quality and compliance prospectively. The QbD elements and steps will be introduced and followed by the challenges of implementing QbD approach in statistical programming. This paper is designed to the programming leads and managers, but all levels of programmer can benefit from learning the QbD approach.
Submission StandardsSS-014 : China NMPA (National Medical Products Administration) reform and new regulations/guidelines/requirements
Yi Yang, Novartis
Monday, 11:30 AM - 11:50 AM, Location: Salon A
Historically the regulatory environment in China has been highly challenging. However, since 2015 China health authority started reforming the regulatory environment in China to bring China medical products up to international standards in terms of efficacy, safety and quality, so as to better meet the public needs for drugs as well as to improve the process of access to innovative drugs and therapies from global. Reforms are building smoother processes for innovative drug development in terms of adopting global standards and technical requirments, increasing review and approval transparency, accelerating new drugs review and approval. China health authority has refined old regulations to clearly define the requirements to clinical trial operation, multi-regional clinical trial design, biostatistics principles, electronic data capture, data management and statistical analysis reporting and on-site inspection. China health authority has also released guidelines to guide the drug development in terms of communication for drug development and technical evaluation, electronic common technical document implementation, post approval safety surveillance. China health authority has also released regulations including priority review & approval, data protection regime, imported drug registration and new chemical drug classification to encourage the innovative drug development. The reform of China regulatory environment is going on. We have seen more and more IND and NDA has been submitted for innovative drugs developed in local or global than ever since the reform. China regulatory will be further aligned with global standards and requirements. Simultaneous development and approval with the US and Europe can be achieved in the near future.
SS-027 : Progression-Free Survival (PFS) Analysis in Solid Tumor Clinical Studies
Na Li, Na Li Clinical Programming Services
Monday, 4:30 PM - 4:50 PM, Location: Salon A
Progression-free survival (PFS) is commonly used as a primary endpoint in Phase III of solid tumor oncology clinical studies. PFS is defined as the time from randomization or start of study treatment until objective tumor progression or death depending on study protocol. This paper describes PFS concept and its analysis methods following ADaM standard to generate ADDATES and ADTTE analysis datasets. This paper also discusses some of the challenges encountered to define progression event (PD) and censoring events. In addition, this paper explains some statistical methods that are commonly used to estimate the distribution of duration of PFS. Such methods include PROC LIFETEST procedure to provide Kaplan-Meier estimates and PROC PHREG to provide Hazard Ratio estimate.
SS-030 : Clinical Development Standards for FDA Bioresearch Monitoring (BIMO) Submissions
Denis Michel, Janssen Research and Development
Julie Maynard, Janssen Research and Development
Monday, 10:00 AM - 10:20 AM, Location: Salon A
The Food and Drug Administration (FDA) published the Bioresearch Monitoring Technical Conformance Guide in February 2018. The document provides specifications for clinical data submission by pharmaceutical companies used in planning of FDA Bioresearch Monitoring (BIMO) inspections. Three types of information are required: clinical study level information, subject level data line listings by clinical site, and a summary level clinical site dataset. The clinical study level information are PDF files related to the clinical trial. The subject level data line listings by clinical site are typically generated as SAS programmed PDF files. The guide states that FDA will be able to generate the listings in the future from submitted clinical datasets compliant with CDISC SDTM and ADaM standards. The summary level clinical site dataset is provided as a SAS V5 transport file named clinsite.xpt with a data definition table named define.pdf. This paper describes the effort to standardize the generation of clinsite.xpt across different therapeutic areas of a pharmaceutical company. Topics include standardizing the SAS variable attributes via metadata, importing clinical site information from Excel to SAS, handling special characters in imported data, and developing standard SAS macros and programs that process data in different formats.
SS-056 : Practical Guidance for ADaM Dataset Specifications and Define.xml
Jack Shostak, DCRI
Monday, 3:30 PM - 4:20 PM, Location: Salon A
The goal of this paper is to provide practical guidance for how to specify ADaM analysis datasets within the confines of Define-XML nomenclature. This paper's guidance is presented in a tool and software agnostic manner. The audience of this paper is for individuals that specify and produce ADaM datasets and the associated define.xml file and who have little to moderate prior experience in doing so. A number of practical issues are explored. Source/derivation/comment text rendering is examined. Then contrasting define file specifications versus programmer ETL specifications is addressed. ADaM source and origin type is explored, followed by a deeper dive into parameter value level metadata. How to reconcile variable and parameter value level metadata is discussed. Further issues around comments, controlled terminology, dates, lengths, object order, and results metadata are explored to close out the paper.
SS-068 : Enforcing Standards in an Organization: A Practical 6 Step-Approach
Priscilla Gathoni, AstraZeneca Pharmaceuticals
Dany Guerendo Christian, STATProg Inc.
Monday, 5:00 PM - 5:20 PM, Location: Salon A
Is your organization struggling to enforce standards? Is complacency and siloed programming within functional units haunting your organization? This paper will explore a 6-step practical approach for organizations to assess standards using CDISC and FDA guidance, show the importance of standards, the possible repercussions to institute for lack standards adherence, show the importance of a gatekeeper for capturing standards adherence metrics, and finally present a generic macro adherence utility for checking the usage of standards in a study folder. A clear communication for the location and type of standards available within an organization will help eliminate excuses for not using standards. Additionally, an explicit message on the value that standardization brings in increasing efficiency; reducing the need for mundane tasks, and efficient resource utilization is explored. Further, creative methods for encouraging the use of standards are mentioned in the paper. Who is the best suited person to be the gatekeeper in your organization? We investigate the role of a gatekeeper, which is crucial in bridging the gap between the data acquisition stage and the practical implementation of standards. Consequently, several utilities whose main posit is to check the adherence of standards are available. We present a generic utility that can be adopted with a few adjustments allowable to complement your organization's platform. Furthermore, we propose that organizations assess periodically the effectiveness of the standards, tools, and utilities in use. In conclusion, we recommend that organizations utilize this 6-step approach and build on it to suit the organizational standards enforcement needs.
SS-092 : A Practical Guide to the Issues Summary in the Data Conformance Summary of Reviewer's Guides
Gary Moore, Moore Computing Services, Inc.
Monday, 3:00 PM - 3:20 PM, Location: Salon A
In the Reviewer's Guides for SDTM (SDRG) and ADaM (ADRG), the Data Conformance Summary is based on validation tools like the Pinnacle 21 Validator (formerly OpenCDISC) and/or proprietary in-house validators. This paper will discuss the purpose of a Reviewer's Guide. Using real life examples, it will illustrate when issues should be resolved by data modifications and when it is appropriate to provide explanations for non-compliance. The paper will contain ideas on the proper way to describe non-compliance issues.
SS-117 : Have You Met Define.xml 2.0?
Christine McNichol, Covance
Tuesday, 9:00 AM - 9:20 AM, Location: Salon A
Define.xml is critical for a reviewer getting to know a study's datasets. The define.xml facilitates the building of this relationship with the study data from casual introduction through the innermost workings of the datasets. To provide the best possible define.xml for reviewer's use, it is important to be comfortable with define.xml and how it communicates information about the datasets. Though define.xml might be intimidating at first, fear not. Following the flow through define.xml to get to know the data is like getting to know a new friend. Define.xml reveals levels of information about the study datasets from structure to details about where the values came from, even linking to documentation of issues encountered and decisions made. The anatomy of a define.xml, its purpose, what is special about each section and what can be learned about the study datasets from each will be discussed. Meet define.xml - friend, not foe.
SS-120 : An Automated, Metadata Approach to Electronic Dataset Submissions
Janette Garner, Kite Pharma, A Gilead Company
Monday, 2:00 PM - 2:20 PM, Location: Salon A
In 2014, the Food and Drug Administration (FDA) provided guidance regarding Section 745A(a), an amendment to the Federal Food, Drug, and Cosmetic Act that requires regulatory submissions (eg, new drug applications [NDAs] or biologics license applications [BLAs]) to be submitted in electronic format. The guidance took effect at the end of 2016. This paper presents a metadata-driven solution that facilitates the generation of the dataset package for electronic dataset submission that is compliant with the FDA expectations based on published FDA guidance documents.
SS-126 : Considerations in Effectively Generating PK Analysis Input Datasets
Jianli Ping, Gilead Sciences, Inc.
Tuesday, 10:30 AM - 10:50 AM, Location: Salon A
Generating high quality pharmacokinetic (PK) analysis input or PK merge dataset in a timely manner is critical for PK parameter generation and downstream programming on CDISC compliant PK/PP datasets. While there is guidance for SDTM PC/PP and ADaM ADPC/ADPP detailed in the CDISC implementation guides, there is a lack of general standards for creating PK analysis input dataset. The different formats of source data as captured from CRF and PK Lab can result in further challenges during PK merge dataset creation. In this paper, general approaches for PK merge dataset creation are proposed that can facilitate PK analysis and generation of PK CDISC compliant datasets and also as a source data for NonMEM programming. This paper also discusses the core PK merge dataset variables with commonly selected matrixes that should be included and their derivation through case studies and excerpts of SAS programs. Dataset checks after PK merge dataset generation are recommended, which can help identify possible data/programming issues.
SS-134 : Next Innovation in Pharma - CDISC data and Machine Learning
Kevin Lee, Clindata Insight
Monday, 1:30 PM - 1:50 PM, Location: Salon A
The most popular buzz word nowadays in the technology world is "Machine Learning (ML)." Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes. This is leading many organizations including drug companies to explore and implement Machine Learning on their own businesses. The presentation will discuss how Machine Learning can lead the next innovation in pharma with CDISC data. The presentation will start with the introduction of most innovative companies and how they innovate and lead the industry using Machine Learning and data. Then, the presentation will show how pharma should learn from them to innovate using Machine Learning and CDISC data. The presentation will also introduce the basic concept of machine learning and the importance of data. In the Machine Learning/AI driven process, data is considered as the most important component. 80 to 90 % of works in Machine Learning is preparing the data. Since FDA mandated CDISC standards submission, all the clinical trial data are prepared in CDISC SDTM and ADaM data format. The presentation will show how CDISC data will be the perfect partner of Machine Learning for the next innovation in pharmaceutical industry. Finally, the presentation will discuss how biometric department can prepare the next innovation and lead this data-driven Machine Learning process in pharmaceutical industry.
SS-153 : Exploring Common CDISC ADaM Conformance Findings
Trevor Mankus, Pinnacle 21
Monday, 2:30 PM - 2:50 PM, Location: Salon A
Having analysis data that is compliant with the CDISC ADaM standard is critically important for the regulatory review process. ADaM data are required to be provided in both FDA and PMDA submissions because the data allows those agencies to better understand the details of the performed analyses and reproduce the results for further validation. Validation of ADaM data is a primary focus for regulatory agencies so they can begin their review of the results. This presentation will review some of the more commonly occurring validation rules which were found across all of our customer data packages which were validated using our automated software and discuss potential reasons for why these rules fired.
SS-162 : Multiple Studies BIMO Submission Package - A Programmer's Perspective
Ramanjulu Valluru, Accenture
Harsha Dyavappa, Accenture
Tuesday, 8:00 AM - 8:20 AM, Location: Salon A
As support documentation of its Bioresearch Monitoring (BIMO) activities, FDA's Center for Drug Evaluation and Research (CDER) requests that sponsors of new drug applications (NDAs), biologics license applications (BLAs), and NDA or BLA supplemental applications containing clinical data provide the following three items: I. Clinical Study-Level Information II. Subject-Level Data Line Listings by Clinical Site III. Summary-Level Clinical Site Dataset. Recent papers (Singh Kahlon et al , Lin et al ), give details on how to create the BIMO submission package containing the three items above. This paper emphasizes multiple studies submissions cases when working with multiple studies to align the submission with FDA's requirements in these situations. It expands on how to create a single clinsite dataset; define.xml; and BIMO reviewer's guide instead of one version of each of these documents per study. Specific considerations are given to discuss about a single clinsite dataset; define.xml; and BIMO reviewer's guide used to submit this package in eCTD module 188.8.131.52 submission. We will share our experiences while supporting successful FDA applications for several therapeutic areas on multiple studies working on BIMO SITE LEVEL data.
SS-172 : Pharmacokinetic Parameters for Sparse and Intensive Sampling - Nonclinical and Clinical Studies
Shallabh Mehta, PPD
Tuesday, 11:00 AM - 11:20 AM, Location: Salon A
Sparse sampling is very common in toxicokinetic studies, where a single blood sample can be collected on a given study day from each animal in a treatment group. Similar case can be seen in a clinical study where not more than one sample can be taken from a human on each study or study day. The purpose of this paper is to present how the process of transforming pharmacokinetic (PK) parameters to Clinical Data Interchange Standards Consortium (CDISC) Standard for Exchange of Nonclinical Data (SEND) Pharmacokinetics Parameters (PP), can be used for CDISC Study Data Tabulation Model Version 1.5 (SDTM) PP, specifically how the pooled PK parameters are formatted to SEND and SDTM PP domains using SEND Implementation Guide 3.1 and SDTM Implementation Guide 3.2.
SS-217 : Process optimization for efficient and smooth e-data submissions to both FDA and PMDA
Eri Sakai, Shionogi & Co., Ltd.
Malla Boda, Shionogi Inc.
Akari Kamitani, SHIONOGI & CO., LTD.
Yoshitake Kitanishi, SHIONOGI & CO., LTD.
Tuesday, 2:30 PM - 2:50 PM, Location: Salon A
The data submission of NDA to FDA has already been mandatory, application to PMDA will be also mandatory after April 2020. We need to consider and to construct the process for e-data submission to FDA/PMDA simultaneously with a single data package. SHIONOGI has headquarters in Japan, recently, we established group of companies in U.S. and Europe to promote drug development globally. As a result, we will be doing e-data submission to not only either FDA or PMDA but also both authorities simultaneously. Currently, we are preparing SDTM, ADaM datasets and the other related submission deliverables (e.g., define.xml, SDRG and ADRG) according to CDISC standards in all our clinical trials. We established global programming process to ensure compliance to CDISC standards. As we all know that, some rules for e-data submission are different between FDA and PMDA. Therefore, we prepare separate data package for the submission to FDA and PMDA. However, we think that the process is very inefficient. As a global company, we try to optimize the process to enable us to prepare only one package that meets the most common rules for both authorities from the viewpoint of the greatest common divisor or least common multiple. In that background, we would like to share our recent experience in establishing the global process for smooth e-data submissions to both authorities and the results. In addition, we suggest efficient way to communicate between Japan and U.S. team members that required in the project.
SS-226 : De-Identification of Data & It's Techniques
Shabbir Bookseller, Quartesian Clinical Research
Tuesday, 5:00 PM - 5:20 PM, Location: Salon A
In the past few years, with the emergence of modern technologies in the image of big data, the privacy concerns had grown widely. Now a day's sharing data is much easier then saying Hello. De-identification is a tool that organizations can use to remove personal information from data that they collect, use, archive, and share with other organizations. For many reasons, not the least of which is patient privacy, any shared data must first be de-identified De-identification is not a technique of securing data, but a collection of approaches, algorithms, and tools that can be applied to various kinds of data with differing levels of effectiveness of an individual's privacy. In general, privacy protection improves as more aggressive de-identification techniques are employed, but less utility remains in the resulting dataset. De-identification is especially important for government agencies, businesses, and other organizations including healthcare industry that seek to make data available to outsiders. For example, significant medical research resulting in societal benefit is made possible by the sharing of de-identified patient information under the framework established by the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, the primary US regulation providing for privacy of medical records. This topic provides an overview of Data De-identification with Data Protection Procedures & its Techniques I.e. K-Anonymity, l-Diversity & t-closeness.
SS-240 : Sponsor Considerations for Building a Reviewer's Guide to Facilitate BIMO (Bioresearch Monitoring) Review
Kiran Kundarapu, Merck & Co., Inc.
Janet Low, Merck & Co., Inc.
Majdoub Haloui, Merck & Co., Inc.
Monday, 10:30 AM - 10:50 AM, Location: Salon A
CDER's Bioresearch Monitoring (BIMO) team has specific responsibility for verifying the integrity of clinical data submitted in regulatory applications and supplements, and for determining compliance of trial conduct in accordance to FDA regulations and statutory requirements. In the FDA Draft Guidance for Industry, CDER's BIMO inspectors and Office of Regulatory Affairs (ORA) identifies sites of interest from all major pivotal studies within the submission. BIMO released a Technical Conformance Guide (TCG) in 2018 to facilitate site selection and review, but gave limited information for sponsors to consider when building a BIMO Reviewers Guide. There is no available reviewer's guide template in industry. In addition, there is insufficient guidance on types of information that should be included in a reviewer's guide. This paper will share a suggested structure and considerations when authoring BIMO Reviewer's Guide.
SS-270 : Designing Flexible Data Standards Models
Melissa Martinez, SAS
Monday, 8:30 AM - 8:50 AM, Location: Salon A
When we think of CDISC submission data standards, we often focus on the tables and variables that the standards describe. We build products and business processes around creating these tables, but isn't there more to the story than just the submission data sets? This paper explores ways to design data standard models with data processing in mind. It also shows how to add to industry data standard models to define and standardize additional attributes, such as de-identification algorithms, variable calculations, and even blocks of SAS code.
SS-273 : The Need for Therapeutic Area User Guide Implementation
Michael Beers, Pinnacle 21
Tuesday, 10:00 AM - 10:20 AM, Location: Salon A
As regulatory agencies, specifically PMDA at the moment, begin to do cross-product analysis of their accumulated study data, the need for standardization increases. CDISC standards cover much of the data common to clinical trials, but gaps exist. Therapeutic Area User Guides (TAUGs) are created to fill some of these gaps. However, consistent implementation of these provisional guides is lacking, and this will impact the ability of regulatory agencies to analyze data across products. This paper will discuss some possible reasons for the slow adoption of TAUGs, attempt to show why it is increasingly important that the industry implement the TAUGs, and show how the implementation of the TAUGs could be enforced.
SS-291 : Information Requests During An FDA Review
Hong Qi, Merck & Co., Inc.
Lei Xu, Merck & Co., Inc.
Mary N. Varughese, Merck & Co., Inc.
Tuesday, 2:00 PM - 2:20 PM, Location: Salon A
Filing a marketing application is a pivotal and exciting milestone for the long-term effort of drug development. Even though there are industry standards to follow, submissions vary among drugs, indications, and sponsoring companies. Regardless of the extraordinary efforts in preparing the submission package, it is common for regulatory agencies to issue information requests (IRs) from the pre-supplemental biologics license application (BLA) meeting, during the approval review, and labeling process. IRs may arise from different aspects including the target indication, patient population, the drug safety profile, the reviewers' scientific interest on getting further information on the potential benefits of the medicine, additional case-report forms, and even the collected previous therapies not included in the ADaM datasets. This paper will discuss the data preparation and submission pertaining to IRs received from the pre-sBLA and during the sBLA review, the approaches we utilize, and the thoughts on future strategies.
SS-306 : Making Lab Toxicity Tables Less Toxic on Your Brain
Lindsey Xie, Kite Pharma, a Gilead Company
Jinlin Wang, Kite Pharma, a Gilead company
Jennifer Sun, Kite Pharma, a Gilead company
Rita Lai, Kite Pharma, a Gilead company
Richann Watson, DataRich Consulting
Tuesday, 4:30 PM - 5:20 PM, Location: Salon B
Processing and presenting lab data is always challenging, especially when lab limits are assessed in two directions. The lab data process becomes even more complicated when multiple baselines are required due to different analysis criteria or are inherent in the study design. This paper discusses an approach to create lab toxicity grade variables in ADLB for lab bi-directional toxicity report. They are mixed variables defined by Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model Implementation Guide (ADaMIG) v1.1 and draft ADAMIG v1.2 and by sponsor to make ADLB easily interpreted and related summary tables easily produced. This paper is based on the lab Common Terminology Criteria for Adverse Events (CTCAE) toxicity grade summary, taking into account lab tests with abnormal assessment in either increased direction or decreased direction. In this paper, the authors explain and provide examples showing how ADaMIG v1.1 variables ATOXGR, BTOXGR, SHIFTy, ANLzzFL, MCRITy, and BASETYPE, draft ADaMIG v1.2 new variables ATOXGRH(L) and BTOXGRH(L), and sponsor-defined variables ATOXDIR and WAYSHIFT can be utilized and implemented appropriately. In addition, this paper explains how to handle baseline toxicity grade for analysis sets with more than one baseline in ADLB.
SS-309 : Expediting Drug Approval: Real Time Oncology Review Pilot Program
Laxmi Samhitha Bontha, Lamar University
Tuesday, 1:30 PM - 1:50 PM, Location: Salon A
Health care industry aims to provide right treatment and immediate care for a patient. New drug approval in the United States take an average of 12 years from pre-clinical testing to approval with just the approval process averaging around two and half years. It is important to provide patients new, potentially lifesaving therapies at the earliest. To achieve this FDA launched Real Time Oncology Review (RTOR) pilot program which allows FDA to review data earlier, before the applicant formally submits the complete application. For a drug to be selected to be evaluated in the RTOR category it should meet criteria such as easily interpreted endpoints, straight forward study design, drugs showing substantial improvements over available therapy, drugs which have been given break through designation previously. The RTOR process is designed to take about 20weeks of time. Currently, the RTOR pilot program is being used for supplemental applications for already-approved cancer drugs. FDA could later expand the pilot to new drug applications and original biologic license applications for cancer drugs. The RTOR may encourage faster data publication and greater clarity of analysis. Patient, manufacturer and FDA are benefitted by this RTOR scheme. This paper discusses about how RTOR is carried out, challenges faced in a RTOR and why its being used in the oncology therapeutic area.
SS-317 : Non-Clinical (SEND) Reference Guide for Clinical (SDTM) Programmers
Dharmendra Tirumalasetti, Vita Data Sciences
Bhavin Busa, Vita Data Sciences
Tuesday, 8:30 AM - 8:50 AM, Location: Salon A
The U.S. FDA now requires the use of standardized data submission, SEND (Standard for Exchange of Nonclinical Data), for non-clinical data. Many Sponsor companies have started preparing SEND datasets towards their upcoming submissions, although, they still lack the much needed expertise to get their data submission-ready. In our experience, one of the reasons could be due to lack of available resources/subject matter experts in the Non-clinical team with-in an organization. One of the solution to overcome the resourcing challenges is to utilize existing pool of Clinical (SDTM) Programmers. In this paper, our intent is to provide a quick reference guide for Clinical (SDTM) Programmers to develop SEND domains for non-clinical studies. We will present commonalities and differences between SDTM and SEND domains. In addition, we will summarize our experience and lessons learned with performing mapping and standardization of non-clinical legacy studies to make it submission-ready.
SS-318 : How to use SUPPQUAL for specifying natural key variables in define.xml?
Sergiy Sirichenko, Pinnacle 21
Tuesday, 9:30 AM - 9:50 AM, Location: Salon A
Define.xml must identify natural keys for each dataset to specify uniqueness for records and sort order. Sometimes standard SDTM/SEND variables are not enough to completely describe the structure of collected study data. In this presentation, we will show examples and provide recommendations on when it is appropriate to use SUPPQUAL variables in the natural key and when to use other common alternatives. We will also provide guidance on how to document SUPPQUAL natural keys in define.xml and the Reviewer's Guide.
SS-328 : Framework for German Dossier Submissions
Kriss Harris, SAS Specialists Ltd.
Monday, 11:00 AM - 11:20 AM, Location: Salon A
When submitting your drug benefit assessments to the German Authority or other regulatory agencies, you need to provide your reports in a specific format. These reports are usually in a non-English language and the characters will be different from the English characters that you are used to. The characters will contain accents and other special characters, and the numerical results will have comma's where you expect decimal points to be, and vice versa. Usually to provide this report, a medical writer will use Microsoft Word to copy the results from a Clinical Study Report (CSR) into the document, and include the appropriate formatting and translations. The copying will have to be done very carefully, and this process is error prone and very exhaustive. Also the German Authorities may have other follow up questions, and so following this process is inefficient. There is a way to make the process more automated and this paper will demonstrate those methods. Firstly this paper will show you a framework you can use for submitting a German Dossier, such as the time-to-event macro's, subgroup analyses and the processes that you can use to get your data in the right format. Secondly, this paper will show you how you can output the results in the exact format needed for the German Dossier's. You will be shown how to use encoding to read and write special characters, and how to use Proc Report along with the style attributes to get your outputs in the correct format needed.
SS-331 : A Standardized Data Sample: Key to Improving the Submission Strategy
Prafulla Girase, Biogen
Joanna Koft, Biogen
Tuesday, 4:30 PM - 4:50 PM, Location: Salon A
As mentioned in the FDA's study data technical conformance guide, the agency offers a process for submitting sample standardized datasets for validation. Although sample submissions are tests only and not considered official submissions, they can be of a great value to sponsors in various ways. This paper is based on sponsor's practical experience of submitting sample submissions on five different compounds which are currently approved therapies in the market. The paper will walk through the key parts of a sample submission as well as how to plan and implement one. The paper will also discuss excerpts from the regulatory feedback received on sample submissions and how it helped in continuous improvement of sponsor's submission strategy.
SS-334 : The Anatomy of Clinical Trials Data: A Beginner's Guide
Venky Chakravarthy, BioPharma Data Services
Monday, 9:00 AM - 9:50 AM, Location: Salon A
We are so pre-occupied with our own little programming world that we often forget that we are part of a long complex process in the discovery and development of drugs. In this gentle introduction to the Pharmaceutical world, you will learn about the various stages of drug development. You will also get to know the complications in drug development and the chances of a drug receiving Food and Drug Administration (FDA) approval. This is a US specific regulatory agency, but the process is similar for other agencies like the European Medicines Agency (EMA) or the Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. You will then learn about the Human Trials stage where the bulk of SAS Programming occurs. At this stage, you will get to know the types of documents that a programmer needs to familiarize. Next comes SAS data followed by a review of the evolution of standards and the current data standards. You will get an appreciation of these data standards in analyzing data and generating SAS reports.
SS-337 : Do-It-Yourself CDISC! A Case Study of Westat's Successful Implementation of CDISC Standards on a Fixed Budget
Rick Mitchell, Westat
Rachel Brown, Westat
Jennifer Fulton, Westat
Marie Alexander, Westat
Stephen Black, Westat
Tuesday, 4:00 PM - 4:20 PM, Location: Salon A
Do you find the prospect of implementing CDISC within the confines of a tight timeline and a fixed budget to be overwhelming? Westat, a premier research organization, offers the continuum of support services, such as data management, data analysis, regulatory affairs, and site monitoring for clinical research studies. Clinical research is dynamic, and it is imperative to comply with ever-changing federal guidance and regulations. This paper describes how Westat's statisticians, programmers, and data managers approached the implementation of CDISC standards on a fixed budget; discussing how we overcame challenges by creating tools and processes to improve efficiency, developing training programs, and reimagining how project teams collaborate to ensure data sets submitted to the FDA are compliant.
SS-343 : Panel Discussion: Recent Submission Experiences around the World
Tuesday, 3:00 PM - 3:50 PM, Location: Salon A
Moderator(s): Carey Smoak, S-Cubed and Heather Archambault, Urovant Sciences, Inc.
- Karin LaPann, Takeda
- Kriss Harris, SAS Specialists Ltd.
- Marianne Caramés, Novo Nordisk
- yi yang, Novartis