Training Seminars

Training Seminars

Enhance your PharmaSUG experience by attending optional pre- and post-conference training seminars taught by seasoned experts. Half-day courses are only $250 with a conference registration, or $350 without a conference registration. Space is limited, so don’t delay!

If you are registering for a training seminar only, and not attending the conference, you can register for the seminar here.
Otherwise, please proceed to the Registration page to register for both the conference and training seminars.

Schedule

Sunday, June 1, 2025

Course TitleInstructor(s)Time
#1-1Introduction to End-to-End Submissions in RPhil Bowsher, Rammprasad Ganapathy & Shiyu Chen8:00 AM - 12:00 PM
#1-2FDA, PMDA & NMPA Submission Data RequirementsDavid Izard8:00 AM - 12:00 PM
#1-3Navigating SDTM: The Road AheadSoumya Rajesh & Karl Miller 8:00 AM - 12:00 PM
#1-4Gen AI Training for Biometrics: From Unlocking Data Insight to Practical ApplicationKevin Lee8:00 AM - 12:00 PM
#2-1End-to-End Electronic Submission Components for Regulatory Submission of Clinical Study DataPrafulla Girase1:00 PM - 5:00 PM, Aqua AB
#2-2SAS® Macro and PROC FCMP User-Defined Functions for Clinical and Pharmaceutical ApplicationTroy Martin Hughes1:00 PM - 5:00 PM, Aqua C
#2-3Understanding and Using the Dataset-JSON v1.1 StandardLex Jansen1:00 PM - 5:00 PM, Aqua D
#2-4Bee-yond the Basics: Harnessing SAS, SQL, and Python for Data Analytics in PharmaCharu Shankar1:00 PM - 5:00 PM, Aqua EF

Wednesday, June 4, 2025

Course TitleInstructor(s)Time
#3-1ODS Graphics: Mastering Custom Graphs with Graph Template Language (GTL)Richann Watson 1:00 PM - 5:00 PM
#3-2A Step-by-Step Introduction to Data Cleaning Using Excel, Python, R, and SAS®Kirk Paul Lafler 1:00 PM - 5:00 PM
#3-3Expanding My SAS Vocabulary with Hash Tables SyntaxBart Jablonski 1:00 PM - 5:00 PM
#3-4Considerations When Choosing the Optimum ADaM Structure for a Specific Statistical AnalysisMario Widel & Veronica Gonzalez 1:00 PM - 5:00 PM

Course Descriptions

Introduction to End-to-End Submissions in R

Phil Bowsher, Rammprasad Ganapathy and Shiyu Chen
Sunday, June 1, 2025, 8:00 AM – 12:00 PM

Posit/RStudio will be presenting an overview of Pharmaverse, GT, Quarto and WebAssembly for the R user community at PharmaSUG. This is a great opportunity to learn and get inspired about new capabilities for working with clinical trials data and generating TLGs (Tables, Listings and Graphs) for inclusion in Clinical Study Reports. The usage of R in pharma, especially in clinical trials, has increased rapidly over the last 3 years. This workshop will review various public new drug applications and discuss the usage of open source within various areas of the submissions.  This is a great opportunity to learn about current trends of using open-source languages in submissions. This hands-on workshop will provide an introduction to the current landscape for SDTM, prepping ADAM data and using GT, Quarto and Shinylive TLGs. In this talk, we will review and reproduce a subset of common table outputs used in clinical reporting containing descriptive statistics, counts and or percentages. The talk will provide an introduction to TFL-producing R programs and include an overview of the GT & gt_summary R package with applications in drug development such as safety analysis and Adverse Events. Pharmas have been using Shiny for Exploratory Data Analysis for many years. Now there is much interest in creating interactive reports with Shiny, Teal and WebAssembly to help with efficiency gains when delivering results and to save reviewers time and streamline the review process. The Shiny apps would support data review and exploration for clinical outputs. There has also been a public pilot to include Shiny in an FDA clinical trial submission as an opportunity to be more effective at processing the kinds of data and analyses that appear in clinical trial submissions. Moreover, the apps can help the regulatory bodies more quickly and accurately assess the safety and efficacy of new medical products.  This talk will discuss the new horizon for new drug submissions with tools like Teal, WebR, Quarto and Shinylive. Posit will also feature ways that you and your pharma can get involved in this exciting new effort that is bringing pharma together!

FDA, PMDA & NMPA Submission Data Requirements

David Izard
Sunday, June 1, 2025, 8:00 AM – 12:00 PM

The US FDA introduced binding guidance in 2014 establishing requirements for the provision of clinical and non-clinical data and related assets in support of drug & biologic filings seeking marketing approval. PMDA (Japanese regulator) and NMPA (Chinese regulator) followed suit shortly afterwards, establishing similar expectations.

This seminar will engage Programmers, Statisticians, Data Managers and others as they directly or indirectly support the generation of these regulated deliverables by:

  • Reviewing the foundations of FDA, PMDA and NMPA submission data requirements and how they have evolved over time.
  • Participating in a detailed walkthrough of regulatory guidance, highlighting the specific requirements for each deliverable as dictated by documents such as the US FDA Study Data Technical Conformance Guide, the PMDA Technical Conformance Guide on Electronic Study Data Submissions and the NMPA Guidelines for the Submission of Clinical Trial Data for Drugs.
  • Focusing on work practices that will support meeting immediate study execution needs while simultaneously preserving assets for future use in regulatory submissions.

We will wrap up with a focus on what is likely to come – imminent acceptance of machine-readable data by EMA, other regulators worldwide that consume machine readable data in alternative ways, and expansion of US FDA Technical Rejection Criteria, among other topics.

Navigating SDTM: The Road Ahead

Soumya Rajesh & Karl Miller
Sunday, June 1, 2025, 8:00 AM – 12:00 PM

The Study Data Tabulation Model (SDTM) and Implementation Guide (SDTMIG) are poised for yet another major update in the coming year. SDTM was originally designed to be the consistent tabulation standard for the industry, but as it continues to evolve, each new version contains some impactful updates/changes to its use. So, implementation study teams and clients need to keep up with all the changes and their impact on the submission of study data. With the study submission requirement date for the most recent SDTMIG (v.3.4) beginning in March 2025, clients and study teams really need to familiarize with the nuances, expected changes and their impacts on the implementation of SDTM. The good news is that this fundamental standard is backwards compatible.

This seminar is tailored to refresh you on some basics of SDTM, and then dive deeper into some of the more challenging topics that have been pitfalls to sponsors and programming teams alike. It includes topics such as:

  • New domains & variables introduced in the SDTMIG v3.4 – like Specimen-based Findings domains
  • How to include them in submissions to SDTMIG v3.3 per FDA request
  • Trial Design Domains
  • Visit occurrences
  • Collected versus derived Exposure data, etc.

Gen AI Training for Biometrics: From Unlocking Data Insight to Practical Application

Kevin Lee
Sunday, June 1, 2025, 8:00 AM – 12:00 PM

Gen AI, mostly known ChatGPT, is at the forefront of the next revolution, and in this seminar, we are about to embark on a journey that will demystify this remarkable technology. Imagine a virtual assistant that can do the following:

  • Enhance data exploration and analysis
  • Comprehend and generate human-like text from data
  • Generate the codes in SAS, R and Python
  • Convert the codes from SAS to R and Python
  • Answer questions about specific data
  • Visualize data
  • Assist in content creation such research reports
  • Assist literature review

To truly harness its potential, we need to understand how to use Gen AI in the art of prompt engineering, application development using Gen AI API and fine-tuning Gen AI with our own data. In the seminar, we will embark on an exploration of Gen AI that will equip you with the knowledge and skills to leverage its capabilities effectively while ensuring ethical and responsible use.

Below are tentative agenda topics.

  • Introduction of Gen AI and ChatGPT
  • Gen AI Use Cases in Biometrics
  • Prompt it, not Google it
  • Prompt Engineering (e.g. Zero Shot, Few Shot, Chain of Thought)
  • Introduction of Gen AI tools (e.g., ChatGPT, Copilot, Gemini, Sonnet)
  • How to use Gen AI tools ( free-version )
  • Gen AI API driven application development
  • Gen AI Framework : LangChain, RAG
  • Gen AI (e.g., ChatGPT) Risk and Concerns (e.g. Data Privacy, Security, Ethics, Compliance)
  • Gen AI Risk Migration Plan
  • Gen AI (e.g., ChatGPT) Implementation Roadmap
  • Future of Gen AI/ChatGPT

End-to-End Electronic Submission Components for Regulatory Submission of Clinical Study Data

Prafulla Girase
Sunday, June 1, 1:00 PM – 5:00 PM

A regulatory submission of clinical study data also needs to be accompanied by various other electronic submission (eSUB) components such as Define-XML, annotated CRF, study data reviewer’s guide, analysis data reviewer’s guide, etc. This seminar will take a deep dive into each of these components. It will educate attendees about various requirements, best practices, consistency checks, etc. It will also go over key considerations related to preparation of a whole eSUB package for a submission such as folder structure considerations, PDF validation practices, final package checklist, regulatory hand-off, etc.

Prerequisite: Very basic knowledge of eSUB components.

SAS® Macro and PROC FCMP User-Defined Functions for Clinical and Pharmaceutical Application

Troy Martin Hughes
Sunday, June 1, 2025, 1:00 PM – 5:00 PM

Attend and receive a FREE copy of the instructor’s 550-page book, SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality, Second Edition, released in 2022! Students will receive the physical book at the training. This hands-on course provides a gentle introduction to SAS user-defined functions, which enable SAS practitioners to build reusable chunks of code that can be shared among coworkers and teams, and which improve the efficiency and quality of SAS software development. All examples demonstrate real-world application in clinical, pharmaceutical, and life science environments.

The first half of the course introduces and demonstrates SAS macro functions, a subset of SAS macros that can be called inside of the DATA step or PROC SQL. A basic prior knowledge of the SAS macro language is beneficial for these more advanced macro concepts. Too often, however, SAS users design and implement a user-defined macro function when a user-defined PROC FCMP function would have been a cleaner, more appropriate solution. Thus, the second half of the course introduces PROC FCMP, and demonstrates FCMP user-defined functions that can replace SAS macro functions, as well as FCMP user-defined functions that can perform data acrobatics not possible in the SAS macro language. No prior knowledge of PROC FCMP is required.

Understanding and Using the Dataset-JSON v1.1 Standard

Lex Jansen
Sunday, June 1, 2025, 1:00 PM – 5:00 PM

Dataset-JSON is a new CDISC data exchange standard for sharing tabular data using JSON. It is designed to meet a wide range of data exchange scenarios, including regulatory submissions and API-based data exchange. Dataset-JSON is based on the JSON standard, which is simple to implement, very stable, and widely supported. Dataset-JSON  is also designed to address the limitations of legacy formats, such as the SAS v5 XPT Transport format, and is extensible to support new metadata and new use cases.

Join for the first PharmaSUG Dataset-JSON training seminar covering the new Dataset-JSON v1.1 standard. In this half-day seminar delivered by a leading expert, you will learn everything you need to know about the new, published version of Dataset-JSON. You will build new skills with demonstrations to help you understand this new standard.

In this training, you will:

  • Learn the history and limitations of current clinical data exchange standards
  • Understand the Dataset-JSON use cases and business cases
  • Understand the  JSON standard
  • Get a deep understanding of the Dataset-JSON v1.1 standard
  • Learn about the different serializations of Dataset-JSON: JSON and NDJSON
  • Deep dive into the key features of Dataset-JSON, such as character encoding, precision and rounding, representing numeric dates, and how to apply them in data exchange scenarios
  • Understand the relation between Dataset-JSON and Define-XML
  • Get an overview  and demonstration of how SAS and  open-source software (R, Python) work with JSON, and specifically Dataset-JSON (viewing, creating, reading, validating Dataset-JSON)
  • Learn how to avoid pitfalls when  working with Dataset-JSON

Intended Audience: Programmers, Statistical Programmers, Clinical Data Scientists, Data Managers

Note: this seminar is not the official CDISC Dataset-JSON Hands-On Implementation training.

Bee-yond the Basics: Harnessing SAS, SQL, and Python for Data Analytics in Pharma

Charu Shankar
Sunday, June 1, 2025, 1:00 PM – 5:00 PM

This seminar outlines a structured, five-step approach to data processing—Access, Discovery, Manipulation, Analysis, and Reporting—applied to bumblebee data as a creative parallel to pharmaceutical analytics.

This approach showcases the complementary strengths of SAS, SQL, and Python in handling diverse analytical tasks. SAS excels in managing large datasets and creating polished reports with its robust data integration and visualization capabilities. SQL demonstrates its power in querying, aggregating, and organizing relational data efficiently, making it indispensable for structured data exploration and filtering. Python shines in its flexibility and scalability, offering advanced analytics, machine learning capabilities, and dynamic visualizations. By leveraging these tools together in SAS Viya Workbench, analysts can create an efficient, end-to-end pipeline tailored for both exploratory and production workflows, bridging the gap between traditional business intelligence and modern data science techniques.

ODS Graphics: Mastering Custom Graphs with Graph Template Language (GTL)

Richann Watson
Wednesday, June 4, 2025, 1:00 PM – 5:00 PM

Anyone who has produced a graph using ODS Graphics has unknowingly used the Graph Template Language (GTL). ODS graphics produced by SAS® procedures such as the Statistical Graphics (SG) procedures actually rely on pre-defined templates built with GTL. GTL generates graphs using a template definition that provides extensive control over its format and appearance. Although most of the graphs produced within a procedure are adequate for most situations, they sometimes lack those one or two extra features you need to really make your graphs stand out and impress your clients or customers. GTL, as an extension of ODS Graphics, allows users much greater flexibility in creating specialized graphs. In this class, you’ll learn how to use GTL to create complex, highly customized graphics that you could only dream about before. We’ll cover the different types of layouts provided by GTL as well as various types of plots. The focus of the course will be on three specific layouts (i.e., OVERLAY, GRIDDED and LATTICE) and the more common types of plots (e.g., BARCHART, BLOCKPLOT, BOXPLOT, HIGHLOWPLOT, SCATTERPLOT and SERIESPLOT). Through the use of detailed examples, you will learn how to build your own template to make customized graphs and how to create that one highly desired, unique graph that at first glance seems impossible.

A Step-by-Step Introduction to Data Cleaning Using Excel, Python, R, and SAS®

Kirk Paul Lafler
Wednesday, June 4, 2025, 1:00 PM – 5:00 PM

If you are spending too much time and money dealing with data quality issues, then this seminar is for you. SAS® users often turn to off-the-shelf or user-built tools to handle messy data issues. Unfortunately, and all too often, many tools in use today fall short and/or have steep learning curves to master. This seminar explores the problems found in data, the types of data quality issues, and the various programming techniques users can learn and use to clean their data, once and for all. Attendees learn how to check and clean character and numeric data issues; handle missing data; remove duplicate data based on the row’s values and/or keys; read and write date/time variables; apply data integrity rules to prevent messy data from continuing to creep into a spreadsheet, dataframe, and/or data set (or table); and automate the data cleaning process to identify and fix errors in data while improving scale.

Topics

The following topics are introduced in this seminar:

  • Data importation techniques to access, identify, and parse data in various data formats including CSV, XLSX, JSON, and XML.
  • Data preparation techniques for data discovery, data cleaning, and data transformation.
  • The automation of data cleaning can reduce the workload and save time, improve an organization’s productivity, help to enhance the cleaning process’s accuracy and consistency, and provide an organization with additional time to conduct data analysis and interpretation.
  • Excel data cleaning topics include the application of data importation, removing duplicates, standardizing formats, streamlining case, removing extraneous spaces, splitting delimited data, finding and replacing data values, extracting prefixes and suffixes, checking for spelling and typos, and imputing missing values.
  • Python data cleaning topics include the application of data importation, removing duplicates, standardizing formats, streamlining case, removing extraneous spaces, splitting delimited data, finding and replacing data values, extracting prefixes and suffixes, checking for spelling and typos, and imputing missing values.
  • R data cleaning topics include the application of data importation, removing duplicates, standardizing formats, streamlining case, removing extraneous spaces, splitting delimited data, finding and replacing data values, extracting prefixes and suffixes, checking for spelling and typos, and imputing missing values.
  • SAS data cleaning topics include the application of importing data with PROC IMPORT; examining PROC CONTENTS and Metadata; performing exploratory data analysis (EDA) with PROC FREQ, PROC MEANS, PROC PRINT, PROC SORT, PROC SQL, PROC SUMMARY, and DATA step logic and programming techniques including BY-group and FIRST. and LAST. processing; numerous SAS functions to clean data issues and anomalies; and macro language techniques.

Expanding My SAS Vocabulary with Hash Tables Syntax

Bart Jablonski
Wednesday, June 4, 2025, 1:00 PM – 5:00 PM

Unusual, for a “good old SAS language”, syntax of hash tables objects can be uncomfortable, hard to accept or to “get used to” for some programmers. The aim of the workshop is to convince those people that advantages growing up out of the hash tables facility overgrow that vague disadvantages by three orders of magnitude.

During the tutorial:

  1. general overview and analogies with SAS arrays
  2. syntax of the hash object
  3. and a bunch of use cases

will be presented.

Considerations When Choosing the Optimum ADaM Structure for a Specific Statistical Analysis

Mario Widel & Veronica Gonzalez
Wednesday, June 4, 2025, 1:00 PM – 5:00 PM

ADaM datasets must be designed with the fundamental principles of traceability and analysis readiness in mind. Traceability provides linkages and context as to how the data flows from SDTM to ADaM via datapoint or metadata. Analysis readiness ensures that datasets contain all pertinent information and are structured to perform the statistical analysis with very little additional programming.

Most of the time, analysis requirements lead to a specific and unambiguous ADaM approach. Statistical analyses involving change from baseline can only be done in ADaM Basic Dataset Structure (BDS) dataset, e.g., Lab or Vital Sign. Analyses that involve counts/frequencies of hierarchical data, like dictionary data, can best be completed in ADaM Occurrence Data Structures (OCCDS) dataset, e.g. Adverse Events, Concomitant Medications, and Medical History.

However, there are circumstances where perfectly clear analysis requirements do not lead to an obvious, optimum approach. This seminar will present analysis situations where the best approach may be unclear. Cases will be explored where more than one candidate data structure may be appropriate, including, but not limited to:

    • Some SDTM interventions and events which could be analyzed using either BDS or OCCDS such as protocol deviations and exposure
    • Baseline characteristics in ADSL or a BDS dataset (ADBASE)
    • Analyses involving more than one dataset structure

Advantages and disadvantages will be discussed per case, while highlighting the preservation of the ADaM fundamental principles and ADaM rules, as feasible. To enhance the learning experience, examples will be scrutinized, and exercises will be provided to deepen the participants’ understanding.

Instructors

Phil Bowsher

Phil Bowsher

Director of Healthcare and Life SciencesPosit/RStudioRead Bio
Shiyu Chen

Shiyu Chen

Data Solutions EngineerAtorus ResearchRead Bio
Rammprasad Ganapathy

Rammprasad Ganapathy

Principal Data ScientistRoche/GenentechRead Bio
Prafulla Girase

Prafulla Girase

Senior DirectorAlexion AstraZeneca Rare DiseaseRead Bio
Veronica Gonzalez

Veronica Gonzalez

Senior Principal AnalystBiogenRead Bio
Troy Martin Hughes

Troy Martin Hughes

SAS AuthorData Llama AnalyticsRead Bio
David Izard

David Izard

Principal Submission ConsultantMerckRead Bio
Bart Jablonski

Bart Jablonski

SAS ConsultantRead Bio
Lex Jansen

Lex Jansen

Senior Director, Data Science DevelopmentCDISCRead Bio
Kirk Paul Lafler

Kirk Paul Lafler

Consultant, Developer, Programmer, Data Scientist, Educator, and AuthorRead Bio
Kevin Lee

Kevin Lee

Senior Director, Biometrics & Data ScienceClinviaRead Bio
Karl Miller

Karl Miller

Senior Data Standards EngineerIQVIARead Bio
Soumya Rajesh

Soumya Rajesh

Sr. Standards EngineerIQVIARead Bio
Charu Shankar

Charu Shankar

Senior Technical Training ConsultantSASRead Bio
Richann Watson

Richann Watson

Statistical Programmer and CDISC ConsultantDataRich ConsultingRead Bio
Mario Widel

Mario Widel

Statistical ProgrammerCDISCRead Bio