PharmaSUG SDE Boston 2025

PharmaSUG SDE Boston 2025

PharmaSUG Single-Day Event

Open Source Advancements in Clinical Reporting

Wednesday, October 15, 2025

Microsoft New England
Horace Mann & Abigail Adams Conference Room
1 Memorial Drive
Cambridge, MA 02142

Many thanks to all of our presenters and attendees for making the Boston SDE a success, and join us as we return to Boston for PharmaSUG 2026!

Richard Allen
Peak Statistical Services
Single-Day Event Co-Chair

Natalie Martinez
Eli Lilly & Company
Single-Day Event Co-Chair

Conference Committee:
Richard Allen (Peak Statistical Services), Natalie Martinez (Eli Lilly & Company), Eric Larson (IQVIA)

Social Media:
Richann Watson

Questions? Contact us!

Registration and Rates

Registration fee is $99.  The registration deadline has been extended to Friday, October 10, 2025!

Please note that you must register for the event online by 11:45 PM ET on Friday, October 10, 2025. 

Onsite registrations are not allowed.

Cancellation Policy

Cancellations can be requested by emailing the registrar at bostonsde@pharmasug.org. Cancellations on or before October 15, 2025 will be refunded minus a $25 fee. Refunds will be issued by the same form of payment received. No refunds will be available after October 15, 2025.

Host

Sponsors TBA

Download our Sponsorship Opportunities Flyer and sign up as a sponsor!  You can also apply for sponsorship by downloading and returning this PDF form.

Event Schedule

Wednesday, October 15, 2025 | Single-Day Event Presentations

Presentation Title (click to download slides)Speaker
Posit Updates for Pharma: Audited Workbench Jobs & MoreSean Sinnott, Posit
Validating R for Pharma – Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical WorkAnuja Das, Biogen
siera - Redefining TFL AutomationShivani Gupta & Anna Yaggi, Clymb Clinical
From Pilot to Production: The Generic Safety App as a Validated Platform for Clinical Trial Safety ReviewBo Wang, Novartis
Interactive and Automated Generation of Clinical Study Reports (CSRs) Using {quarto} and {shiny}Peng Zhang, CIMS Global
From SAS to Cursor: Vibe-Coding into SAS, R & Python in Data AnalysisKevin Lee, Clinvia
PROC RCode: Coming Soon to a Theater Near You - This Movie is Rated R for SASsy LanguageJoe Madden, SAS
AI and ML in Clinical Trials: Enhancing Efficiency and QualitySaurabh Das, Tata Consultancy
Gen AI in Biometrics with Open Source Programming: Transforming Clinical Trials with Supercharged Efficiency and InnovationKevin Lee, Clinvia
Utilizing Open-Source Technologies and Quarto Dashboards to Enhance Study Oversight and Accelerate Critical DeliverablesMargaret Wishart, BMS
Charting Your AI Journey: A Roadmap for Supervised, Unsupervised, and Generative Learning Through Machine Learning and Deep LearningRyan Lafler & Miguel Angel Bravo, Premier Analytics Consulting LLC
BIMO Subject-level Data Line Listings Auto Generation with RWeishan Song, Vertex
AI Agents for Accelerating Data Analysis in Clinical DevelopmentXing Chen & Weijie Zhang, Moderna

Presentation Descriptions

From SAS to Cursor: Vibe-Coding into SAS, R & Python in Data Analysis

Kevin Lee, Clinvia

The rapid evolution of generative AI is reshaping how programming and data analysis are performed across industries, and the programming of Biometrics team is no exception. Vibe coding, a novel paradigm that leverages conversational AI tools for code generation, debugging, and automation, is emerging as a transformative force in the programming workflow – for example, Microsoft CEO Satya Nadella estimated in April 2025 that 20% to 30% of Microsoft’s code was generated by AI.

This presentation introduces the concept of vibe coding and explores its application in the Biometrics Department, where traditional tools like SAS, R, and Python remain foundational. The presentation will discuss how vibe coding can be integrated with established statistical coding environments for data analysis and visualization. First, the presentation will demonstrate how prompt engineering could be applied in popular AI web platforms like ChatGPT, Gemini and Claude. Secondly, the presentation will examine AI coding agents such as Cursor, Windsurf, GitHub Copilot and more where these tools can offer inline code suggestions, integration and completion with real-time collaborative coding with AI. The presentation also introduces cases where customized AI agents are developed to support data analysis, review and visualization, and even suggesting code validation steps across SAS, R, and Python environments. The presentation also examines the benefits, including enhanced productivity, democratization of complex tasks, and reduced coding overhead potential risks such as hallucinated outputs, compliance concerns, and the need for human oversight. The presentation concludes with a forward-looking view on the evolving role of statistical programmers in the era of Vibe coding. It offers a vision where vibe coding augments, not replaces, human expertise, empowering Biometrics departments to focus more on scientific insight, data-driven strategy, and cross-functional collaboration.

Charting Your AI Journey: A Roadmap for Supervised, Unsupervised, and Generative Learning Through Machine Learning and Deep Learning

Ryan Lafler and Miguel Angel Bravo, Premier Analytics Consulting LLC

Machine learning (ML) continues to reshape business, technology, science, and research across all industries, with its adoption enabling systems to learn from data, automate decisions, and generate insights. This paper presents a structured roadmap through three core domains of machine learning that are increasingly adopted by organizations: supervised, unsupervised, and generative learning. Along this roadmap, readers will identify key algorithms and architectures within each domain and understand the role of parameters and hyperparameters in mitigating overfitting and underfitting. The discussion includes examples of predictive modeling on labeled data using supervised algorithms, knowledge discovery from unlabeled data using unsupervised algorithms, and the extension of these capabilities through generative learning, which enables systems to extract insights and produce new content or data representations. The paper concludes by introducing three generative architectures that define the state of AI in 2025: encoder models (BERT), decoder models (LLMs), and encoder-decoder models (T5), and describes how each supports advanced AI tasks including representation learning, language generation, natural language processing (NLP), text summarization, and translation.

AI and ML in Clinical Trials: Enhancing Efficiency and Quality

Saurabh Das, Tata Consultancy

Clinical trials have increased in complexity and duration. Incorporating data sources, including wearables, electronic health records, mobile health applications, genetic sequencing, and diagnostic imaging, expand the breadth of information collected. Maintaining data quality, efficiency, and deriving actionable insights remain ongoing considerations. Artificial intelligence and machine learning along with Generative AI, are increasingly being used to improve processes related to these challenges, contributing to faster and more efficient clinical trials. These technologies are expected to have a notable impact on research and development within life sciences and healthcare. For instance, a recent McKinsey study reported that approximately 270 companies are involved in AI-driven drug discovery, while others focus on areas such as clinical trial management and reporting.

This paper outlines practical strategies for global industries to implement open-source AI/ML software, tools, and frameworks in their products and platforms. This practice can support scalability and potentially improve accuracy, efficiency, safety, and speed in clinical trials. Some key use cases in clinical research and development are:

  • Monitoring Visit Reports Insights and Analytics: Natural language processing (NLP) models organize insights from unstructured reports, assisting in the identification of compliance risks and operational inefficiencies.
  • Anomaly Detection: Machine learning algorithms identify outliers and protocol deviations, supporting data integrity and patient safety.
  • Patient Profile Scoring: Predictive models assess patients based on adherence, risk factors, and medical history, helping facilitate targeted interventions and analysis of dropout rates and efficacy.
  • Site Risk Profiling: Dashboards utilizing aggregate site metrics generate dynamic risk profiles to inform monitoring and resource allocation, increasing efficiency by 30%.
  • One-Touch Statistical Reporting: Automated processes enable timely statistical report generation, reducing efforts from 6 weeks to 1 day.

In conclusion, the integration of artificial intelligence and machine learning into clinical trial operations marks a transformative period for the industry, promising significant advances in speed, accuracy, and safety. Looking ahead, the emergence of Generative AI—capable of synthesizing complex datasets, generating clinical documentation, and even simulating patient populations—offers new horizons for protocol design, data analysis, and regulatory submissions. Meanwhile, Agentic AI systems, distinguished by their autonomous decision-making and adaptive learning, have the potential to orchestrate end-to-end trial activities, proactively identify risks, and optimize resource allocation without constant human intervention. Together, these innovations stand to redefine the boundaries of clinical research, enabling more personalized, efficient, and reliable trials. As open-source platforms and collaborative frameworks mature, the adoption of Generative AI and Agentic AI will not only enhance data-driven insights but also foster a more agile and responsive clinical trial ecosystem. Embracing these technologies will be essential for organizations seeking to stay at the forefront of life sciences and deliver transformative benefits to global healthcare.

siera - Redefining TFL Automation

Shivani Gupta and Anna Yaggi, Clymb Clinical

The industry is steadily advancing toward TFL automation, yet automating analysis results has remained a persistent challenge. With the release of the Analysis Results Standard (ARS) by CDISC in April 2024, we now have an opportunity to build automation tools on a common standard. ARS metadata is structured and machine-readable, making it highly compatible with Gen-AI. By combining ARS metadata with study documents such as the Statistical Analysis Plan (SAP) and TFL mock shells, Gen-AI can generate complete programming code in R and SAS, ready for integration and customization with your company standards.

This presentation introduces siera, an open-source R package that ingests ARS metadata and generates ready-to-run, transparent, and modifiable R code to create Analysis Results Datasets (ARDs). Unlike black-box solutions, siera produces fully inspectable code, giving users both flexibility and control. Together, ARS, siera, and Gen-AI can bring us closer to true end-to-end TFL automation powerful, efficient, and standards-driven

BIMO Subject-level Data Line Listings Auto Generation with R

Weishan Song, Vertex

The U.S. Food and Drug Administration (FDA) requires a Bioresearch Monitoring (BIMO) package for pivotal studies to support the planning and conduct of clinical site inspections. A key component is the subject-level data line listings by site, which provide site-specific subject data underpinning safety and efficacy assessments. These listings are often produced by manually programming outputs from SDTM/ADaM datasets for each site and each listing, a resource-intensive process that can introduce variability and misalignment with the clinical study report (CSR).

We developed an R tool that automatically generates the subject-level data line listings by site using an output-to-output approach. The tool ingests finalized CSR listings, extracts content, page layout, and formatting, and reproduces the required BIMO listings for every site without reprogramming listing specifications. This preserves the CSR’s structure, maintains consistency between CSR and BIMO outputs, and standardizes presentation.
By automating and formalizing listing generation, the tool meets regulatory requirements and standards, reduces manual effort, simplifies quality control, and improves traceability from CSR to BIMO deliverables.

Gen AI in Biometrics with Open Source Programming: Transforming Clinical Trials with Supercharged Efficiency and Innovation

Kevin Lee, Clinvia

The pharmaceutical industry stands at the beginning of a transformative era, with Generative AI (e.g., ChatGPT) revolutionizing clinical trial development. It will explore the integration of Generative AI in Biometrics, highlighting its potential to streamline workflows, redefine clinical trial development, and lead innovation.

The presentation will start with the introduction of Generative AI and its wide-ranging applications in Biometrics, such as information query, codes generation (e.g., SAS, R & Python), codes conversion (e.g. SAS to R/Python), document generation (e.g., SAP, CSR), data analysis, data visualization, patient profiling and many more. It will also explore the tools, systems, processes and people that are reshaping clinical trials with Gen AI integration. Looking toward the future, the presentation will evaluate the lasting impact of Generative AI on Biometrics department. By strategically adopting these cutting-edge technologies, Biometrics teams can dramatically enhance operational efficiency, optimize trial outcomes, and expedite regulatory approval processes. It will culminate in a forward-looking exploration of how Biometrics teams can evolve into “super Biometrics teams”—leveraging Generative AI to achieve unprecedented levels of innovation, precision, and effectiveness in clinical trial development.

Interactive and Automated Generation of Clinical Study Reports (CSRs) Using {quarto} and {shiny}

Peng Zhang, CIMS Global

The preparation of Clinical Study Reports (CSRs) for regulatory submission often requires medical writers to manually integrate tables, figures, and listings (TFLs) into narrative documents. This process can be time-consuming and prone to human error. To address this challenge, we leverage open-source solutions including {quarto} and {shiny} for medical writers to generate the report. By interacting with the study outputs, medical writers can efficiently produce customizable contents in the desired format. In this presentation, we will demonstrate the workflow, illustrate feasibility through practical examples, and discuss key development practices.

PROC RCode: Coming Soon to a Theater Near You - This Movie is Rated R for SASsy Language

Joe Madden, SAS

In the world of data analysis, two mighty warriors stand tall: R, the shining star for statistical open-source freedom, and SAS, the seasoned veteran. Join us as we dive into the epic showdown between these two data titans. Will R’s quirky charm and endless packages win the day, or will SAS’s polished prowess and enterprise support reign supreme? Expect a rollercoaster of code, a sprinkle of statistical magic, and a whole lot of nerdiness as we explore the strengths, quirks, and the things we obsess over for these beloved tools. Whether you’re an R enthusiast, a SAS devotee, or just here for the data drama, this presentation promises to be a data duel for the ages!

Validating R for Pharma – Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical Work

Anuja Das, Biogen

R Validation Hub is a cross-industry collaboration to support the adoption of R within a biopharmaceutical regulatory setting through appropriate tools and resources that leverage the open source, collaborative nature of the language. Using R in submissions to healthcare regulators often requires documentation showing that the quality of the programming packages used was adequately assessed. This can pose a challenge in R where many of the commonly used tools are open source. Through this presentation, we will highlight the R Validation Hub’s risk assessment framework for R packages that has been utilized by key pharma companies across the industry. We also showcase the products our working groups have developed including the {riskmetric} R package that evaluates the risk of an R package using a specified set of metrics and validation criteria, and the {riskassessment} app that augments the utility of the {riskmetric} package within a Shiny app front end. Lastly, we will illustrate a prototype of a technical framework to maintain a ‘repository’ of R packages with accompanying evidence of their quality and the assessment criteria. All of our work is designed to facilitate the use of R within a highly regulated space and ease the burden of using R packages within a validated environment.

AI Agents for Accelerating Data Analysis in Clinical Development

Xing Chen and Weijie Zhang, Moderna

Artificial intelligence (AI) is transforming drug development by accelerating how clinical and statistical teams interact with data. This talk presents an AI agent framework designed to deliver rapid, flexible analysis with clinical data. The agent leverages a multi-LLM architecture to interpret natural language questions and translate them into actionable workflows. Statistical modeling and visualization are executed natively in R, ensuring alignment with common toolsets for clinical data analyses while maintaining trackability and reproducibility for high quality outputs. This approach enables automatic construction of graph databases, dynamic retrieval of records, generation of reproducible statistical models, and automated production of publication-quality plots. By bridging AI-driven reasoning with R-based analytics, the system empowers clinical teams to obtain insights in minutes rather than days, enhancing decision-making in drug development pipelines.

From Pilot to Production: The Generic Safety App as a Validated Platform for Clinical Trial Safety Review

Bo Wang, Novartis

The Generic Safety App is a customizable tool that streamlines interactive safety data review in clinical trials. Initially introduced as a pilot, it has now completed full validation and is available as a scalable, production-ready solution.
Trial teams can quickly configure trial-specific instances without coding, supported by training and user guides. This ease of setup reduces deployment time while ensuring consistent, transparent safety analyses. Validation under the Scientific Software Validation Working Practice ensures compliance, and subsequent use in new trials requires no additional validation. During its pilot phase, the Generic Safety App supported over 20 Data Monitoring Committee (DMC) meetings and, after successful validation in Q4 2024, has been implemented in more than 10 standardized deployments. These deployments span multiple Disease Units, including Immunology, Oncology, and Cardio-Renal-Metabolism, with expansions to additional areas such as Neuroscience. By advancing from pilot to fully validated platform, the Generic Safety App illustrates how standardized, scalable tools can enhance collaboration, support external partners, and promote innovative approaches to safety data review – enabling faster, more informed decision-making in clinical development.

Posit Updates for Pharma: Audited Workbench Jobs & More

Sean Sinnott, Posit

This talk will provide an overview of some recent Pharma updates from Posit. In particular, the talk will focus on Workbench, and a new feature: Audited Workbench Jobs. Audited Jobs enables R and Python code to run with cryptographically signed outputs and environment details for verifiable, reproducible results, crucial for Pharma use cases. This talk will highlight how Workbench computes a digital signature of the job’s output, verifying the integrity and authenticity of the data that runs in the Workbench job. Presentation will also discuss how Audited Workbench jobs capture information about the environment used to run the job, including R or Python versions as well as how to customize auditing by including user-defined information in the audit record. Audited Workbench Jobs and other Posit updates will be highlighted in the talk.

Presenters

Miguel Angel Bravo

Miguel Angel Bravo

ConsultantPremier Analytics Consulting LLCRead Bio
Xing Chen

Xing Chen

ModernaRead Bio
Anuja Das

Anuja Das

Technology Product SpecialistBiogenRead Bio
Saurabh Das

Saurabh Das

Senior ConsultantTata Consultancy ServicesRead Bio
Shivani Gupta

Shivani Gupta

Director of Stats ProgrammingClymb ClinicalRead Bio
Ryan Lafler

Ryan Lafler

Founder, C.E.O.Premier Analytics Consulting, LLCRead Bio
Kevin Lee

Kevin Lee

Senior Director, Biometrics & Data ScienceClinviaRead Bio
Joe Madden

Joe Madden

Senior Product ManagerSASRead Bio
Sean Sinnott

Sean Sinnott

Application DeveloperPositRead Bio
Weishan Song

Weishan Song

Senior Statistical Programmer IIVertexRead Bio
Bo Wang

Bo Wang

Data ScientistNovartisRead Bio
Anna Yaggi

Anna Yaggi

Product ManagerClymb ClinicalRead Bio
Peng Zhang

Peng Zhang

Associate Director, Innovative Data SciencesCIMS GlobalRead Bio
Weijie Zhang

Weijie Zhang

ModernaRead Bio