
What we’re about
đź–– This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events (4+)
See all- Network event375 attendees from 39 groups hostingAugust 7 - Understanding Visual AgentsLink visible for attendees
Join us for a virtual event to hear talks from experts on the current state of visual agents.
When
Aug 7, 2025 at 9 AM Pacific
Where
Virtual. Register for the Zoom.
Foundational capabilities and models for generalist agents for computers
As we move toward a future where language agents can operate software, browse the web, and automate tasks across digital environments, a pressing challenge emerges: how do we build foundational models that can act as generalist agents for computers? In this talk, we explore the design of such agents—ones that combine vision, language, and action to understand complex interfaces and carry out user-intent accurately.
We present OmniACT as a case study, a benchmark that grounds this vision by pairing natural language prompts with UI screenshots and executable scripts for both desktop and web environments. Through OmniACT, we examine the performance of today’s top language and multimodal models, highlight the limitations in current agent behavior, and discuss research directions needed to close the gap toward truly capable, general-purpose digital agents.
About the Speaker
Raghav Kapoor is a machine learning at Adobe, where he works on the Brand Services team, contributing to cutting-edge projects in brand intelligence. His work blends research with machine learning, reflecting his deep expertise in both areas. Prior to joining Adobe, Raghav earned his Master’s degree from Carnegie Mellon University, where his research focused on multimodal machine learning and web-based agents. He also brings industry experience from his experience as a strategist at Goldman Sachs India.
BEARCUBS: Evaluating Web Agents' Real-World Information-Seeking Abilities
The talk focuses on the challenges of evaluating AI agents in dynamic web settings, the design and implementation of the BEARCUBS benchmark, and insights gained from human and agent performance comparisons. In the talk, we will discuss the significant performance gap between human users and current state-of-the-art agents, highlighting areas for future improvement in AI web navigation and information retrieval capabilities.
About the Speaker
Yixiao Song is a Ph.D. candidate in Computer Science at the University of Massachusetts Amherst. Her research focuses on enhancing the evaluation of natural language processing systems, particularly in assessing factuality and reliability in AI-generated content. Her work encompasses the development of tools and benchmarks such as VeriScore, an automatic metric for evaluating the factuality of long-form text generation, and BEARCUBS, a benchmark for assessing AI agents' ability to identify factual information from web content.
Visual Agents: What it takes to build an agent that can navigate GUIs like humans
We’ll examine conceptual frameworks, potential applications, and future directions of technologies that can “see” and “act” with increasing independence. The discussion will touch on both current limitations and promising horizons in this evolving field.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
Implementing a Practical Vision-Based Android AI Agent
In this talk I will share with you practical details of designing and implementing Android AI agents, using deki.
From theory we will move to practice and the usage of these agents in
industry/production.For end users - remote usage of Android phones or for automation of standard tasks. Such as:
- "Write my friend 'some_name' in WhatsApp that I'll be 15 minutes late"
- "Open Twitter in the browser and write a post about 'something'"
- "Read my latest notifications and say if there are any important ones"
- "Write a linkedin post about 'something'"
And for professionals - to enable agentic testing, a new type of test that only became possible because of the popularization of LLMs and AI agents that use them as a reasoning core.
About the Speaker
Rasul Osmanbayli is a senior Android developer at Kapital Bank, Baku/Azerbaijan. It is the largest private bank in Azerbaijan. He created deki, an Image Description model that was used as a foundation for an Android AI agent that achieved high results on 2 different benchmarks: Android World and Android Control.
He previously worked in Istanbul/TĂĽrkiye for various companies as an
Android and Backend developer. He is also a MS at Istanbul Aydin University in Istanbul/TĂĽrkiye. - Network event217 attendees from 44 groups hostingAug 15 - Visual Agent Workshop Part 1: Navigating the GUI Agent LandscapeLink visible for attendees
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 15, 2025 at 9 AM Pacific
Part 1: Navigating the GUI Agent Landscape
Understanding the Foundation Before Building
The GUI agent field is evolving rapidly, but success requires an understanding of what came before. In this opening session, we'll map the terrain of GUI agent research—from the early days of MiniWoB's simplified environments to today's complex, multimodal systems tackling real-world applications. You'll discover why standard vision models fail catastrophically on GUI tasks, explore the annotation bottlenecks that make GUI datasets so expensive to create, and understand the platform fragmentation that makes "click a button" mean twenty different things across datasets.
We'll dissect the most influential datasets (Mind2Web, AITW, Rico) and models that have shaped the field, examining their strengths, limitations, and the research gaps they reveal. By the end, you'll have a clear picture of where GUI agents excel, where they struggle, and, most importantly, where the opportunities lie for your own contributions.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
- Aug 20 - In-Person - Raleigh AI, ML and Computer Vision MeetupTEKsystems, Raleigh, NC
Join us for an evening of talks from experts on AI, ML, and Computer Vision co-presented by Voxel51 and Teksystems.
Date and Time
- Aug 20, 2025
- 5:30 PM - 8:30 PM
Location
TEKsystems
4300 Edwards Mill Rd
Raleigh, NCAdapting Vision Foundation Models to Medical Imaging: Strategies and Clinical Applications
Foundation models like SAM and DINO-v2 have shown strong performance on natural image tasks. However, when applied directly to medical imaging, they often underperform due to domain shifts, limited labeled data, and modality-specific challenges. This raises an important question: how can we adapt foundation models to work reliably and meaningfully in medical images?
In this talk, I will share our research efforts toward answering that question. I will begin by exploring several fine-tuning strategies for different data scenarios, ranging from few-shot labeled examples to large collections of unlabeled scans. These strategies aim to help identify the optimal adaptation framework under various data availability settings. I will then introduce a series of models we developed based on these insights. SegmentAnyBone and SegmentAnyMuscle are two SAM-based models designed for accurate bone and muscle segmentation across all body locations and a wide range of MRI sequences. MRI-Core is a self-supervised model that learns general-purpose MRI features from unlabeled data and can be easily adapted to multiple downstream tasks.
Finally, I will present a clinical application where one of these models is used to support abdominal surgical risk prediction. This example shows how I have explored using these models to contribute to real-world clinical decision-making. I hope this talk can share some of my experiences in building foundation models that are both practical for research and adaptable to clinical settings and to spark new insights and discussions in this field!
About the Speaker
Hanxue Gu is a 5th year Ph.D. student in Electrical and Computer Engineering at Duke University, working at the intersection of AI and Healthcare. I am fortunate to be advised by Prof. Maciej A. Mazurowski under Duke Spark Initiative. My research sits at the intersection of machine learning and healthcare, with a focus on developing and adapting deep learning methods for medical image analysis—from application-oriented tools to foundational advancements.
Bias & Batch Effects in Medical Imaging
Medical AI models can exhibit concerning biases, such as the ability to predict race from radiology images, which is impossible for human experts. This talk will examine bias and batch effects in medical imaging, beginning with a histopathology case study to illustrate the origins of some of these biases. I'll cover detection methods, such as exploratory data analysis, and mitigation strategies, including careful cross-validation and model-level interventions. While research has shown that foundation models reduce some biases, they don't eliminate the problem entirely. Bias represents a fundamental challenge in medical AI requiring early detection, careful validation, and tailored mitigation approaches.
About the Speaker
Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.
Managing Medical Imaging Datasets: From Curation to Evaluation
High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.
We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.
Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.
Learning with Small Datasets in Real-World Medical Imaging Applications
The talk will explore approaches that are effective when developing computer vision models for real-world medical imaging applications in situations where available datasets are limited in size. This setting is of special interest because the most challenging prediction problems in the medical domain have usually low incidence rates, thus resulting in (relatively) small datasets. Specifically, it will consider data-efficient architectures, multi-task learning and data augmentation through pseudo interventions. For illustration, a use case in which volumetric ophthalmology images are used to predict geographic atrophy conversion will be discussed.
About the Speaker
Ricardo Henao, a quantitative scientist, is an Associate Professor in the department of Biostatistics and Bioinformatics, Department of Electrical and Computer Engineering (ECE), Surgery, member of the Information Initiative at Duke (iiD), Duke AI Health and the Duke Clinical Research Institute (DCRI), all at Duke University. He also serves as the Associate Director of Clinical Trials AI at DCRI. His recent work has been focused on the development of machine learning models, predominantly deep learning and representation learning approaches, for the analysis and interpretation of clinical and biological data with applications to predictive modeling for diverse clinical outcomes.
- Network event52 attendees from 26 groups hostingAug 21 - AI, ML and Computer Vision Meetup en EspañolLink visible for attendees
Hear talks from experts on cutting-edge topics in AI, ML and Computer Vision Meetup en Español.
Date and Time
Aug 21 at 9 AM Pacific
Location
Virtual. Register for the Zoom
Quiero ser parte del mundo de AI, como lo logro?
En esta charla, compartirĂ© mi trayectoria personal hacia el mundo de la inteligencia artificial (IA), comenzando con mi formaciĂłn como ingeniero electrĂłnico y mi doctorado en neuroinformática. DestacarĂ© cĂłmo mi tesis laureada sobre modelos volumĂ©tricos realistas para la localizaciĂłn precisa de fuentes EEG abriĂł puertas a oportunidades en procesamiento digital y visiĂłn 3D. Con experiencia docente en la Universidad Nacional de Colombia y certificaciones en machine learning y deep learning, discutirĂ© cĂłmo estos hitos me llevaron a desempeñarme como desarrollador de currĂculo para DeepLearning.AI, ofreciendo valiosas lecciones para quienes deseen seguir un camino similar.
Presentador
Ernesto Cuartas es un ingeniero electrĂłnico y PhD en neuroinformática. Tesis PhD laureada “Forward volumetric modeling framework for realistic head models towards accurate EEG source localization”. Profesor asociado Universidad Nacional de Colombia. Experto en implementaciĂłn y desarrollo de proyectos en procesamiento digital de señales, procesamiento digital de imágenes, visiĂłn 3D, computaciĂłn gráfica, geometrĂa computacional, fotogrametrĂa e inteligencia artificial. Con certificaciones profesionales en machine learning, deep learning y data engineering. Actualmente trabajo como curriculum developer/engineer para DeepLearning.AI.
Domina tus Datos MĂ©dicos: De la CuraciĂłn al Impacto ClĂnico
Los datos de alta calidad son la base de un aprendizaje automático efectivo en el ámbito de la salud. Esta charla presenta estrategias prácticas y técnicas emergentes para gestionar datasets de imágenes médicas, desde la generación de datos sintéticos y la curación, hasta la evaluación y el despliegue.
Comenzaremos con casos de estudio reales de investigadores y profesionales que están transformando sus flujos de trabajo en imágenes médicas mediante prácticas centradas en los datos. Luego pasaremos a un tutorial práctico utilizando FiftyOne, la plataforma open-source para la inspección visual de datasets y la evaluación de modelos. Los asistentes aprenderán a cargar, visualizar, curar y evaluar datasets médicos en distintos tipos de imágenes.
Ya seas investigador, clĂnico o ingeniero de ML, esta charla te brindará herramientas e ideas prácticas para mejorar la calidad de tus datos, la fiabilidad de tus modelos y su impacto clĂnico.
Presentadora
Paula Ramos tiene un doctorado en VisiĂłn Artificial y Aprendizaje Automático, con más de 20 años de experiencia en el campo tecnolĂłgico. Desde principios de la dĂ©cada del 2000 en Colombia, ha desarrollado novedosas tecnologĂas integradas de ingenierĂa, principalmente en VisiĂłn Artificial, robĂłtica y Aprendizaje Automático aplicados a la agricultura.
Agentes AI Multi-Fuente y Embebidos
Demostraré cómo construir agentes de IA contextualmente conscientes, capaz de responder y tomar acciones entre multiples sistemas privados y la implementación de RAG semántico a través de fuentes de datos dispares, embebidos en sistemas existentes, todo esto sin necesidad de una infraestructura compleja de MLOps.
Presentador
Kevin Blanco es un Senior DevRel Advocate, Charlista Internacional con más de 15 años en liderazgo tecnológico. Ha diseñado estrategias de IA en IBM Watson y desarrollado soluciones para Google, Microsoft y Nintendo.
Más allá del modelo: MetodologĂa y buenas prácticas para liderar proyectos exitosos de IA con CPMAI
El Ă©xito de los proyectos de IA no depende solo del modelo o de los datos, sino de cĂłmo se gestionan desde el inicio. En esta charla exploraremos la metodologĂa CPMAI (Cognitive Project Management for AI) avalada por el Project Management Institute - PMI, un marco estructurado que permite a los equipos de IA alinear sus iniciativas con objetivos de negocio, gestionar riesgos Ă©ticos y mejorar los resultados. Compartiremos buenas prácticas que pueden ser adaptadas por profesionales tĂ©cnicos para mejorar la entrega de valor en cada fase del proyecto e implementar soluciones de IA Ă©ticas y responsables.
Presentadora
Ivonne MejĂa B. es especialista en gestiĂłn de proyectos tecnolĂłgicos, con más de 20 años de experiencia internacional en el sector privado y acadĂ©mico en MĂ©xico, Canadá y Estados Unidos. Está certificada en CPMAI™, PMP®, Prosci®, y cuenta con un diplomado en Liderazgo TecnolĂłgico por UC Berkeley. Disfruta colaborar, aprender en comunidad y compartir su experiencia para ayudar a las organizaciones a definir estrategias de transformaciĂłn con IA y liderar soluciones Ă©ticas y responsables.
Past events (49)
See all- Network event457 attendees from 37 groups hostingJuly 24 - Women in AIThis event has passed