Privacy and Security in the Age of Generative AI

Privacy and Security in the Age of Generative AI is a talk that I gave at ODSC West 2024 in Burlingame, California. The slides of the talk are below: Abstract From sensitive data leakage to prompt injection and zero-click worms, LLMs and generative models are the new cyber battleground for hackers. As more AI models are deployed in production, data scientists and ML engineers can’t ignore these problems. The good news is that we can influence privacy and security in the machine learning lifecycle using data specific techniques. In this talk, we’ll review some of the newest security concerns affecting LLMs and deep learning models and learn how to embed privacy into model training with ACLs and differential privacy, secure text generation and function-calling interfaces, and even leverage models to defend other models. ...

October 30, 2024 · 3 min · 531 words · Benjamin Bengfort

Smart Global Replication Using Reinforcement Learning

Smart Global Replication using Reinforcement Learning is a talk that I gave at KubeCon + CloudNative North America 2023 in Chicago, IL. The video of the talk is below: Description There are many great reasons to replicate data across Kubernetes clusters in different geographic regions: e.g. for disaster recovery and to ensure the best possible user experiences. Unfortunately, global replication is not easy; not just because of the difficulty in consistency reasoning that it introduces, but also due to the increased cost of provisioning multiple volumes that exponentially duplicate ingress and egress. Wouldn’t it be great if our systems could learn the optimal placement of storage blocks so that total replication was not necessary? Wouldn’t it be even better if our replication messaging was reduced ensuring communication only between the minimally necessary set of storage nodes? We show a system that uses multi-armed bandits to perform such an optimization; dynamically adjusting how data is replicated based on usage. We demonstrate the savings achieved and system performance using a real world system: the TRISA Global Travel Rule Compliance Directory. ...

November 7, 2023 · 1 min · 178 words · Benjamin Bengfort

DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort)

DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort) Description How do distributed systems work? If you’ve got a database spread over three servers, how do they elect a leader? How does that change when we spread those machines out across data centers, situated around the globe? Do we even need to understand how it works, or can we relegate those problems to an off the shelf tool like Zookeeper? Joining me this week is Distributed Systems Doctor—Benjamin Bengfort—for a deep dive into consensus algorithms. We start off by discussing how much of “the clustering problem” is your problem, and how much can be handled by a library. We go through many of the constraints and tradeoffs that you need to understand either way. And we eventually reach Benjamin’s surprising message - maybe the time is ripe to roll your own. Should we be writing our own bespoke Raft implementations? And if so, how hard would that be? What guidance can he offer us? Somewhere in the recording of this episode, I decided I want to sit down and try to implement a leader election protocol. Maybe you will too. And if not, you’ll at least have a better appreciation for what it takes. Distributed systems used to be rocket science, but they’re becoming deployment as usual. This episode should help us all to keep up! ...

August 30, 2023 · 2 min · 227 words · Benjamin Bengfort

Visual Diagnostics for More Effective Machine Learning

Visual Diagnostics for More Effective Machine Learning Description Modeling is often treated as a search activity: find some combination of features, algorithm, and hyperparameters that yields the best score after cross-validation. In this talk, we will explore how to steer the model selection process with visual diagnostics and the Yellowbrick library, leading to more effective and more interpretable results and faster experimental workflows. ...

January 10, 2019 · 1 min · 63 words · Benjamin Bengfort

Understanding Machine Learning Through Visualizations with Benjamin Bengfort and Rebecca Bilbro - Episode 166

Understanding Machine Learning Through Visualizations with Benjamin Bengfort and Rebecca Bilbro - Episode 166 Description Machine learning models are often inscrutable and it can be difficult to know whether you are making progress. To improve feedback and speed up iteration

June 17, 2018 · 1 min · 40 words · Benjamin Bengfort

Visual Pipelines for Text Analysis

Visual Pipelines for Text Analysis Description Employing machine learning in practice is half search, half expertise, and half blind luck. In this talk we will explore how to make the luck half less blind by using visual pipelines to steer model selection from raw input to operational prediction. We will look specifically at extending transformer pipelines with visualizers for sentiment analysis and topic modeling text corpora.

June 24, 2017 · 1 min · 66 words · Benjamin Bengfort

Data Product Architectures: O'Reilly Webinar

Data Product Architectures: O’Reilly Webinar Description Data products derive their value from data and generate new data in return. As a result, machine-learning techniques must be applied to their architecture and development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. ...

December 7, 2016 · 1 min · 175 words · Benjamin Bengfort