Posts

Implementing Function Calling LLMs without Fear

Implementing Function Calling LLMs without Fear is a talk that I gave at a C4AI/RealmOne Happy Hour Tech Meetup in Columbia, Maryland. The slides of the talk are below: Abstract Description: For an AI system to be an agent rather than a simple chatbot, it needs to be able to do work on behalf of its users, often accomplished through the use of Function Calling LLMs. Instruction-based models can identify external functions to call for additional input or context before creating a final response without the need for any additional training. However, giving an AI system access to databases, APIs, or even tools like our calendars is fraught with security concerns and task validation nightmares. In this talk, we’ll discuss the basics of how Function Calling works and think through the best practices and techniques to ensure that your agents work for you, not against you! ...

Privacy and Security in the Age of Generative AI

Privacy and Security in the Age of Generative AI is a talk that I gave at ODSC West 2024 in Burlingame, California. The slides of the talk are below: An updated presentation that I gave at C4AI on April 15, 2025 in Columbia, Maryland is below: Abstract From sensitive data leakage to prompt injection and zero-click worms, LLMs and generative models are the new cyber battleground for hackers. As more AI models are deployed in production, data scientists and ML engineers can’t ignore these problems. The good news is that we can influence privacy and security in the machine learning lifecycle using data specific techniques. In this talk, we’ll review some of the newest security concerns affecting LLMs and deep learning models and learn how to embed privacy into model training with ACLs and differential privacy, secure text generation and function-calling interfaces, and even leverage models to defend other models. ...

Smart Global Replication Using Reinforcement Learning

Smart Global Replication using Reinforcement Learning is a talk that I gave at KubeCon + CloudNative North America 2023 in Chicago, IL. The video of the talk is below: Description There are many great reasons to replicate data across Kubernetes clusters in different geographic regions: e.g. for disaster recovery and to ensure the best possible user experiences. Unfortunately, global replication is not easy; not just because of the difficulty in consistency reasoning that it introduces, but also due to the increased cost of provisioning multiple volumes that exponentially duplicate ingress and egress. Wouldn’t it be great if our systems could learn the optimal placement of storage blocks so that total replication was not necessary? Wouldn’t it be even better if our replication messaging was reduced ensuring communication only between the minimally necessary set of storage nodes? We show a system that uses multi-armed bandits to perform such an optimization; dynamically adjusting how data is replicated based on usage. We demonstrate the savings achieved and system performance using a real world system: the TRISA Global Travel Rule Compliance Directory. ...

DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort)

DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort) Description How do distributed systems work? If you’ve got a database spread over three servers, how do they elect a leader? How does that change when we spread those machines out across data centers, situated around the globe? Do we even need to understand how it works, or can we relegate those problems to an off the shelf tool like Zookeeper? Joining me this week is Distributed Systems Doctor—Benjamin Bengfort—for a deep dive into consensus algorithms. We start off by discussing how much of “the clustering problem” is your problem, and how much can be handled by a library. We go through many of the constraints and tradeoffs that you need to understand either way. And we eventually reach Benjamin’s surprising message - maybe the time is ripe to roll your own. Should we be writing our own bespoke Raft implementations? And if so, how hard would that be? What guidance can he offer us? Somewhere in the recording of this episode, I decided I want to sit down and try to implement a leader election protocol. Maybe you will too. And if not, you’ll at least have a better appreciation for what it takes. Distributed systems used to be rocket science, but they’re becoming deployment as usual. This episode should help us all to keep up! ...

Faster Protocol Buffer Serialization

Performance is key when building streaming gRPC services. When you’re trying to maximize throughput (e.g. messages per second) benchmarking is essential to understanding where the bottlenecks in your application are. However, as a start, you can pretty much guarantee that one bottleneck is going to be the serialization (marshaling) and deserialization (unmarshaling) of protocol buffer messages. We have a use case where the server does not need all of the information in the message in order to process the message. E.g. we have header information such as IDs and client information that the server does need to update as part of processing. The other part of the message is data that needs to be saved to disk and does not have to be unmarshaled until it’s read. However, our protocol buffer schema right now is “flat” — meaning that all fields whether they are required for processing or not are defined by a single protocol buffer message. ...

Atomic vs Mutex

When implementing Go code, I find myself chasing increased concurrency performance by trying to reduce the number of locks in my code. Often I wonder if using the sync/atomic package is a better choice because I know (as proved by this blog post) that atomics have far more performance than mutexes. The issue is that reading on the internet, including the package documentation itself strongly recommends relying on channels, then mutexes, and finally atomics only if you know what you’re doing. ...

Nonlinear Workflow for Planning Software Projects

Good software development achieves complexity by describing the interactions between simpler components. Although we tend to think of software processes as step-by-step “wizards”, design and decoupling of components often means that the interactions are non-linear. So why should our software project planning be defined in a linear progression of steps with time estimates? Can we plan projects using a non-linear workflow that mirrors how we think about component design? ...