Home | Bernardo de Lemos
Image embedding is a powerful tool in machine learning that allows us to convert images into numerical vectors, making it easier to analyze, compare, and use them in downstream tasks like classification, clustering, or recommendation systems. This post goes over the Image Embedding Inference (IEI)
project, which provides a REST API for generating embeddings using pretrained models from Hugging Face. This project is part of a larger one I’m building which focuses on image retrieval and multi-modal knowledge building.
Intent routing is essential for many Generative AI (GenAI) applications, enabling systems to accurately interpret user queries and route them to the appropriate actions. With the rise of Large Language Models (LLMs), their flexibility and contextual understanding have made them a go-to choice for intent classification tasks. However, embedding-based rerankers offer a compelling alternative, delivering high accuracy with significantly lower computational costs and latency.
This analysis compares the performance of a reranker model (BAAI/bge-reranker-large
) versus a state-of-the-art LLM (claude-3-haiku-20240307
) for intent routing, focusing on their trade-offs in efficiency, accuracy, and suitability for real-world use cases.
Building a service to serve machine learning models, such as a BERT-based embedding generator, requires careful consideration of factors like performance, ease of development, and maintainability. This article explores two implementations of such a service—one in Rust and the other in Python—and highlights their design choices, strengths, and trade-offs.
To explore the efficiency of serving machine learning models, I conducted a benchmark comparison between Python and Rust implementations. My initial hypothesis was that Rust, with its reputation for high performance and low-level control, would significantly outperform Python.
Additionally, I aimed to investigate how different concurrency mechanisms—such as RwLock
and Mutex
— and the choice between sharing or not sharing the model state among workers would influence performance.
In this blog we tinker around the actor model in rust. It’s a very interesting exercise given Rust’s unique features.
Rust’s strengths in memory safety and concurrency make it a great choice for building robust, concurrent systems. In this post, we’ll explore a program that implements an actor model in Rust using the asynchronous runtime Tokio. This example illustrates message-passing, state management, and graceful shutdown in a concurrent environment.
In this blog post, we’ll explore how to implement a Singleton pattern in Rust. The Singleton design pattern ensures a class has only one instance while providing a global access point to that instance. Rust’s ownership model and thread safety make this implementation an interesting challenge.
This blog post provides insights into Golang’s concurrent programming features. It delves into the implementation of actors, independent entities communicating through messages. We do a walkthrough over a simple actor system implementation in go, which showcases actor creation, message sending, and concurrent message processing, highlighting the principles of the actor model.
While candle
provides a low-level API for building neural network models, some users (including myself) may prefer a more intuitive and user-friendly way to define, compile, and train their models. That’s why I propose the addition of a high-level Keras-like API to candle
. This API would allow users to define models in a sequential manner by adding layers one after the other. It would also provide methods for model compilation, training, and evaluation.
A first look at Mojo. In this post I scratch the surface of Mojo’s syntax and compare how its borrow semantics compare to rust’s.
In this blog post, we’ll explore a Rust program that exposes an existing command-line tool via a REST API. This program leverages the Actix-web framework to create a simple HTTP server, handles HTTP requests, and interacts with an external CLI tool.
Rust provides several traits that are fundamental for comparing and ordering values. These traits include Ord
, Eq
, PartialOrd
, and PartialEq
. In this blog post, we’ll explore what these traits are, when they can and can’t be derived, and how they can be used to sort a vector.
This is the first blog post of a series that will focus on time series analysis and forecasting of stock prices. This post shows how to get data from yahoo finance and parse json responses in rust.
In this blog post I show the main differences between using dynamic dispatching, dyn
, and generics,impl
/<T>
, using a simple program that creates different database connectors.
NB: this blog post assumes some familiarity with rust