Value Matching Methods

This page provides an overview of all value matching methods available in the BDI-Kit library. Some methods reuse the implementation of other libraries such as PolyFuzz (e.g, embedding and tfidf) while others are implemented originally for bdikit (e.g., gpt). To see how to use these methods, please refer to the documentation of match_values() in the api module.

bdikit methods
Method	Class	Description
`llm`	`LLM`	Leverages LLMs to identify and select the most accurate value matches. Supports multiple models, with gpt-4o-mini used as the default.
`llm_numeric`	`LLMNumeric`	Employs LLMs to perform numeric value transformations, such as converting ages from years to months. Supports multiple models, with gpt-4o-mini used as the default.

Methods from other libraries
Method	Class	Description
`tfidf`	`TFIDF`	Employs a character-based n-gram TF-IDF approach to approximate edit distance by capturing the frequency and contextual importance of n-gram patterns within strings. This method leverages the Term Frequency-Inverse Document Frequency (TF-IDF) weighting to quantify the similarity between strings based on their shared n-gram features.
`edit_distance`	`EditDistance`	Uses the edit distance between lists of strings using a customizable scorer that supports various distance and similarity metrics.
`embedding`	`Embeddings`	A value-matching algorithm that leverages the cosine similarity of value embeddings for precise comparisons. By default, it utilizes the bert-base-multilingual-cased model to generate contextualized embeddings, enabling effective multilingual matching.