Value Matching Methods
This page provides an overview of all value matching methods available in the bdikit library.
Some methods reuse the implementation of other libraries such as PolyFuzz (e.g, embedding and tfidf) while others are implemented originally for bdikit (e.g., gpt).
To see how to use these methods, please refer to the documentation of match_values()
in the api
module.
Method |
Class |
Description |
---|---|---|
|
|
Leverages a large language model (GPT-4) to identify and select the most accurate value matches.
|
Method |
Class |
Description |
---|---|---|
|
|
Employs a character-based n-gram TF-IDF approach to approximate edit distance by capturing the frequency and contextual importance of n-gram patterns within strings. This method leverages the Term Frequency-Inverse Document Frequency (TF-IDF) weighting to quantify the similarity between strings based on their shared n-gram features.
|
|
|
Uses the edit distance between lists of strings using a customizable scorer that supports various distance and similarity metrics.
|
|
|
A value-matching algorithm that leverages the cosine similarity of value embeddings for precise comparisons. By default, it utilizes the bert-base-multilingual-cased model to generate contextualized embeddings, enabling effective multilingual matching..
|
|
|
This method uses the cosine similarity of FastText embeddings to accurately compare and align values, capturing both semantic and subword-level similarities..
|