module metrics
Built-in evaluation suite for measuring the quality, utility, and privacy of synthetic tabular data generated by GReaT. All metrics inherit from BaseMetric and share a common interface:
result = SomeMetric().compute(real_data, synthetic_data)
Column types (numerical / categorical) are auto-detected but can be passed explicitly via num_cols and cat_cols.
class BaseMetric
Abstract base class for all GReaT evaluation metrics.
Subclasses must implement name(), direction(), and compute().
Methods:
name()→ str: Human-readable metric namedirection()→ str:"maximize"if higher is better,"minimize"if lower is bettercompute(real_data, synthetic_data, **kwargs)→ dict: Compute the metric
Statistical Metrics
class ColumnShapes
Per-column distribution similarity.
Uses the Kolmogorov-Smirnov test for numerical columns and Total Variation Distance for categorical columns. Returns a score in [0, 1] per column — 1.0 means identical distributions.
from be_great.metrics import ColumnShapes
result = ColumnShapes().compute(real_data, synthetic_data)
# result["column_shapes_mean"] -> average similarity across all columns
# result["column_shapes_std"] -> standard deviation
# result["column_shapes_detail"] -> per-column scores dict
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
class ColumnPairTrends
Pairwise correlation preservation.
Compares Pearson correlations for numerical pairs and Cramer's V for categorical pairs between real and synthetic data. Returns a score in [0, 1] — 1.0 means identical pairwise relationships.
from be_great.metrics import ColumnPairTrends
result = ColumnPairTrends().compute(real_data, synthetic_data)
# result["column_pair_trends_mean"] -> overall similarity
# result["column_pair_trends_numerical"] -> numerical pair similarity
# result["column_pair_trends_categorical"] -> categorical pair similarity
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
class BasicStatistics
Summary statistics comparison.
Compares mean, standard deviation, and median for numerical columns, and category frequency distributions for categorical columns.
from be_great.metrics import BasicStatistics
result = BasicStatistics().compute(real_data, synthetic_data)
# result["basic_statistics"]["col_name"]["real_mean"]
# result["basic_statistics"]["col_name"]["synth_mean"]
# result["basic_statistics"]["col_name"]["mean_diff_pct"]
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
Fidelity & Utility Metrics
class DiscriminatorMetric
Trains a Random Forest classifier to distinguish real from synthetic data.
A score close to 0.5 means the synthetic data is indistinguishable from real data. A score close to 1.0 means the classifier easily tells them apart. Uses cross-validated hyperparameter tuning and reports mean/std over multiple random seeds.
from be_great.metrics import DiscriminatorMetric
result = DiscriminatorMetric(n_runs=10).compute(real_data, synthetic_data)
# result["discriminator_mean"] -> mean accuracy (0.5 = best)
# result["discriminator_std"] -> standard deviation
Args (__init__):
metric(callable): Scoring function. Default:accuracy_scoren_runs(int): Number of evaluation runs. Default: 10encoder(type): Encoder for categorical features. Default:OrdinalEncoderencoder_params(dict, optional): Encoder parameters
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetcat_cols(list, optional): Categorical column namestest_ratio(float): Fraction used for testing. Default: 0.2cv(int): Cross-validation folds. Default: 5
class MLEfficiency
Machine learning efficiency — train on synthetic, test on real.
Measures the downstream utility of synthetic data. A model is trained entirely on the synthetic dataset and evaluated on a held-out real test set. The closer the score is to the performance achieved when training on real data, the higher the utility.
from be_great.metrics import MLEfficiency
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
result = MLEfficiency(
model=RandomForestClassifier,
metric=accuracy_score,
model_params={"n_estimators": 100},
).compute(real_data, synthetic_data, label_col="target")
# result["mle_mean"] -> mean score across seeds
# result["mle_std"] -> standard deviation
# result["mle_scores"] -> per-seed scores list
Args (__init__):
model(type): Sklearn-compatible model classmetric(callable): Scoring functionmodel_params(dict, optional): Model constructor parametersencoder(type): Encoder for categorical features. Default:OrdinalEncoderencoder_params(dict, optional): Encoder parametersnormalize(bool): Standard-scale continuous features. Default: Falseuse_proba(bool): Usepredict_probainstead ofpredict. Default: Falsemetric_params(dict, optional): Extra kwargs for the scoring function
Args (compute):
real_data(DataFrame): Original training datasetsynthetic_data(DataFrame): Generated dataset (used for training)label_col(str): Target column namecat_cols(list, optional): Categorical column namesnum_cols(list, optional): Numerical column namesreal_test_data(DataFrame, optional): Separate real test settest_ratio(float): Split ratio if no separate test set. Default: 0.2random_seeds(list[int], optional): Seeds for multiple runs. Default: [512, 13, 23, 28, 21]
Privacy Metrics
class DistanceToClosestRecord
Distance to Closest Record (DCR).
For each synthetic record, computes the distance to the closest record in the real dataset. Uses L1 (Manhattan) distance for numerical features and Hamming distance for categorical features. Records with distance 0 are exact copies.
from be_great.metrics import DistanceToClosestRecord
result = DistanceToClosestRecord().compute(real_data, synthetic_data)
# result["dcr_mean"] -> mean minimum distance
# result["dcr_std"] -> standard deviation
# result["n_copies"] -> number of exact copies
# result["ratio_copies"] -> fraction of exact copies
Args (__init__):
n_samples(int): Number of synthetic samples to evaluate. 0 = use all. Default: 0use_euclidean(bool): Use L2 norm instead of L1 for numerical features. Default: False
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
class kAnonymization
k-Anonymization metric.
Evaluates the k-anonymity of a dataset using KMeans clustering. Each record should be similar to at least k-1 other records on the quasi-identifying variables. Reports the ratio k_synthetic / k_real — a ratio >= 1 means the synthetic data has at least as much k-anonymity as the real data.
from be_great.metrics import kAnonymization
result = kAnonymization().compute(real_data, synthetic_data)
# result["k_real"] -> k value for original data
# result["k_synthetic"] -> k value for synthetic data
# result["k_ratio"] -> syn / real ratio
Args (__init__):
n_clusters_list(list[int], optional): Cluster counts to evaluate. Default: [2, 5, 10, 15]
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetsensitive_cols(list, optional): Columns to exclude from quasi-identifiers
class lDiversity
l-Diversity metric.
Measures the diversity of sensitive attribute values within each equivalence class. Uses KMeans to form groups and checks how many distinct sensitive values exist in the smallest group. Higher l-diversity means better protection against attribute inference.
from be_great.metrics import lDiversity
result = lDiversity(sensitive_col="diagnosis").compute(real_data, synthetic_data)
# result["l_real"] -> l value for original data
# result["l_synthetic"] -> l value for synthetic data
# result["l_ratio"] -> syn / real ratio
Args (__init__):
sensitive_col(str): Name of the sensitive attribute columnn_clusters_list(list[int], optional): Cluster counts to evaluate. Default: [2, 5, 10, 15]
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated dataset
class IdentifiabilityScore
Identifiability score.
Measures the risk that a synthetic record can be linked back to a specific real record. Uses k-nearest neighbors and checks whether the closest real neighbor is significantly closer than the second closest (distance ratio below threshold).
from be_great.metrics import IdentifiabilityScore
result = IdentifiabilityScore().compute(real_data, synthetic_data)
# result["identifiability_score"] -> fraction of identifiable records
# result["mean_distance_ratio"] -> average d1/d2 ratio
Args (__init__):
n_neighbors(int): Number of nearest neighbors. Default: 5threshold_ratio(float): Identifiability threshold. Default: 0.5
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
class DeltaPresence
Delta-presence metric.
Measures how much the presence of an individual in the dataset can be inferred from the synthetic data. Computes the fraction of real records that have a near-exact match in the synthetic dataset within a distance threshold.
from be_great.metrics import DeltaPresence
result = DeltaPresence(threshold=0.5).compute(real_data, synthetic_data)
# result["delta_presence"] -> fraction of real records with a match
# result["mean_nearest_distance"] -> average nearest distance
Args (__init__):
threshold(float): Distance threshold. 0.0 = exact match only. Default: 0.0
Args (compute):
real_data(DataFrame): Original datasetsynthetic_data(DataFrame): Generated datasetnum_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column names
class MembershipInference
Membership inference risk.
Simulates a membership inference attack: given a record, can an attacker determine whether it was in the training set? Compares distances from known-member records (train) and known-non-member records (holdout) to their nearest synthetic neighbors.
A score close to 0.5 means the attacker cannot distinguish members from non-members (good privacy). A score close to 1.0 means high membership inference risk.
from be_great.metrics import MembershipInference
result = MembershipInference().compute(real_data, synthetic_data)
# result["membership_inference_score"] -> attacker accuracy
# result["mean_member_distance"] -> avg distance for members
# result["mean_non_member_distance"] -> avg distance for non-members
Args (__init__):
n_neighbors(int): Number of nearest neighbors. Default: 1
Args (compute):
real_data(DataFrame): Original training dataset (members)synthetic_data(DataFrame): Generated datasetholdout_data(DataFrame, optional): Non-member data. If None, real_data is split.num_cols(list, optional): Numerical column namescat_cols(list, optional): Categorical column namesholdout_ratio(float): Split ratio if no holdout provided. Default: 0.5
This file was manually authored following the lazydocs convention.