module metrics

Built-in evaluation suite for measuring the quality, utility, and privacy of synthetic tabular data generated by GReaT. All metrics inherit from BaseMetric and share a common interface:

result = SomeMetric().compute(real_data, synthetic_data)

Column types (numerical / categorical) are auto-detected but can be passed explicitly via num_cols and cat_cols.


class BaseMetric

Abstract base class for all GReaT evaluation metrics.

Subclasses must implement name(), direction(), and compute().

Methods:

  • name() → str: Human-readable metric name
  • direction() → str: "maximize" if higher is better, "minimize" if lower is better
  • compute(real_data, synthetic_data, **kwargs) → dict: Compute the metric

Statistical Metrics

class ColumnShapes

Per-column distribution similarity.

Uses the Kolmogorov-Smirnov test for numerical columns and Total Variation Distance for categorical columns. Returns a score in [0, 1] per column — 1.0 means identical distributions.

from be_great.metrics import ColumnShapes

result = ColumnShapes().compute(real_data, synthetic_data)
# result["column_shapes_mean"]   -> average similarity across all columns
# result["column_shapes_std"]    -> standard deviation
# result["column_shapes_detail"] -> per-column scores dict

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

class ColumnPairTrends

Pairwise correlation preservation.

Compares Pearson correlations for numerical pairs and Cramer's V for categorical pairs between real and synthetic data. Returns a score in [0, 1] — 1.0 means identical pairwise relationships.

from be_great.metrics import ColumnPairTrends

result = ColumnPairTrends().compute(real_data, synthetic_data)
# result["column_pair_trends_mean"]        -> overall similarity
# result["column_pair_trends_numerical"]   -> numerical pair similarity
# result["column_pair_trends_categorical"] -> categorical pair similarity

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

class BasicStatistics

Summary statistics comparison.

Compares mean, standard deviation, and median for numerical columns, and category frequency distributions for categorical columns.

from be_great.metrics import BasicStatistics

result = BasicStatistics().compute(real_data, synthetic_data)
# result["basic_statistics"]["col_name"]["real_mean"]
# result["basic_statistics"]["col_name"]["synth_mean"]
# result["basic_statistics"]["col_name"]["mean_diff_pct"]

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

Fidelity & Utility Metrics

class DiscriminatorMetric

Trains a Random Forest classifier to distinguish real from synthetic data.

A score close to 0.5 means the synthetic data is indistinguishable from real data. A score close to 1.0 means the classifier easily tells them apart. Uses cross-validated hyperparameter tuning and reports mean/std over multiple random seeds.

from be_great.metrics import DiscriminatorMetric

result = DiscriminatorMetric(n_runs=10).compute(real_data, synthetic_data)
# result["discriminator_mean"] -> mean accuracy (0.5 = best)
# result["discriminator_std"]  -> standard deviation

Args (__init__):

  • metric (callable): Scoring function. Default: accuracy_score
  • n_runs (int): Number of evaluation runs. Default: 10
  • encoder (type): Encoder for categorical features. Default: OrdinalEncoder
  • encoder_params (dict, optional): Encoder parameters

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • cat_cols (list, optional): Categorical column names
  • test_ratio (float): Fraction used for testing. Default: 0.2
  • cv (int): Cross-validation folds. Default: 5

class MLEfficiency

Machine learning efficiency — train on synthetic, test on real.

Measures the downstream utility of synthetic data. A model is trained entirely on the synthetic dataset and evaluated on a held-out real test set. The closer the score is to the performance achieved when training on real data, the higher the utility.

from be_great.metrics import MLEfficiency
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

result = MLEfficiency(
    model=RandomForestClassifier,
    metric=accuracy_score,
    model_params={"n_estimators": 100},
).compute(real_data, synthetic_data, label_col="target")
# result["mle_mean"]   -> mean score across seeds
# result["mle_std"]    -> standard deviation
# result["mle_scores"] -> per-seed scores list

Args (__init__):

  • model (type): Sklearn-compatible model class
  • metric (callable): Scoring function
  • model_params (dict, optional): Model constructor parameters
  • encoder (type): Encoder for categorical features. Default: OrdinalEncoder
  • encoder_params (dict, optional): Encoder parameters
  • normalize (bool): Standard-scale continuous features. Default: False
  • use_proba (bool): Use predict_proba instead of predict. Default: False
  • metric_params (dict, optional): Extra kwargs for the scoring function

Args (compute):

  • real_data (DataFrame): Original training dataset
  • synthetic_data (DataFrame): Generated dataset (used for training)
  • label_col (str): Target column name
  • cat_cols (list, optional): Categorical column names
  • num_cols (list, optional): Numerical column names
  • real_test_data (DataFrame, optional): Separate real test set
  • test_ratio (float): Split ratio if no separate test set. Default: 0.2
  • random_seeds (list[int], optional): Seeds for multiple runs. Default: [512, 13, 23, 28, 21]

Privacy Metrics

class DistanceToClosestRecord

Distance to Closest Record (DCR).

For each synthetic record, computes the distance to the closest record in the real dataset. Uses L1 (Manhattan) distance for numerical features and Hamming distance for categorical features. Records with distance 0 are exact copies.

from be_great.metrics import DistanceToClosestRecord

result = DistanceToClosestRecord().compute(real_data, synthetic_data)
# result["dcr_mean"]      -> mean minimum distance
# result["dcr_std"]       -> standard deviation
# result["n_copies"]      -> number of exact copies
# result["ratio_copies"]  -> fraction of exact copies

Args (__init__):

  • n_samples (int): Number of synthetic samples to evaluate. 0 = use all. Default: 0
  • use_euclidean (bool): Use L2 norm instead of L1 for numerical features. Default: False

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

class kAnonymization

k-Anonymization metric.

Evaluates the k-anonymity of a dataset using KMeans clustering. Each record should be similar to at least k-1 other records on the quasi-identifying variables. Reports the ratio k_synthetic / k_real — a ratio >= 1 means the synthetic data has at least as much k-anonymity as the real data.

from be_great.metrics import kAnonymization

result = kAnonymization().compute(real_data, synthetic_data)
# result["k_real"]      -> k value for original data
# result["k_synthetic"] -> k value for synthetic data
# result["k_ratio"]     -> syn / real ratio

Args (__init__):

  • n_clusters_list (list[int], optional): Cluster counts to evaluate. Default: [2, 5, 10, 15]

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • sensitive_cols (list, optional): Columns to exclude from quasi-identifiers

class lDiversity

l-Diversity metric.

Measures the diversity of sensitive attribute values within each equivalence class. Uses KMeans to form groups and checks how many distinct sensitive values exist in the smallest group. Higher l-diversity means better protection against attribute inference.

from be_great.metrics import lDiversity

result = lDiversity(sensitive_col="diagnosis").compute(real_data, synthetic_data)
# result["l_real"]      -> l value for original data
# result["l_synthetic"] -> l value for synthetic data
# result["l_ratio"]     -> syn / real ratio

Args (__init__):

  • sensitive_col (str): Name of the sensitive attribute column
  • n_clusters_list (list[int], optional): Cluster counts to evaluate. Default: [2, 5, 10, 15]

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset

class IdentifiabilityScore

Identifiability score.

Measures the risk that a synthetic record can be linked back to a specific real record. Uses k-nearest neighbors and checks whether the closest real neighbor is significantly closer than the second closest (distance ratio below threshold).

from be_great.metrics import IdentifiabilityScore

result = IdentifiabilityScore().compute(real_data, synthetic_data)
# result["identifiability_score"] -> fraction of identifiable records
# result["mean_distance_ratio"]   -> average d1/d2 ratio

Args (__init__):

  • n_neighbors (int): Number of nearest neighbors. Default: 5
  • threshold_ratio (float): Identifiability threshold. Default: 0.5

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

class DeltaPresence

Delta-presence metric.

Measures how much the presence of an individual in the dataset can be inferred from the synthetic data. Computes the fraction of real records that have a near-exact match in the synthetic dataset within a distance threshold.

from be_great.metrics import DeltaPresence

result = DeltaPresence(threshold=0.5).compute(real_data, synthetic_data)
# result["delta_presence"]        -> fraction of real records with a match
# result["mean_nearest_distance"] -> average nearest distance

Args (__init__):

  • threshold (float): Distance threshold. 0.0 = exact match only. Default: 0.0

Args (compute):

  • real_data (DataFrame): Original dataset
  • synthetic_data (DataFrame): Generated dataset
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names

class MembershipInference

Membership inference risk.

Simulates a membership inference attack: given a record, can an attacker determine whether it was in the training set? Compares distances from known-member records (train) and known-non-member records (holdout) to their nearest synthetic neighbors.

A score close to 0.5 means the attacker cannot distinguish members from non-members (good privacy). A score close to 1.0 means high membership inference risk.

from be_great.metrics import MembershipInference

result = MembershipInference().compute(real_data, synthetic_data)
# result["membership_inference_score"] -> attacker accuracy
# result["mean_member_distance"]       -> avg distance for members
# result["mean_non_member_distance"]   -> avg distance for non-members

Args (__init__):

  • n_neighbors (int): Number of nearest neighbors. Default: 1

Args (compute):

  • real_data (DataFrame): Original training dataset (members)
  • synthetic_data (DataFrame): Generated dataset
  • holdout_data (DataFrame, optional): Non-member data. If None, real_data is split.
  • num_cols (list, optional): Numerical column names
  • cat_cols (list, optional): Categorical column names
  • holdout_ratio (float): Split ratio if no holdout provided. Default: 0.5

This file was manually authored following the lazydocs convention.