Skip to content

Build and Enhance Custom Datasets for your Use Case

augini logo

AI-Powered Tabular Data Augmentation, Generation, Labeling, and Anonymization

augini is a versatile Python framework that leverages AI for comprehensive data manipulation. It uses large language models to augment, generate, and anonymize tabular data, creating realistic and privacy-preserving datasets.

Data Augmentation:

  • Enhance existing datasets with AI-generated features
  • Add contextual information based on current data
  • Infuse domain knowledge from LLMs

Synthetic Data Generation and Extantion:

  • Create entirely new, realistic datasets
  • Maintain statistical properties of original data
  • Generate diverse, coherent synthetic profiles

Data Anonymization:

  • Implement k-anonymity and l-diversity
  • Generate synthetic identifiers
  • Balance privacy and data utility

Use Cases

  • Augment ML training datasets
  • Generate privacy-safe data for sharing
  • Automatic labeling using state-of-the-art AI models
  • Create synthetic data for software testing
  • Develop realistic scenarios for business planning
  • Produce diverse datasets for research and education

Installation

You can install Augini using pip:

pip install augini

Quick Start

Here's a simple example of how to use Augini:

from augini import Augini
import pandas as pd

api_key = "OpenAI or OpenRouter token"

# OpenAI
augini = Augini(api_key=api_key,  model='gpt-4-turbo', use_openrouter=False)

# OpenRouter 
augini = Augini(api_key=api_key, use_openrouter=True, model='meta-llama/llama-3-8b-instruct')

# Create a sample DataFrame
data = {
    'Place of Birth': ['New York', 'London', 'Tokyo'],
    'Age': [30, 25, 40],
    'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)

# Add synthetic features
result_df = augini.augment_columns(df, ['NAME', 'OCCUPATION', 'FAVORITE_DRINK'])

print(result_df)

Contact us