module great_start
class GReaTStart
Abstract super class GReaT Start
GReaT Start creates tokens to start the generation process.
Attributes:
tokenizer(AutoTokenizer): Tokenizer, automatically downloaded from llm-checkpoint
method GReaTStart.__init__
__init__(tokenizer)
Initializes the super class.
Args:
tokenizer: Tokenizer from the HuggingFace library
method GReaTStart.get_start_tokens
get_start_tokens(n_samples: int) → List[List[int]]
Get Start Tokens
Creates starting points for the generation process
Args:
n_samples: Number of start prompts to create
Returns: List of n_sample lists with tokens
class CategoricalStart
Categorical Starting Feature
A categorical column with its categories is used as starting point.
Attributes:
start_col(str): Name of the categorical columnpopulation(list[str]): Possible values the column can takeweights(list[float]): Probabilities for the individual categories
method CategoricalStart.__init__
__init__(tokenizer, start_col: str, start_col_dist: dict)
Initializes the Categorical Start
Args:
tokenizer: Tokenizer from the HuggingFace librarystart_col: Name of the categorical columnstart_col_dist: Distribution of the categorical column (dict of form {"Cat A": 0.8, "Cat B": 0.2})
method CategoricalStart.get_start_tokens
get_start_tokens(n_samples)
class ContinuousStart
Continuous Starting Feature
A continuous column with some noise is used as starting point.
Attributes:
start_col(str): Name of the continuous columnstart_col_dist(list[float]): The continuous column from the train data setnoise(float): Size of noise that is added to each valuedecimal_places(int): Number of decimal places the continuous values have
method ContinuousStart.__init__
__init__(
tokenizer,
start_col: str,
start_col_dist: List[float],
noise: float = 0.01,
decimal_places: int = 5
)
Initializes the Continuous Start
Args:
tokenizer: Tokenizer from the HuggingFace librarystart_col: Name of the continuous columnstart_col_dist: The continuous column from the train data setnoise: Size of noise that is added to each valuedecimal_places: Number of decimal places the continuous values have
method ContinuousStart.get_start_tokens
get_start_tokens(n_samples)
class RandomStart
Random Starting Features
Random column names are used as start point. Can be used if no distribution of any column is known.
Attributes:
all_columns(List[str]): Names of all columns
method RandomStart.__init__
__init__(tokenizer, all_columns: List[str])
Initializes the Random Start
Args:
tokenizer: Tokenizer from the HuggingFace libraryall_columns: Names of all columns
method RandomStart.get_start_tokens
get_start_tokens(n_samples)
This file was automatically generated via lazydocs.