folktexts packageο
Subpackagesο
- folktexts.acs package
- folktexts.classifier package
- Submodules
- folktexts.classifier.base module
LLMClassifierLLMClassifier.DEFAULT_INFERENCE_KWARGSLLMClassifier.compute_risk_estimates_for_dataframe()LLMClassifier.compute_risk_estimates_for_dataset()LLMClassifier.correct_order_biasLLMClassifier.encode_rowLLMClassifier.fit()LLMClassifier.inference_kwargsLLMClassifier.model_nameLLMClassifier.predict()LLMClassifier.predict_proba()LLMClassifier.prompt_configLLMClassifier.seedLLMClassifier.set_fit_request()LLMClassifier.set_inference_kwargs()LLMClassifier.set_predict_proba_request()LLMClassifier.set_predict_request()LLMClassifier.set_score_request()LLMClassifier.taskLLMClassifier.threshold
- folktexts.classifier.transformers_classifier module
- folktexts.classifier.vllm_classifier module
- folktexts.classifier.web_api_classifier module
- Module contents
- folktexts.cli package
Submodulesο
folktexts.benchmark moduleο
A benchmark class for measuring and evaluating LLM calibration.
- class folktexts.benchmark.Benchmark(llm_clf, dataset, config=BenchmarkConfig(numeric_risk_prompting=False, cot_prompting=False, enable_thinking=False, few_shot_config=None, use_chat_template=False, chat_prompt=<object object>, system_prompt=<object object>, batch_size=None, context_size=None, correct_order_bias=True, feature_subset=None, population_filter=None, seed=42, prompt_variation=None))[source]ο
Bases:
objectMeasures and evaluates risk scores produced by an LLM.
A benchmark object to measure and evaluate risk scores produced by an LLM.
- Parameters:
llm_clf (LLMClassifier) β A language model classifier object (can be local or web-hosted).
dataset (Dataset) β The dataset object to use for the benchmark.Γ·
config (BenchmarkConfig, optional) β The configuration object used to create the benchmark parameters. NOTE: This is used to uniquely identify the benchmark object for reproducibility; it will not be used to change the benchmark behavior. To configure the benchmark, pass a configuration object to the Benchmark.make_benchmark method.
-
ACS_DATASET_CONFIGS:
dict[str,Any] = {'horizon': '1-Year', 'seed': 42, 'subsampling': None, 'survey': 'person', 'survey_year': '2018', 'test_size': 0.1, 'val_size': 0.1}ο
- property configs_dict: dictο
- classmethod make_acs_benchmark(task_name, *, model, tokenizer=None, data_dir=None, max_api_rpm=None, config=BenchmarkConfig(numeric_risk_prompting=False, cot_prompting=False, enable_thinking=False, few_shot_config=None, use_chat_template=False, chat_prompt=<object object>, system_prompt=<object object>, batch_size=None, context_size=None, correct_order_bias=True, feature_subset=None, population_filter=None, seed=42, prompt_variation=None), backend=None, model_name_or_path=None, **kwargs)[source]ο
Create a standardized calibration benchmark on ACS data.
- Parameters:
task_name (str) β The name of the ACS task to use.
model (AutoModelForCausalLM | str) β The transformers language model to use, or the model ID for a webAPI hosted model (e.g., βopenai/gpt-4o-miniβ).
tokenizer (AutoTokenizer, optional) β The tokenizer used to train the model (if using a transformers model). Not required for webAPI models.
data_dir (str | Path, optional) β Path to the directory to load data from and save data in.
max_api_rpm (int, optional) β The maximum number of API requests per minute for webAPI models.
config (BenchmarkConfig, optional) β Extra benchmark configurations, by default will use BenchmarkConfig.default_config().
**kwargs β Additional arguments passed to ACSDataset and BenchmarkConfig. By default will use a set of standardized configurations for reproducibility.
- Returns:
bench β The ACS calibration benchmark object.
- Return type:
- classmethod make_benchmark(*, task, dataset, model, tokenizer=None, max_api_rpm=None, config=BenchmarkConfig(numeric_risk_prompting=False, cot_prompting=False, enable_thinking=False, few_shot_config=None, use_chat_template=False, chat_prompt=<object object>, system_prompt=<object object>, batch_size=None, context_size=None, correct_order_bias=True, feature_subset=None, population_filter=None, seed=42, prompt_variation=None), backend=None, model_name_or_path=None, **kwargs)[source]ο
Create a calibration benchmark from a given configuration.
- Parameters:
task (TaskMetadata | str) β The task metadata object or name of the task to use.
dataset (Dataset) β The dataset to use for the benchmark.
model (AutoModelForCausalLM | str) β The transformers language model to use, or the model ID for a webAPI hosted model (e.g., βopenai/gpt-4o-miniβ).
tokenizer (AutoTokenizer, optional) β The tokenizer used to train the model (if using a transformers model). Not required for webAPI models.
max_api_rpm (int, optional) β The maximum number of API requests per minute for webAPI models.
config (BenchmarkConfig, optional) β Extra benchmark configurations, by default will use BenchmarkConfig.default_config().
**kwargs β Additional arguments for easier configuration of the benchmark. Will simply use these values to update the config object.
- Returns:
bench β The calibration benchmark object.
- Return type:
- property model_nameο
- plot_results(*, show_plots=True)[source]ο
Render evaluation plots and save to disk.
- Parameters:
show_plots (bool, optional) β Whether to show plots, by default True.
- Returns:
plots_paths β The paths to the saved plots.
- Return type:
dict[str, str]
- property resultsο
- property results_dir: Pathο
Get the results directory for this benchmark.
- property results_root_dir: Pathο
- run(results_root_dir, fit_threshold=0)[source]ο
Run the calibration benchmark experiment.
- Parameters:
results_root_dir (str | Path) β Path to root directory under which results will be saved.
fit_threshold (int | bool, optional) β Whether to fit the binarization threshold on a given number of training samples, by default 0 (will not fit the threshold).
- Returns:
Dictionary of evaluation results.
- Return type:
dict
- save_results(results_root_dir=None)[source]ο
Save the benchmark results to disk.
- Parameters:
results_root_dir (str | Path, optional) β Path to root directory under which results will be saved. By default will use self.results_root_dir.
- property taskο
- class folktexts.benchmark.BenchmarkConfig(numeric_risk_prompting=False, cot_prompting=False, enable_thinking=False, few_shot_config=None, use_chat_template=False, chat_prompt=<object object>, system_prompt=<object object>, batch_size=None, context_size=None, correct_order_bias=True, feature_subset=None, population_filter=None, seed=42, prompt_variation=None)[source]ο
Bases:
objectA dataclass to hold the configuration for risk-score benchmark.
- numeric_risk_promptingο
Whether to prompt for numeric risk-estimates instead of multiple-choice Q&A, by default False.
- Type:
bool, optional
- cot_promptingο
Whether to use chain-of-thought prompting: the model generates free-form reasoning text and ends with a Probability: X% line that is recovered via regex. Works on any model regardless of chat template. By default False.
- Type:
bool, optional
- enable_thinkingο
Whether to enable thinking mode for tokenizers that support it (e.g., Qwen3). Only applies when cot_prompting=True. When enabled, calls apply_chat_template(β¦, enable_thinking=True) and the resulting <think>β¦</think> block is stripped before regex extraction. Default is False.
- Type:
bool, optional
- few_shot_configο
Few-shot prompting configuration (number of shots, composition, example order, reuse).
Nonemeans zero-shot prompting.- Type:
FewShotConfig | None, optional
- use_chat_templateο
Whether to format prompts using the tokenizerβs chat template, by default False. Only supported for local transformers models.
- Type:
bool, optional
- chat_promptο
The assistant prefill text to use with chat templates. Defaults to
PROMPT_DEFAULT, which selects the appropriate default from the QA subclass (ANTHROPIC_CHAT_PROMPTfor MC,NUMERIC_CHAT_PROMPTfor numeric,Nonefor CoT). PassNoneexplicitly to disable the assistant prefill entirely.- Type:
str | None, optional
- system_promptο
System prompt text to use with chat templates. Defaults to
PROMPT_DEFAULT, which selects the appropriate default from the QA subclass (SYSTEM_PROMPTfor MC,NUMERIC_SYSTEM_PROMPTfor numeric,Nonefor CoT). PassNoneexplicitly to disable the system role (e.g. for Gemma-style tokenizers that reject it).- Type:
str | None, optional
- batch_sizeο
The batch size to use for inference.
- Type:
int | None, optional
- context_sizeο
The maximum context size when prompting the LLM.
- Type:
int | None, optional
- correct_order_biasο
Whether to correct the ordering bias in multiple-choice Q&A when prompting the LLM, by default True.
- Type:
bool, optional
- feature_subsetο
Whether to use a subset of the standard feature set for the task. The list should contain the names of the columns of features to use.
- Type:
list[str] | None, optional
- population_filterο
Optional population filter for this benchmark; must follow the format {βcolumn_nameβ: βvalueβ}.
- Type:
dict | None, optional
- seedο
Random seed β to set for reproducibility.
- Type:
int, optional
- prompt_variationο
Dictionary of prompt style overrides (e.g.
{"format": "bullet", "connector": "is"}).Nonemeans no variation is applied.- Type:
dict | None, optional
-
batch_size:
int|None= Noneο
-
chat_prompt:
str|None= <object object>ο
-
context_size:
int|None= Noneο
-
correct_order_bias:
bool= Trueο
-
cot_prompting:
bool= Falseο
- classmethod default_config(**changes)[source]ο
Returns the default configuration with optional changes.
-
enable_thinking:
bool= Falseο
-
feature_subset:
list[str] |None= Noneο
-
few_shot_config:
FewShotConfig|None= Noneο
- classmethod load_from_disk(path)[source]ο
Load the configuration from disk (tolerant of pre-refactor JSON).
-
numeric_risk_prompting:
bool= Falseο
-
population_filter:
dict|None= Noneο
-
prompt_variation:
dict|None= Noneο
-
seed:
int= 42ο
-
system_prompt:
str|None= <object object>ο
-
use_chat_template:
bool= Falseο
folktexts.col_to_text moduleο
- class folktexts.col_to_text.ColumnToText(name, short_description, value_map=None, question=None, connector_verb='is:', missing_value_fill='N/A', use_value_map_only=False)[source]ο
Bases:
objectMaps a single columnβs values to natural text.
Constructs a ColumnToText object.
- Parameters:
name (str) β The columnβs name.
short_description (str) β A short description of the column to be used before different values. For example, short_description=βyearly incomeβ will result in βThe yearly income is [β¦]β.
value_map (dict[int | str, str] | Callable, optional) β A map between column values and their textual meaning. If not provided, will try to infer a mapping from the question.
question (QAInterface, optional) β A question associated with the column. If not provided, will try to infer a multiple-choice question from the value_map.
connector_verb (str, optional) β Which verb to use when connecting the columnβs description to its value; by default βisβ.
missing_value_fill (str, optional) β The value to use when the columnβs value is not found in the value_map, by default βN/Aβ.
use_value_map_only (bool, optional) β Whether to only use the value_map for mapping values to text, or whether natural language representation should be generated using the connector_verb and short_description as well. By default (False) will construct a natural language representation of the form: βThe [short_description] [connector_verb] [value_map.get(val)]β.
- get_text(value)[source]ο
Returns the natural text representation of the given data value.
- Return type:
str
- property name: strο
- property question: QAInterfaceο
- property short_description: strο
- property value_map: Callableο
Returns the value map function for this column.
folktexts.dataset moduleο
General Dataset functionality for text-based datasets.
- class folktexts.dataset.Dataset(data, task, test_size=0.1, val_size=0.1, subsampling=None, seed=42)[source]ο
Bases:
objectConstruct a Dataset object.
- Parameters:
data (pd.DataFrame) β The datasetβs data in pandas DataFrame format.
task (TaskMetadata) β The metadata for the prediction task.
test_size (float, optional) β The size of the test set, as a fraction of the total dataset size, by default 0.1.
val_size (float, optional) β The size of the validation set, as a fraction of the total dataset size, by default 0.1.
subsampling (float, optional) β Whether to use sub-sampling, and which fraction of the data to keep. By default will not use sub-sampling (subsampling=None).
seed (int, optional) β The random state seed, by default 42.
- property data: DataFrameο
- property name: strο
A unique name for this dataset.
- sample_n_train_examples(n, reuse_examples=False, composition='random')[source]ο
Return a set of samples from the training set.
- Parameters:
n (int) β The number of example rows to return.
reuse_examples (bool, optional) β Whether to reuse the same examples for consistency. By default will sample new examples each time (reuse_examples=False).
composition (str or list, optional) β βrandomβ (default) samples uniformly. βbalancedβ draws equal counts per class. A list of ints specifies exact per-class counts in label order and must sum to n.
- Returns:
X, y β The features and target data for the sampled examples.
- Return type:
tuple[pd.DataFrame, pd.Series]
- property seed: intο
- property subsampling: floatο
- property task: TaskMetadataο
- property test_size: floatο
- property train_size: floatο
- property val_size: floatο
folktexts.evaluation moduleο
Module to map risk-estimates to a variety of evaluation metrics.
Notes
Code based on the error_parity.evaluation module, at: https://github.com/socialfoundations/error-parity/blob/main/error_parity/evaluation.py
- folktexts.evaluation.bootstrap_estimate(eval_func, *, y_true, y_pred_scores, sensitive_attribute=None, k=200, confidence_pct=95, seed=42)[source]ο
Computes bootstrap estimates of the given evaluation function.
- Parameters:
eval_func (Callable[[np.ndarray, np.ndarray, np.ndarray], dict[str, float]]) β The evaluation function to run for each bootstrap sample. Must follow the signature eval_func(y_true, y_pred_scores, sensitive_attribute).
y_true (np.ndarray) β The true labels.
y_pred_scores (np.ndarray) β The predicted scores.
sensitive_attribute (np.ndarray, optional) β Optionally, provide the sensitive attribute data to compute fairness metrics, by default None.
k (int, optional) β How many bootstrap samples to draw, by default 200.
confidence_pct (float, optional) β The confidence interval to use, in percentage, by default 95.
seed (int, optional) β The random seed, by default 42.
- Returns:
results β A dictionary containing bootstrap estimates for a variety of metrics.
- Return type:
dict[str, float]
- folktexts.evaluation.compute_best_threshold(y_true, y_pred_scores, *, false_pos_cost=1.0, false_neg_cost=1.0)[source]ο
Computes the binarization threshold that maximizes accuracy.
- Parameters:
y_true (np.ndarray) β The true class labels.
y_pred_scores (np.ndarray) β The predicted risk scores.
false_pos_cost (float, optional) β The cost of a false positive error, by default 1.0
false_neg_cost (float, optional) β The cost of a false negative error, by default 1.0
- Returns:
best_threshold β The threshold value that maximizes accuracy for the given predictions.
- Return type:
float
- folktexts.evaluation.evaluate_binary_predictions(y_true, y_pred)[source]ο
Evaluates the provided binary predictions on common performance metrics.
- Parameters:
y_true (np.ndarray) β The true class labels.
y_pred (np.ndarray) β The binary predictions.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
dict
- folktexts.evaluation.evaluate_binary_predictions_fairness(y_true, y_pred, sensitive_attribute, return_groupwise_metrics=False, min_group_size=0.04)[source]ο
Evaluates fairness of the given predictions.
Fairness metrics are computed as the ratios between group-wise performance metrics.
- Parameters:
y_true (np.ndarray) β The true class labels.
y_pred (np.ndarray) β The discretized predictions.
sensitive_attribute (np.ndarray) β The sensitive attribute (protected group membership).
return_groupwise_metrics (bool, optional) β Whether to return group-wise performance metrics (bool: True) or only the ratios between these metrics (bool: False), by default False.
min_group_size (float, optional) β The minimum fraction of samples (as a fraction of the total number of samples) that a group must have to be considered for fairness evaluation, by default 0.04. This is meant to avoid evaluating metrics on very small groups which leads to noisy and inconsistent results.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
dict
- folktexts.evaluation.evaluate_predictions(y_true, y_pred_scores, *, sensitive_attribute=None, threshold='best', model_name=None)[source]ο
Evaluates predictions on common performance and fairness metrics.
- Parameters:
y_true (np.ndarray) β The true class labels.
y_pred_scores (np.ndarray) β The predicted scores.
sensitive_attribute (np.ndarray, optional) β The sensitive attribute data. Will compute fairness metrics if provided.
threshold (float | str, optional) β The threshold to use for binarizing the predictions, or βbestβ to infer which threshold maximizes accuracy.
model_name (str, optional) β The name of the model to be used on the plots, by default None.
- Returns:
results β A dictionary with key-value pairs of (metric name, metric value).
- Return type:
dict
- folktexts.evaluation.evaluate_predictions_bootstrap(y_true, y_pred_scores, *, sensitive_attribute=None, threshold='best', k=200, confidence_pct=95, seed=42)[source]ο
Computes bootstrap estimates of classification metrics for the given predictions.
- Parameters:
y_true (np.ndarray) β The true labels.
y_pred_scores (np.ndarray) β The score predictions.
sensitive_attribute (np.ndarray, optional) β The sensitive attribute data. Will compute fairness metrics if provided.
threshold (float | str, optional) β The threshold to use for binarizing the predictions, or βbestβ to infer which threshold maximizes accuracy, by default βbestβ.
k (int, optional) β How many bootstrap samples to draw, by default 200.
confidence_pct (float, optional) β How large of a confidence interval to use when reporting lower and upper bounds, by default 95 (i.e., 2.5 to 97.5 percentile of results).
seed (int, optional) β The random seed, by default 42.
- Returns:
results β A dictionary containing bootstrap estimates for a variety of metrics.
- Return type:
dict[str, float]
folktexts.llm_utils moduleο
Common functions to use with transformer LLMs.
- folktexts.llm_utils.add_pad_token(tokenizer)[source]ο
Add a pad token to the model and tokenizer if it doesnβt already exist.
Here weβre using the end-of-sentence token as the pad token. Both the model weights and tokenizer vocabulary are untouched.
Another possible way would be to add a new token [PAD] to the tokenizer and update the tokenizer vocabulary and model weight embeddings accordingly. The embedding for the new pad token would be the average of all other embeddings.
- folktexts.llm_utils.decode_topk_logprobs_to_risk_estimate(per_pass_topk, *, tokenizer_vocab, vocab_dim, question)[source]ο
Convert top-K log-probabilities into a single risk-estimate float.
- Parameters:
per_pass_topk (list[dict[int, float]]) β One dict per generated token position, mapping token_id -> log-prob. The token_ids must match the values in tokenizer_vocab. Tokens absent from the top-K are assumed to have probability ~0.
tokenizer_vocab (dict[str, int]) β Token string -> token_id map used by the QA decoder for prefix-variant lookup (MultipleChoiceQA) or digit/decimal lookup (DirectNumericQA).
vocab_dim (int) β Size of the linear-probability arrayβs vocab axis. For local backends this is model.config.vocab_size (the logits axis); for the synthetic WebAPI path it is the size of the synthesised vocab.
question (MultipleChoiceQA | DirectNumericQA) β The QA interface used to interpret the probabilities.
- Returns:
risk_estimate β Risk score in [0, 1] from question.get_answer_from_model_output.
- Return type:
float
Notes
Both the WebAPI backend (top_logprobs=20 from OpenAI-style responses) and the vLLM backend (top-K logprobs from SamplingParams(logprobs=K)) call this helper. The transformers backend reads the full softmax directly and bypasses this path; see query_model_batch_multiple_passes.
- folktexts.llm_utils.generate_text_batch(text_inputs, model, tokenizer, max_new_tokens=1024, context_size=None, enable_thinking=None, system_prompt=None)[source]ο
Generate text completions for a batch of prompts.
Uses the modelβs generate() method for autoregressive text generation, suitable for chain-of-thought Q&A where the model needs to produce free-form text before outputting a probability estimate. Generation is greedy (do_sample=False) so runs are reproducible β matches the web-API pathβs temperature=0 contract.
- Parameters:
text_inputs (list[str]) β The input prompts as a list of strings.
model (AutoModelForCausalLM) β The model to use for generation.
tokenizer (AutoTokenizer) β The tokenizer used to encode/decode text.
max_new_tokens (int, optional) β Maximum number of new tokens to generate, by default 1024.
context_size (int, optional) β The maximum context size for input tokens. If None, no truncation is applied to inputs.
enable_thinking (bool, optional) β
Controls chat template application and thinking mode: - None: Do not apply chat template (use raw prompts, for base models) - False: Apply chat template WITHOUT thinking mode (for instruction-tuned models) - True: Apply chat template WITH thinking mode, and extract response
content after </think> marker (for thinking models like Qwen3)
system_prompt (str | None, optional) β System prompt to inject as a system role message when applying the chat template. Ignored when
enable_thinkingis None.
- Returns:
generated_texts β The generated text completions for each input prompt. Only the newly generated tokens are returned (not the input prompt).
- Return type:
list[str]
- folktexts.llm_utils.get_model_folder_path(model_name, root_dir='/tmp')[source]ο
Returns the folder where the model is saved.
- Return type:
str
- folktexts.llm_utils.get_model_size_B(model_name, default=None)[source]ο
Get the model size from the model name, in Billions of parameters.
- Return type:
int
- folktexts.llm_utils.is_bf16_compatible()[source]ο
Checks if the current environment is bfloat16 compatible.
- Return type:
bool
- folktexts.llm_utils.load_model_tokenizer(model_name_or_path, **kwargs)[source]ο
Load a model and tokenizer from the given local path (or using the model name).
- Parameters:
model_name_or_path (str | Path) β Model name or local path to the model folder.
kwargs (dict) β Additional keyword arguments to pass to the model from_pretrained call.
- Returns:
The loaded model and tokenizer, respectively.
- Return type:
tuple[AutoModelForCausalLM, AutoTokenizer]
- folktexts.llm_utils.load_vllm_model(model_name_or_path, *, dtype='auto', gpu_memory_utilization=0.85, max_model_len=None, tensor_parallel_size=1, trust_remote_code=True, seed=42, max_logprobs=50, **kwargs)[source]ο
Load a vLLM LLM engine and its tokenizer.
Mirrors load_model_tokenizer for the vLLM backend. vLLM allocates the KV cache statically at startup based on gpu_memory_utilization and max_model_len; tune these per-GPU. vllm is an optional install β if it is not importable, this function raises a pointed error.
- Parameters:
model_name_or_path (str | Path) β Model name or local path to the model folder. Pre-cached snapshots under /fast/groups/sf/huggingface-models/ work without download.
dtype (str, optional) β Compute dtype:
"auto"(default; vLLM picks bf16/fp16 from the config),"bfloat16","float16", or"float32".gpu_memory_utilization (float, optional) β Fraction of GPU VRAM vLLM may use for weights + KV cache. Default 0.85 (vLLMβs own default is 0.9, which is aggressive on shared cluster nodes). vLLM fails fast at startup if this isnβt enough β bump down if you hit OOM at LLM().
max_model_len (int, optional) β Maximum number of tokens (input + output) per request. If
None, vLLM reads it from the model config β which on some Llama checkpoints is 131072 and will allocate enormous KV cache. Pass an explicit value sized ascontext_size + max_new_tokens + bufferfor the workload.tensor_parallel_size (int, optional) β Number of GPUs to shard the model across; default 1. Set higher when the cluster job grants multiple GPUs and the model fits with tensor-parallel sharding.
trust_remote_code (bool, optional) β Forwarded to vLLM (mirrors load_model_tokenizer).
seed (int, optional) β Random seed for vLLM. Doesnβt affect greedy (temperature=0) decoding but pinned for safety.
max_logprobs (int, optional) β Engine-level cap on top-K logprobs SamplingParams may request. Default 50 β must be β₯
VLLMClassifier._TOPK_LOGPROBSor the engine rejects the request at predict time (VLLMValidationError: Requested sample logprobs of K, which is greater than max allowed).**kwargs β Additional keyword arguments forwarded verbatim to
vllm.LLM(...).
- Returns:
Loaded engine and its tokenizer. The tokenizer has had add_pad_token applied so it matches the transformers pathβs tokenizer state.
- Return type:
tuple[vllm.LLM, AutoTokenizer]
- folktexts.llm_utils.query_model_batch(text_inputs, model, tokenizer, context_size)[source]ο
Queries the model with a batch of text inputs.
- Parameters:
text_inputs (list[str]) β The inputs to the model as a list of strings.
model (AutoModelForCausalLM) β The model to query.
tokenizer (AutoTokenizer) β The tokenizer used to encode the text inputs.
context_size (int) β The maximum context size to consider for each input (in tokens).
- Returns:
last_token_probs β Modelβs last token linear probabilities for each input as an np.array of shape (batch_size, vocab_size).
- Return type:
np.array
- folktexts.llm_utils.query_model_batch_multiple_passes(text_inputs, model, tokenizer, context_size, n_passes, digits_only=False)[source]ο
Queries an LM for multiple forward passes.
Greedy token search over multiple forward passes: Each forward pass takes the highest likelihood token from the previous pass.
NOTE: could use model.generate in the future!
- Parameters:
text_inputs (list[str]) β The batch inputs to the model as a list of strings.
model (AutoModelForCausalLM) β The model to query.
tokenizer (AutoTokenizer) β The tokenizer used to encode the text inputs.
context_size (int) β The maximum context size to consider for each input (in tokens).
n_passes (int, optional) β The number of forward passes to run.
digits_only (bool, optional) β Whether to only sample for digit tokens.
- Returns:
last_token_probs β Last token linear probabilities for each forward pass, for each text in the input batch. The output has shape (batch_size, n_passes, vocab_size).
- Return type:
np.array
folktexts.plotting moduleο
Module to plot evaluation results.
- folktexts.plotting.render_evaluation_plots(y_true, y_pred_scores, *, eval_results={}, model_name=None, imgs_dir=None, show_plots=False)[source]ο
Renders evaluation plots for the given predictions.
- Return type:
dict
folktexts.prompting moduleο
Prompt construction utilities for risk-estimation tasks.
This module maps risk-estimation questions to different prompting techniques and supports systematic prompt variations for benchmarking and evaluation.
Each prompt (corresponding to a tabular data row) is represented as composition of three parts:
- [PREFIX] Shared task description and/or system context.
This section is constant across all rows.
[INFO] Row-specific serialized feature-value pairs. [SUFFIX] Question text defining the prediction task.
Within INFO the prompt variation pipeline is fixed by semantics: VaryValueMap β VaryOrder β VaryConnector β VaryFormat Return types enforce the order: per-item stages share listβlist; VaryFormat collapses the list to str, making it impossible to apply a per-item stage after it.
The module implements multiple prompting strategies, including:
Multiple-choice Q&A vs direct numeric Q&A
Zero-shot prompting
Few-shot prompting
Chain-of-thought (CoT) prompting
- folktexts.prompting.DEFAULT_PROMPT_STYLE: dict[str, Any] = {'connector': 'is:', 'custom_prompt_prefix': None, 'custom_prompt_suffix': None, 'format': 'textbullet', 'granularity': 'original', 'order': None, 'show_question': True}ο
Default values for the seven prompt-variation keys;
PromptConfig.from_dictvalidates overrides against them.
- class folktexts.prompting.FeatureItem(col, label, raw_value, text_value='', connected='')[source]ο
Bases:
objectOne feature mid-pipeline:
text_valueis set byVaryValueMap,connectedbyVaryConnector.-
col:
strο
-
connected:
str= ''ο
-
label:
strο
-
raw_value:
Anyο
-
text_value:
str= ''ο
-
col:
- class folktexts.prompting.FewShotConfig(n_shots, example_order=None, compose='random', reuse_examples=False, show_question_in_examples=True)[source]ο
Bases:
objectConfiguration for few-shot prompting.
- Parameters:
n_shots (int) β Number of example questions and answers to prepend.
example_order (tuple[int, ...] | str | None, optional) β Integer permutation to reorder examples (e.g.
(2, 0, 1)or"2,0,1").Nonekeeps the sampled order.compose (str | list, optional) β How to select few-shot samples:
"random"(default),"balanced"(equal draws per class), or a list of per-class counts summing ton_shots.reuse_examples (bool, optional) β Whether to reuse the same examples across calls, by default False.
show_question_in_examples (bool, optional) β Whether each in-context example repeats the question (default, matches main) or shows only the answer. Set False for the compact answer-only format. Default is True.
-
compose:
str|list= 'random'ο
-
example_order:
tuple[int,...] |str|None= Noneο
-
n_shots:
intο
-
reuse_examples:
bool= Falseο
-
show_question_in_examples:
bool= Trueο
- folktexts.prompting.PROMPT_DEFAULT = <object object>ο
βuse the question typeβs default system / chat promptβ β as opposed to
None, which disables the role.- Type:
Sentinel
- class folktexts.prompting.PromptBuilder(task)[source]ο
Bases:
object
- class folktexts.prompting.PromptConfig(prefix, value_map, order, connector, format, suffix, system_prompt=None)[source]ο
Bases:
objectHow one row is rendered into a prompt β one instance of each variation stage.
A prompt is a task
prefix, a feature[INFO]block (thevalue_map β order β connector β formatpipeline), and a questionsuffix, plus an optionalsystem_promptfor the chat path. Build one withfrom_dict()(ordefault()) rather than instantiating the stages directly. Frozen and hashable, so each distinct configuration gets its ownresults.bench-{hash}.json.-
connector:
VaryConnectorο
-
format:
VaryFormatο
- classmethod from_dict(pv, task, question=None, add_task_description=True, system_prompt=<object object>)[source]ο
Build a PromptConfig from a prompt-variation dict and a task.
- Parameters:
pv (dict) β Prompt style overrides; see
DEFAULT_PROMPT_STYLEfor valid keys.task (TaskMetadata) β The task that defines features, column mappings, and the question.
question (QAInterface, optional) β Override the taskβs default question interface.
add_task_description (bool, optional) β Whether to include the task description in the prefix.
system_prompt (str | None, optional) β System prompt string; wrapped in
VarySystemPromptwhen provided. Defaults toquestion.default_system_prompt(set per QA subclass). PassNoneexplicitly to disable the system role (e.g. for Gemma-style templates).
- Return type:
-
prefix:
VaryPrefixο
-
suffix:
VarySuffixο
-
system_prompt:
VarySystemPrompt|None= Noneο
-
value_map:
VaryValueMapο
-
connector:
- class folktexts.prompting.VaryConnector(connector='is:')[source]ο
Bases:
objectJoins each feature label to its value with
connector(default"is:"->"Age is: 30"; e.g."is",":").-
connector:
str= 'is:'ο
-
connector:
- class folktexts.prompting.VaryFormat(format='textbullet')[source]ο
Bases:
objectCollapses the feature list into the final layout:
"textbullet"(default),"bullet","comma", or"text".-
format:
str= 'textbullet'ο
-
format:
- class folktexts.prompting.VaryOrder(order=None)[source]ο
Bases:
objectReorders feature items by
order(named columns first, the rest appended);Nonekeeps the original order.-
order:
tuple|list|str|None= Noneο
-
order:
- class folktexts.prompting.VaryPrefix(task_description, add_task_description=True, custom_prefix=None)[source]ο
Bases:
objectBuilds the prompt
[PREFIX]: the task description plus an optional custom prefix.-
add_task_description:
bool= Trueο
-
custom_prefix:
str|None= Noneο
-
task_description:
strο
-
add_task_description:
- class folktexts.prompting.VarySuffix(question, show_question=True, with_answer_prefill=True, show_label=False, label=None, custom_suffix=None)[source]ο
Bases:
objectBuilds the prompt
[SUFFIX]: the question text and answer prefill (or just the prefill whenshow_question=False).-
custom_suffix:
str|None= Noneο
-
label:
Any= Noneο
-
question:
QAInterfaceο
-
show_label:
bool= Falseο
-
show_question:
bool= Trueο
-
with_answer_prefill:
bool= Trueο
-
custom_suffix:
- class folktexts.prompting.VarySystemPrompt(system_prompt)[source]ο
Bases:
objectHolds the optional system-role string for the chat path.
-
system_prompt:
strο
-
system_prompt:
- class folktexts.prompting.VaryValueMap(cols_to_text, granularity='original')[source]ο
Bases:
objectMaps raw feature values to human-readable text;
granularity("original"/"low") selects the value-map variant.-
cols_to_text:
dictο
-
granularity:
str= 'original'ο
-
cols_to_text:
- folktexts.prompting.apply_chat_template(tokenizer, user_prompt, system_prompt=None, chat_prompt=None, **kwargs)[source]ο
Apply the tokenizerβs chat template to assemble a single prompt string.
- Return type:
str
Notes
system_prompt is treated as βincludeβ iff it is not None. This means an empty string ββ will inject an empty system message rather than be treated as βno system roleβ β pass None (or omit the argument) to skip the system role entirely.
chat_prompt is the assistant prefill. When provided, the returned prompt is trimmed so it ends exactly with chat_prompt, preserving the last-token scoring contract relied on by LLMClassifier. If the chat template mutates or strips the prefill (so it cannot be located verbatim in the rendered output), a ValueError is raised rather than silently returning a corrupted prompt.
When chat_prompt is None, add_generation_prompt=True is used and the model is left to generate freely; this is not appropriate for the benchmark scoring path (the last token will be a template-emitted role header, not the prefill).
- folktexts.prompting.encode_row_prompt(row, task, *, question=None, prompt_config=None)[source]ο
Encode a question regarding a given row into a natural-language prompt.
- Parameters:
row (pd.Series) β The data row to encode.
task (TaskMetadata) β The task that defines features, column mappings, and the question.
question (QAInterface, optional) β Override the question interface. Defaults to
task.question. Whenprompt_configis provided, only the suffix question is replaced (used for order-bias correction). Otherwise a default config is built with this question.prompt_config (PromptConfig, optional) β A pre-built config; all style parameters are taken from it. Build once at classifier/benchmark init and pass here to avoid rebuilding on every row.
The (with_answer_prefill is forwarded to question.get_question_prompt.)
separate (chat-template path passes False so the prefill is supplied as a)
message. (assistant turn rather than baked into the user)
- Returns:
The fully formatted prompt string.
- Return type:
str
- folktexts.prompting.encode_row_prompt_chat(row, task, tokenizer, system_prompt=<object object>, chat_prompt=<object object>, question=None, prompt_config=None)[source]ο
Encode a row prompt using the tokenizerβs chat template.
- Parameters:
row (pd.Series) β The row that the question will be about.
task (TaskMetadata) β The task metadata object.
tokenizer (AutoTokenizer) β The tokenizer whose chat template will be applied.
system_prompt (str | None, optional) β System prompt text. Only used when
prompt_configis not provided; passed straight toPromptConfig.from_dictwhich selects the mode-appropriate default when omitted. PassNoneexplicitly to disable the system role (e.g. for Gemma-style templates that reject it). Whenprompt_configis provided, system_prompt is ignored β patch the config directly instead.chat_prompt (str | None, optional) β Assistant prefill text. If omitted, the mode-appropriate default is selected from the question type. Pass
Noneexplicitly to skip the assistant prefill β note that this routes inference throughadd_generation_prompt=Trueand breaks the last-token scoring assumption used byLLMClassifier, so it is not appropriate for the benchmark path.question (QAInterface, optional) β The question interface to use. When
prompt_configis provided this overrides only the suffix question (used for order-bias correction).prompt_config (PromptConfig, optional) β A pre-built config object. When provided, all style parameters are ignored.
questionstill overrides the suffix question when given.
- Returns:
The fully formatted chat-template prompt.
- Return type:
str
- folktexts.prompting.encode_row_prompt_few_shot(row, task, dataset, *, n_shots=None, question=None, reuse_examples=False, compose_few_shot_examples='random', example_order=None, prompt_config=None, few_shot_config=None)[source]ο
Encode a question regarding a given row using few-shot prompting.
- Parameters:
row (pd.Series) β The row that the question will be about.
task (TaskMetadata) β The task that the row belongs to.
dataset (Dataset) β The dataset to draw few-shot examples from (sampled from the train split).
n_shots (int, optional) β The number of example questions and answers to prepend. Ignored when
few_shot_configis provided.question (QAInterface, optional) β The question interface to use; defaults to
task.question.reuse_examples (bool, optional) β Whether to reuse the same examples for consistency. By default will resample new examples each time (reuse_examples=False). Ignored when
few_shot_configis provided.compose_few_shot_examples (str or list, optional) β How to select few-shot samples:
"random"(default),"balanced"(equal draws per class), or a list of per-class counts summing ton_shots. Ignored whenfew_shot_configis provided.example_order (tuple[int, ...] | str | None, optional) β Integer permutation to reorder examples before building the prompt (e.g.
[2, 0, 1]for 3 shots).Nonekeeps the sampled order. Ignored whenfew_shot_configis provided.prompt_config (PromptConfig, optional) β A pre-built config object. When provided, all other style parameters are ignored.
few_shot_config (FewShotConfig, optional) β Typed few-shot configuration. When provided, the individual
n_shots,reuse_examples,compose_few_shot_examples, andexample_orderparams are ignored.
- Returns:
prompt β The encoded few-shot prompt.
- Return type:
str
- folktexts.prompting.resolve_chat_defaults(question, system_prompt=<object object>, chat_prompt=<object object>)[source]ο
Resolve default system_prompt / chat_prompt for chat-template prompting.
Defaults are read from
question.default_system_promptandquestion.default_chat_prompt(ClassVar``s defined on each ``QAInterfacesubclass). PassPROMPT_DEFAULT(or omit the argument) to use the questionβs ClassVar default. PassNoneexplicitly to disable a role entirely (e.g. for Gemma-style tokenizers that reject the system role).- Return type:
tuple[str|None,str|None]
- folktexts.prompting.tokenizer_supports_system_prompt(tokenizer)[source]ο
Check whether the tokenizerβs chat template supports system messages.
Some models (e.g. Gemma) raise a TemplateError when a system role is used. Other templates surface this with different exception types depending on transformers / Jinja versions (e.g. RuntimeError, KeyError, or a template-defined exception macro), so we treat any failure of the probe as βsystem role not supportedβ rather than letting it propagate and crash the benchmark.
- Return type:
bool
folktexts.qa_interface moduleο
Interface for question-answering with LLMs.
Create different types of questions (direct numeric, multiple-choice, chain-of-thought).
Encode questions and decode model outputs.
Compute risk-estimate from model outputs.
- class folktexts.qa_interface.ChainOfThoughtQA(column, text, num_forward_passes=-1, max_new_tokens=8000, enable_thinking=False)[source]ο
Bases:
QAInterfaceA chain-of-thought (CoT) question interface.
The model is instructed to reason step-by-step in free-form text and end with an explicit Probability: X% line; the probability is recovered via regex. This works on any model regardless of chat template.
Orthogonal to the tokenizerβs enable_thinking chat-template kwarg: CoT prompting always uses free-form generation, and enable_thinking=True additionally activates the <think>β¦</think> block on tokenizers that support it (e.g., Qwen3-Thinking) β the block is stripped before regex extraction.
Notes
Unlike DirectNumericQA and MultipleChoiceQA which use token probabilities, this interface uses full text generation. The num_forward_passes is set to -1 to signal text-generation mode instead of token-probability extraction.
The regex extraction is flexible and accepts multiple formats: - βProbability: 80%β -> 0.80 - βProbability: 0.80β -> 0.80 - βProbability: 80 percentβ -> 0.80 - ββ¦ 75%β (at end of text) -> 0.75
- enable_thinkingο
Whether to enable thinking mode for tokenizers that support it (e.g., Qwen3). When True, the tokenizerβs apply_chat_template is called with enable_thinking=True. Default is False.
- Type:
bool
-
default_chat_prompt:
ClassVar[str|None] = Noneο
-
default_system_prompt:
ClassVar[str|None] = Noneο
-
enable_thinking:
bool= Falseο
- static extract_probability_from_text(generated_text)[source]ο
Extract a probability value from generated text using regex patterns.
The extraction prioritizes (in order): the explicit βProbability: X[%]β anchor, last loose percentage, βX percentβ, then a bare 0.XX decimal. Returns a float in [0, 1] or None if nothing matched.
- Return type:
float|None
- get_answer_from_model_output(generated_text, tokenizer_vocab=None)[source]ο
Extract the probability answer from the modelβs generated text.
- Parameters:
generated_text (str) β The full text generated by the model, including reasoning and the final probability estimate.
tokenizer_vocab (dict[str, int], optional) β The tokenizerβs vocabulary. Not used for ChainOfThoughtQA but included for interface compatibility.
- Returns:
answer β The extracted probability as a float between 0 and 1.
- Return type:
float
- Raises:
ValueError β If no valid probability could be extracted from the generated text.
- get_question_prompt(with_answer_prefill=True)[source]ο
Returns the CoT question prompt.
The with_answer_prefill parameter is accepted for interface compatibility with QAInterface but has no effect: CoT prompts produce free-form text and have no answer prefill to strip.
- Return type:
str
-
max_new_tokens:
int= 8000ο
-
num_forward_passes:
int= -1ο
- class folktexts.qa_interface.Choice(text, data_value, numeric_value=None)[source]ο
Bases:
objectRepresents a choice in multiple-choice Q&A.
- textο
The text of the choice. E.g., β25-34 years oldβ.
- Type:
str
- data_valueο
The categorical value corresponding to this choice in the data.
- Type:
object
- numeric_valueο
A meaningful numeric value for the choice. E.g., if the choice is β25-34 years oldβ, the numeric value could be 30. The choice with the highest numeric value can be used as a proxy for the positive class. If not provided, will try to use the choice.value.
- Type:
float, optional
-
data_value:
objectο
-
numeric_value:
float= Noneο
-
text:
strο
- class folktexts.qa_interface.DirectNumericQA(column, text, num_forward_passes=2, answer_probability=True)[source]ο
Bases:
QAInterfaceRepresents a direct numeric question.
Notes
For example, the prompt could be β Q: What is 2 + 2? A: β With the expected answer being β4β.
If looking for a direct numeric probability, the answer prompt will be framed as so: β Q: What is the probability, between 0 and 1, of getting heads on a coin flip? A: 0.β So that we can extract a numeric answer with at most 2 forward passes. This is done automatically by passing the kwarg answer_probability=True.
Note that some models have multi-digit tokens in their vocabulary, so we need to correctly assess which tokens in the vocabulary correspond to valid numeric answers.
-
answer_probability:
bool= Trueο
-
default_chat_prompt:
ClassVar[str] = 'Answer (between 0 and 1): 0.'ο
-
default_system_prompt:
ClassVar[str] = 'You are a helpful assistant. You provide numeric probability estimates based on the information provided.\n'ο
- get_answer_from_model_output(last_token_probs, tokenizer_vocab)[source]ο
Outputs a numeric answer inferred from the modelβs output.
- Parameters:
last_token_probs (np.ndarray) β The last token probabilities of the model for the question. The first dimension must correspond to the number for forward passes as specified by num_forward_passes.
tokenizer_vocab (dict[str, int],) β The tokenizerβs vocabulary.
- Returns:
answer β The numeric answer to the question.
- Return type:
float | int
Notes
Eventually we could run a search algorithm to find the most likely answer over multiple forward passes, but for now weβll just take the argmax on each forward pass.
- get_answer_prefix()[source]ο
Returns the answer label that follows the question (e.g. βAnswer:β).
- Return type:
str
- get_question_prompt(with_answer_prefill=True)[source]ο
Returns the question text.
with_answer_prefill=True (the default) bakes the answer prefill into the returned string β required by the zero-shot / few-shot last-token scoring path, which reads probabilities from the very next token after the prefill. Set to False for chat-template prompting, where the prefill is supplied separately as the assistant turn (otherwise the same string ends up emitted twice and silently degrades scoring).
- Return type:
str
-
num_forward_passes:
int= 2ο
-
answer_probability:
- class folktexts.qa_interface.MultipleChoiceQA(column, text, num_forward_passes=1, choices=<factory>, _answer_keys_source=<factory>)[source]ο
Bases:
QAInterfaceRepresents a multiple-choice question and its answer keys.
- property answer_keys: tuple[str, ...]ο
- classmethod create_answer_keys_permutations(question)[source]ο
Yield questions with all permutations of answer keys.
- Parameters:
question (Question) β The template question whose answer keys will be permuted.
- Returns:
permutations β A generator of questions with all permutations of answer keys.
- Return type:
Iterator[Question]
- classmethod create_question_from_value_map(column, value_map, attribute, **kwargs)[source]ο
Constructs a question from a value map.
- Return type:
- get_answer_from_model_output(last_token_probs, tokenizer_vocab)[source]ο
Decodes the modelβs output into an answer for the given question.
- Parameters:
last_token_probs (np.ndarray) β The modelβs last token probabilities for the question. The first dimension corresponds to the number of forward passes as specified by self.num_forward_passes.
tokenizer_vocab (dict[str, int],) β The tokenizerβs vocabulary.
- Returns:
answer β The answer to the question.
- Return type:
float
- get_answer_key_from_value(value)[source]ο
Returns the answer key corresponding to the given data value.
- Return type:
str
- get_answer_prefix()[source]ο
Returns the answer label that follows the question (e.g. βAnswer:β).
- Return type:
str
- get_question_prompt(with_answer_prefill=True)[source]ο
Returns the question text.
with_answer_prefill=True (the default) bakes the answer prefill into the returned string β required by the zero-shot / few-shot last-token scoring path, which reads probabilities from the very next token after the prefill. Set to False for chat-template prompting, where the prefill is supplied separately as the assistant turn (otherwise the same string ends up emitted twice and silently degrades scoring).
- Return type:
str
- get_value_to_text_map()[source]ο
Returns the map from choice data value to choice textual representation.
- Return type:
dict[object,str]
-
num_forward_passes:
int= 1ο
- class folktexts.qa_interface.QAInterface(column, text, num_forward_passes)[source]ο
Bases:
ABCAn interface for a question-answering system.
-
column:
strο
-
default_chat_prompt:
ClassVar[str|None] = 'If had to select one of the options, my answer would be'ο
-
default_system_prompt:
ClassVar[str|None] = 'You are a helpful assistant. You answer multiple-choice questions based on the information provided. Respond with a single answer choice.\n'ο
- get_answer_from_model_output(last_token_probs, tokenizer_vocab)[source]ο
Decodes the modelβs output into an answer for the given question.
- Parameters:
last_token_probs (np.ndarray) β The modelβs last token probabilities for the question. The first dimension corresponds to the number of forward passes as specified by self.num_forward_passes.
tokenizer (dict[str, int]) β The tokenizerβs vocabulary.
- Returns:
answer β The answer to the question.
- Return type:
float
- get_answer_prefix()[source]ο
Returns the answer label that follows the question (e.g. βAnswer:β).
- Return type:
str
- get_question_prompt(with_answer_prefill=True)[source]ο
Returns the question text.
with_answer_prefill=True (the default) bakes the answer prefill into the returned string β required by the zero-shot / few-shot last-token scoring path, which reads probabilities from the very next token after the prefill. Set to False for chat-template prompting, where the prefill is supplied separately as the assistant turn (otherwise the same string ends up emitted twice and silently degrades scoring).
- Return type:
str
-
num_forward_passes:
intο
-
text:
strο
-
column:
folktexts.task moduleο
Definition of a generic TaskMetadata class.
- class folktexts.task.TaskMetadata(name, features, target, cols_to_text, sensitive_attribute=None, target_threshold=None, multiple_choice_qa=None, direct_numeric_qa=None, cot_qa=None, description=None, _use_numeric_qa=False, _use_cot_qa=False)[source]ο
Bases:
objectA base class to hold information on a prediction task.
- check_task_columns_are_available(available_cols, raise_=True)[source]ο
Checks if all columns required by this task are available.
- Parameters:
available_cols (list[str]) β The list of column names available in the dataset.
raise (bool, optional) β Whether to raise an error if some columns are missing, by default True.
- Returns:
all_available β True if all required columns are present in the given list of available columns, False otherwise.
- Return type:
bool
-
cols_to_text:
dict[str,ColumnToText]ο A mapping between column names and their textual descriptions.
-
cot_qa:
ChainOfThoughtQA= Noneο The chain-of-thought (CoT) question and answer interface for this task.
- create_task_with_feature_subset(feature_subset)[source]ο
Creates a new task with a subset of the original features.
-
description:
str= Noneο A description of the task, including the population to which the task pertains to.
-
direct_numeric_qa:
DirectNumericQA= Noneο The direct numeric question and answer interface for this task.
-
features:
list[str]ο The names of the features used in the task.
- get_row_description(row)[source]ο
Encode a description of a given data row in textual form.
- Return type:
str
- get_target()[source]ο
Resolves the name of the target column depending on self.target_threshold.
- Return type:
str
- classmethod get_task(name, use_numeric_qa=False)[source]ο
Fetches a previously created task by its name.
- Parameters:
name (str) β The name of the task to fetch.
use_numeric_qa (bool, optional) β Whether to set the retrieved task to use verbalized numeric Q&A instead of the default multiple-choice Q&A prompts. Default is False.
- Returns:
task β The task object with the given name.
- Return type:
- Raises:
ValueError β Raised if the task with the given name has not been created yet.
-
multiple_choice_qa:
MultipleChoiceQA= Noneο The multiple-choice question and answer interface for this task.
-
name:
strο The name of the task.
- property question: QAInterfaceο
Getter for the Q&A interface for this task.
-
sensitive_attribute:
str= Noneο The name of the column used as the sensitive attribute data (if provided).
- sensitive_attribute_value_map()[source]ο
Returns a mapping between sensitive attribute values and their descriptions.
- Return type:
Callable
-
target:
strο The name of the target column.
-
target_threshold:
Threshold= Noneο The threshold used to binarize the target column (if provided).
- property use_cot_qa: boolο
Getter for whether to use chain-of-thought (CoT) Q&A prompts.
- property use_numeric_qa: boolο
Getter for whether to use numeric Q&A instead of multiple-choice Q&A prompts.
folktexts.threshold moduleο
Helper function for defining binarization thresholds.
- class folktexts.threshold.Threshold(value, op)[source]ο
Bases:
objectA class to represent a threshold value and its comparison operator.
- valueο
The threshold value to compare against.
- Type:
float | int
- opο
The comparison operator to use. One of β>β, β<β, β>=β, β<=β, β==β, β!=β.
- Type:
str
- apply_to_column_data(data)[source]ο
Applies the threshold operation to a pandas Series or scalar value.
- Return type:
int|Series
- apply_to_column_name(column_name)[source]ο
Standardizes naming of thresholded columns.
- Return type:
str
-
op:
strο
-
valid_ops:
ClassVar[dict] = {'!=': <built-in function ne>, '<': <built-in function lt>, '<=': <built-in function le>, '==': <built-in function eq>, '>': <built-in function gt>, '>=': <built-in function ge>}ο
-
value:
float|intο