Welcome to folktexts’ documentation!

The folktexts package enables you to benchmark and evaluate LLM-generated risk scores.

We encode unrealizable tabular prediction tasks as natural language text tasks, and prompt LLMs for the probability of a target variable being true. The correct solutions for each task often require expressing uncertainty, as the target variable is not uniquely determined by the input features.

Folktexts is compatible with any huggingface transformer model and models available through web APIs (e.g., OpenAI API).

Five tabular data tasks are provided out-of-the-box, using the American Community Survey as a data source: ACSIncome, ACSMobility, ACSTravelTime, ACSEmployment, and ACSPublicCoverage. These tasks follow the same name, feature columns, and target columns as those put forth by Ding et al. (2021) in the folktables python package.

Full code available on the GitHub repository, including various jupyter notebook examples .

Check out the following sub-pages:

Citing

The folktexts package is the basis for the following publication:

@inproceedings{cruz2024evaluating,
   title={Evaluating language models as risk scores},
   author={Andr\'{e} F. Cruz and Moritz Hardt and Celestine Mendler-D\"{u}nner},
   booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
   year={2024},
   url={https://openreview.net/forum?id=qrZxL3Bto9}
}

All additional supplementary materials are available in the GitHub repository.

Indices