github.com/wbrown/gpt_bpe@v0.0.0-20250709161131-1571a6e8ad2d/resources/data/pile-tokenizer (about)

encoder.json
specials.txt
unitrim.json
vocab.bpe