github.com/wbrown/gpt_bpe@v0.0.0-20250709161131-1571a6e8ad2d/resources/data/pile-tokenizer (about) encoder.json specials.txt unitrim.json vocab.bpe