Sync with data tooling repo, using edugp/kenlm models, updating viz to use quantiles for coloring and ad-hoc viz for the registry dataset 3c30fa3 edugp commited on Dec 9, 2021
Replicate default cc_net preprocessing at inference time on KenlmModel.get_perplexity 0def03f edugp commited on Nov 11, 2021
Add tests and fix issue when splitting into sentences, to grab the minimum number between total sentences and sample size, rather than total original documents and sample size d131aa3 edugp commited on Nov 9, 2021
Support visualizing both sentences and whole documents. Smooth down color assignment in visualization. a86046b edugp commited on Nov 4, 2021