
Companion to Born et al. 2019. Includes our cleaned dataset and code samples which replicate the published results. Includes code snippets for computing frequency and PMI of proto-Elamite n-grams; hierarchical clusterings of signs based on their distributional properties; and visualizations of an LDA topic model trained on the proto-Elamite corpus.


Repository hosting raw data extracted from the proto-Elamite and proto-cuneiform corpora. Contents is currently limited to n-gram frequencies; pending updates include entropy values and pre-trained word embeddings.

n-Gram viewer

Online interface for exploring the data hosted in the pe-pc-datasets repository.