Software
At Aarhus NLP we regularly engage in developing software, primarily for research or educational purposes. The following Python packages were either partly or entirely developed by our group.
![]() |
MTEB | Evaluation toolkit for text and image embeddings, including model implementations, datasets and various benchmarks. |
| Danish Dynaword | The Danish dynaword is a ever-expanding collection of permissible licensed Danish free-form text datasets from various domains | |
| Scandinavian Embedding Benchmark | A Scandinavian Benchmark for evaluating document embeddings | |
| EuroEval | An evaluation benchmark for the Scandinavian and Germanic language models evaluating natural language understanding and generation. | |
| Turftopic | A unified framework for topic modelling with transformer models. | |
| stormtrooper | Zero and few shot learning with Large Language Models | |
| topicwizard | Model agnostic, interactive topic model interpretation framework. | |
| DaCy | The State of the Art Danish NLP pipeline for SpaCy | |
| OdyCy | General Purpose NLP pipelines for Ancient Greek | |
| TextDescriptives | A Python library for calculating a large variety of metrics from text | |
| embedding-explorer | Interactively explore your embeddings with semantic graphs and clustering. | |
| neofuzz | Blazing fast fuzzy and semantic text search with the power of machine learning. | |
| Augmenty | An structured augmentation library for augmenting both the texts and the annotations | |
| Asent | An educational library for performing transparent sentiment analysis | |
| tweetopic | Blazing Fast implementations of short-text topic models. | |
| glovpy | The fastest and lightest Python package for training GloVe word embeddings |
