Software

At Aarhus NLP we regularly engage in developing software, primarily for research or educational purposes. The following Python packages were either partly or entirely developed by our group.


	MTEB	Evaluation toolkit for text and image embeddings, including model implementations, datasets and various benchmarks.
	Danish Dynaword	The Danish dynaword is a ever-expanding collection of permissible licensed Danish free-form text datasets from various domains
	Scandinavian Embedding Benchmark	A Scandinavian Benchmark for evaluating document embeddings
	EuroEval	An evaluation benchmark for the Scandinavian and Germanic language models evaluating natural language understanding and generation.
	Turftopic	A unified framework for topic modelling with transformer models.
	stormtrooper	Zero and few shot learning with Large Language Models
	topicwizard	Model agnostic, interactive topic model interpretation framework.
	DaCy	The State of the Art Danish NLP pipeline for SpaCy
	OdyCy	General Purpose NLP pipelines for Ancient Greek
	TextDescriptives	A Python library for calculating a large variety of metrics from text
	embedding-explorer	Interactively explore your embeddings with semantic graphs and clustering.
	neofuzz	Blazing fast fuzzy and semantic text search with the power of machine learning.
	Augmenty	An structured augmentation library for augmenting both the texts and the annotations
	Asent	An educational library for performing transparent sentiment analysis
	tweetopic	Blazing Fast implementations of short-text topic models.
	glovpy	The fastest and lightest Python package for training GloVe word embeddings