CV¶
Employment¶
2025–present
Postdoc, Aarhus University
Research on continuous development and evaluation of language models
Research on continuous development and evaluation of language models
Education¶
2021–2024
PhD, Aarhus University
Center for Humanities Computing, in collaboration with Quantitative Genomics Group and Aarhus University Hospital
Main supervisor: Kristoffer Nielbo · Co-supervisors: Doug Speed, Andreas Danielsen
Research stays: UCLA (2023, Prof. Vwani Roychowdhury), UC Berkeley (2023, Prof. Tim Tangherlini)
Center for Humanities Computing, in collaboration with Quantitative Genomics Group and Aarhus University Hospital
Main supervisor: Kristoffer Nielbo · Co-supervisors: Doug Speed, Andreas Danielsen
Research stays: UCLA (2023, Prof. Vwani Roychowdhury), UC Berkeley (2023, Prof. Tim Tangherlini)
2016–2022
BSc & MSc Cognitive Science, Aarhus University
Elective: Mathematics · GPA: 11.67/12.00
Elective: Mathematics · GPA: 11.67/12.00
Professional Experience¶
2024–2025
Research Assistant, Aarhus University
Teaching and research in Natural Language Processing at Cognitive Science
Teaching and research in Natural Language Processing at Cognitive Science
2018–2022
Instructor, Aarhus University
Natural Language Processing, Computational Modelling, and Experimental Methods at Cognitive Science
Topics: GLM, GLMM, Bayesian modelling, R, Python, HPC, NLP, cognitive modelling
Natural Language Processing, Computational Modelling, and Experimental Methods at Cognitive Science
Topics: GLM, GLMM, Bayesian modelling, R, Python, HPC, NLP, cognitive modelling
2018–2021
Student Developer, Center for Humanities Computing Aarhus
HPC, NLP, and information extraction
HPC, NLP, and information extraction
2017–2020
Junior Consultant, JHN Processor
Data management, data collection, economics, and user experience
Data management, data collection, economics, and user experience
Funding¶
2022–2025
Multiple Grants, Danish E-Infrastructure Cooperation
>300,000 GPU core hours and >1,000,000 CPU core hours
Case numbers: DeiC-AU-N5-2024079, DeiC-AU-N1-2025144, DeiC-KU-N5-2025117, H2-2023-15, H2-2023-16, 2022-H2-11
>300,000 GPU core hours and >1,000,000 CPU core hours
Case numbers: DeiC-AU-N5-2024079, DeiC-AU-N1-2025144, DeiC-KU-N5-2025117, H2-2023-15, H2-2023-16, 2022-H2-11
Counseling¶
2023-2024
The Danish Agency for Digital Governance (Digitaliseringsstyrelsen)
Invited presentations and counselling on current limitations and opportunities of language technology for Danish
Invited presentations and counselling on current limitations and opportunities of language technology for Danish
Supervision¶
2025–present
Jakob Grøhn Damgaard
PhD Co-supervisor
PhD Co-supervisor
2025
Anton Drasbæk
Master's thesis supervisor
Master's thesis supervisor
2025
Jørgen Højlund Wibe
Master's thesis supervisor
Master's thesis supervisor
2023
Emil Jessen
Master's thesis supervisor
Master's thesis supervisor
Open-source Projects¶
Selected open source projects
2025-present
Danish Dynaword
The largest corpus of open-source Danish text data · Core developer and maintainer
The largest corpus of open-source Danish text data · Core developer and maintainer
2024-present
Massive Multilingual Embedding Benchmark (MTEB)
The de-facto Python package and benchmark for evaluating text and image embedding models across languages and use cases · Core developer and maintainer
The de-facto Python package and benchmark for evaluating text and image embedding models across languages and use cases · Core developer and maintainer
2024-2025
Scandinavian Embedding Benchmark
The de-facto Benchmark for estimating the quality of Scandinavian embedding model. Later merged into MTEB · Core developer and maintainer
The de-facto Benchmark for estimating the quality of Scandinavian embedding model. Later merged into MTEB · Core developer and maintainer
2023-present
Augmenty
A Python package for text augmentation with use cases in bias detection, evaluating model robustness, and improving model performance · Core developer and maintainer
A Python package for text augmentation with use cases in bias detection, evaluating model robustness, and improving model performance · Core developer and maintainer
2023-present
timeseriesflattener
A package for converting irregularly spaced time series, such as electronic health records, into statically shaped data frames · Initial developer, maintained by others
A package for converting irregularly spaced time series, such as electronic health records, into statically shaped data frames · Initial developer, maintained by others
2022-present
TextDescriptives
A package for extracting text features such as dependency dynamics and metrics of text quality · Co-developer and maintainer
A package for extracting text features such as dependency dynamics and metrics of text quality · Co-developer and maintainer
2022-present
Tomsup 👍
Theory of Mind Simulation using Python · Agent-based simulation implementing variational recursive k-ToM · Core developer and maintainer
Theory of Mind Simulation using Python · Agent-based simulation implementing variational recursive k-ToM · Core developer and maintainer
2022-present
UD_Danish-DDT
The Danish Universal Dependencies Treebank, a high quality linguistic resource · Maintainer
The Danish Universal Dependencies Treebank, a high quality linguistic resource · Maintainer
2021-present
DaCy
State-of-the-art Danish NLP · POS tagging (98.37 acc), NER (84.39 F1), dependency parsing (88.44 LAS) on DDT and DaNE · Core developer and maintainer
State-of-the-art Danish NLP · POS tagging (98.37 acc), NER (84.39 F1), dependency parsing (88.44 LAS) on DDT and DaNE · Core developer and maintainer
2021-present
DANSK
DANSK: Danish Annotations for NLP Specific TasKs is a dataset consisting of texts from diverse domains annotated for 18 entities. Actively used in EuroEval · Language resource
DANSK: Danish Annotations for NLP Specific TasKs is a dataset consisting of texts from diverse domains annotated for 18 entities. Actively used in EuroEval · Language resource
Open-source Contributions¶
Selected contributions
2025
spacy-lookup-data, Explosion
Added Danish Lexeme probabilities
Added Danish Lexeme probabilities
2024
datasets, Huggingface
Fixes for compatibility issue with numpy >=2.0.0
Fixes for compatibility issue with numpy >=2.0.0
2024
curated-transformers, Explosion
Added support for ELECTRA models
Added support for ELECTRA models
2024
spacy-curated-transformers, Explosion
Added support for ELECTRA tokenizers
Added support for ELECTRA tokenizers
2023
confection, Explosion
Fixed issue where config where could not be filled
Fixed issue where config where could not be filled
2023
curated-transformers, Explosion
Added support for ELECTRA models
Added support for ELECTRA models
2022
transformers, Huggingface
Bugfixes for training masked language models using flax
Bugfixes for training masked language models using flax
2021
spacy-transformers, Explosion
Allow passing arguments to the transformer backend to obtain attention weights
Allow passing arguments to the transformer backend to obtain attention weights