Dutch Medical NER with BERT - domain specific difficulties

The subject of my thesis is 'dutch named entity recognition using BERT'. This means that I will have to do entity extraction on dutch clinical notes using BERT. The main obstacle I see here is that, since there are only 2 dutch BERT models that were pre-trained on a dutch corpus of books/news text, my guess is that these models will perform rather poorly for (dutch) clinical notes. These notes are full of medical jargon, acronyms, shorthand notation, misspellings, sentence fragments and high terminological variation. I also don't know how this will play out with the word piece embeddings. Comments/remark are much appreciated!


