Digging the Digital

Right now I am testing some LLM’s that have trainingsets specifically for the Dutch language. I can test them offline, on my own machine in the terminal. It’s extremely easy to try and test these models. And after some digging, I found the dataset on which it is based. The Gigacorpus with Dutch forumposts, books, law-texts, Wikipedia etcetera. It’s fascinating to see how so many researchers and enthousiasts are working on AI models that are private, local and open source. What a difference with the ongoing and growing hype we see with OpenAI and Californian Big Tech…

Terminal with a black background displaying a command promt

Digging the digital

Je reactie?