AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
We introduce AtlasOCR, the first open-source Darija OCR model.
Read moreWe introduce AtlasOCR, the first open-source Darija OCR model.
Read moreHow we built a Ghibli-style text-to-image model that authentically represents Moroccan culture, using LoRA fine-tuning and a curated dataset.
Read moreWe present a comprehensive dataset for Moroccan darija, addressing the lack of resources for this widely spoken dialect. We detail our collection methodology, provide thorough data analysis, and demonstrate performance improvements in both masked and causal language models after training on this dataset.
Read moreWe introduce Darija Chatbot Arena, an innovative platform designed to facilitate the comparison of responses from various Large Language Models (LLMs) on a diverse set of prompts in Darija, the Moroccan Arabic dialect.
Read moreWe introduce TerjamaBench, an evaluation benchmark for English-Darija machine translation.
Read more