AI for Northeast India’s Languages

We build tools that understand Northeast India’s regional languages, in text and voice for education, governance, and everyday use.

Rooted in Context, Built for the World

At MWirelabs, we advance open research in AI with a focus on inclusivity, safety, and cultural context. Our work begins with low-resource languages of Northeast India and scales toward universal accessibility. We build the complete AI stack: language models, speech, vision, and deployment-ready tools, ensuring that technology serves every community in their own voice.

Languages First

From Khasi to Garo to Mizo and Assamese, our models are trained from with deep linguistic and cultural sensitivity, not adapted from English-first systems. We preserve the grammatical structures, cultural context, and nuances that make each language unique.

Efficient and Accessible

Optimized for real-world use: responsive, lightweight, and capable of running in resource-constrained environments across Northeast India. Our models work where connectivity is limited and computing resources are precious.

Research-Driven and Transparent

Every release is grounded in rigorous testing, published research, and open-source transparency. We contribute to AIkosh, share datasets on Kaggle and HuggingFace, and publish our findings so the entire AI community can build on our work.

Models Published

0 +

Northeast India Languages

0 +

Multilingual Northeast Datasets

0 M+

Our Foundation Models

We create and open-source multilingual foundational models designed to support research, education, and practical applications across Northeast India.

Kren-M

Generative Language Model

Kren-M is a bilingual (Khasi–English) language model developed through extensive continued pre-training and supervised fine-tuning of Gemma 2 (2B). Specifically designed for the Khasi, a low-resource Austroasiatic language spoken in Meghalaya, Northeast India, while retaining English fluency from its base model.

~2.6B params

NE-BERT

Multilingual Foundation Model

NE-BERT: A regional state-of-the-art open-source model for 9 Northeast Indian languages. Built on ModernBERT for superior speed and accuracy in low-resource NLP.

~149M params

NE-OCR

Text Recognition Model

NE-OCR is a unified OCR model for 10 Northeast Indian languages across 12 language-script pairs and 4 scripts, along with Hindi and English anchors. Developed on the ViTSTR-Base backbone (86M parameters) using 1.34 million text-image pairs from native corpora, it delivers 94.99% mean Character Accuracy and the fastest inference speed of 17.2 ms/image.

~86M params

View all models

From Research to Real-World Impact

Our models power real AI solutions across government departments, NGOs, and enterprises in Northeast India-from multilingual chatbots to document processing systems that serve communities in their native languages.

Government Automation

AI chatbots and workflow automation for citizen services in local languages. Deployed across multiple departments for seamless multilingual communication.

Document Intelligence

Multilingual form extraction, classification, and processing for government departments and enterprises. Understands cultural context and regional document formats.

Offline AI Solutions

On-premise language models (Kren-M 7B/9B) for sensitive data and offline deployments. Complete control, zero cloud dependency.

Be Part of Northeast India's AI Revolution

Research Collaboration

Access our models, datasets, and technical documentation. Collaborate on cutting-edge NLP research for low-resource languages.

Startup Support

Build AI-driven solutions with our open-source models. Get technical guidance and integration support for your applications.

Government & NGO Partnerships

Deploy proven AI solutions for citizen services, automation, and multilingual communication at scale.

Developer Ecosystem

Integrate KREN and NE-BERT APIs into your applications. Commercial licensing and custom development available.

Built in the Open, For Everyone

All our foundational models are open-source on HuggingFace. We believe language AI should be accessible to researchers, developers, and communities across Northeast India and beyond. Our datasets are published on Kaggle, our research is transparent, and we actively contribute to AIkosh, India’s National AI Repository.

Our team works across Northeast India, building AI from authentic community data.

Join Us in Building Inclusive AI

MWirelabs invites researchers, educators, and developers to collaborate in shaping technology that reflects the world’s diversity.