The model behind the BgGPT chat is now published

March 3, 2024

At INSAIT we are delighted to release BgGPT-7B-Instruct-v0.2, the model used behind the BgGPT chat app: https://chat.bggpt.ai. This model, part of the BgGPT series of models, is an improved version of the one we released a couple of weeks ago. BGGPT-7B-Instruct-v0.2 is still a 7B model, which is very fast for text generation and can run on most recent personal computers. It also comes with a permissive and commercial-friendly Apache 2.0 licence. The model is based on Mistral-7B, but was trained on significant amounts of data, and combined with other advances (to be published in research conferences), can outperform much larger models on Bulgarian tasks. The training costs of BGGPT-7B-Instruct-v0.2 come entirely from private funds and donations. Please see the blog post for BGGPT-7B-Instruct-v0.1 we released earlier.

BgGPT Success Story

In only 2 weeks, BGGPT-7B-Instruct-v0.1 has already been adopted by various companies who remarked that with only few hours of work and low computation and financial resources for fine-tuning, it can reach the performance of GPT-4 on a particular task in Bulgarian.

Evaluation & Benchmarks

As with many other language models, we evaluate on a set of standard benchmarks translated to Bulgarian as well as on English benchmarks:

These benchmarks test the logical reasoning, math, knowledge, language understanding and other skills of the model.

Evaluation Results

The following graphs show the performance of BgGPT-7B-Instruct-v0.2. It outperforms same-sized models on Bulgarian benchmarks, including improving upon the previous version of BgGPT-7B (BGGPT-7B-Instruct-v0.1). It also outperformed the much larger Mixtral-8x7B-Instruct-v0.1 on Bulgarian benchmarks. It also did not lose English skills and on some is comparable or better than the models of Google’s Gemma-7B, Mistral-7B, Llama-7B and others.

Outlook

Note that while the model is quite competitive to free open-source models, and especially for its size, it is still not on the level of paid commercial offerings. Yet, even at the current level, it can be useful for many applications.

References

  1. Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021.
  2. Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? https://arxiv.org/abs/1905.07830
  3. Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. https://arxiv.org/abs/1803.05457
  4. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. https://arxiv.org/abs/2009.03300
  5. Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. MathQA: Towards interpretable math word problem solving with operation-based formalisms https://arxiv.org/abs/1905.13319
  6. Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. https://arxiv.org/abs/2110.14168
  7. Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. https://arxiv.org/abs/1705.03551
  8. Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Veselin Stoyanov, Ivan Koychev, Preslav Nakov, and Dragomir Radev. bgGLUE: A Bulgarian general language understanding evaluation benchmark. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8733–8759 https://bgglue.github.io/