Towards Personalized Bangla Book Recommendation: A Large-Scale Heterogeneous Book Graph Dataset

arXiv:2602.12129v2 Announce Type: replace-cross Abstract: Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through several relation types and organized as a comprehensive knowledge graph. To
The proliferation of AI models demands high-quality, domain-specific datasets, particularly for low-resource languages, making this dataset's introduction timely for expanding AI accessibility.
This development addresses a critical gap in data availability for personalized AI in a low-resource language, potentially fostering local AI innovation and reducing reliance on global AI stacks trained on dominant languages.
The existence of this large-scale Bengali book dataset enables the development of more accurate and culturally relevant recommendation systems and AI applications for Bangla speakers, previously hindered by a lack of structured data.
- · Bangladeshi AI developers
- · Bangla-speaking consumers
- · NLP researchers in low-resource languages
- · Global tech companies without localized data strategies
- · Generic recommendation systems
Increased research and development in personalized AI for Bangla.
Emergence of more culturally nuanced AI services and applications tailored for the Bangla-speaking population.
Potential for sovereign AI initiatives in Bangladesh, building on localized data infrastructure and models.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG