Elasticsearch

Optimizing Multilingual LMS Search with Elasticsearch

Introduction

A multilingual LMS (Learning Management System) allows users to access content in different languages. This feature provides significant advantages for platforms with international members. Enabling users to access content in their language improves user experience, enhances inclusivity, and ensures content localization. While this feature greatly advances the project, it also introduces complex data structures. The amount of data to be stored increases proportionally with the number of supported languages. This blog explains the use of Elasticsearch in large-scale and multilingual EdTech projects.

Elasticsearch

Elasticsearch is a search engine written in Java. It uses the JSON data format to ensure compatibility with all languages. Performing search operations on large datasets using traditional SQL queries can reduce performance and may require root analysis, which is not feasible with classic SQL queries. For example, a user searching for the word “health” might expect results that include “healthfully” as well. In multilingual projects, the requirements are even more complex. For instance, when users search for the English word “health,” they might also expect results containing the Turkish word “sağlık.” For projects involving large multilingual datasets, Elasticsearch has become a necessity.

Challenges in Multilingual LMS

In a multilingual EdTech project, all content, including lessons, quizzes, courses, as well as interface menus, is multilingual. In other words, when creating a lesson, many details such as the name of the lesson, the texts on the screen, audio files, questions, answers, and feedback about the lesson are always multilingual. Although saving all this content in a database in a multilingual manner does not seem like a big problem, we will encounter reading and writing performance problems in the future. For this reason, the design of the main architecture is very important.
Even if the problem so far is solved somehow, the search process in lesson and course content is a real challenge. When users search for the same word in different languages, they should always reach the same lesson and course. It is almost impossible to do this in a way that is efficient with traditional SQL queries. Especially in a search with more than one word, searching with “like/ilike” is completely insufficient. And a huge challenge arises.

Capabilities and Features of Elasticsearch

Elasticsearch’s capabilities and features include full-text search, a distributed system, scalability, and communication via RESTful APIs. Elasticsearch first breaks down data into smaller pieces (words) to enable full-text search. It then converts these pieces to lowercase, filters out unnecessary words, and finally identifies the root of each word. In summary, it splits a sentence into words, converts all words to lowercase, filters out unnecessary words like “and” or “a,” and identifies the root of each word (e.g., “running” as “run”). These steps constitute Elasticsearch’s indexing logic. Since each language has unique characteristics, language-specific analyzers must be used. As the number of users and data grows, performance issues may arise. Elasticsearch addresses this problem with its distributed and scalable nature. For example, when traffic increases, the system can add additional servers to handle the load. Its distributed nature enables independent servers to communicate and function as a single system.

Implementing Elasticsearch in an LMS

To manage scalability more effectively, follow Docker and Docker Compose instructions to install Elasticsearch from the link below:
https://docs.docker.com/engine/install/ubuntu/ 

Run the command docker-compose up -d to start Docker.
Create a YAML file for the necessary configuration settings.

As mentioned earlier, create analyzers and stemmers to support multilingual functionality. This requires configuring the relevant settings in a JSON file.

The schema of the desired index should also be defined in the same JSON file. Once the JSON file is finalized, the index structure is created using the PUT method.

This JSON file instructs Elasticsearch on how to perform analysis and what the data structure should look like. Even if changes are made to the JSON file, the PUT method updates the existing index.

In the next step, the Elasticsearch package adds each new piece of data to the index using the client.create method. Then, the system uses the _reindex POST method to add this data to the initially configured index.

When modifying or deleting data, the system uses the client.update and client.delete methods.

For more detailed information on Elasticsearch’s methods and endpoints, refer to the following resource:
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/getting-started-js.html

Conclusion

At Mysoly E-Learn, Elasticsearch is an effective solution to enhance user experience and enable post-search tagging in our multilingual all-in-one digital school project. Users can now access the content they need more quickly and in their preferred language.

Mysoly | Your partner in digital!

Picture of Emir C.
Emir C.
Senior Backend Developer
Picture of Emir C.
Emir C.
Senior Backend Developer