case study on recommendation system

Recommender Systems — A Complete Guide to Machine Learning Models

Leveraging data to help users discovering new content.

Francesco Casalegno

Francesco Casalegno

Towards Data Science

Recommender Systems: Why And How?

Recommender systems are algorithms providing personalized suggestions for items that are most relevant to each user. With the massive growth of available online contents, users have been inundated with choices. It is therefore crucial for web platforms to offer recommendations of items to each user, in order to increase user satisfaction and engagement.

The following list shows examples of well-known web platforms with a huge number of available contents , which need efficient recommender systems to keep users interested.

  • Youtube . Every minute people upload 500 hours of videos , i.e. it would take 82 years to a user to watch all videos uploaded just in the last hour.
  • Spotify . Users can listen to ore than 80 million song tracks and podcasts .
  • Amazon . Users can buy more than 350 million different products .

All these platforms use powerful machine learning models in order to generate relevant recommendations for each user.

Explicit Feedback vs. Implicit Feedback

In recommender systems, machine learning models are used to predict the rating rᵤᵢ of a user u on an item i . At inference time, we recommend to each user u the items l having highest predicted rating rᵤ ᵢ .

We therefore need to collect user feedback, so that we can have a ground truth for training and evaluating our models. An important distinction has to be made here between explicit feedback and implicit feedback .

Explicit feedback is a rating explicitly given by the user to express their satisfaction with an item. Examples are: number of stars on a scale from 1 to 5 given after buying a product, thumb up/down given after watching a video, etc. This feedback provides detailed information on how much a user liked an item, but it is hard to collect as most users typically don’t write reviews or give explicit ratings for each item they purchase.

Implicit feedback , on the other hand, assume that user-item interactions are an indication of preferences. Examples are: purchases/browsing history of a user, list of songs played by a user, etc. This feedback is extremely abundant , but at the same time it is less detailed and more noisy (e.g. someone may buy a product as a present for someone else). However, this noise becomes negligible when compared to the sheer size of available data of this kind, and most modern Recommender Systems tend to rely on implicit feedback .

Once we have collected explicit or implicit feedbacks, we can create the user-item rating matrix rᵤᵢ . For explicit feedback, each entry in rᵤᵢ is a numerical value—e.g. rᵤᵢ = “stars given by u to movie i ”—or “?” if user u did not rate item i . For implicit feedback, the values in rᵤᵢ are a boolean values representing presence or lack of interaction—e.g. rᵤᵢ = “did user u watch movie i ?”. Notice that the matrix rᵤᵢ is very sparse, as users interact with few items among all available contents, and they review even fewer items!

Content-Based vs. Collaborative Filtering Approaches

Recommender system can be classified according to the kind of information used to predict user preferences as Content-Based or Collaborative Filtering.

Content-Based Approach

Content-based methods describe users and items by their known metadata . Each item i is represented by a set of relevant tags—e.g. movies of the IMDb platform can be tagged as “action”, “comedy”, etc. Each user u is represented by a user profile, which can created from known user information—e.g. sex and age—or from the user’s past activity.

To train a Machine Learning model with this approach we can use a k-NN model . For instance, if we know that user u bought an item i , we can recommend to u the available items with features most similar to i .

The advantage of this approach is that items metadata are known in advance, so we can also apply it to Cold-Start scenarios where a new item or user is added to the platform and we don’t have user-item interactions to train our model. The disadvantages are that we don’t use the full set of known user-item interactions (each user is treated independently), and that we need to know metadata information for each item and user.

Collaborative Filtering Approach

Collaborative filtering methods do not use item or user metadata, but try instead to leverage the feedbacks or activity history of all users in order to predict the rating of a user on a given item by inferring interdependencies between users and items from the observed activities.

To train a Machine Learning model with this approach we typically try to cluster or factorize the rating matrix rᵤᵢ in order to make predictions on the unobserved pairs ( u,i ), i.e. where rᵤᵢ = “?”. In the following of this article we present the Matrix Factorization algorithm , which is the most popular method of this class.

The advantage of this approach is that the whole set of user-item interactions (i.e. the matrix rᵤᵢ ) is used, which typically allows to obtain higher accuracy than using Content-Based models. The disadvantage of this approach is that it requires to have a few user interactions before the model can be fitted.

Hybrid Approaches

Finally, there are also hybrid methods that try to use both the known metadata and the set of observed user-item interactions. This approach combines advantages of both Content-Based and Collaborative Filtering methods, and allow to obtain the best results. Later in this article we present LightFM , which is the most popular algorithm of this class of methods.

Collaborative Filtering: Matrix Factorization

Matrix factorization algorithms are probably the most popular and effective collaborative filtering methods for recommender systems. Matrix factorization is a latent factor model assuming that for each user u and item i there are latent vector representations pᵤ, qᵢ ∈ R ᶠ s.t. rᵤᵢ can be uniquely expressed— i.e. “factorized” — in terms of pᵤ and qᵢ . The Python library Surprise provides excellent implements of these methods.

Matrix Factorization for Explicit Feedback

The simplest idea is to model user-item interactions through a linear model . To learn the values of pᵤ and qᵢ , we can minimize a regularized MSE loss over the set K of pairs ( u , i ) for which rᵤᵢ is known. The algorithm so obtained is called probabilistic matrix factorization (PMF) .

The loss function can be minimized in two different ways. The first approach is to use stochastic gradient descent (SGD) . SGD is easy to implement, but it may have some issues because both pᵤ and qᵢ are both unknown and therefore the loss function is not convex. To solve this issue, we can alternatively fix the value pᵤ and qᵢ and obtain a convex linear regression problem that can be easily solved with ordinary least squares (OLS) . This second method is known as alternating least squares (ALS) and allows significant parallelization and speedup.

The PMF algorithm was later generalized by the singular value decomposition (SVD) algorithm, which introduced bias terms in the model. More specifically, bᵤ and bᵢ measure observed rating deviations of user u and item i , respectively, while μ is the overall average rating. These terms often explain most of the observed ratings rᵤᵢ , as some items widely receive better/worse ratings, and some users are consistently more/less generous with their ratings.

Matrix Factorization for Implicit Feedback

The SVD method can be adapted to implicit feedback datasets . The idea is to look at implicit feedback as an indirect measure of confidence . Let’s assume that the implicit feedback tᵤᵢ measures the percentage of movie i that user u has watched — e.g. tᵤᵢ = 0 means that u never watched i , tᵤᵢ = 0.1 means that he watched only 10% of it, tᵤᵢ = 2 means that he watched it twice. Intuitively, a user is more likely to be interested into a movie they watched twice, rather than in a movie they never watched. We therefore define a confidence matrix cᵤᵢ and a rating matrix rᵤᵢ as follows.

Then, we can model the observed rᵤᵢ using the same linear model used for SVD, but with a slightly different loss function. First, we compute the loss over all ( u , i ) pairs — unlike the explicit case, if user u never interacted with i we have rᵤᵢ = 0 instead of rᵤᵢ = “?” . Second, we weight each loss term by the confidence cᵤᵢ that u likes i.

Finally, the SVD++ algorithm can be used when we have access to both explicit and implicit feedbacks. This can be very useful, because typically users interact with many items (= implicit feedabck) but rate only a small subset of them (= explicit feedback). Let’s denote, for each user u , the set N(u) of items that u has interacted with. Then, we assume that an implicit interaction with an item j is associated with a new latent vector zⱼ ∈ R ᶠ . The SVD++ algorithm modifies the linear model of SVD by including into the user representation a weighted sum of these latent factors zⱼ.

Hybrid Approach: LightFM

Collaborative filtering methods based on matrix factorization often produce excellent results, but in cold-start scenarios —where little to no interaction data is available for new items and users—they cannot make good predictions because they lack data to estimate the latent factors. Hybrid approaches solve this issue by leveraging known item or user metadata in order to improve the matrix factorization model. The Python library LightFM implements one of the most popular hybrid algorithms.

In LightFM, we assume that for each user u we have collected a set of tag annotations Aᵁ(u) — e.g. “male” , “age < 30” , … — and similarly each item i has a set of annotations Aᴵ(i) — e.g. “price > 100 $” , “book” , … Then we model each user tag by a latent factor xᵁₐ ∈ R ᶠ and by a bias term bᵁₐ ∈ R , and we assume that the user vector representation pᵤ and its associated bias bᵤ can be expressed simply as the sum of these terms xᵁₐ and bᵁₐ , respectively. We take the same approach to item tags, using latent factors xᴵₐ ∈ Rᶠ and bias terms bᴵₐ ∈ R. Once we have defined pᵤ, qᵢ, bᵤ, bᵢ using these formulas, we can use the same linear model of SVD to describe the relationship between these terms and rᵤᵢ .

Notice that there are three interesting cases of this hybrid approach of LightFM.

  • Cold start. If we have a new item i with known tags Aᴵ(i) , then we can use the latent vectors xᴵₐ (obtained by fitted our model on the previous data) to compute its embedding qᵢ , and therefore estimate for any user u its rating rᵤᵢ .
  • No available tags. If we don’t have any known metadata for items or users, the only annotation we can use is an indicator function, i.e. a different annotation a for each user and each item. Then, user and item feature matrices are identity matrices, and LightFM reduces to a classical collaborative filtering method such as SVD.
  • Content-based vs. Hybrid. If we only used user or item tags without indicator annotations, LightFM would almost be a content-based model. So in practice, to leverage user-item interactions, we also add to known tags an indicator annotation a different from each user and item.

TL;DR – Conclusions

  • Recommender systems leverage machine learning algorithms to help users inundated with choices in discovering relevant contents.
  • Explicit vs. implicit feedback : the first is easier to leverage, but the second is way more abundant.
  • Content-based models work well in cold-start scenarios, but require to know user and item metadata .
  • Collaborative filtering models typically use matrix factorization: PMF, SVD, SVD for implicit feedback, SVD++.
  • Hybrid models take the best of content-based and collaborative filtering. LightFM is a great example of this approach.
  • Wikipedia, Recommender System .
  • “Surprise”, Python package documentation .
  • (S. Funk 2006), Netflix Update: Try This at Home.
  • (R. Salakhutdinov 2007), Probabilistic matrix factorization.
  • ( Y. Hu 2008), Collaborative Filtering for Implicit Feedback Datasets .
  • (Y. Koren 2009), Matrix Factorization Techniques for Recommender Systems .
  • (Y. Koren 2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model.
  • ( M. Kula 2015), Metadata Embeddings for User and Item Cold-start Recommendations .

Francesco Casalegno

Written by Francesco Casalegno

Machine Learning | Data Science | Statistics

Text to speech

A Survey on Modern Recommendation System based on Big Data

This survey provides an exhaustive exploration of the evolution and current state of recommendation systems, which have seen widespread integration in various web applications. It focuses on the advancement of personalized recommendation strategies for online products or services. We categorize recommendation techniques into four primary types: content-based, collaborative filtering-based, knowledge-based, and hybrid-based, each addressing unique scenarios. The survey offers a detailed examination of the historical context and the latest innovative approaches in recommendation systems, particularly those employing big data. Additionally, it identifies and discusses key challenges faced by modern recommendation systems, such as data sparsity, scalability issues, and the need for diversity in recommendations. The survey concludes by highlighting these challenges as potential areas for fruitful future research in the field.

1 Introduction

In this survey, we examine the escalating popularity and diverse application of recommendation systems in web applications, a topic extensively covered by Zhou et al. [ 1 ] . These systems, a specialized category of information filtering systems, are designed to predict user preferences for various items. They play a crucial role in guiding decision-making processes, such as purchasing decisions and music selections, as Wang et al. discuss [ 2 ] . A prime example of this application is Amazon’s personalized recommendation engine, which tailors each user’s homepage. Major companies like Amazon, YouTube, and Netflix employ these systems to enhance user experience and generate significant revenue, as noted by Adomavicius et al. and Omura et al. [ 3 , 4 ] . Figure 1 from Entezari et al. [ 5 ] illustrates a modern recommendation system. Additionally, these systems are increasingly relevant in the field of human-computer interaction (HCI), where they enhance interaction efficiency through feedback mechanisms, a topic explored in several studies [ 6 , 7 , 8 , 9 ] .

Recommendation systems are particularly crucial for certain companies, as their efficiency can lead to substantial revenue generation and competitive advantage, as evidenced in the research by Rismanto et al. and Cui et al. [ 10 , 11 ] . For instance, Netflix’s “Netflix Prize” challenge aimed to develop a recommender system surpassing their existing algorithm, with a substantial prize to incentivize innovation.

Refer to caption

Furthermore, in the domain of big data, recommendation systems are highly prevalent, as detailed by Li et al. [ 12 , 13 ] . These systems predict user interests in purchasing based on extensive data analysis, including purchase history, ratings, and reviews. There are four widely recognized types of recommendation systems, as identified by Numnonda [ 14 ] : content-based, collaborative filtering-based, knowledge-based, and hybrid-based, each with distinct advantages and drawbacks, as Xiao et al. elucidate [ 15 ] . For example, collaborative filtering-based systems may face issues such as data sparsity and scalability, as Huang et al. mention [ 16 ] , and cold-start problems, while content-based systems might struggle to diversify user interests, as noted by Zhang et al. and Benouaret et al. [ 17 , 18 ] .

This paper is organized as follows: Section II provides a comprehensive review of both historical and modern state-of-the-art approaches in recommendation systems, coupled with an in-depth analysis of the latest advancements in the field. Section III discusses the challenges in big data-based recommendation systems, including sparsity, scalability, and diversity, and explores solutions for these challenges. The paper concludes with a summary in Section IV.

2 Recommendation Systems

Recommendation systems aim to predict users’ preferences for a certain item and provide personalized services [ 19 ] . This section will discuss several commonly used recommender methods, such as content-based method, collaborative filtering-based method, knowledge-based method, and hybrid-based method.

2.1 Content-based Recommendation Systems

The main idea of content-based recommenders is to recommend items based on the similarity between different users or items [ 20 ] . This algorithm determines and differentiates the main common attributes of a particular user’s favorite items by analyzing the descriptions of those items. Then, these preferences are stored in this user’s profile. The algorithm then recommends items with a higher degree of similarity with the user’s profile. Besides, content-based recommendation systems can capture the specific interests of the user and can recommend rare items that are of little interest to other users. However, since the feature representations of items are designed manually to a certain extent, this method requires a lot of domain knowledge. In addition, content-based recommendation systems can only recommend based on users’ existing interests, so the ability to expand users’ existing interests is limited.

Refer to caption

2.2 Collaborative Filtering-based Recommendation Systems

Collaborative Filtering-based (CF) methods are primarily used in big data processing platforms due to their parallelization characteristics [ 21 ] . The basic principle of the recommendation system based on collaborative filtering is shown in Fig. 2 [ 22 ] . CF recommendation systems use the behavior of a group of users to recommend to other users [ 23 ] . There are mainly two types of collaborative filtering techniques, which are user-based and item-based.

User-based CF: In the user-based CF recommendation system, users will receive recommendations of products that similar users like [ 24 ] . Many similarity metrics can calculate the similarity between users or items, such as Constrained Pearson Correlation coefficient (CPC), cosine similarity, adjusted cosine similarity, etc. For example, cosine similarity is a measure of similarity between two vectors. Let x 𝑥 x italic_x and y 𝑦 y italic_y denote two vectors, cosine similarity between x 𝑥 x italic_x and y 𝑦 y italic_y can be represented by

(1)
(2)

Item-based CF: Item-based CF algorithm predicts user ratings for items based on item similarity. Generally, item-based CF yields better results than user-based CF because user-based CF suffers from sparsity and scalability issues. However, both user-based CF and item-based CF may suffer from cold-start problems [ 25 ] .

Recommendation
Systems
Recommend items based on
the similarity between
different items.
Musto et al. ]
Volkovs et al. ]
Mittal et al. ]
Almaguer et al. ]
Collaborative
Filtering-based
Recommend items to some
users based on the other
users behavior.
Zhang et al. ]
Bobadilla et al. ]
Bobadilla et al. ]
Rezaimehr et al. ]
Recommend items to users
based on basic knowledge of
users, items, and relationships
between items.
Dong et al. ]
Gazdar et al. ]
Alamdari et al. ]
Cena et al. ]
Recommend items to users
based on more than one
filtering approach.
Hrnjica et al. ]
Shokeen et al. ]
Zagra et al. ]
Ibrahim et al. ]

2.3 Knowledge-based Recommendation Systems

The main idea of knowledge-based recommendation systems is to recommend items to users based on basic knowledge of users, items, and relationships between items [ 41 , 42 ] . Since knowledge-based recommendation systems do not require user ratings or purchase history, there is no cold start problem for this type of recommendation [ 43 ] . Knowledge-based recommendation systems are well suited for complex domains where items are not frequently purchased, such as cars and apartments [ 44 ] . But the acquisition of required domain knowledge can become a bottleneck for this recommendation technique [ 33 ] .

2.4 Hybrid-based Recommendation Systems

Hybrid-based recommendation systems combine the advantages of multiple recommendation techniques and aim to overcome the potential weaknesses in traditional recommendation systems [ 45 ] . There are seven basic hybrid recommendation techniques [ 40 ] : weighted, mixed, switching, feature combination, feature augmentation, cascade, and meta-level methods [ 46 , 47 ] . Among all of these methods, the most commonly used is the combination of the CF recommendation methods with other recommendation methods (such as content-based or knowledge-based) to avoid sparsity, scalability, and cold-start problems [ 37 , 39 , 48 ] .

2.5 Challenges in Modern Recommendation Systems

Sparsity. As we know, the usage of recommendation systems is growing rapidly. Many commercial recommendation systems use large datasets, and the user-item matrix used for filtering may be very large and sparse. Therefore, the performance of the recommendation process may be degraded due to the cold start problems caused by data sparsity [ 49 ] .

Scalability. Traditional algorithms will face scalability issues as the number of users and items increases. Assuming there are millions of customers and millions of items, the algorithm’s complexity will be too large. However, recommendation systems must respond to the user’s needs immediately, regardless of the user’s rating history and purchase situation, which requires high scalability. For example, Twitter is a large web company that uses clusters of machines to scale recommendations for its millions of users [ 38 ] .

Diversity. Recommendation systems also need to increase diversity to help users discover new items. Unfortunately, some traditional algorithms may accidentally do the opposite because they always recommend popular and highly-rated items that some specific users love. Therefore, new hybrid methods need to be developed to improve the performance of the recommendation systems [ 50 ] .

3 Recommendation System based on Big Data

Big data refers to the massive, high growth rate and diversified information [ 51 , 52 ] . It requires new processing models to have stronger decision-making and process optimization capabilities [ 53 ] . Big data has its unique “4V” characteristics, as shown in Fig. 3 [ 54 ] : Volume, Variety, Velocity, and Veracity.

Refer to caption

3.1 Big Data Processing Flow

Big data comes from many sources, and there are many methods to process it [ 55 ] . However, the primary processing of big data can be divided into four steps [ 56 ] . Besides, Fig. 4 presents the basic flow of big data processing.

Data Collection.

Data Processing and Integration. The collection terminal itself already has a data repository, but it cannot accurately analyze the data. The received information needs to be pre-processed [ 57 ] .

Data Analysis. In this process, these initial data are always deeply analyzed using cloud computing technology [ 58 ] .

Data Interpretation.

Refer to caption

3.2 Modern Recommendation Systems based on the Big Data

The shortcomings of traditional recommendation systems mainly focus on insufficient scalability and parallelism [ 59 ] . For small-scale recommendation tasks, a single desktop computer is sufficient for data mining goals, and many techniques are designed for this type of problems [ 60 ] .

Refer to caption

However, the rating data is usually so large for medium-scale recommendation systems that it is impossible to load all the data into memory at once [ 61 ] . Common solutions are based on parallel computing or collective mining, sampling and aggregating data from different sources, and using parallel computing programming to perform the mining process [ 62 ] . The big data processing framework will rely on cluster computers with high-performance computing platforms [ 63 ] . At the same time, data mining tasks will be deployed on a large number of computing nodes (i.e., clusters) by running some parallel programming tools [ 64 ] , such as MapReduce [ 52 , 65 ] . For example, Fig. 5 is the MapReduce in the Recommendation Systems.

In recent years, various big data platforms have emerged [ 66 ] . For example, Hadoop and Spark [ 52 ] , both developed by the Apache Software Foundation, are widely used open-source frameworks for big data architectures [ 52 , 67 ] . Each framework contains an extensive ecosystem of open-source technologies that prepare, process, manage and analyze big data sets [ 68 ] . For example, Fig. 6 is the ecosystem of Apache Hadoop [ 69 ] .

Refer to caption

Hadoop allows users to manage big data sets by enabling a network of computers (or “nodes”) to solve vast and intricate data problems. It is a highly scalable, cost-effective solution that stores and processes structured, semi-structured and unstructured data.

Spark is a data processing engine for big data sets. Like Hadoop, Spark splits up large tasks across different nodes. However, it tends to perform faster than Hadoop, and it uses random access memory (RAM) to cache and process data instead of a file system. This enables Spark to handle use cases that Hadoop cannot. The following are some benefits of the Spark framework:

It is a unified engine that supports SQL queries, streaming data, machine learning (ML), and graph processing.

It can be 100x faster than Hadoop for smaller workloads via in-memory processing, disk data storage, etc.

It has APIs designed for ease of use when manipulating semi-structured data and transforming data.

Refer to caption

Furthermore, Spark is fully compatible with the Hadoop eco-system and works smoothly with Hadoop Distributed File System (HDFS), Apache Hive, and others. Thus, when the data size is too big for Spark to handle in-memory, Hadoop can help overcome that hurdle via its HDFS functionality. Fig. 7 is a visual example of how Spark and Hadoop can work together. Fig. 8 is the the architecture of the modern recommendation system based on Spark.

Refer to caption

Recommendation systems have become very popular in recent years and are used in various web applications. Modern recommendation systems aim at providing users with personalized recommendations of online products or services. Various recommendation techniques, such as content-based, collaborative filtering-based, knowledge-based, and hybrid-based recommendation systems, have been developed to fulfill the needs in different scenarios.

This paper presents a comprehensive review of historical and recent state-of-the-art recommendation approaches, followed by an in-depth analysis of groundbreaking advances in modern recommendation systems based on big data. Furthermore, this paper reviews the issues faced in modern recommendation systems such as sparsity, scalability, and diversity and illustrates how these challenges can be transformed into prolific future research avenues.

  • [1] F. Zhou, B. Luo, T. Hu, Z. Chen, and Y. Wen, “A combinatorial recommendation system framework based on deep reinforcement learning,” in 2021 IEEE International Conference on Big Data (Big Data) .   IEEE, 2021, pp. 5733–5740.
  • [2] H. Wang, N. Lou, and Z. Chao, “A personalized movie recommendation system based on lstm-cnn,” in 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) .   IEEE, 2020, pp. 485–490.
  • [3] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE transactions on knowledge and data engineering , vol. 17, no. 6, pp. 734–749, 2005.
  • [4] T. Omura, K. Suzuki, P. Siriaraya, M. Mittal, Y. Kawai, and S. Nakajima, “Ad recommendation utilizing user behavior in the physical space to represent their latent interest,” in 2020 IEEE International Conference on Big Data (Big Data) .   IEEE, 2020, pp. 3143–3146.
  • [5] N. Entezari, E. E. Papalexakis, H. Wang, S. Rao, and S. K. Prasad, “Tensor-based complementary product recommendation,” in 2021 IEEE International Conference on Big Data (Big Data) .   IEEE, 2021, pp. 409–415.
  • [6] F. Ali, D. Kwak, P. Khan, S. H. A. Ei-Sappagh, S. M. R. Islam, D. Park, and K.-S. Kwak, “Merged ontology and svm-based information extraction and recommendation system for social robots,” IEEE Access , vol. 5, pp. 12 364–12 379, 2017.
  • [7] Y. Peng, W. Han, and Y. Ou, “Semantic segmentation model for road scene based on encoder-decoder structure,” in 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO) , 2019, pp. 1927–1932.
  • [8] X. Ma, G. Jiang, Y. Peng, T. Ma, C. Liu, and Y.-s. Ou, “An intelligent speed-suggestion planner for coverage path with multiple constraints,” in 2021 IEEE International Conference on Real-time Computing and Robotics (RCAR) , 2021, pp. 1213–1218.
  • [9] Y. Peng, Y. Ou, and W. Feng, “Learning stable control for a wheeled inverted pendulum with fast adaptive neural network,” in 2020 IEEE International Conference on Real-time Computing and Robotics (RCAR) , 2020, pp. 227–232.
  • [10] R. Rismanto, A. R. Syulistyo, and B. P. C. Agusta, “Research supervisor recommendation system based on topic conformity.” International Journal of Modern Education & Computer Science , vol. 12, no. 1, 2020.
  • [11] Z. Cui, X. Xu, X. Fei, X. Cai, Y. Cao, W. Zhang, and J. Chen, “Personalized recommendation system based on collaborative filtering for iot scenarios,” IEEE Transactions on Services Computing , vol. 13, no. 4, pp. 685–695, 2020.
  • [12] B. Li, A. Maalla, and M. Liang, “Research on recommendation algorithm based on e-commerce user behavior sequence,” in 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA) , vol. 2.   IEEE, 2021, pp. 914–918.
  • [13] X. Li and F. Sun, “Sports training recommendation method under the background of data analysis,” in 2021 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS) .   IEEE, 2021, pp. 12–16.
  • [14] T. Numnonda, “A real-time recommendation engine using lambda architecture,” Artificial Life and Robotics , vol. 23, no. 2, pp. 249–254, 2018.
  • [15] J. Xiao, M. Wang, B. Jiang, and J. Li, “A personalized recommendation system with combinational algorithm for online learning,” Journal of Ambient Intelligence and Humanized Computing , vol. 9, no. 3, pp. 667–677, 2018.
  • [16] Z. Huang, X. Xu, J. Ni, H. Zhu, and C. Wang, “Multimodal representation learning for recommendation in internet of things,” IEEE Internet of Things Journal , vol. 6, no. 6, pp. 10 675–10 685, 2019.
  • [17] H. Zhang, T. Huang, Z. Lv, S. Liu, and Z. Zhou, “Mcrs: A course recommendation system for moocs,” Multimedia Tools and Applications , vol. 77, no. 6, pp. 7051–7069, 2018.
  • [18] I. Benouaret and S. Amer-Yahia, “A comparative evaluation of top-n recommendation algorithms: Case study with total customers,” in 2020 IEEE International Conference on Big Data (Big Data) .   IEEE, 2020, pp. 4499–4508.
  • [19] B. Yi, X. Shen, H. Liu, Z. Zhang, W. Zhang, S. Liu, and N. Xiong, “Deep matrix factorization with implicit feedback embedding for recommendation system,” IEEE Transactions on Industrial Informatics , vol. 15, no. 8, pp. 4591–4601, 2019.
  • [20] P. Lops, M. d. Gemmis, and G. Semeraro, “Content-based recommender systems: State of the art and trends,” Recommender systems handbook , pp. 73–105, 2011.
  • [21] M. Elahi, F. Ricci, and N. Rubens, “A survey of active learning in collaborative filtering recommender systems,” Computer Science Review , vol. 20, pp. 29–50, 2016.
  • [22] B. Alhijawi and Y. Kilani, “Using genetic algorithms for measuring the similarity values between users in collaborative filtering recommender systems,” in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) .   IEEE, 2016, pp. 1–6.
  • [23] L. Al Hassanieh, C. Abou Jaoudeh, J. B. Abdo, and J. Demerjian, “Similarity measures for collaborative filtering recommender systems,” in 2018 IEEE Middle East and North Africa Communications Conference (MENACOMM) .   IEEE, 2018, pp. 1–5.
  • [24] F. Rezaimehr and C. Dadkhah, “A survey of attack detection approaches in collaborative filtering recommender systems,” Artificial Intelligence Review , vol. 54, no. 3, pp. 2011–2066, 2021.
  • [25] F. Zhang, T. Gong, V. E. Lee, G. Zhao, C. Rong, and G. Qu, “Fast algorithms to evaluate collaborative filtering recommender systems,” Knowledge-Based Systems , vol. 96, pp. 96–103, 2016.
  • [26] C. Musto, G. Semeraro, M. d. Gemmis, and P. Lops, “Learning word embeddings from wikipedia for content-based recommender systems,” in European conference on information retrieval .   Springer, 2016, pp. 729–734.
  • [27] M. Volkovs, G. W. Yu, and T. Poutanen, “Content-based neighbor models for cold start in recommender systems,” in Proceedings of the Recommender Systems Challenge 2017 , 2017, pp. 1–6.
  • [28] D. Mittal, S. Shandilya, D. Khirwar, and A. Bhise, “Smart billing using content-based recommender systems based on fingerprint,” in ICT Analysis and Applications .   Springer, 2020, pp. 85–93.
  • [29] Y. Pérez-Almaguer, R. Yera, A. A. Alzahrani, and L. Martínez, “Content-based group recommender systems: A general taxonomy and further improvements,” Expert Systems with Applications , vol. 184, p. 115444, 2021.
  • [30] F. Zhang, V. E. Lee, R. Jin, S. Garg, K.-K. R. Choo, M. Maasberg, L. Dong, and C. Cheng, “Privacy-aware smart city: A case study in collaborative filtering recommender systems,” Journal of Parallel and Distributed Computing , vol. 127, pp. 145–159, 2019.
  • [31] J. Bobadilla, S. Alonso, and A. Hernando, “Deep learning architecture for collaborative filtering recommender systems,” Applied Sciences , vol. 10, no. 7, p. 2441, 2020.
  • [32] J. Bobadilla, F. Ortega, A. Gutiérrez, and S. Alonso, “Classification-based deep neural network architecture for collaborative filtering recommender systems.” International Journal of Interactive Multimedia & Artificial Intelligence , vol. 6, no. 1, 2020.
  • [33] M. Dong, X. Zeng, L. Koehl, and J. Zhang, “An interactive knowledge-based recommender system for fashion product design in the big data environment,” Information Sciences , vol. 540, pp. 469–488, 2020.
  • [34] A. Gazdar and L. Hidri, “A new similarity measure for collaborative filtering based recommender systems,” Knowledge-Based Systems , vol. 188, p. 105058, 2020.
  • [35] P. M. Alamdari, N. J. Navimipour, M. Hosseinzadeh, A. A. Safaei, and A. Darwesh, “A systematic study on the recommender systems in the e-commerce,” IEEE Access , vol. 8, pp. 115 694–115 716, 2020.
  • [36] F. Cena, L. Console, and F. Vernero, “Logical foundations of knowledge-based recommender systems: A unifying spectrum of alternatives,” Information Sciences , vol. 546, pp. 60–73, 2021.
  • [37] B. Hrnjica, D. Music, and S. Softic, “Model-based recommender systems,” Trends in Cloud-based IoT , pp. 125–146, 2020.
  • [38] J. Shokeen and C. Rana, “A study on features of social recommender systems,” Artificial Intelligence Review , vol. 53, no. 2, pp. 965–988, 2020.
  • [39] A. Zagranovskaia and D. Mitura, “Designing hybrid recommender systems,” in IV International Scientific and Practical Conference , 2021, pp. 1–5.
  • [40] A. J. Ibrahim, P. Zira, and N. Abdulganiyyi, “Hybrid recommender for research papers and articles,” International Journal of Intelligent Information Systems , vol. 10, no. 2, p. 9, 2021.
  • [41] S. Shishehchi, S. Y. Banihashem, N. A. M. Zin, S. A. M. Noah, and K. Malaysia, “Ontological approach in knowledge based recommender system to develop the quality of e-learning system,” Australian Journal of Basic and Applied Sciences , vol. 6, no. 2, pp. 115–123, 2012.
  • [42] C. C. Aggarwal, “Knowledge-based recommender systems,” in Recommender systems .   Springer, 2016, pp. 167–197.
  • [43] R. Cabezas, J. G. Ruizº, and M. Leyva, “A knowledge-based recommendation framework using svn,” Neutrosophic Sets and Systems , vol. 16, p. 24, 2017.
  • [44] J. K. Tarus, Z. Niu, and G. Mustafa, “Knowledge-based recommendation: a review of ontology-based recommender systems for e-learning,” Artificial intelligence review , vol. 50, no. 1, pp. 21–48, 2018.
  • [45] M. T. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani, “Pareto-efficient hybridization for multi-objective recommender systems,” in Proceedings of the sixth ACM conference on Recommender systems , 2012, pp. 19–26.
  • [46] M. Hassan and M. Hamada, “Enhancing learning objects recommendation using multi-criteria recommender systems,” in 2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE) .   IEEE, 2016, pp. 62–64.
  • [47] Y. Zhang, X. Liu, W. Liu, and C. Zhu, “Hybrid recommender system using semi-supervised clustering based on gaussian mixture model,” in 2016 international conference on cyberworlds (CW) .   IEEE, 2016, pp. 155–158.
  • [48] G. George and A. M. Lal, “Review of ontology-based recommender systems in e-learning,” Computers & Education , vol. 142, p. 103642, 2019.
  • [49] J. D. West, I. Wesley-Smith, and C. T. Bergstrom, “A recommendation system based on hierarchical clustering of an article-level citation network,” IEEE Transactions on Big Data , vol. 2, no. 2, pp. 113–123, 2016.
  • [50] X. He and X. Ke, “Research summary of recommendation system based on knowledge graph,” in The 2021 3rd International Conference on Big Data Engineering , 2021, pp. 104–109.
  • [51] H. Chen, “A dqn-based recommender system for item-list recommendation,” in 2021 IEEE International Conference on Big Data (Big Data) .   IEEE, 2021, pp. 5699–5702.
  • [52] S. D. Kadam, D. Motwani, and S. A. Vaidya, “Big data analytics-recommendation system with hadoop framework,” in 2016 International Conference on Inventive Computation Technologies (ICICT) , vol. 3.   IEEE, 2016, pp. 1–5.
  • [53] D. P. Acharjya and K. Ahmed, “A survey on big data analytics: challenges, open research issues and tools,” International Journal of Advanced Computer Science and Applications , vol. 7, no. 2, pp. 511–518, 2016.
  • [54] X. Zhou, W. Liang, I. Kevin, K. Wang, R. Huang, and Q. Jin, “Academic influence aware and multidimensional network analysis for research collaboration navigation based on scholarly big data,” IEEE Transactions on Emerging Topics in Computing , vol. 9, no. 1, pp. 246–257, 2018.
  • [55] P. Ram Mohan Rao, S. Murali Krishna, and A. Siva Kumar, “Privacy preservation techniques in big data analytics: a survey,” Journal of Big Data , vol. 5, no. 1, pp. 1–12, 2018.
  • [56] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” IEEE transactions on knowledge and data engineering , vol. 26, no. 1, pp. 97–107, 2013.
  • [57] C. K. Emani, N. Cullot, and C. Nicolle, “Understandable big data: a survey,” Computer science review , vol. 17, pp. 70–81, 2015.
  • [58] H.-Y. Lin and S.-Y. Yang, “A cloud-based energy data mining information agent system based on big data analysis technology,” Microelectronics Reliability , vol. 97, pp. 66–78, 2019.
  • [59] Y. Cheng and X. Bu, “Research on key technologies of personalized education resource recommendation system based on big data environment,” in Journal of Physics: Conference Series , vol. 1437, no. 1.   IOP Publishing, 2020, p. 012024.
  • [60] K. Al Fararni, F. Nafis, B. Aghoutane, A. Yahyaouy, J. Riffi, and A. Sabri, “Hybrid recommender system for tourism based on big data and ai: A conceptual framework,” Big Data Mining and Analytics , vol. 4, no. 1, pp. 47–55, 2021.
  • [61] A. V. Dev and A. Mohan, “Recommendation system for big data applications based on set similarity of user preferences,” in 2016 International Conference on Next Generation Intelligent Systems (ICNGIS) .   IEEE, 2016, pp. 1–6.
  • [62] J. Chen, K. Li, H. Rong, K. Bilal, N. Yang, and K. Li, “A disease diagnosis and treatment recommendation system based on big data mining and cloud computing,” Information Sciences , vol. 435, pp. 124–149, 2018.
  • [63] Z. Wan, “Research on e-commerce recommendation system based on big data technology,” in Journal of Physics: Conference Series , vol. 1883, no. 1.   IOP Publishing, 2021, p. 012159.
  • [64] B. Asiya Banu and S. Banu, “Keyword based movie recommendation service using mapreduce.”
  • [65] J. P. Verma, B. Patel, and A. Patel, “Big data analysis: recommendation system with hadoop framework,” in 2015 IEEE International Conference on Computational Intelligence & Communication Technology .   IEEE, 2015, pp. 92–97.
  • [66] M. Uzun-Per, A. B. Can, A. V. Gürel, and M. S. Aktaş, “Big data testing framework for recommendation systems in e-science and e-commerce domains,” in 2021 IEEE International Conference on Big Data (Big Data) .   IEEE, 2021, pp. 2353–2361.
  • [67] Y.-w. Zhang, Y.-y. Zhou, F.-t. Wang, Z. Sun, and Q. He, “Service recommendation based on quotient space granularity analysis and covering algorithm on spark,” Knowledge-Based Systems , vol. 147, pp. 25–35, 2018.
  • [68] G. Chaithra et al. , “User preferences based recommendation system for services using mapreduce approach,” 2015.
  • [69] B. Ait Hammou, A. Ait Lahcen, and S. Mouline, “A distributed group recommendation system based on extreme gradient boosting and big data technologies,” Applied Intelligence , vol. 49, no. 12, pp. 4128–4149, 2019.

Deep Learning for Recommender Systems: A Netflix Case Study

  • Harald Steck Netflix
  • Linas Baltrunas Netflix
  • Ehtsham Elahi Netflix
  • Dawen Liang Netflix
  • Yves Raimond Netflix
  • Justin Basilico Netflix

Deep learning has profoundly impacted many areas of machine learning. However, it took a while for its impact to be felt in the field of recommender systems. In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service. We found that different model architectures excel at different tasks. Even though many deep-learning models can be understood as extensions of existing (simple) recommendation algorithms, we initially did not observe significant improvements in performance over well-tuned non-deep-learning approaches. Only when we added numerous features of heterogeneous types to the input data, deep-learning models did start to shine in our setting. We also observed that deep-learning methods can exacerbate the problem of offline–online metric (mis-)alignment. After addressing these challenges, deep learning has ultimately resulted in large improvements to our recommendations as measured by both offline and online metrics. On the practical side, integrating deep-learning toolboxes in our system has made it faster and easier to implement and experiment with both deep-learning and non-deep-learning approaches for various recommendation tasks. We conclude this article by summarizing our take-aways that may generalize to other applications beyond Netflix.

Recommender Systems, by James Gary

How to Cite

  • Endnote/Zotero/Mendeley (RIS)
  • The author(s) warrants that they are the sole author and owner of the copyright in the above article/paper, except for those portions shown to be in quotations; that the article/paper is original throughout; and that the undersigned right to make the grants set forth above is complete and unencumbered.
  • The author(s) agree that if anyone brings any claim or action alleging facts that, if true, constitute a breach of any of the foregoing warranties, the author(s) will hold harmless and indemnify AAAI, their grantees, their licensees, and their distributors against any liability, whether under judgment, decree, or compromise, and any legal fees and expenses arising out of that claim or actions, and the undersigned will cooperate fully in any defense AAAI may make to such claim or action. Moreover, the undersigned agrees to cooperate in any claim or other action seeking to protect or enforce any right the undersigned has granted to AAAI in the article/paper. If any such claim or action fails because of facts that constitute a breach of any of the foregoing warranties, the undersigned agrees to reimburse whomever brings such claim or action for expenses and attorneys’ fees incurred therein.
  • Author(s) retain all proprietary rights other than copyright (such as patent rights).
  • Author(s) may make personal reuse of all or portions of the above article/paper in other works of their own authorship.
  • Author(s) may reproduce, or have reproduced, their article/paper for the author’s personal use, or for company use provided that original work is property cited, and that the copies are not used in a way that implies AAAI endorsement of a product or service of an employer, and that the copies per se are not offered for sale. The foregoing right shall not permit the posting of the article/paper in electronic or digital form on any computer network, except by the author or the author’s employer, and then only on the author’s or the employer’s own web page or ftp site. Such web page or ftp site, in addition to the aforementioned requirements of this Paragraph, must provide an electronic reference or link back to the AAAI electronic server, and shall not post other AAAI copyrighted materials not of the author’s or the employer’s creation (including tables of contents with links to other papers) without AAAI’s written permission.
  • Author(s) may make limited distribution of all or portions of their article/paper prior to publication.
  • In the case of work performed under U.S. Government contract, AAAI grants the U.S. Government royalty-free permission to reproduce all or portions of the above article/paper, and to authorize others to do so, for U.S. Government purposes.
  • In the event the above article/paper is not accepted and published by AAAI, or is withdrawn by the author(s) before acceptance by AAAI, this agreement becomes null and void.

Information

  • For Readers
  • For Authors

Developed By

Part of the PKP Publishing Services Network

Copyright © 2021, Association for the Advancement of Artificial Intelligence. All rights reserved.

More information about the publishing system, Platform and Workflow by OJS/PKP.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

make-logo

Article Menu

case study on recommendation system

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Systematic review of recommendation systems for course selection.

case study on recommendation system

1. Introduction

2. motivation and rationale of the research, 3. research questions, 3.1. questions about the used algorithms.

  • What preprocessing methods were applied?
  • What recommendation system algorithms were used in the paper?
  • What are the applied evaluation metrics?
  • What are the performance results of applied evaluation metrics?

3.2. Questions about the Used Dataset

  • Is the dataset published or accessible?
  • How many records are there in the dataset?
  • How many unique student records are there in the dataset?
  • How many unique course records are there in the dataset?
  • How many features are there in the dataset?
  • How many features are used from the existing features?
  • How many unique majors are there in the dataset?
  • How did the authors split the training and testing set?

3.3. Questions about the Research

  • What is the type of comparative produced in the study (algorithm level, preprocessing level, or data level)?
  • What is the main aim of the study?
  • What are the strong points of the research?
  • What are the weak points of the research?

4. Research Methodology

4.1. title-level screening stage.

  • The study addresses recommendation systems in the Education sector.
  • The study must be primary.

4.2. Abstract-Level Screening Stage

4.3. full-text article scanning stage.

  • The study was written in the English language.
  • The study implies empirical experiments and provides the experiment’s results.

4.4. Full-Text Article Screening Stage

  • Q1: Did the study conduct experiment in the course selection and courses recommendation system?
  • Q2: Is there a comparison with other approaches in the conducted study?
  • Q3: Were the performance measures fully defined?
  • Q4: Was the method used in the study clearly described?
  • Q5: Was the dataset and number of training and testing data identified?

4.5. Data Extraction Stage

5. research results, 5.1. the studies included in the slr, 5.1.1. collaborative filtering studies, 5.1.2. content-based filtering studies, 5.1.3. hybrid recommender system studies, 5.1.4. studies based on machine learning, 5.1.5. similarity-based study, 6. key studies analysis, 6.1. discussion of aims and contributions of the existing research works, 6.1.1. aim of studies that used collaborative filtering, 6.1.2. aim of studies that used content-based filtering, 6.1.3. aim of studies that used hybrid recommender systems, 6.1.4. aim of studies that used novel approaches, 6.1.5. aim of studies that used similarity-based filtering, 6.2. description of datasets used in the studies, 6.2.1. dataset description of studies that used collaborative filtering, 6.2.2. dataset description of studies that used content-based filtering, 6.2.3. dataset description of studies that used hybrid recommender systems, 6.2.4. dataset description of studies that used novel approaches.

  • Train-test split.
  • K-fold cross-validation.
  • Nested time series splits.

6.2.5. Dataset Description of the Study That Used Similarity-Based Filtering

6.3. research evaluation, 6.3.1. research evaluation for studies that used collaborative filtering, 6.3.2. research evaluation for studies that used content-based filtering, 6.3.3. research evaluation for studies that used hybrid recommender systems, 6.3.4. research evaluation for studies that used novel approaches, 6.3.5. research evaluation for the study that used similarity-based filtering, 7. discussion of findings, 8. gaps, challenges, future directions and conclusions for (crs) selection, 8.2. challenges, 8.3. future directions, 9. conclusions.

  • Making precise course recommendations that are tailored to each student’s interests, abilities, and long-term professional goals.
  • Addressing the issue of “cold starts,” wherein brand-new students without prior course experience might not obtain useful, reliable, and precise advice.
  • Ensuring that the system is flexible enough to accommodate various educational contexts, data accessibility, and the unique objectives of the advising system.
  • Increasing suggestion recall and precision rates.
  • Using preprocessing and data-splitting methods to enhance the predefined performance standards of the CRS overall as well as the predefined and measured quality of recommendations.

Author Contributions

Data availability statement, conflicts of interest.

  • Iatrellis, O.; Kameas, A.; Fitsilis, P. Academic advising systems: A systematic literature review of empirical evidence. Educ. Sci. 2017 , 7 , 90. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Chang, P.C.; Lin, C.H.; Chen, M.H. A hybrid course recommendation system by integrating collaborative filtering and artificial immune systems. Algorithms 2016 , 9 , 47. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Xu, J.; Xing, T.; Van Der Schaar, M. Personalized course sequence recommendations. IEEE Trans. Signal Process. 2016 , 64 , 5340–5352. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Noaman, A.Y.; Ahmed, F.F. A new framework for e academic advising. Procedia Comput. Sci. 2015 , 65 , 358–367. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Pizzolato, J.E. Complex partnerships: Self-authorship and provocative academic-advising practices. NACADA J. 2006 , 26 , 32–45. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Unelsrød, H.F. Design and Evaluation of a Recommender System for Course Selection. Master’s Thesis, Institutt for Datateknikk og Informasjonsvitenskap, Trondheim, Norway, 2011. [ Google Scholar ]
  • Kuh, G.D.; Kinzie, J.; Schuh, J.H.; Whitt, E.J. Student Success in College: Creating Conditions That Matter ; John Wiley & Sons: New York, NY, USA, 2011. [ Google Scholar ]
  • Mostafa, L.; Oately, G.; Khalifa, N.; Rabie, W. A case based reasoning system for academic advising in Egyptian educational institutions. In Proceedings of the 2nd International Conference on Research in Science, Engineering and Technology (ICRSET’2014), Dubai, United Arab Emirates, 21–22 March 2014; pp. 21–22. [ Google Scholar ]
  • Obeidat, R.; Duwairi, R.; Al-Aiad, A. A collaborative recommendation system for online courses recommendations. In Proceedings of the 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), Istanbul, Turkey, 26–28 August 2019; pp. 49–54. [ Google Scholar ]
  • Feng, J.; Xia, Z.; Feng, X.; Peng, J. RBPR: A hybrid model for the new user cold start problem in recommender systems. Knowl.-Based Syst. 2021 , 214 , 106732. [ Google Scholar ] [ CrossRef ]
  • Kohl, C.; McIntosh, E.J.; Unger, S.; Haddaway, N.R.; Kecke, S.; Schiemann, J.; Wilhelm, R. Online tools supporting the conduct and reporting of systematic reviews and systematic maps: A case study on CADIMA and review of existing tools. Environ. Evid. 2018 , 7 , 8. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shminan, A.S.; Choi, L.J.; Barawi, M.H.; Hashim, W.N.W.; Andy, H. InVesa 1.0: The Conceptual Framework of Interactive Virtual Academic Advisor System based on Psychological Profiles. In Proceedings of the 2021 13th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 20–21 October 2021; pp. 112–117. [ Google Scholar ]
  • Wang, H.; Wei, Z. Research on Personalized Learning Route Model Based on Improved Collaborative Filtering Algorithm. In Proceedings of the 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, 24–26 September 2021; pp. 120–123. [ Google Scholar ]
  • Shaptala, R.; Kyselova, A.; Kyselov, G. Exploring the vector space model for online courses. In Proceedings of the 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kyiv, Ukraine, 29 May–2 June 2017; pp. 861–864. [ Google Scholar ]
  • Zhao, X.; Liu, B. Application of personalized recommendation technology in MOOC system. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Vientiane, Laos, 11–12 January 2020; pp. 720–723. [ Google Scholar ]
  • Wahyono, I.D.; Asfani, K.; Mohamad, M.M.; Saryono, D.; Putranto, H.; Haruzuan, M.N. Matching User in Online Learning using Artificial Intelligence for Recommendation of Competition. In Proceedings of the 2021 Fourth International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, 2–3 October 2021; pp. 1–4. [ Google Scholar ]
  • Elghomary, K.; Bouzidi, D. Dynamic peer recommendation system based on trust model for sustainable social tutoring in MOOCs. In Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 3–4 October 2019; pp. 1–9. [ Google Scholar ]
  • Mufizar, T.; Mulyani, E.D.S.; Wiyono, R.A.; Arifiana, W. A combination of Multi Factor Evaluation Process (MFEP) and the Distance to the Ideal Alternative (DIA) methods for majors selection and scholarship recipients in SMAN 2 Tasikmalaya. In Proceedings of the 2018 6th International Conference on Cyber and IT Service Management (CITSM), Parapat, Indonesia, 7–9 August 2018; pp. 1–7. [ Google Scholar ]
  • Sutrisno, M.; Budiyanto, U. Intelligent System for Recommending Study Level in English Language Course Using CBR Method. In Proceedings of the 2019 6th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Bandung, Indonesia, 18–20 September 2019; pp. 153–158. [ Google Scholar ]
  • Gan, B.; Zhang, C. Research on the Application of Curriculum Knowledge Point Recommendation Algorithm Based on Learning Diagnosis Model. In Proceedings of the 2020 5th International Conference on Electromechanical Control Technology and Transportation (ICECTT), Nanchang, China, 5–17 May 2020; pp. 188–192. [ Google Scholar ]
  • Ivanov, D.A.; Ivanova, I.V. Computer Self-Testing of Students as an Element of Distance Learning Technologies that Increase Interest in the Study of General Physics Course. In Proceedings of the 2018 IV International Conference on Information Technologies in Engineering Education (Inforino), Moscow, Russia, 22–26 October 2018; pp. 1–4. [ Google Scholar ]
  • Anupama, V.; Elayidom, M.S. Course Recommendation System: Collaborative Filtering, Machine Learning and Topic Modelling. In Proceedings of the 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 25–26 March 2022; Volume 1, pp. 1459–1462. [ Google Scholar ]
  • Sabnis, V.; Tejaswini, P.D.; Sharvani, G.S. Course recommendations in moocs: Techniques and evaluation. In Proceedings of the 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 20–22 December 2018; pp. 59–66. [ Google Scholar ]
  • Britto, J.; Prabhu, S.; Gawali, A.; Jadhav, Y. A Machine Learning Based Approach for Recommending Courses at Graduate Level. In Proceedings of the 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 27–29 November 2019; pp. 117–121. [ Google Scholar ]
  • Peng, Y. A Survey on Modern Recommendation System based on Big Data. arXiv 2022 , arXiv:2206.02631. [ Google Scholar ]
  • Bozyiğit, A.; Bozyiğit, F.; Kilinç, D.; Nasiboğlu, E. Collaborative filtering based course recommender using OWA operators. In Proceedings of the 2018 International Symposium on Computers in Education (SIIE), Jerez, Spain, 19–21 September 2018; pp. 1–5. [ Google Scholar ]
  • Mondal, B.; Patra, O.; Mishra, S.; Patra, P. A course recommendation system based on grades. In Proceedings of the 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), Gunupur, India, 13–14 March 2020; pp. 1–5. [ Google Scholar ]
  • Lee, E.L.; Kuo, T.T.; Lin, S.D. A collaborative filtering-based two stage model with item dependency for course recommendation. In Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, 19–21 October 2017; pp. 496–503. [ Google Scholar ]
  • Malhotra, I.; Chandra, P.; Lavanya, R. Course Recommendation using Domain-based Cluster Knowledge and Matrix Factorization. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, Indi, 23–25 March 2022; pp. 12–18. [ Google Scholar ]
  • Huang, L.; Wang, C.D.; Chao, H.Y.; Lai, J.H.; Philip, S.Y. A score prediction approach for optional course recommendation via cross-user-domain collaborative filtering. IEEE Access 2019 , 7 , 19550–19563. [ Google Scholar ] [ CrossRef ]
  • Zhao, L.; Pan, Z. Research on online course recommendation model based on improved collaborative filtering algorithm. In Proceedings of the 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 24–26 April 2021; pp. 437–440. [ Google Scholar ]
  • Ceyhan, M.; Okyay, S.; Kartal, Y.; Adar, N. The Prediction of Student Grades Using Collaborative Filtering in a Course Recommender System. In Proceedings of the 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 21–23 October 2021; pp. 177–181. [ Google Scholar ]
  • Dwivedi, S.; Roshni, V.K. Recommender system for big data in education. In Proceedings of the 2017 5th National Conference on E-Learning & E-Learning Technologies (ELELTECH), Hyderabad, India, 3–4 August 2017; pp. 1–4. [ Google Scholar ]
  • Zhong, S.T.; Huang, L.; Wang, C.D.; Lai, J.H. Constrained matrix factorization for course score prediction. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 1510–1515. [ Google Scholar ]
  • Chen, Z.; Song, W.; Liu, L. The application of association rules and interestingness in course selection system. In Proceedings of the 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), Beijing, China, 10–12 March 2017; pp. 612–616. [ Google Scholar ]
  • Chen, Z.; Liu, X.; Shang, L. Improved course recommendation algorithm based on collaborative filtering. In Proceedings of the 2020 International Conference on Big Data and Informatization Education (ICBDIE), Zhangjiajie, China, 23–25 April 2020; pp. 466–469. [ Google Scholar ]
  • Ren, Z.; Ning, X.; Lan, A.S.; Rangwala, H. Grade prediction with neural collaborative filtering. In Proceedings of the 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Washington, DC, USA, 5–8 October 2019; pp. 1–10. [ Google Scholar ]
  • Fernández-García, A.J.; Rodríguez-Echeverría, R.; Preciado, J.C.; Manzano, J.M.C.; Sánchez-Figueroa, F. Creating a recommender system to support higher education students in the subject enrollment decision. IEEE Access 2020 , 8 , 189069–189088. [ Google Scholar ] [ CrossRef ]
  • Adilaksa, Y.; Musdholifah, A. Recommendation System for Elective Courses using Content-based Filtering and Weighted Cosine Similarity. In Proceedings of the 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 16–17 December 2021; pp. 51–55. [ Google Scholar ]
  • Esteban, A.; Zafra, A.; Romero, C. Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization. Knowl.-Based Syst. 2020 , 194 , 105385. [ Google Scholar ] [ CrossRef ]
  • Emon, M.I.; Shahiduzzaman, M.; Rakib, M.R.H.; Shathee, M.S.A.; Saha, S.; Kamran, M.N.; Fahim, J.H. Profile Based Course Recommendation System Using Association Rule Mining and Collaborative Filtering. In Proceedings of the 2021 International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 5–7 August 2021; pp. 1–5. [ Google Scholar ]
  • Alghamdi, S.; Sheta, O.; Adrees, M. A Framework of Prompting Intelligent System for Academic Advising Using Recommendation System Based on Association Rules. In Proceedings of the 2022 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 29–31 March 2022; pp. 392–398. [ Google Scholar ]
  • Bharath, G.M.; Indumathy, M. Course Recommendation System in Social Learning Network (SLN) Using Hybrid Filtering. In Proceedings of the 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2–4 December 2021; pp. 1078–1083. [ Google Scholar ]
  • Nafea, S.M.; Siewe, F.; He, Y. On recommendation of learning objects using felder-silverman learning style model. IEEE Access 2019 , 7 , 163034–163048. [ Google Scholar ] [ CrossRef ]
  • Huang, X.; Tang, Y.; Qu, R.; Li, C.; Yuan, C.; Sun, S.; Xu, B. Course recommendation model in academic social networks based on association rules and multi-similarity. In Proceedings of the 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanjing, China, 9–11 May 2018; pp. 277–282. [ Google Scholar ]
  • Baskota, A.; Ng, Y.K. A graduate school recommendation system using the multi-class support vector machine and KNN approaches. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; pp. 277–284. [ Google Scholar ]
  • Jiang, W.; Pardos, Z.A.; Wei, Q. Goal-based course recommendation. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge, Tempe, AZ, USA, 4–8 March 2019; pp. 36–45. [ Google Scholar ]
  • Liang, Y.; Duan, X.; Ding, Y.; Kou, X.; Huang, J. Data Mining of Students’ Course Selection Based on Currency Rules and Decision Tree. In Proceedings of the 2019 4th International Conference on Big Data and Computing, Guangzhou, China, 10–12 May 2019; pp. 247–252. [ Google Scholar ]
  • Isma’il, M.; Haruna, U.; Aliyu, G.; Abdulmumin, I.; Adamu, S. An autonomous courses recommender system for undergraduate using machine learning techniques. In Proceedings of the 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), Ayobo, Nigeria, 18–21 March 2020; pp. 1–6. [ Google Scholar ]
  • Revathy, M.; Kamalakkannan, S.; Kavitha, P. Machine Learning based Prediction of Dropout Students from the Education University using SMOTE. In Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 January 2022; pp. 1750–1758. [ Google Scholar ]
  • Oreshin, S.; Filchenkov, A.; Petrusha, P.; Krasheninnikov, E.; Panfilov, A.; Glukhov, I.; Kaliberda, Y.; Masalskiy, D.; Serdyukov, A.; Kazakovtsev, V.; et al. Implementing a Machine Learning Approach to Predicting Students’ Academic Outcomes. In Proceedings of the 2020 International Conference on Control, Robotics and Intelligent System, Xiamen, China, 27–29 October 2020; pp. 78–83. [ Google Scholar ]
  • Verma, R. Applying Predictive Analytics in Elective Course Recommender System while preserving Student Course Preferences. In Proceedings of the 2018 IEEE 6th International Conference on MOOCs, Innovation and Technology in Education (MITE), Hyderabad, India, 29–30 November 2018; pp. 52–59. [ Google Scholar ]
  • Bujang, S.D.A.; Selamat, A.; Ibrahim, R.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H.; Ghani, N.A.M. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021 , 9 , 95608–95621. [ Google Scholar ] [ CrossRef ]
  • Srivastava, S.; Karigar, S.; Khanna, R.; Agarwal, R. Educational data mining: Classifier comparison for the course selection process. In Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, 11–12 July 2018; pp. 1–5. [ Google Scholar ]
  • Abed, T.; Ajoodha, R.; Jadhav, A. A prediction model to improve student placement at a south african higher education institution. In Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference, Cape Town, South Africa, 29–31 January 2020; pp. 1–6. [ Google Scholar ]
  • Uskov, V.L.; Bakken, J.P.; Byerly, A.; Shah, A. Machine learning-based predictive analytics of student academic performance in STEM education. In Proceedings of the 2019 IEEE Global Engineering Education Conference (EDUCON), Dubai, United Arab Emirates, 8–11 April 2019; pp. 1370–1376. [ Google Scholar ]
  • Sankhe, V.; Shah, J.; Paranjape, T.; Shankarmani, R. Skill Based Course Recommendation System. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; pp. 573–576. [ Google Scholar ]
  • Kamila, V.Z.; Subastian, E. KNN and Naive Bayes for Optional Advanced Courses Recommendation. In Proceedings of the 2019 International Conference on Electrical, Electronics and Information Engineering (ICEEIE), Denpasar, Indonesia, 3–4 October 2019; Volume 6, pp. 306–309. [ Google Scholar ]
  • Shah, D.; Shah, P.; Banerjee, A. Similarity based regularization for online matrix-factorization problem: An application to course recommender systems. In Proceedings of the TENCON 2017—2017 IEEE Region 10 Conference, Penang, Malaysia, 5–8 November 2017; pp. 1874–1879. [ Google Scholar ]

Click here to enlarge figure

AuthorThe Study Addresses Recommendation Systems in the Education SectorPrimary Study
Shminan et al. [ ]NoYes
Wang et al. [ ]NoYes
Shaptala et al. [ ]NoYes
Zhao et al. [ ]NoYes
ID Wahyono et al. [ ]NoYes
AuthorThe Study Addresses Recommendation Systems in the Education SectorPrimary Study
Elghomary et al. [ ]NoYes
Mufizar et al. [ ]NoYes
Sutrisno et al. [ ]NoYes
Gan et al. [ ]NoYes
Ivanov et al. [ ]NoYes
AuthorReason of Exclusion
Anupama et al. [ ]Did not imply empirical experiments and did not provide experiments results
Sabnis et al. [ ]The full text is not accessible
AuthorScoreTotal ScoreIncluded
Q1Q2Q3Q4Q5
Britto et al. [ ]100.50.50.52.5No
Obeidat et al. [ ] 0.510.50.50.53Yes
Authors and YearAlgorithms UsedComparative Type
A. Bozyiğit et al., 2018 [ ]• Collaborative filtering;Algorithm level
• OWA (ordered weighted average).
B. Mondal et al., 2020 [ ]• Collaborative filtering.Algorithm level
E. L. Lee et al., 2017 [ ]• Two-stage collaborative filtering;Algorithm level
• Personalized Ranking Matrix Factorization (BPR-MF);
• Course dependency regularization;
• Personalized PageRank;
• Linear RankSVM.
I. Malhotra et al., 2022 [ ]• Collaborative filtering;Algorithm level
• Domain-based cluster knowledge;
• Cosine pairwise similarity evaluation;
• Singular value decomposition ++;
• Matrix factorization.
L. Huang et al., 2019 [ ]• Cross-user-domain collaborative filtering.Algorithm level
L. Zhao et al., 2021 [ ]• Improved collaborative filtering;Algorithm level
• Historical preference fusion similarity.
M. Ceyhan et al., 2021 [ ]• Collaborative filtering;Algorithm level
• Correlation-based similarities: (Pearson correlation coefficient, median-based robust correlation coefficient);
• Distance-based similarities: Manhattan and Euclidian distance similarities.
R. Obeidat et al., 2019 [ ]• Collaborative filtering;Algorithm level
• K-means clustering;
• Association rules (Apriori algorithm, sequential, pattern discovery using equivalence classes algorithm).
S. Dwivedi et al., 2017 [ ]• Collaborative filtering;Algorithm level
• Similarity log-likelihood.
S.-T. Zhong et al., 2019 [ ]• Collaborative filtering;Algorithm level
• Constrained matrix factorization.
Z. Chen et al., 2017 [ ]• Collaborative filtering;Algorithm level
• Association rules (Apriori).
Z. Chen et al., 2020 [ ]• Collaborative filtering;Algorithm level
• Improved cosine similarity;
• TF–IDF (term frequency–inverse document frequency).
Z. Ren et al., 2019 [ ]• Neural Collaborative Filtering (NCF).Algorithm level
Authors and YearUsed AlgorithmsComparative Type
A. J. Fernández-García et al., 2020 [ ]• Content-based filtering.Preprocessing level
Y. Adilaksa et al., 2021 [ ]• Content-based filtering;Algorithm level
• Weighted cosine similarity;
• TF–IDF.
Authors and YearUsed AlgorithmsComparative Type
Esteban, A. et al., 2020 [ ]• Hybrid recommender system;Algorithm level
• Collaborative filtering;
• Content-based filtering;
• Genetic algorithm.
M. I. Emon et al., 2021 [ ]• Hybrid recommender system;Algorithm level
• Collaborative filtering;
• Association rules (Apriori algorithm).
S. Alghamdi et al., 2022 [ ]• Hybrid recommender system;Algorithm level
• Content-based filtering;
• Association rules (Apriori algorithm);
• Jaccard coefficient.
S. G. G et al., 2021 [ ]• Hybrid recommender system;Algorithm level
• Collaborative filtering;
• Content-based filtering;
• Lasso;
• KNN;
• Weighted average.
S. M. Nafea et al., 2019 [ ]• Hybrid recommender system;Algorithm level
• Felder–Silverman learning styles model;
• K-means clustering.
X. Huang et al., 2018 [ ]• Hybrid recommender system;Algorithm level
• Association rules;
• Improved multi-similarity.
Authors and YearUsed AlgorithmsComparative Type
A. Baskota et al., 2018 [ ]• Forward feature selection;Algorithm level
• K-Nearest Neighbor (KNN);
• Multi-class Support Vector Machines (MC-SVM).
Jiang, Weijie et al., 2019 [ ]• Goal-based filtering;Algorithm level
• LSTM recurrent neural network.
Liang, Yu et al., 2019 [ ]• Currency rules;Preprocessing level
• C4.5 decision tree.
M. Isma’il et al., 2020 [ ]• Support Vector Machine (SVM).Algorithm level
M. Revathy et al., 2022 [ ]• KNN-SMOTE.Algorithm level
Oreshin et al., 2020 [ ]• Latent Dirichlet Allocation;Algorithm level
• FastTextSocialNetworkModel;
• Catboost.
R. Verma et al., 2018 [ ]• Support Vector Machines;Algorithm level
• Artificial Neural Networks (ANN).
S. D. A. Bujang et al., 2021 [ ]• Random forests.• Algorithm level
• Preprocessing level
S. Srivastava et al., 2018 [ ]• Support Vector Machines with radial basis kernel;Algorithm level
• KNN.
T. Abed et al., 2020 [ ]• Naive Bayes.Algorithm level
V. L. Uskov et al., 2019 [ ]• Linear regression.Algorithm level
V. Sankhe et al., 2020 [ ]• Skill-based filtering;Algorithm level
• C-means fuzzy clustering;
• Weighted mode.
V. Z. Kamila et al., 2019 [ ]• KNN;algorithm level
• Naive Bayes.
Authors and YearUsed AlgorithmsComparative Type
D. Shah et al., 2017 [ ]• Similarity-based regularization;Algorithm level
• Matrix factorization.
Authors and YearPublicRecordsStudentsCoursesMajorsFeaturesUsed FeaturesPreprocessing StepsData-Splitting Method
A. Bozyiğit et al., 2018 [ ] NoN/A22176N/AN/AN/AN/ATen-fold cross-validation.
B. Mondal et al., 2020 [ ]No300300N/AN/A4812• Data cleaning: lowercase conversion, removing punctuation, striping white spaces.N/A
E. L. Lee et al., 2017 [ ]No896,61613,977N/AN/AN/AN/A• Ignore the students whose 4-year registration records are incomplete.Nested time-series split cross-validation (class 2008, class 2009 as a training set, and class 2010 as a testing set).
I. Malhotra et al., 2022 [ ]NoN/A1780N/A9N/AN/AN/AN/A
L. Huang et al., 2019 [ ]No52,3111166N/A8N/AN/AN/AN/A
L. Zhao et al., 2021 [ ]NoN/A43,916240N/AN/AN/A• Group data based on interest data points,
• Eliminate noise by filtering the data noise constrained in 0,1,
• Normalize all numerical features.
Five-fold cross-validation.
M. Ceyhan et al., 2021 [ ]NoN/A15061460N/AN/AN/A• The updated grade is taken into consideration if a student retakes any course.• Nested time-series split cross-validation,
• Train = 91.7% (from 2010/11-F to 2019/20-S),
• Test = 8.3% (the whole 2020/21-F).
R. Obeidat et al., 2019 [ ]Yes22,14410,00016N/AN/AN/A• Remove incomplete records
• Calculate the order of courses sequences events for each student,
• Convert grades to a new grade scale,
• Cluster students.
N/A
S. Dwivedi et al., 2017 [ ]NoN/AN/AN/AN/AN/AN/A• Data cleaning,
• Data discretization (converting low-level concept to high-level concept).
N/A
S. -T. Zhong et al., 2019 [ ]NoN/AN/AN/A8N/AN/AN/AN/A
Z. Chen et al., 2017 [ ]NoN/AN/AN/AN/AN/AN/AStudents’ score categorization (A, B, C).N/A
Z. Chen et al., 2020 [ ]No18,4572022309N/AN/AN/AN/AK-fold cross-validation.
Z. Ren et al., 2019 [ ]NoN/A43,099N/A151N/AN/AUsed different embedding dimensions for students, courses, and course instructors for different majors.Nested time-series split cross-validation (data from Fall 2009 to Fall 2015 as a training set, and data from Spring 2016 as a testing set).
Authors and YearPublicRecordsStudentsCoursesMajorsFeaturesFeatures Used Preprocessing StepsData-Splitting Method
A. J. Fernández-García et al., 2020 [ ]No6948323N/AN/A1010• Feature deletion,
• Class reduction,
• One-hot encoding,
• Creating new features,
• Data scaling: MinMax Scaler, Standard Scaler, Robust Scaler, and Normalizer Scaler,
• Data resampling: upsample, downsample, SMOTE.
• Train size = 80%,
• Test size = 20%.
Y. Adilaksa et al., 2021 [ ]NoN/AN/AN/AN/AN/AN/A• Case folding,
• Word tokenization,
• Punctuation removal,
• Stop words removal.
N/A
Authors and YearPubRecsStudentsCoursesMajorsFeaturesUsed FeaturesPreprocessing StepsData-Splitting Method
Esteban, A. et al., 2020 [ ]No2500C9563N/AN/AN/AN/AFive-fold cross-validation.
M. I. Emon et al., 2021 [ ]NoN/A250+250+20+N/AN/AFeature extraction.N/A
S. Alghamdi et al., 2022 [ ]No18203848N/AN/A7Cluster sets for academic transcript datasets.Five-fold cross-validation.
S. G. G et al., 2021 [ ]NoN/A~6000~400018N/AN/AN/AN/A
S. M. Nafea et al., 2019 [ ]NoN/A80N/AN/AN/AN/AN/AStudent dataset was split into cold-start students, cold-start learning objects, and all students.
X. Huang et al., 2018 [ ]YesN/A56,600860N/AN/AN/AN/A• Train size = 80%,
• Test size = 20%.
Authors and YearPublicRecordsStudentsCoursesMajorsFeaturesFeatures UsedPreprocessing StepsData-Splitting Method
A. Baskota et al., 2018 [ ]No16,000N/AN/AN/AN/AN/A• Data cleaning,
• Data scaling.
• Train size = 14,000,
• Test size = 2000.
Jiang, Weijie et al., 2019 [ ]No4,800,000164,19610,430 17N/AN/AN/ANested time-series split cross-validation (data from F’08 to F’15 as a training set, data in Sp’16 as validation set & data in Sp’17 as test set)
Liang, Yu et al., 2019 [ ]No35,000N/AN/AN/AN/AN/AData cleaning.N/A
M. Isma’il et al., 2020 [ ]No8700N/A9N/AN/A4• Data cleaning,
• Data encoding.
N/A
M. Revathy et al., 2022 [ ]NoN/A1243N/AN/AN/A33• One-hot encoding for categorical features,
• Principal Component Analysis (PCA).
• Train size = 804,
• Test size = 359.
Oreshin et al., 2020 [ ]NoN/A>20,000N/AN/AN/A112• One-hot encoding,
• Removed samples with unknown values.
Nested time-series split cross-validation.
R. Verma et al., 2018 [ ]No658658N/AN/A1311Data categorization.Ten-fold cross-validation.
S. D. A. Bujang et al., 2021 [ ]No12826412N/A13N/A• Ranked and grouped the students into five categories of grades,
• Applied oversampling SMOTE (Synthetic Minority Over-sampling Technique),
• Applied two feature selection methods: Wrapper and filter-based.
Ten-fold cross-validation.
S. Srivastava et al., 2018 [ ]No19882890N/AN/AN/A14Registration number transformation.• Train = 1312,
• Test = 676.
T. Abed et al., 2020 [ ]NoN/AN/AN/AN/AN/A18Balanced the dataset using under sampling.Ten-fold cross-validation.
V. L. Uskov et al., 2019 [ ]No90+N/AN/AN/A16N/AData cleaning• Train = 80%,
• Test = 20%.
V. Sankhe et al., 2020 [ ]NoN/A2000157N/AN/AN/AN/A
V. Z. Kamila et al., 2019 [ ]NoN/AN/AN/AN/AN/AN/AN/A• Train size = 75%,
• Test size= 25%.
Authors and YearPublicRecordsStudentsCoursesMajorsFeaturesFeatures Used Preprocessing StepsData-Splitting Method
D. Shah et al., 2017 [ ]NoN/A• Dataset 1 = 300 students
• Dataset 2 = 84 students
• Dataset 1 = 10
• Dataset 2 = 26
N/AN/A• Student features = 3
• Course features = 30
N/A• Train size = 90%
• Test size = 10%
Authors and YearEvaluation Metrics and ValuesStrengthsWeaknesses
A. Bozyiğit et al., 2018 [ ]MAE = 0.063.• Compared the performance of the proposed OWA approach with the performance of other popular approaches.• The number of features and features used in the dataset is not provided,
• The dataset description is not detailed,
• Did not use RMSE for evaluation, considered the standard as it’s more accurate,
• Mentioned that some preprocessing had been carried out but did not give any details regarding it.
B. Mondal et al., 2020 [ ]• MSE = 3.609,
• MAE = 1.133,
• RMSE = 1.8998089,
• Precision,
• Recall.
• Used many metrics for evaluation,
• The implementation of algorithms is comprehensively explained.
• Did not mention whether they split data for testing or used the training data for testing,
• Did not provide the exact measures of precision and recall.
E. L. Lee et al., 2017 [ ]AUC = 0.9709.• Compared the performance of the proposed approach with the performance of other approaches,
• Used a very large dataset,
• Achieved a very high AUC,
• The implementation of algorithms is comprehensively explained.
• Did not provide the percentage of the train-test split,
• The number of courses in the dataset is not mentioned (it only mentions course registration records).
I. Malhotra et al., 2022 [ ]• MAE = 0.468,
• RMSE = 0.781.
• The implementation of algorithms is comprehensively explained with examples,
• Used RMSE and MAE for evaluation.
• The dataset description is not detailed,
• The method of splitting the training and testing dataset is not provided,
• Did not mention whether they have done any preprocessing on the dataset or if it was used as it is,
• The proposed approach is not compared to any other approaches in the evaluation section.
L. Huang et al., 2019 [ ]• AverHitRate between 0.6538, 1,
• AverACC between 0.8347, 1.
• The literature is meticulously discussed,
• The implementation is comprehensively explained in detail.
• The method of splitting the training and testing dataset is not provided,
• Did not mention whether they have conducted any preprocessing on the dataset or if it was used as it is.
L. Zhao et al., 2021 [ ]• Precision,
• Recall.
• The implementation is comprehensively explained.• The exact numbers for the evaluation metrics used in the paper are not provided,
• The numbers of features and features used in the dataset are not provided.
M. Ceyhan et al., 2021 [ ]• Coverage,
• F1-measure,
• Precision,
• Sensitivity,
• Specificity,
• MAE,
• RMSE,
• Binary MAE,
• Binary RMSE.
• Used many metrics for evaluation.• The implemented algorithm and similarities explanation were very brief
R. Obeidat et al., 2019 [ ]• Coverage measure (using SPADES | with clustering) = 0.376, 0.28, 0.594, 0.546,
• Coverage measure (using Apriori | with clustering) = 0.46, 0.348, 0.582, 0.534.
• Confirmed by experiment that clustering significantly improves the generation and coverage of two association rules: SPADES and Apriori• The dataset description is not detailed,
• The method of splitting the training and testing dataset is not provided,
• The implementation is not discussed in detail.
S. Dwivedi et al., 2017 [ ]• RMSE = 0.46.• The proposed system is efficient as it proved to work well with big data,
• The implementation of algorithms is comprehensively explained.
• Did not provide any information about the dataset,
• The literature review section was very brief.
S.-T. Zhong et al., 2019 [ ]• MAE (CS major) = 6.6764 ± 0.0029,
• RMSE (CS major) = 4.5320 ± 0.0022.
• Used eight datasets for model training and evaluation,
• Dataset description is detailed,
• Compared the performance of the proposed approach with the performance of other popular approaches.
• The percentage of train-test splitting is not consistent among the eight datasets.
Z. Chen et al., 2017 [ ]• Confidence,
• Support.
• The implementation of algorithms is comprehensively explained with examples.• Did not provide any information about the used dataset,
• Did not include any information about the preprocessing of the dataset,
• Did not provide useful metrics for evaluation,
• The performance of the proposed approach is not compared to other similar approaches.
Z. Chen et al., 2020 [ ]• Precision,
• Recall,
• F1-score.
• Compared the performance of the proposed approach with the performance of other popular approaches: cosine similarity and improved cosine similarity.• The exact numbers for the evaluation metrics used in the paper are not provided,
• The numbers of features and features used in the dataset are not provided.
Z. Ren et al., 2019 [ ]• PTA,
• MAE.
• Compared the performance of the proposed approach with the performance of other approaches,
• The implementation of algorithms is comprehensively explained,
• The number of students in the dataset is big.
• The dataset description is not detailed.
Authors and YearEvaluation Metrics and ValuesStrengthsWeaknesses
A. J. Fernández-García et al., 2020 [ ]• Accuracy,
• Precision,
• Recall,
• F1-score.
• Included a section that contains the implementation code,
• The literature is meticulously discussed and followed by a table for a summary,
• Compared the effect of various preprocessing steps on the final measures of different machine-learning approaches and provided full details about these metrics,
• The implementation of each preprocessing step is explained in detail.
• N/A
Y. Adilaksa et al., 2021 [ ]• The percentage of recommendation diversity = 81.67%,
• Accuracy = 64%.
• The preprocessing steps are discussed in detail,
• The implementation is comprehensively explained,
• Confirmed by the experiment that using the weighted cosine similarity instead of the traditional cosine similarity significantly increased the accuracy of the course recommendations system.
• Did not provide any information about the used dataset,
• The method of splitting the training and testing dataset is not provided,
• The accuracy measurement is not specified.
Authors and YearEvaluation Metrics and ValuesStrengthsWeaknesses
Esteban, A. et al., 2020 [ ]• RMSE = 0.971,
• Normalized discount cumulative gain (nDCG) = 0.682,
• Reach = 100%,
• Time = 3.022s.
• The literature is meticulously discussed and followed by a table for a summary,
• The implementation of algorithms is comprehensively explained with examples,
• Compared the performance of the proposed hybrid approach with other similar approaches,
• Used many useful metrics for evaluation.
• Mentioned that some preprocessing had been carried out but did not give any details regarding it,
• The number of students in the dataset is relatively low.
M. I. Emon et al., 2021 [ ]• Accuracy,
• Precision,
• Recall,
• F1-score.
• Compared the performance of the proposed hybrid approach with the used standalone algorithms.• The exact numbers for the evaluation metrics used in the paper are not provided,
• The dataset description is not detailed,
• The method of splitting the training and testing dataset is not provided.
S. Alghamdi et al., 2022 [ ]• MAE = 0.772,
• RMSE = 1.215.
• The dataset description is detailed,
• The implementation of algorithms is clearly explained.
• Other similar approaches are not stated in the literature,
• The number of students in the dataset is relatively low.
S. G. G et al., 2021 [ ]RMSE = 0.931.• EDA of the dataset is included in the paper,
• Compared the performance of different approaches against the proposed approach,
• The implementation is comprehensively discussed and explained.
• The dataset description is not detailed,
• The method of splitting the training and testing dataset is not provided,
• Similar approaches are not stated in the literature,
• Did not mention whether they conducted any preprocessing on the dataset or if it was used as it is.
S. M. Nafea et al., 2019 [ ]• MAE for cold students = 0.162,
• RMSE for cold students = 0.26,
• MAE for cold Learning Objects (Los) = 0.162,
• RMSE for cold LOs = 0.3.
• Achieved higher accuracy than standalone traditional approaches mentioned in the paper: collaborative filtering and content-based recommendations,
• The implementation is comprehensively explained with examples.
• Mentioned that some preprocessing had been carried out but did not give any details regarding it,
• The dataset description is not detailed,
• The number of students in the dataset is relatively low.
X. Huang et al., 2018 [ ]• Precision,
• Recall,
• F1-score.
• The implementation of the proposed approach is comprehensively explained with examples,
• Compared the performance of the proposed hybrid approach with other similar approaches through testing.
• The dataset description is not detailed,
• Did not mention whether they have done any preprocessing on the dataset or if it was used as it is,
• The exact numbers for the evaluation metrics used in the paper are not provided.
Authors and YearEvaluation Metrics and ValuesStrengthsWeaknesses
A. Baskota et al., 2018 [ ]• Accuracy = 61.6%,
• Precision = 61.2%,
• Recall = 62.6%,
• F1-score = 61.5%.
• Compared the performance of the proposed approach with the performance of other popular approaches,
• Used many evaluation metrics and provided the exact numbers for each metric for the evaluation result.
• The dataset description is not detailed.
Jiang, Weijie et al., 2019 [ ]• The A model: accuracy = 75.23%, F-score = 60.24%,
• The B model: accuracy = 88.05%, F-score = 42.01%.
• The implementation of algorithms is comprehensively explained with examples,
• Included various sets of hyperparameters and carried out extensive testing.
• Did not mention whether they have done any preprocessing on the dataset or if it was used as it is,
• Did not mention the number of features in the dataset,
• The performance of the proposed approach is not compared to other similar approaches,
• Did not mention the exact percentages for splitting data.
Liang, Yu et al., 2019 [ ]• Support rate.• The implementation of algorithms is comprehensively explained.• The dataset description is not detailed,
• A literature review has not been discussed,
• The performance of the proposed approach is not compared to other similar approaches,
• Did not provide many useful metrics for evaluation and explained that was due to the large number of data sets selected for the experiment.
M. Isma’il et al., 2020 [ ]• Accuracy = 99.94%.• Compared the performance of the proposed machine-learning algorithm with the performance of other algorithms through testing.• Did not mention the training and test set sizes,
• The machine learning algorithms used are not explained,
• Only used the accuracy measure for evaluation,
• The dataset description is not detailed.
M. Revathy et al., 2022 [ ]• Accuracy = 97.59%,
• Precision = 97.52%,
• Recall = 98.74%,
• Sensitivity = 98.74%,
• Specificity = 95.56%.
• Used many evaluation metrics and provided the exact numbers for each metric for the evaluation result,
• Provided detailed information about the preprocessing steps,
• Compared the performance of the proposed approach with the performance of other approaches,
• Provided the exact numbers for each metric for the evaluation result.
N/A
Oreshin et al., 2020 [ ]• Accuracy = 0.91 ± 0.02,
• ROC-AUC = 0.97 ± 0.01,
• Recall = 0.83 ± 0.02,
• Precision = 0.86 ± 0.03.
• Used many evaluation metrics and provided the exact numbers for each metric for the evaluation result,
• Provided detailed information about the preprocessing steps.
• Contains many English grammar and vocabulary errors,
• The dataset description is not detailed,
• The machine learning algorithms used are not explained,
• Did not specify the parameters for the nested time-series split cross-validation.
R. Verma et al., 2018 [ ]• Accuracy (SVM) = 88.5%,
• Precision,
• Recall,
• F1-score.
• The implementation of algorithms is comprehensively explained,
• Compared the performance of several machine-learning algorithms with the performance of other algorithms through testing and concluded that the best two were SVM and ANN.
• The exact numbers for the evaluation metrics used in the paper are not provided except for the achieved accuracy of SVM.
S. D. A. Bujang et al., 2021 [ ]• Accuracy = 99.5%,
• Precision 99.5%,
• Recall = 99.5%,
• F1-score = 99.5%.
• Included all the exact numbers for the evaluation metrics used in the evaluation,
• Compared the performance of six machine learning algorithms and concluded that random forests performed the best based on the evaluation metrics,
• EDA of the dataset is included in the paper,
• The literature is meticulously discussed and followed by a table for a summary,
• Provided detailed information about the used dataset.
• The number of courses is very low (only 2).
S. Srivastava et al., 2018 [ ]• Accuracy (from 1 cluster to 100) = 99.40%:87.72%.• Compared the performance of the proposed approach with the performance of other popular approaches,
• Provided a confusion matrix for all the used approaches.
• Accuracy is the only metric used for evaluation,
• The dataset description is not detailed.
T. Abed et al., 2020 [ ]• Accuracy = 69.18%.• Compared the performance of the proposed approach with the performance of other popular approaches: Random Forest, J48, Naive Bayes, Logistic Regression, Sequential Minimal Optimization, and a Multilayer Perceptron.• The dataset description is not detailed,
• Only used the accuracy measure for evaluation,
• Did not include an explanation for the implemented algorithms and why they were initially chosen.
V. L. Uskov et al., 2019 [ ]• Average error = 3.70%.• Through extensive testing of various ML algorithms, they concluded that linear regression was the best candidate for the problem as the data was linear;
• The implementation of algorithms is comprehensively explained.
• The dataset description is not detailed,
• Only used the accuracy measure for evaluation,
• Did not use RMSE for the evaluation of linear regression.
V. Sankhe et al., 2020 [ ]• Accuracy = 81.3%• The implementation of algorithms is comprehensively explained.• The dataset description is not detailed,
• The method of splitting the training and testing dataset is not provided,
• Did not mention whether they have conducted any preprocessing on the dataset or if it was used as it is.
V. Z. Kamila et al., 2019 [ ]• Accuracy of KNN K = 1:100.00%
• Accuracy of Naive Bayes algorithm = 100.00%
• Provided the exact numbers for each metric for the evaluation result.• The implemented algorithms explanation was very brief,
• The performance of the proposed approach is not compared to other similar approaches,
• Did not provide any information about the dataset used,
• Did not mention whether they have conducted any preprocessing on the dataset or if it was used as it is.
Authors and YearEvaluation MetricsStrengthsWeaknesses
D. Shah et al., 2017 [ ]• Normalized mean absolute error (NMAE) = 0.0023,
• Computational Time Comparison.
• The implementation of the two compared algorithms is comprehensively explained,
• Compared the accuracy of recommendations from both algorithms as well as the speed.
• Did not mention whether they have conducted any preprocessing on the dataset or if it was used as it is,
• Similar approaches are not stated in the literature, in addition, the literature was very brief,
• Did not use RMSE for evaluation, which is considered the standard as its more accurate.
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Algarni, S.; Sheldon, F. Systematic Review of Recommendation Systems for Course Selection. Mach. Learn. Knowl. Extr. 2023 , 5 , 560-596. https://doi.org/10.3390/make5020033

Algarni S, Sheldon F. Systematic Review of Recommendation Systems for Course Selection. Machine Learning and Knowledge Extraction . 2023; 5(2):560-596. https://doi.org/10.3390/make5020033

Algarni, Shrooq, and Frederick Sheldon. 2023. "Systematic Review of Recommendation Systems for Course Selection" Machine Learning and Knowledge Extraction 5, no. 2: 560-596. https://doi.org/10.3390/make5020033

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

A Novel Approach to Recommendation System Business Workflows: A Case Study for Book E-Commerce Websites

  • Conference paper
  • First Online: 26 July 2022
  • Cite this conference paper

case study on recommendation system

  • Mounes Zaval 12 ,
  • Said Orfan Haidari 12 ,
  • Pinar Kosan 12 &
  • Mehmet S. Aktas 12  

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13381))

Included in the following conference series:

  • International Conference on Computational Science and Its Applications

1024 Accesses

1 Citations

Have you ever wondered why a song or a book or a movie becomes so popular that everyone everywhere starts talking about it? If we did not have the technology, we would say that people who love something would start recommending it to their friends and families. We live in the age of technology where there are so many algorithms that can discover the patterns of human interaction and make an excellent guess about someone’s opinion about something. These algorithms are building blocks of digital streaming services and E-Commerce websites. These services require as accurate as possible recommendation systems for them to function. While many businesses prefer one type or another of recommendation algorithms, in this study, we developed a hybrid recommendation system for a book E-Commerce website by integrating many popular classical and Deep Neural Network-based recommendation algorithms. Since explicit feedback is unavailable most of the time, all our implementations are on implicit binary feedback. The four algorithms that we were concerned about in this study were the well-known Collaborative filtering algorithms, item-based CF and user-based CF, ALS Matrix Factorization, and Deep Neural Network Based approaches. Consequently, comparing their performances and accuracy, it was not surprising that the Deep Neural Network approach was the most accurate recommender for our E-Commerce website.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

case study on recommendation system

A Survey of Artificial Intelligence-Based E-Commerce Recommendation System

case study on recommendation system

E-commerce Personalized Recommendations: a Deep Neural Collaborative Filtering Approach

case study on recommendation system

A deep neural network-based hybrid recommender system with user-user networks

Goldberg, K., et al.: EigenTaste: a constant time collaborative filtering algorithm. Inf. Retriev. 4 , 133–151 (2001)

Article   Google Scholar  

Miller, B.N., et al.: PocketLens: toward a personal recommender system. ACM Trans. Office Inf. Syst. 22 (3), 437–476 (2004)

Su, X., et al.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009 , 2 (2009)

Linden, G., et al.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7 (1), 76–80 (2003)

Claypool, M., et al.: Combining content-based and collaborative filters in an online newspaper. In: Proceedings of the ACM SIGIR 1999 Workshop on Recommender Systems: Algorithms and Evaluation, Berkeley, California, ACM (1999)

Google Scholar  

Gunes, I., et al.: Shilling attacks against recommender systems: a comprehensive survey. Artif. Intell. Rev. 42 (4), 767–799 (2014)

Pandey, A.K., Rajpoot, D.S.: Resolving cold start problem in recommendation system using demographic approach, pp. 213–218 (2016)

Papagelis, M., et al.: Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents. Eng. App. Artif. Intell. 18 (7), 781–789 (2005)

Bagchi, S.: Performance and quality assessment of similarity measures in collaborative filtering using mahout. Proc. Comput. Sci. 50 , 229–234 (2015)

Bokde, D., et al.: Matrix factorization model in collaborative filtering algorithms: a survey. Proc. Comput. Sci. 49 , 136–146 (2015)

kumar Bokde, D., et al.: Role of matrix factorization model in collaborative filtering algorithm: a survey. ArXiv abs/1503.07475 (2015)

He, X., et al.: Neural collaborative filtering. In: WWW 2017, Republic and Canton of Geneva, CHE, International World Wide Web Conferences Steering Committee, pp. 173–182 (2017)

Tran, T., Cohen, R.: Hybrid recommender systems for electronic commerce. In: Proceedings of Knowledge-Based Electronic Markets, Papers from the AAAI Workshop, Technical report WS-00-04, vol. 40. AAAI Press (2000)

Burke, R.: Knowledge-based recommender systems. Encycl. Libr. Inf. Syst. 69 , 175–186 (2000)

Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adapt. Interact. 12 (4), 331–370 (2002)

Smyth, B., Cotter, P.: Personalized tv listings service for the digital tv age. Knowl. Based Syst. 13 , 53–59 (2000)

Basu, C., et al.: Recommendation as classification: using social and content-based information in recommendation. In: AAAI/IAAI (1998)

Uzun-Per, M., et al.: Scalable recommendation systems based on finding similar items and sequences. Concurr. Comput. Pract. Exp. 2022 , e6841 (2022)

Uzun-Per, M., et al.: Big data testing framework for recommendation systems in e-science and e-commerce domains. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 2353–2361. IEEE (2021)

Uzun-Per, M., et al.: An approach to recommendation systems using scalable association mining algorithms on big data processing platforms: a case study in airline industry. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–6. IEEE (2021)

Tas, K., et al.: On the implicit feedback based data modeling approaches for recommendation systems. In: 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), pp. 1–6. IEEE (2021)

Duzen, Z., Aktas, M.S.: An approach to hybrid personalized recommender systems. In: 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–8. IEEE (2016)

Aktas, M.S., et al.: A web based conversational case-based recommender system for ontology aided metadata discovery. In: Fifth IEEE/ACM International Workshop on Grid Computing, pp. 69–75. IEEE (2004)

Arpacı, A., Aktaş, M.: Investigation of different approaches for recommendation system. In: ELECO 2018 (2018)

Olmezogullari, E., Aktas, M.: Representation of click-stream datasequences for learning user navigational behavior by using embeddings. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 3173–3179. IEEE (2020)

Uygun, Y., et al.: On the large-scale graph data processing for user interface testing in big data science projects. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 3173–3179. IEEE (2020)

Oz, M., et al.: On the use of generative deep learning approaches for generating hidden test scripts. Int. J. Softw. Eng. Knowl. Eng. IJSEKE. 31 (10), 1447–1468 (2021)

Olmezogullari, E., Aktas, M.: Pattern2vec: representation of clickstream data sequences for learning user navigational behavior. Concurr. Comput. Pract. Exp. 34 (9), e6546 (2022)

Bokde, D., et al.: Role of matrix factorization model in collaborative filtering algorithm: a survey. arXiv:abs/1503.07475 (2015)

Hu, Y., et al.: Collaborative filtering for implicit feedback datasets. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 263–272 (2008)

Elkahky, A.M., et al.: A multi-view deep learning approach for cross domain user modeling in recommendation systems. In: Proceedings of the 24th International Conference on World Wide Web. WWW 2015, Republic and Canton of Geneva, CHE, International World Wide Web Conferences Steering Committee, pp. 278–288 (2015)

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2015)

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.S.: Neural collaborative filtering source code. https://github.com/hexiangnan/neural_collaborative_filtering

Bayer, I., et al.: A generic coordinate descent framework for learning from implicit feedback. In: Proceedings of the 26th International Conference on World Wide Web. WWW 2017, Republic and Canton of Geneva, CHE, International World Wide Web Conferences Steering Committee, pp. 1341–1350 (2017)

He, X., et al.: Fast matrix factorization for online recommendation with implicit feedback. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2016, New York, NY, USA, Association for Computing Machinery, pp. 549–558 (2016)

Rendle, S., et al.: BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012)

He, X., et al.: Trirank: Review-aware explainable recommendation by modeling aspects. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. CIKM 2015, New York, NY, USA, Association for Computing Machinery, pp. 1661–1670 (2015)

Download references

Acknowledgement

We would like to thank TekhneLogos Company for helping with the dataset and the computational environment.

Author information

Authors and affiliations.

Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey

Mounes Zaval, Said Orfan Haidari, Pinar Kosan & Mehmet S. Aktas

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mounes Zaval .

Editor information

Editors and affiliations.

University of Perugia, Perugia, Italy

Osvaldo Gervasi

University of Basilicata, Potenza, Potenza, Italy

Beniamino Murgante

Østfold University College, Halden, Norway

Sanjay Misra

University of Minho, Braga, Portugal

Ana Maria A. C. Rocha

University of Cagliari, Cagliari, Italy

Chiara Garau

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Zaval, M., Haidari, S.O., Kosan, P., Aktas, M.S. (2022). A Novel Approach to Recommendation System Business Workflows: A Case Study for Book E-Commerce Websites. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13381. Springer, Cham. https://doi.org/10.1007/978-3-031-10548-7_50

Download citation

DOI : https://doi.org/10.1007/978-3-031-10548-7_50

Published : 26 July 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-10547-0

Online ISBN : 978-3-031-10548-7

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • DOI: 10.5120/ijca2016908149
  • Corpus ID: 1655012

A Case Study on Various Recommendation Systems

  • A. R. Krishnan
  • Published 15 January 2016
  • Computer Science
  • International Journal of Computer Applications

One Citation

Adaptive user interface design: a case study of web recommendation system, 12 references, amazon.com recommendations: item-to-item collaborative filtering, amazon . com recommendations item-to-item collaborative filtering, examining collaborative query reformulation: a case of travel information searching, user interactions with everyday applications as context for just-in-time information access, towards a model of collaborative information retrieval in tourism, visualizing implicit queries for information management and retrieval, implicit queries (iq) for contextualized search, just-in-time information retrieval agents, remembrance agent: a continuously running automated information retrieval system, ada and grace: direct interaction with museum visitors, related papers.

Showing 1 through 3 of 0 Related Papers

Book a Demo

Multiply your Shopify Store's Revenue with Personalization

30 day free-trial

Try Argoid for your business now!

Get product recommendation ribbons like 'Trending', 'Similar Products' and more, to improve conversion and sales Try Argoid, risk-free

5 Use Case Scenarios for Recommendation Systems and How They Help

case study on recommendation system

SHARE THIS BLOG

The market for recommendation engines is projected to grow from USD 1.14B in 2018 to 12.03B by 2025 with a CAGR of 32.39%, for the forecasted period. These figures are an indication of the growing emphasis on customer experience while also being a byproduct of the widespread proliferation of data. 

On that note, here are five practical use cases of recommendation systems across different industry verticals. Use these to understand the multiple ways in which recommendation systems can add value to your business.

Five Practical Use Cases of Recommendation Systems

1. ecommerce recommendations.

personalized ecommerce recommendaions

eCommerce is by far the commonest and most frequently encountered use case of recommendation systems in action. 

Amazon was a pioneer in introducing this change back in 2012 by making use of item-item collaborative filtering to recommend products to the buyers. The result? A resounding 29% uplift in sales in comparison to the performance in the previous quarter! Soon enough, the recommendation engine contributed to 35% of purchases made on the platform, which was bound to impact the bottom line of the eCommerce giant.

To this date, Amazon continues to remain a market leader by virtue of its helpful and user-friendly recommender engine that now also extends to the streaming platform - Amazon Prime (more on this later). The recommendation system is designed to intuitively understand and predict account interest and behaviors to drive purchases, boost engagement, increase cart volume, up-sell and cross-sell, and prevent cart abandonment.

Other retailers like ASOS, Pandora, and H&M utilize recommender systems to achieve a gamut of favorable results.

2. Media Recommendations

personalized OTT recommendaions

If Amazon is a frontrunner in the recommendation engine race, platforms like Netflix, Spotify, Prime Video, YouTube, and Disney+ are consolidating the role of recommendations in the field of media, entertainment, publishing, etc. Such channels have successfully normalized   recommendation systems in the real world. 

Typically, most media streaming service providers employ a relational understanding of the type of content consumed by the user to suggest fresh content accordingly. Additionally, the self-learning and self-training aspect of AI in recommendation engines improves relevancy to maintain high levels of engagement while preventing customer churn.

Consider Netflix, for example. About 75% of what users watch on Netflix is a result of its product recommendation algorithm. As a result, it is unsurprising that the platform has pegged its personalized recommendation engine at a whopping USD 1B per year as it maintains sustained subscription rates and delivers an impressive ROI that the company can redirect in fresh content creation.

3. Video Games and Stores

personalized video recommendations

Video games are a treasure trove of user-generated data as it contains everything - from the games they play to the choices they make. This stored repository of action, reaction, and behavior translates into usable data that allows developers to curate the experience to maximize revenue without coming off as pushy or annoying.

Gaming platforms like Steam, Xbox Games Store, PlayStation Store are already well-known for their excellent recommender engines that suggest games based on the player’s gaming history, browsing history, and purchase history. As such, someone who has an interest in battle royale games like Fortnite will recommend games like PUBG, Apex Legends, and CoD rather than MMORPGs like WoW.

Similarly, video games can also deploy recommender systems to nudge players towards the top of the micro-transactions funnel to make the gaming experience easier or more rewarding. Apart from boosting engagements and in-game purchases, AI-based recommender algorithms can unlock cross-selling and up-selling opportunities.

4. Location-Based Recommendations

location based recommedations

Geographic location can be a demographic factor that acts as a glue between the online and offline customer experience. It can augment marketing, advertising, and sales efforts to improve overall profitability. As a result, businesses have been in the works of developing a reliable location-based recommender system (LBRS) or a location-aware recommender system (LARS) for quite a while now, and have registered successful results.

Sephora, for instance, issues geo-triggered app notifications alerting customers on existing promos and offers when they are in the vicinity of a physical store. The Starbucks app also follows a similar system for recommending happy hours and store locations. An extension of this feature is seen in Foursquare, a local search-and-discovery mobile app, which matches users with establishments like local eateries, breweries, or activity centers, based on customer location and preferences. In the process, it maintains high engagement levels and promotes businesses in the same breath.

5. Health and Fitness

persoanlized health recommendations

Health and fitness is one of the newest entrants in the recommender system but enjoys immense potential because of this lag. Applications can capture user inputs, such as their dietary preferences, activity levels, fitness goals, height and weight, BMI, etc., to suggest customized diet plans, recipes, or workout routines to match their fitness goals. 

In addition to logging data on such platforms, its integration with wearable devices can streamline its ability to make more accurate and valuable suggestions - such as suggesting meditation or mindfulness exercises to high-risk groups upon registering elevated vitals. 

Most importantly, it can capture user feedback on their fitness journey and experiences in the form of ratings to fine tune the plan and make the recommendation smarter and more personalized. Say, someone is dissatisfied with the level of difficulty of the exercise regimen, then the app can recalibrate it to suit their abilities.

Final Words

Recommendations have become an implicit customer expectation as they no longer wish to sift through stores and websites or find things that they do not like. As a result, AI-based content recommendation engines, such as the one developed by Argoid , are no longer a “nice-to-have” feature but the life blood for your business. Talk to us to know more!

FAQs on Recommendation System Use Cases

What is a recommendation system.

A recommendation system uses the process of information filtering to predict the products a user will like and accordingly rate the products based on users' preferences. A recommender system easily highlights the most relevant products to the users and ensures faster conversion.

Why recommendation systems are essential for eCommerce stores?

A recommendation system makes the shopping journey simpler and enjoyable for users. With a recommendation system, you can get product recommendations with minimum search efforts and create long-term engagement.

Which recommendation system should you use?

A recommendation system that you must use is - Argoid. It helps with 1:1 personalized product recommendations for your eCommerce store and ensures faster conversion and retention.

Similar blogs recommended for you

case study on recommendation system

Revolutionizing TV Content Scheduling with AI: Introducing Argoid’s FAST Channel AI Co-Planner

Learn about Argoid's FAST channel AI Co-planner

case study on recommendation system

The Rise of FAST: How Ad-Supported Streaming is Changing TV

A brief on what is FAST and it is changing the television

case study on recommendation system

Framing the Future: How Streaming Media Recommendation Engines Can Transform Your Content Strategy

Learn how you can leverage streaming streaming media recommendation engines to transform your content strategy

Try Argoid for your business

Zero setup fee . comprehensive product . packages that suit your business..

case study on recommendation system

AI-powered real-time relevance for your viewers!

case study on recommendation system

Subscribe to our newsletter

Get the latest on AI in eCommerce and Streaming/OTT, AI-based recommendation systems and hyper-personalization.

case study on recommendation system

Facebook Recommendation System Case Study

Shubham D

Analytics Vidhya

In daily life we have used many social media application like FB, IG etc. but do you think how this platform will provide the friend recommendation alert. Also, the alert for the each user is kind of unique and depends upon the his or her friend circle, office circle etc. In this case study we will take the deep look at how all this stuff will work in perspective of the Machine Learning models.

Table of Contents:

A. Introduction

B. ML Formulation

C. Business Objective and Constraints

D. Data set analysis

E. First Cut Approach

F. Performance Matrix

H. Featurization

I . Modeling

K. Future work

M. References

A. Introduction:

In this case study we have to focus on the friend recommendation System for the Facebook users (i.e., recommend the person which has the highest possibility to become friend)

B. ML formulation:

In this case study we can use the supervised learning model with binary classification as we have to identify whether the link between the source and destination node is good or bad.

C. Business Objectives and constraints:

a) Objectives:

• The main objective is to identify the possible missing link by using the given data.

• The probability of classification is important because by using this we can select the specific number of recommendation option w.r.t the probability.

b) Constraints:

• Probability of prediction is useful to recommend highest probability links.

• No strict latency constraint, as the objective is more about making the right decision rather than a quick decision. It would be fine and acceptable if the model takes few seconds to make a prediction.

D. Data set analysis:

Taken data from facebook’s recruiting challenge on Kaggle

https://www.kaggle.com/c/FacebookRecruiting data contains two columns source and destination each edge in graph

Data columns (total 2 columns):

- Source node int64

  • Destination node int64

E. First Cut Approach:

1) The Dataset we have contain the nodes and edges. So, first we will do the EDA on the dataset to get clear insight of the dataset.

2) Post EDA, We have to do featurization because we will not give the datset with the node and edges directly to the model. So, to avoid this we will create the possible features by using the nodes and edges.

3) Once we have finalized the features we will create the dataset by using this features and we can use it for the modeling.

4) Now, in the modeling section we will try the different NLP model and check the performance of them.

5) Lastly we will select the model which will give the best performance.

F. Performance Matrix:

1) The predicted missing link should be correct so we required the high Recall value.

2) As we have to cover the maximum number of the people so the precision value should also be high.

4) So, the performance matrix F-1 score is very useful.

5) Confusion matrix

Importing the important libraries:

Note: We will use the networkx library as it is very helpful for graph related feature

Now we will read the graph where in degree and out degree is explain as below:

Let there is person A who follows the 3 person and it has the 5 followers then in this case

In-degree =5 and out-degree=3.

The Output of the above code is mentioned as below:

The graph type is Directional Graph.

Number of nodes (Source + Destination node) in the dataset is 1862220

Number of total edges in the dataset is 9437519

Both Average in degree and Average out degree for the nodes in the dataset is 5.0679

In dataset we have the nodes and edges between the nodes. The nodes are source node and destination node. So, we will analysis the data in term of the Followers, Following and Followers + Following.

1) Followers (in-degree to each person):

· First, we will try to plot the graph between the number of follower vs Index No. From this we got to know that there are very less people who have the number of follower greater than 50.

· To further analysis the same we will check the number of followers for the first 1.5M peoples by using the number of follower vs Index No graph. From this graph we got to know that most of the people have the no. of followers less than equal to 7.

· We have tried to plot the Box-plot by using the in-degree but it does not give the clear understanding.

· Now we have plotted the 90 to 100 percentile value of the in-degree and we got to know that 99 percentile value is 40 but 100 percentile value 552.

· So now we will further analyze the 100-percentile value by plotting the percentile value between the 99.1 to 100. From this we got to know that 99.9 percentile value is 112 but 100 percentile value 552.

· We have also plot the PDF for the in-degree data.

2) Following (out-degree from each person):

· First, we will try to plot the graph between the number of following vs Index No. From this we got to know that there are very less people who have the number of following greater than 50.

· To further analysis the same we will check the number of following for the first 1.5M peoples by using the number of following vs Index No graph. From this graph we got to know that most of the people have the no. of following less than equal to 7.

· We have tried to plot the Box-plot by using the out-degree but it does not give the clear understanding.

· Now we have plotted the 90 to 100 percentile value of the out-degree and we got to know that 99 percentile value is 40 but 100 percentile value 1566.

· So now we will further analyze the 100-percentile value by plotting the percentile value between the 99.1 to 100. From this we got to know that 99.9 percentile value is 112 but 100 percentile value 1566.

· The No. of person who is not following anyone or who has the no follower is 0.

· No of persons those are not following anyone are 274512 and % is 14.741115442858524

· No of persons having zero followers are 188043 and % is 10.097786512871734

3) Followers + Following (in-degree and Out-degree for each person):

· First, we will try to plot the graph between the number of Followers + Following vs Index No. From this we got to know that there are very less people who have the number of following greater than 80.

· To further analysis the same we will check the number of Followers + Following for the first 1.5M peoples by using the number of Followers + Following vs Index No graph. From this graph we got to know that most of the people have the no. of Followers + Following less than equal to 14.

· Now we have plotted the 90 to 100 percentile value of the out-degree+In-degree and we got to know that 99 percentile value is 79 but 100 percentile value 1579.

· So now we will further analyze the 100-percentile value by plotting the percentile value between the 99.1 to 100. From this we got to know that 99.9 percentile value is 221 but 100 percentile value 1579.

· No of persons having followers + following less than 10 are 1320326.

· Min of no of followers + following is 1 and 334291 persons having minimum no of followers + following

· Max of no of followers + following is 1579 and 1 persons having maximum no of followers + following

· No of weakly connected components 45558 weakly connected components wit 2 nodes 32195

Currently we have the dataset with the good edges(y=1) only so the dataset is currently imbalance to make the dataset properly balance we have to add some bad edges(y=0). Generated Bad links from graph which are not in graph and whose shortest path is greater than 2.

Len(U1, U4)=3 and Len(U1, U3)=2 so we will reject the (U1,U3) and accept the (U1,U4) link.

The total number of the bad link(missing_edges) which we got is 9437519 which we can add in the current dataset.

So, now we have the balanced dataset with the good links (y=1) and bad links (y=0) as mentioned below.

Now we have to split the train and test data in the 80% and 20% combination randomly as mentioned below:

Now we have done some observation about the train and test data.

As shown in the dig the 81K points are present in test but absent in the train data set and 717K points are present in train but not in the test data set. So, we do not have information about this point such problem are called as the partial cold start problem.

Post analyze the Output of the above code we got to know that :

  • Number of Nodes in the train and test dataset is 1780722 and 144623 respectively.
  • Number of edges in the train an test dataset is 7550015 and 1887504 respectively.
  • Average in-degree in the train an test dataset is 4.2399 and 1.6490 respectively.
  • Average out-degree in the train an test dataset is 4.2399 and 1.6490 respectively.
  • Number of nodes common in train and test dataset is 1063125
  • Number of nodes present in train but not present in test is 717597
  • Number of nodes present in test but not present in train is 81498
  • Percentage of the nodes present in test but not present in train from the total test dataset is 7.12%

H. Featurization:

Currently we have the data set with the source node, destination node and Yi but we cannot give this data directly to model we need to do the featurization first on the data. Let we have the feature f1, f2, f3…etc if we input the source node and destination node to the feature f1 we will get the corresponding value for the feature and we will add this to the new feature-based data set.

1) Jaccard Distance

2) cosine distance, 3) ranking measures.

3.1) Page Ranking

4) Graph Features

4.1) Shortest path:

4·2) Checking for same community

4·3) Adamic/Adar Index:

4·4) Is person was following back

4·5) Katz Centrality:

4·6) Hits Score

1) Jaccard Distance:

It is calculated by using the below mentioned formula:

Jaccard distance we can calculate for both follower and following. It is parameter to find out the similarity between the user. Higher the Jaccard distance larger is the probability that there is the edge between the users. Let X= {u1, u2, u3} and Y= {u2, u3, u4} then |X ∩ Y|= {u1, u4} =2 and |X ∪ Y| = {u1, u2, u3, u4} =4 i.e., j=2/4=0.5

Code snippet:

2) Cosine distance:

Cosine distance is also a distance same as the Jaccard distance only there is difference in the denominator. It is calculated by using the below mentioned formula:

Code Snippet:

3.1) Page Ranking:

If dataset have the directed graph data, then PageRank is popular feature. Consider each vertex as the web page and we can call it as the directed-graph. If any webpage has the hyperlink of CNN.com then this condition is in directed-graph condition. The Page rank score of the page is increase if it has the large number of important pages directed towards it.

In below mentioned image, the B has the highest page rank score because all the web pages are directed towards it so it has the large no of web page connected and some of them are the important one. While C has the high page rank as compare to E even though it has the less page directed toward it but B is directed toward C so it is also the important web page hence it has the high page rank value as compare to the E. So, from this we got to know that the page rank is mostly depend on the quality and quantity of the web page directed toward it.

It is mostly used in internet/web search task. But we can use in our data set as let consider the bill gate is person who is followed by the Mark Zugarbark, Warren Buffet (both are the celebrity like bill gates and also important person) so it will increase the score of the bill gates and also like non celebrity person like us (increases the number of follower) so we can recommend the bill gates to the any random person as there are chances that the person will follow him.

We already have the page rank feature in network. We can calculate the min probability, max probability and mean probability value. The probability value is the possibility of the person land up on this page if he does any random search . Earlier we have seen that there are some nodes which are present in train but not in the test and vice versa. For such source and destination nodes we will use the mean page rank value(mean_pr).

Post analyze the Output we got to know that:

  • Minimum page rank score is 1.6556497245737814e-07
  • Maximum page rank score is 2.7098251341935827e-05
  • Mean value of the page rank score 5.615699699389075e-07 . We can use the mean value for the imputation to the nodes which are not there in train data.

4) GRAPH FEATURES:

4.1) SHORTEST PATH:

In this featurization we will calculate the shortest path between the two nodes. If two nodes do not have the shortest path between them then we will consider it as the -1. If two nodes are directly connected then first, we will remove this direct connection and then find out the shortest path between them.

The shortest path between the source node 77697 and destination node 826021 is 10.

4.2) CHECKING FOR SAME COMMUNITY:

Before understanding this feature first, we have to focus on the concept of the weekly connected component and strongly connected component.

In the below mentioned image the component in section S1 is the Strongly connected component because from 1,2,0 we can travel to any component by following the direction but we cannot travel to the node 0,1,2 from node 3 and node 4. So, the component in the section S2 is weakly connected component.

If we combine S1 and S2, when we try to go from vertex 4 to 0 we do not get any path to move so we cannot call the Sub graph of combination of S1 and S2 as the strongly connected component.

Now let if there is some person who having set of friends in collage in one weakly connected sub-graph also called as the community and in other community is of it’s Office friend. Both the sets are very much different from each other. If there is any link between both this community then both communities can be combined

We can find the weakly connected component by using the networkx library and we can check if the input nodes are weakly connected component or not.

We are checking whether the source node 861 and destination node 1659750 is belong to the same community or not. But in the output we got the 0 so it shows that it does not belong to the same community.

4.3) ADAMIC/ADAR INDEX:

Where N(x) →Number of inward and outward connection at node x

N(y) →Number of inward and outward connection at node y

We can calculate the Adar index as below:

1)If we have to find the Adar index for node x and node y then first, we have to find out the common neighborhood between them i.e., N(x) ∩ N(y)

2) There are two type of common neighborhood we can get into the node x and node y :

1) Celebrity Node: it has large number of the neighbors and node x and node y is one of them so possibility of edge between the node x and node y is very small. So, in this case log(N|u|) is large so the value of 1/ log(N|u|) is decreases. As a result, the Adar index decreases.

2)Normal Node: It has the limited neighbors. Node may be node x and node y college friend etc. so there is large possibility that there may be possible edge between the node x and node y . So, in this case log(N|u|) is small so the value of 1/ log(N|u|) is increases. As a result, Adar index increases.

Calculating the Adar Index between source node 1 and destination node 189226 which is 0.

4.4) FOLLOW BACK:

If there is an edge between the (a,b) then there is high probability that there is an edge between (b,a).

There is possibility that source node 1 will follow back the destination node 189226 as the output is 1.

4.5) KARTZ CENTRALITY:

The Katz centrality of a node is a measure of centrality in a network. It is similar as the page rank but it is very old technique so not much popular. Formula for the Katz centrality is mentioned as below:

Where, Aij →Adjoin matrix with the eigen vector λ

Β → controls the initial centrality and α<1/ λmax

  • Minimum Katz centrality score is 0.0007313532484065916
  • Maximum Katz centrality score is 0.003394554981699122
  • Mean value of the Katz centrality score 0.0007483800935562018 . We can use the mean value for the imputation to the nodes which are not there in train data.

4.6) HITS SCORE:

Kindly refer the https://en.wikipedia.org/wiki/HITS_algorithm for more details.

  • Minimum HITS score is 0.0
  • Maximum HITS score is 0.004868653378780953
  • Mean value of the HITS score 5.615699699344123e-07

4.7) SVD(Decomposition):

Let A is the Adjacency matrix of G which is the size of the nodes in the dataset i.e., 1.78M*1.78M. It is the large sparse matrix because as shown in the below matrix A the value for the vertex i and j is Aij which is 1 if there is the edge between the ui and uj and if not then 0. So, this makes the matrix as a sparse.

In below mention code, while performing the SVD we have selected the decomposition of matrix into size 6. Post applying the SVD i.e., decompose the Adjacency matrix, we got the three values U, S (also refer as the sigma) and VT so respective matrix shapes is mentioned as below. In the U the point i is define in a six possible way and also in the V same i is define in a six possible way. So, if we have two vertices then for each vertex, we have the pair of 6 feature respective to the U and VT. So, for the pair of the Vertices, we have the total 24 features with value 0 or 1.

The shape of matrix post SVD decomposition is:

4.8) WEIGHT FEATURES:

In order to determine the similarity of nodes, an edge weight value was calculated between nodes. Edge weight decreases as the neighbors count goes up. Intuitively, consider one million people following a celebrity on a social network then chances are most of them never met each other or the celebrity. On the other hand, if a user has 30 contacts in his/her social network, the chances are higher that many of them know each other.

We have created the below mentioned feature for both train and test:

· weight of incoming edges

· weight of outgoing edges

· weight of incoming edges + weight of outgoing edges

· weight of incoming edges * weight of outgoing edges

· 2*weight of incoming edges + weight of outgoing edges

· weight of incoming edges + 2*weight of outgoing edges

4.9) SVD DOT:

Dot product between source node SVD and destination node SVD features. as shown below.

4.10) PREFERENTIAL ATTACHMENT:

One well-known concept in social networks is that users with many friends tend to create more connections in the future. This is due to the fact that in some social networks, like in finance, the rich get richer. We estimate how ” rich” our two vertices are by calculating the multiplication between the number of friends (|Γ(x)|) or followers each vertex has. It may be noted that the similarity index does not require any node neighbor information; therefore, this similarity index has the lowest computational complexity.

I. Modeling:

We have done with the featurization. So, now we will procced with the modeling in below mentioned steps:

· Parameter Tuning

· Train the model with the best parameter.

· Checking the performance matrix

· Feature importance

Below Table will feature the performance of the different model:

J. Summary:

As per the performance which we have observed in the Modeling section the GBDT is the best model for the FB Recommendation system with the graph based dataset

K. Future work:

1. Currently we have tried to implement the important NLP algorithm but we can also check the performance of the model by using the DL method like: 1) MLP 2) LSTM 3) GRE

2. Also, we can introduce the new methods of featurization to improve the dataset as you know the thump rule of ML that more is the data better is the performance.

L. Profile:

Thanks for reading! do appreciate my hard work by clapping. If you want to see how I implemented this analysis , here is the project’s GitHub repo . I’m always open to constructive feedback — if you have follow-up ideas for this analysis, comment it below or reach out via LinkedIn.

M. References:

1) Special Thanks to: https://www.appliedaicourse.com/?gclid=CjwKCAjwzMeFBhBwEiwAzwS8zGsy_TVp9ZwwZtXRBlFlAMxT_Iq5c6CP5-dPPYQeteoT5b9jBeBmfhoCPVoQAvD_BwE

2) https://en.wikipedia.org/wiki/HITS_algorithm

3) Weight features reference: Graph-based Features for Supervised Link Prediction William Cukierski, Benjamin Hamner, Bo Yang

4)Preferential Attachment reference : https://medium.com/@cynosuremishra01/different-featurization-techniques-for-graph-related-problems-in-machine-learning-9c9d60caae60

5) https://www.kaggle.com/c/FacebookRecruiting

Shubham D

Written by Shubham D

Data Science and Machine Learning Enthusiast

Text to speech

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Internet Res
  • v.23(6); 2021 Jun

Logo of jmir

Health Recommender Systems: Systematic Review

Robin de croon.

1 Department of Computer Science, KU Leuven, Leuven, Belgium

Leen Van Houdt

Nyi nyi htun, gregor Štiglic.

2 Faculty of Health Sciences, University of Maribor, Maribor, Slovenia

Vero Vanden Abeele

Katrien verbert, associated data.

Coded data set of all included papers.

Overview of recommended items by 73 studies.

Overview of evaluation approaches.

Health recommender systems (HRSs) offer the potential to motivate and engage users to change their behavior by sharing better choices and actionable knowledge based on observed user behavior.

We aim to review HRSs targeting nonmedical professionals (laypersons) to better understand the current state of the art and identify both the main trends and the gaps with respect to current implementations.

We conducted a systematic literature review according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and synthesized the results. A total of 73 published studies that reported both an implementation and evaluation of an HRS targeted to laypersons were included and analyzed in this review.

Recommended items were classified into four major categories: lifestyle, nutrition, general health care information, and specific health conditions. The majority of HRSs use hybrid recommendation algorithms. Evaluations of HRSs vary greatly; half of the studies only evaluated the algorithm with various metrics, whereas others performed full-scale randomized controlled trials or conducted in-the-wild studies to evaluate the impact of HRSs, thereby showing that the field is slowly maturing. On the basis of our review, we derived five reporting guidelines that can serve as a reference frame for future HRS studies. HRS studies should clarify who the target user is and to whom the recommendations apply, what is recommended and how the recommendations are presented to the user, where the data set can be found, what algorithms were used to calculate the recommendations, and what evaluation protocol was used.

Conclusions

There is significant opportunity for an HRS to inform and guide health actions. Through this review, we promote the discussion of ways to augment HRS research by recommending a reference frame with five design guidelines.

Introduction

Research goals.

Current health challenges are often related to our modern way of living. High blood pressure, high glucose levels, and physical inactivity are all linked to a modern lifestyle characterized by sedentary living, chronic stress, or a high intake of energy-dense foods and recreational drugs [ 1 ]. Moreover, people usually make poor decisions related to their health for distinct reasons, for example, busy lifestyles, abundant options, and a lack of knowledge [ 2 ]. Practically, all modern lifestyle health risks are directly affected by people’s health decisions [ 3 ], such as an unhealthy diet or physical inactivity, which can contribute up to three-fourth of all health care costs in the United States [ 4 ]. Most risks can be minimized, prevented, or sometimes even reversed with small lifestyle changes. Eating healthily, increasing daily activities, and knowing where to find validated health information could lead to improved health status [ 5 ].

Health recommender systems (HRSs) offer the potential to motivate and engage users to change their behavior [ 6 ] and provide people with better choices and actionable knowledge based on observed behavior [ 7 - 9 ]. The overall objective of the HRS is to empower people to monitor and improve their health through technology-assisted, personalized recommendations. As one approach of modern health care is to involve patients in the cocreation of their own health, rather than just leaving it in the hands of medical experts [ 10 ], we limit the scope of this paper to HRSs that focus on laypersons, for example, nonhealth care professionals. These HRSs are different from clinical decision support systems that provide recommendations for health care professionals. However, laypersons also need to understand the rationale of recommendations, as echoed by many researchers and practitioners [ 11 ]. This paper also studies the role of a graphical user interface. To guide this study, we define our research questions (RQs) as follows:

RQ1: What are the main applications of the recent HRS, and what do these HRSs recommend?

RQ2: Which recommender techniques are being used across different HRSs?

RQ3: How are the HRSs evaluated, and are end users involved in their evaluation?

RQ4: Is a graphical user interface designed, and how is it used to communicate the recommended items to the user?

Recommender Systems and Techniques

Recommender techniques are traditionally divided into different categories [ 12 , 13 ] and are discussed in several state-of-the-art surveys [ 14 ]. Collaborative filtering is the most used and mature technique that compares the actions of multiple users to generate personalized suggestions. An example of this technique can typically be found on e-commerce sites, such as “Customers who bought this item also bought...” Content-based filtering is another technique that recommends items that are similar to other items preferred by the specific user. They rely on the characteristics of the objects themselves and are likely to be highly relevant to a user’s interests. This makes content-based filtering especially valuable for application domains with large libraries of a single type of content, such as MedlinePlus’ curated consumer health information [ 15 ]. Knowledge-based filtering is another technique that incorporates knowledge by logic inferences. This type of filtering uses explicit knowledge about an item, user preferences, and other recommendation criteria. However, knowledge acquisition can also be dynamic and relies on user feedback. For example, a camera recommender system might inquire users about their preferences, fixed or changeable lenses, and budget and then suggest a relevant camera. Hybrid recommender systems combine multiple filtering techniques to increase the accuracy of recommendation systems. For example, the companies you may want to follow feature in LinkedIn uses both content and collaborative filtering information [ 16 ]: collaborative filtering information is included to determine whether a company is similar to the ones a user already followed, whereas content information ensures whether the industry or location matches the interests of the user. Finally, recommender techniques are often augmented with additional methods to incorporate contextual information in the recommendation process [ 17 ], including recommendations via contextual prefiltering, contextual postfiltering, and contextual modeling [ 18 ].

HRSs for Laypersons

Ricci et al [ 12 ] define recommender systems as:

Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user [ 13 , 19 , 20 ]. The suggestions relate to various decision-making processes, such as what items to buy, what music to listen to, or what online news to read.

In this paper, we analyze how recommender systems have been used in health applications, with a focus on laypersons. Wiesner and Pfeifer [ 21 ] broadly define an HRS as:

a specialization of an RS [recommender system] as defined by Ricci et al [ 12 ]. In the context of an HRS, a recommendable item of interest is a piece of nonconfidential, scientifically proven or at least generally accepted medical information.

Researchers have sought to consolidate the vast body of literature on HRSs by publishing several surveys, literature reviews, and state-of-the-art overviews. Table 1 provides an overview of existing summative studies on HRSs that identify existing research and shows the number of studies included, the method used to analyze the studies, the scope of the paper, and their contribution.

An overview of the existing health recommender system overview papers.

ReviewPapers, nMethodScopeContribution
Sezgin and Özkan (2013) [ ]8Systematic reviewProvides an overview of the literature in 2013Identifying challenges (eg, cyber-attacks, difficult integration, and data mining can cause ethical issues) and opportunities (eg, integration with personal health data, gathering user preferences, and increased consistency)
Calero Valdez et al (2016) [ ]17SurveyStresses the importance of the interface and HCI of an HRS Providing a framework to incorporate domain understanding, evaluation, and specific methodology into the development process
Kamran and Javed (2015) [ ]7Systematic reviewProvides an overview of existing recommender systems with more focus on health care systemsProposing a hybrid HRS
Afolabi et al (2015) [ ]22Systematic reviewResearch empirical results and practical implementations of HRSsPresenting a novel proposal for the integration of a recommender system into smart home care
Ferretto et al (2017) [ ]8Systematic reviewIdentifies and analyzes HRSs available in mobile appsIdentifying HRSs that do not have many mobile health care apps
Hors-Fraile et al 2018 [ ]19Systematic reviewIdentifies, categorizes, and analyzes existing knowledge on the use of HRSs for patient interventionsProposing a multidisciplinary taxonomy, including integration with electronic health records and the incorporation of health promotion theoretical factors and behavior change theories
Schäfer et al (2017) [ ]24SurveyDiscusses HRSs to find personalized, complex medical interventions or support users with preventive health care measuresIdentifying challenges subdivided into patient and user challenges, recommender challenges, and evaluation challenges
Sadasivam et al (2016) [ ]15Systematic reviewResearch limitations of current CTHC systemsIdentifying challenges of incorporating recommender systems into CTHC. Proposing a future research agenda for CTHC systems
Wiesner and Pfeifer (2014) [ ]Not reportedSurveyIntroduces HRSs and explains their usefulness to personal health record systemsOutlining an evaluation approach and discussing challenges and open issues
Cappella et al (2015) [ ]Not reportedSurveyExplores approaches to the development of a for archives of public health messagesReflecting on theory development and applications

a HCI: human-computer interaction.

b HRS: health recommender system.

c CTHC: computer-tailored health communication.

As can be seen in Table 1 , the scope of the existing literature varies greatly. For example, Ferretto et al [ 26 ] focused solely on HRSs in mobile apps. A total of 3 review studies focused specifically on the patient side of the HRS: (1) Calero Valdez et al [ 23 ] analyzed the existing literature from a human-computer interaction perspective and stressed the importance of a good HRS graphical user interface; (2) Schäfer et al [ 28 ] focused on tailoring recommendations to end users based on health context, history, and goals; and (3) Hors-Fraile et al [ 27 ] focused on the individual user by analyzing how HRSs can target behavior change strategies. The most extensive study was conducted by Sadasivam et al [ 29 ]. In their study, most HRSs used knowledge-based recommender techniques, which might limit individual relevance and the ability to adapt in real time. However, they also reported that the HRS has the opportunity to use a near-infinite number of variables, which enables tailoring beyond designer-written rules based on data. The most important challenges reported were the cold start [ 31 ] where limited data are available at the start of the intervention, limited sample size, adherence, and potential unintended consequences [ 29 ]. Finally, we observed that these existing summative studies were often restrictive in their final set of papers.

Our contributions to the community are four-fold. First, we analyze a broader set of research studies to gain insights into the current state of the art. We do not limit the included studies to specific devices or patients in a clinical setting but focus on laypersons in general. Second, through a comprehensive analysis, we aim to identify the applications of recent HRS apps and gain insights into actionable knowledge that HRSs can provide to users (RQ1), to identify which recommender techniques have been used successfully in the domain (RQ2), how HRSs have been evaluated (RQ3), and the role of the user interface in communicating recommendations to users (RQ4). Third, based on our extensive literature review, we derive a reference frame with five reporting guidelines for future layperson HRS research. Finally, we collected and coded a unique data set of 73 papers, which is publicly available in Multimedia Appendix 1 [ 7 - 9 , 15 , 32 - 100 ] for other researchers.

Search Strategy

This study was conducted according to the key steps required for systematic reviews according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [ 101 ]. A literature search was conducted using the ACM Digital Library (n=2023), IEEExplore (n=277), and PubMed (n=93) databases. As mentioned earlier, in this systematic review we focused solely on HRSs with a focus on laypersons. However, many types of systems, algorithms, and devices can be considered as a HRS. For example, push notifications in a mobile health app or health tips prompted by web services can also be considered as health-related recommendations. To outline the scope, we limited the search terms to include a recommender or recommendation, as reported by the authors. The search keywords were as follows, using an inclusive OR: ( recommender OR recommendation systems OR recommendation system ) AND (health OR healthcare OR patient OR patients ).

In addition, a backward search was performed by examining the bibliographies of the survey and review papers discussed in the Introduction section and the reference list of included studies to identify any additional studies. A forward search was performed to search for articles that cited the work summarized in Table 1 .

Study Inclusion and Exclusion Criteria

As existing work did not include many studies ( Table 1 ) and focused on a specific medical domain or device, such as mobile phones, this literature review used nonrestrictive inclusion criteria. Studies that met all the following criteria were included in the review: described an HRS whose primary focus was to improve health (eg, food recommenders solely based on user preferences [ 102 ] were not included); targeted laypersons (eg, activity recommendations targeted on a proxy user such as a coach [ 103 ] were not included); implemented the HRS (eg, papers describing an HRS concept are not included); reported an evaluation, either web-based or offline evaluation; peer-reviewed and published papers; published in English.

Papers were excluded when one of the following was true: the recommendations of HRSs were unclear; the full text was unavailable; or a newer version was already included.

Finally, when multiple papers described the same HRS, only the latest, relevant full paper was included.

Classification

To address our RQs, all included studies were coded for five distinct coding categories.

Study Details

To contextualize new insights, the publication year and publication venue were analyzed.

Recommended Items

HRSs are used across different health domains. To provide details on what is recommended, all papers were coded according to their respective health domains. To not limit the scope of potential items, no predefined coding table was used. Instead, all papers were initially coded by the first author. These resulting recommendations were then clustered together in collaboration with the coauthors into four categories, as shown in Multimedia Appendix 2 .

Recommender Techniques

This category encodes the recommender techniques that were used: collaborative filtering [ 104 ], content-based filtering [ 105 ], knowledge-based filtering [ 106 ], and their hybridizations [ 107 ]. Some studies did not specify any algorithmic details or compared multiple techniques. Finally, when an HRS used contextual information, it was coded whether they used pre- or postfiltering or contextual modeling.

Evaluation Approach

This category encodes which evaluation protocols were used to measure the effect of HRSs. We coded whether the HRSs were evaluated through offline evaluations (no users involved), surveys, heuristic feedback from expert users, controlled user studies, deployments in the wild , and randomized controlled trials (RCTs). We also coded sample size and study duration and whether ethical approval was gathered and needed.

Interface and Transparency

Recommender systems are often perceived as a black box , as the rationale for recommendations is often not explained to end users. Recent research increasingly focuses on providing transparency to the inner logic of the system [ 11 ]. We encoded whether explanations are provided and, in this case, how such transparency is supported in the user interface. Furthermore, we also classified whether the user interface was designed for a specific platform, categorized as mobile , web , or other.

Data Extraction, Intercoder Reliability, and Quality Assessment

The required information for all included technologies and studies was coded by the first author using a data extraction form. Owing to the large variety of study designs, the included studies were assessed for quality (detailed scores given in Multimedia Appendix 1 ) using the tool by Hawker et al [ 108 ]. Using this tool, the abstract and title , introduction and aims , method and data , sample size (if applicable), data analysis , ethics and bias , results , transferability or generalizability , and implications and usefulness were allocated a score between 1 and 4, with higher scoring studies indicating higher quality. A random selection with 14% (10/73) of the papers was listed in a spreadsheet and coded by a second researcher following the defined coding categories and subcategories. The decisions made by the second researcher were compared with the first. With the recommended items ( Multimedia Appendix 2 ), there was only one small disagreement between physical activity and leisure activity [ 32 ], but all other recommended items were rated exactly the same; the recommender techniques had a Cohen κ value of 0.71 ( P <.001) and the evaluation approach scored a Cohen κ value of 0.81 ( P <.001). There was moderate agreement (Cohen κ=0.568; P <.001) between the researchers concerning the quality of the papers. The interfaces used were in perfect agreement. Finally, the coding data are available in Multimedia Appendix 1 .

The literature in three databases yielded 2340 studies, of which only 23 were duplicates and 53 were full proceedings, leaving 2324 studies to be screened for eligibility. A total of 2161 studies were excluded upon title or abstract screening because they were unrelated to health or targeted at medical professionals or because the papers did not report an evaluation. Thus, the remaining 163 full-text studies were assessed for eligibility. After the removal of 90 studies that failed the inclusion criteria or met the exclusion criteria, 73 published studies remained. The search process is illustrated in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig1.jpg

Flow diagram according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. EC: exclusion criteria; IC: inclusion criteria.

All included papers were published in 2009 or later, following an upward trend of increased popularity. The publication venues of HRSs are diverse. Only the PervasiveHealth [ 33 - 35 ], RecSys [ 36 , 37 , 109 ], and WI-IAT [ 38 - 40 ] conferences published 3 papers each that were included in this study. The Journal of Medical Internet Research was the only journal that occurred more frequently in our data set; 5 papers were published by Journal of Medical Internet Research [ 41 - 45 ]. The papers were first rated using Hawker tool [ 108 ]. Owing to a large number of offline evaluations, we did not include the sample score to enable a comparison between all included studies. The papers received an average score of 24.32 (SD 4.55, max 32; data set presented in Multimedia Appendix 1 ). Most studies scored very poor on reporting ethics and potential biases, as illustrated in Figure 2 . However, there is an upward trend over the years in more adequate reporting of ethical issues and potential biases. The authors also limited themselves to their specific case studies and did not make any recommendations for policy (last box plot is presented in Figure 2 ). All 73 studies reported the use of different data sets. Although all recommended items were health related, only Asthana et al [ 46 ] explicitly mentioned using electronic health record data. Only 14% (10/73) [ 7 , 47 - 55 ] explicitly reported that they addressed the cold-start problem.

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig2.jpg

Distribution of the quality assessment using Hawker tool.

Most HRSs operated in different domains and thus recommended different items. In this study, four nonmutually exclusive categories of recommended items were identified: lifestyle 33% (24/73), nutrition 36% (26/73), general health information 32% (23/73), and specific health condition–related recommendations 12% (9/73). The only significant trend we found is the increasing popularity of nutrition advice . Multimedia Appendix 2 shows the distribution of these recommended items.

Many HRSs, 33% (24/73) of the included studies, suggest lifestyle-related items, but they differ greatly in their exact recommendations. Physical activity is often recommended. Physical activities are often personalized according to personal interests [ 56 ] or the context of the user [ 35 ]. In addition to physical activities, Kumar et al [ 32 ] recommend eating, shopping, and socializing activities. One study analyzes the data and measurements to be tracked for an individual and then recommends the appropriate wearable technologies to stimulate proactive health [ 46 ]. A total of 7 studies [ 7 , 9 , 42 , 53 , 57 - 59 ] more directly try to convince users to alter their behavior by recommending them to change, or alter their behavior: for example, Rabbi et al [ 7 ] learn “a user’s physical activity and dietary behavior and strategically suggests changes to those behaviors for a healthier lifestyle . ” In another example, both Marlin et al [ 59 ] and Sadasivam et al [ 42 ] motivate users to stop smoking by providing them with tailored messages, such as “Keep in mind that cravings are temporary and will pass.” Messages could reflect the theoretical determinants of quitting, such as positive outcome expectations and self-efficacy enhancing small goals [ 42 ].

The influence of food on health is also clear from the large subset of HRSs dealing with nutrition recommendations. A mere 36% (26/73) of the studies recommend nutrition-related information, such as recipes [ 50 ], meal plans [ 36 ], restaurants [ 60 ], or even help with choosing healthy items from a restaurant menu [ 61 ]. Wayman and Madhvanath [ 37 ] provide automated, personalized, and goal-driven dietary guidance to users based on grocery receipt data. Trattner and Elsweiler [ 62 ] use postfiltering to focus on healthy recipes only and extended them with nutrition advice, whereas Ge et al [ 48 ] require users to first enter their preferences for better recommendations. Moreover, Gutiérrez et al [ 63 ] propose healthier alternatives through augmented reality when the users are shopping. A total of 7 studies specifically recommend healthy recipes [ 47 , 48 , 50 , 62 , 64 - 66 ]. Most HRSs consider the health condition of the user, such as the DIETOS system [ 67 ]. Other systems recommend recipes that are synthesized based on existing recipes and recommend new recipes [ 64 ], assist parents in making appropriate food for their toddlers [ 47 ], or help users to choose allergy-safe recipes [ 65 ].

General Health Information

According to 32% (23/73) of the included studies, providing access to trustworthy health care information is another common objective. A total of 5 studies focused on personalized, trustworthy information per se [ 15 , 55 , 68 - 70 ], whereas 5 others focused on guiding users through health care forums [ 52 , 71 - 74 ]. In total, 3 studies [ 55 , 68 , 69 ] provided personalized access to general health information. For example, Sanchez Bocanegra et al [ 15 ] targeted health-related videos and augmented them with trustworthy information from the United States National Library of Medicine (MedlinePlus) [ 110 ]. A total of 3 studies [ 52 , 72 , 74 ] related to health care forums focused on finding relevant threads. Cho et al [ 72 ] built “an autonomous agent that automatically responds to an unresolved user query by posting an automated response containing links to threads discussing similar medical problems.” However, 2 studies [ 71 , 73 ] helped patients to find similar patients. Jiang and Yang [ 71 ] investigated approaches for measuring user similarity in web-based health social websites, and Lima-Medina et al [ 73 ] built a virtual environment that facilitates contact among patients with cardiovascular problems. Both studies aim to help users seek informational and emotional support in a more efficient way. A total of 4 studies [ 41 , 75 - 77 ] helped patients to find appropriate doctors for a specific health problem, and 4 other studies [ 51 , 78 - 80 ] focused on finding nearby hospitals. A total of 2 studies [ 78 , 79 ] simply focused on the clinical preferences of the patients, whereas Krishnan et al [ 111 ] “provide health care recommendations that include Blood Donor recommendations and Hospital Specialization.” Finally, Tabrizi et al [ 80 ] considered patient satisfaction as the primary feature of recommending hospitals to the user.

Specific Health Conditions

The last group of studies (9/73, 12%) focused on specific health conditions. However, the recommended items vary significantly. Torrent-Fontbona and Lopez Ibanez [ 81 ] have built a knowledge-based recommender system to assist diabetes patients in numerous cases, such as the estimated carbohydrate intake and past and future physical activity. Pustozerov et al [ 43 ] try to “reduce the carbohydrate content of the desired meal by reducing the amount of carbohydrate-rich products or by suggesting variants of products for replacement.” Li and Kong [ 82 ] provided diabetes-related information, such as the need for a low-sodium lunch, targeted on American Indians through a mobile app. Other health conditions supported by recommender systems include depression and anxiety [ 83 ], mental disorders [ 45 ], and stress [ 34 , 54 , 84 , 85 ]. Both the mental disorder [ 45 ] and the depression and anxiety [ 83 ] HRSs recommend mobile apps. For example, the app MoveMe suggests exercises tailored to the user’s mood. The HRS to alleviate stress includes recommending books to read [ 54 ] and meditative audios [ 85 ].

The recommender techniques used varied greatly. Table 2 shows the distributions of these recommender techniques.

Overview of the different recommender techniques used in the studies.

Main technique StudyTotal studies, n (%)
Collaborative filtering[ , , ]3 (4)
Content-based filtering[ , , , , , , ]7 (10)
Knowledge-based filtering[ , , , , , , , , , , , , - ]16 (22)
Hybrid[ , , , , , - , , - , , , , , , , , , , , , , , , - , ]32 (44)
Context-based techniques[ , , , ]4 (5)
Not specified[ , , ]3 (4)
Comparison between techniques[ , , , , , , , ]8 (11)

a The papers are classified based on how the authors reported their techniques.

Recommender Techniques in Practice

The majority of HRSs (49/73, 67%) rely on knowledge-based techniques, either directly (17/49, 35%) or in a hybrid approach (32/49, 65%). Knowledge-based techniques are often used to incorporate additional information of patients into the recommendation process [ 112 ] and have been shown to improve the quality of recommendations while alleviating other drawbacks such as cold-start and sparsity issues [ 14 ]. Some studies use straightforward approaches, such as if-else reasoning based on domain knowledge [ 9 , 79 , 81 , 82 , 88 , 90 , 100 ]. Other studies use more complex algorithms such as particle swarm optimization [ 57 ], fuzzy logic [ 68 ], or reinforcement algorithms [ 44 , 84 ].

In total, 32 studies reported using a combination of recommender techniques and are classified as hybrid recommender systems . Different knowledge-based techniques are often combined. For example, Ali et al [ 56 ] used a combination of rule-based reasoning, case-based reasoning, and preference-based reasoning to recommend personalized physical activities according to the user’s specific needs and personal interests. Asthana et al [ 46 ] combined the knowledge of a decision tree and demographic information to identify the health conditions. When health conditions are known, the system knows which measurements need to be monitored. A total of 7 studies used a content-based technique to recommend educational content [ 15 , 72 , 87 ], activities [ 32 , 86 ], reading materials [ 54 ], or nutritional advice [ 63 ].

Although collaborative filtering is a popular technique [ 113 ], it is not used frequently in the HRS domain. Marlin et al [ 59 ] used collaborative filtering to personalize future smoking cessation messages based on explicit feedback on past messages. This approach is used more often in combination with other techniques. A total of 2 studies [ 38 , 92 ] combined content-based techniques with collaborative filtering. Esteban et al [ 92 ], for instance, switched between content-based and collaborative approaches. The former approach is used for new physiotherapy exercises and the latter, when a new patient is registered or when previous recommendations to a patient are updated.

Context-Based Recommender Techniques

From an HRS perspective, context is described as an aggregate of various information that describes the setting in which an HRS is deployed, such as the location, the current activity, and the available time of the user. A total of 5 studies use contextual information to improve their recommendations but use a different technique; a prefilter uses contextual information to select or construct the most relevant data for generating recommendations. For example, in Narducci et al [ 75 ], the set of potentially similar patients was restricted to consultation requests in a specific medical area. Rist et al [ 33 ] applied a rule-based contextual prefiltering approach [ 114 ] to filter out inadequate recommendations, for example, “if it is dark outside, all outdoor activities, such as ‘take a walk,’ are filtered out” [ 33 ] before they are fed to the recommendation algorithm. However, a postfilter removes the recommended items after they are generated, such as filtering outdoor activities while it is raining. Casino et al [ 97 ] used a postfiltering technique by running the recommended items through a real-time constraint checker . Finally, contextual modeling, which was used by 2 studies [ 35 , 58 ], uses contextual information directly in the recommendation function as an explicit predictor of a user’s rating for an item [ 114 ].

Location, agenda, and weather are examples of contextual information used by Lin et al [ 35 ] to promote the adoption of a healthy and active lifestyle. Cerón-Rios et al [ 58 ] used a decision tree to analyze user needs, health information, interests, time, location, and lifestyle to promote healthy habits. Casino et al [ 97 ] gathered contextual information through smart city sensor data to recommend healthier routes. Similarly, contextual information was acquired by Rist et al [ 33 ] using sensors embedded in the user’s environment.

Comparisons

A total of 8 papers compared different recommender techniques to find the most optimal algorithm for a specific data set, end users, domain, and goal. Halder et al [ 52 ] used two well-known health forum data sets (PatientsLikeMe [ 115 ] and HealthBoards [ 116 ]) to compare 7 recommender techniques (among collaborative filtering and content-based filtering) and found that a hybrid approach scored best [ 52 ]. Another example is the study by Narducci et al [ 75 ], who compared four recommendation algorithms: cosine similarity as a baseline, collaborative filtering, their own HealthNet algorithm, and a hybrid of HealthNet and cosine similarity. They concluded that a prefiltering technique for similar patients in a specific medical area can drastically improve the recommendation accuracy [ 75 ]. The average and SD of the resulting ratings of the two collaborative techniques are compared with random recommendations by Li et al [ 60 ]. They show that a hybrid approach of a collaborative filter augmented with the calculated health level of the user performs better. In their nutrition-based meal recommender system, Yang et al [ 49 ] used item-wise and pairwise image comparisons in a two-step process. In conclusion, the 8 studies showed that recommendations can be improved when the benefits of multiple recommender techniques are combined in a hybrid solution [ 60 ] or contextual filters are applied [ 75 ].

HRSs can be evaluated in multiple ways. In this study, we found two categories of HRS evaluations: (1) offline evaluations that use computational approaches to evaluate the HRS and (2) evaluations in which an end user is involved. Some studies used both, as shown in Multimedia Appendix 3 .

Offline Evaluations

Of the total studies, 47% (34/73) do not involve users directly in their method of evaluation. The evaluation metrics also vary greatly, as many distinct metrics are reported in the included papers ( Multimedia Appendix 3 ). Precision 53% (18/34), accuracy 38% (13/34), performance 35% (12/34), and recall 32% (11/34) were the most commonly used offline evaluation metrics. Recall has been used significantly more in recent papers, whereas accuracy also follows an upward trend. Moreover, performance was defined differently across studies. Torrent-Fontbona and Lopez Ibanez [ 81 ] compared the “amount of time in the glycaemic target range by reducing the time below the target” as performance. Cho et al [ 72 ] compared the precision and recall to report the performance. Clarke et al [ 84 ] calculated their own reward function to compare different approaches, and Lin et al [ 35 ] measured system performance as the number of messages sent in their in the wild study. Finally, Marlin et al [ 59 ] tested the predictive performance using a triple cross-validation procedure.

Other popular offline evaluation metrics are accuracy-related measurements, such as mean absolute (percentage) error, 18% (6/34); normalized discounted cumulative gain (nDCG), 18% (6/34); F 1 score, 15% (5/34); and root mean square error, 15% (5/34). The other metrics were measured inconsistently. For example, Casino et al [ 97 ] reported that they measure robustness but do not outline what they measure as robustness. However, they measured the mean absolute error. Torrent-Fontbona and Lopez Ibanez [ 81 ] defined robustness as the capability of the system to handle missing values. Effectiveness is also measured with different parameters, such as its ability to take the right classification decisions [ 75 ] or in terms of key opinion leaders’ identification [ 41 ]. Finally, Li and Zaman [ 68 ] measured trust with a proxy: “evaluate the trustworthiness of a particular user in a health care social network based on factors such as role and reputation of the user in the social community” [ 68 ].

User Evaluations

Of the total papers, 53% (39/73) included participants in their HRS evaluation, with an average sample size of 59 (SD 84) participants (excluding the outlier of 8057 participants, as recruited in the study by Cheung et al [ 83 ]). On average, studies ran for more than 2 months (68, SD 56 days) and included all age ranges. There is a trend of increasing sample size and study duration over the years. However, only 17 studies reported the study duration; therefore, these trends were not significant. Surveys (12/39, 31%), user studies (10/39, 26%), and deployments in the wild (10/39, 26%) were the most used user evaluations. Only 6 studies used an RCT to evaluate their HRS. Finally, although all the included studies focused on HRSs and were dealing with sensitive data, only 12% (9/73) [ 9 , 34 , 42 - 45 , 73 , 83 , 95 ] reported ethical approval by a review board.

No universal survey was found, as all the studies deployed a distinct survey. Ge et al [ 48 ] used the system usability scale and the framework of Knijnenburg et al [ 117 ] to explain the user experience of recommender systems. Esteban et al [ 95 ] designed their own survey with 10 questions to inquire about user experience. Cerón-Rios [ 58 ] relied on the ISO/IEC (International Organization of Standardization/International Electrotechnical Commission) 25000 standard to select 7 usability metrics to evaluate usability. Although most studies did not explicitly report the surveys used, user experience was a popular evaluation metric, as in the study by Wang et al [ 69 ]. Other metrics range from measuring user satisfaction [ 69 , 99 ] and perceived prediction accuracy [ 59 ] (with 4 self-composed questions). Nurbakova et al [ 98 ] combined data analytics with surveys to map their participants’ psychological background, including orientations to happiness measured using the Peterson scale [ 118 ], personality traits using the Mini-International Personality Item Pool [ 119 ], and Fear of Missing Out based on the Przybylski scale [ 120 ].

Single-Session Evaluations (User Studies)

A total of 10 studies recruited users and asked them to perform certain tasks in a single session. Yang et al [ 49 ] performed a 60-person user study to assess its feasibility and effectiveness. Each participant was asked to rate meal recommendations relative to those made using a traditional survey-based approach. In a study by Gutiérrez et al [ 63 ], 15 users were asked to use the health augmented reality assistant and measure the qualities of the recommender system, users’ behavioral intentions, perceived usefulness, and perceived ease of use. Jiang and Xu [ 77 ] performed 30 consultations and invited 10 evaluators majoring in medicine and information systems to obtain an average rating score and nDCG. Radha et al [ 8 ] used comparative questions to evaluate the feasibility. Moreover, Cheng et al [ 89 ] used 2 user studies to rank two degrees of compromise (DOC). A low DOC assigns more weight to the algorithm, and a high DOC assigns more weight to the user’s health perspective. Recommendations with a lower DOC are more efficient for the user’s health, but recommendations with a high DOC could convince users to believe that the recommended action is worth doing. Other approaches used are structured interviews [ 58 ], ranking [ 86 , 89 ], asking for unstructured feedback [ 40 , 88 ], and focus group discussions [ 87 ]. Finally, 3 studies [ 15 , 75 , 90 ] evaluated their system through a heuristic evaluation with expert users.

In the Wild

Only 2 studies tested their HRS into the wild recruited patients (people with a diagnosed health condition) in their evaluation. Yom-Tov et al [ 44 ] provided 27 sedentary patients with type 2 diabetes with a smartphone-based pedometer and a personal plan for physical activity. They assessed the effectiveness by calculating the amount of activity that the patient performed after the last message was sent. Lima-Medina et al [ 73 ] interviewed 45 patients with cardiovascular problems after a 6-month study period to measure (1) social management results, (2) health care plan results, and (3) recommendation results. Rist et al [ 33 ] performed an in-situ evaluation in an apartment of an older couple and used the data logs to describe the usage but augmented the data with a structured interview.

Yang et al [ 49 ] conducted a field study of 227 anonymous users that consisted of a training phase and a testing phase to assess the prediction accuracy. Buhl et al [ 99 ] created three user groups according to the recommender technique used and analyzed log data to compare the response rate, open email rate, and consecutive log-in rate. Similarly, Huang et al [ 76 ] compared the ratio of recommended doctors chosen and reserved by patients with the recommended doctors. Lin et al [ 35 ] asked 6 participants to use their HRSs for 5 weeks, measured system performance, studied user feedback to the recommendations, and concluded with an open-user interview. Finally, Ali et al [ 56 ] asked 10 volunteers to use their weight management systems for a couple of weeks. However, they do not focus on user-centric evaluation, as “only a prototype of the [...] platform is implemented.”

Rabbi et al [ 7 ] followed a single case with multiple baseline designs [ 121 ]. Single-case experiments achieve sufficient statistical power with a large number of repeated samples from a single individual. Moreover, Rabbi et al [ 7 ] argued that HRSs suit this requirement “since enough repeated samples can be collected with automated sensing or daily manual logging [ 121 ].” Participants were exposed to 2, 3, or 4 weeks of the control condition. The study ran for 7-9 weeks to compensate for the novelty effects. Food and exercise log data were used to measure changes in food calorie intake and calorie loss during exercise.

Randomized Controlled Trials

Only 6 studies followed an RCT approach. In the RCT by Bidargaddi et al [ 45 ], a large group of patients (n=192) and control group (n=195) were asked to use a web-based recommendation service for 4 weeks that recommended mental health and well-being mobile apps. Changes in well-being were measured using the Mental Health Continuum-Short Form [ 122 ]. The RCT by Sadasivam et al [ 42 ] enrolled 120 current smokers (n=74) and control group (n=46) as a follow-up to a previous RCT [ 123 ] that evaluated their portal to specifically evaluate the HRS algorithm. Message ratings were compared between the intervention and control groups.

Cheung et al [ 83 ] measured app loyalty through the number of weekly app sessions over a period of 16 weeks with 8057 users. In the study by Paredes et al [ 34 ], 120 participants had to use the HRS for at least 26 days. Self-reported stress assessment was performed before and after the intervention. Agapito et al [ 67 ] used an RCT with 40 participants to validate the sensitivity (true positive rate/[true positive rate+false negative rate]) and specificity (true negative rate/[true negative rate+false positive rate]) of the DIETOS HRS. Finally, Luo et al [ 93 ] performed a small clinical trial for more than 3 months (but did not report the number of participants). Their primary outcome measures included two standard clinical blood tests: fasting blood glucose and laboratory-measured glycated hemoglobin, before and after the intervention.

Only 47% (34/73) of the studies reported implementing a graphical user interface to communicate the recommended health items to the user. As illustrated in Table 3 , 53% (18/34) use a mobile interface, usually through a mobile (web) app, whereas 36% (14/34) use a web interface to show the recommended items. Rist et al [ 33 ] built a kiosk into older adults’ homes, as illustrated in Figure 3 . Gutiérrez et al [ 63 ] used Microsoft HoloLens to project healthy food alternatives in augmented reality surrounding a physical object that the user holds, as shown in Figure 4 .

Distribution of the interfaces used among the different health recommender systems (n=34).

InterfaceStudyTotal studies, n (%)
Mobile[ , , , , , , , , , , , , - , , , ]18 (53)
Web[ , , , , , , , , , , , , , ]14 (41)
Kiosk[ ]1 (3)
HoloLens[ ]1 (3)

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig3.jpg

Rist et al installed a kiosk in the home of older adults as a direct interface to their health recommender system.

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig4.jpg

An example of the recommended healthy alternatives by Gutiérrez et al.

Visualization

A total of 7 studies [ 33 , 34 , 37 , 63 , 79 , 88 , 97 ] or approximately one-fourth of the studies with an interface included visualizations. However, the approach used was different for all studies, as shown in Table 4 . Showing stars to show the relevance of a recommended item are only used by Casino et al [ 97 ] and Gutiérrez et al [ 63 ]. Wayman and Madhvanath [ 37 ] also used bar charts to visualize the progress toward a health goal. They visualize the healthy proportions, that is, what the user should eat. Somewhat more complex visualizations are used by Ho and Chen [ 88 ] who visualized the user’s ECG zones. Paredes et al [ 34 ] presented an emotion graph as an input screen. Rist et al [ 33 ] visualized an example of how to perform the recommended activity.

Distribution of the visualizations used among the different health recommender systems (n=7).

Visualization techniqueStudyTotal studies, n (%)
Bar chartsWayman and Madhvanath [ ] and Gutiérrez et al [ ]2 (29)
HeatmapHo and Chen [ ]1 (14)
Emotion graphParedes et al [ ]1 (14)
Visual example of actionRist et al [ ]1 (14)
MapAvila-Vazquez et al [ ]1 (14)
Star ratingCasino et al [ ]1 (14)

Transparency

In the study by Lage et al [ 87 ], participants expressed that:

they would like to have more control over recommendations received. In that sense, they suggested more information regarding the reasons why the recommendations are generated and more options to assess them.

A total of 7 studies [ 7 , 37 , 41 , 45 , 63 , 66 , 82 ] explained the reasoning behind recommendations to end users at the user interface. Gutiérrez et al [ 63 ] provided recommendations for healthier food products and mentioned that the items ( Figure 4 ) are based on the users’ profile. Ueta et al [ 66 ] explained the relationship between the recommended dishes and a person’s health conditions. For example, a person with acne can see the following text: “15 dishes that contained Pantothenic acid thought to be effective in acne a lot became a hit” [ 66 ]. Li and Kong [ 82 ] showed personalized recommended health actions in a message center. Color codes are used to differentiate between reminders, missed warnings, and recommendations. Rabbi et al [ 7 ] showed tailored motivational messages to explain why activities are recommended. For example, when the activity walk near East Ave is recommended, the app shows the additional message:

1082 walks in 240 days, 20 mins of walk everyday. Each walk nearly 4 min. Let us get 20 mins or more walk here today 7

Wayman and Madhvanath [ 37 ] first visualized the user’s personal nutrition profile and used the lower part of the interface to explain why the item was recommended. They provided an illustrative example of spaghetti squash. The explanation shows that:

This product is high in Dietary_fiber, which you could consume more of. Try to get 3 servings a week 37

Guo et al [ 41 ] recommended doctors and showed a horizontal bar chart to visualize the user’s values compared with the average values. Finally, Bidargaddi et al [ 45 ] visualized how the recommended app overlaps with the goal set by the users, as illustrated in Figure 5 .

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig5.jpg

A screenshot from the health recommender system of Bidargaddi et al. Note the blue tags illustrating how each recommended app matches the users’ goals.

Principal Findings

HRSs cover a multitude of subdomains, recommended items, implementation techniques, evaluation designs, and means of communicating the recommended items to the target user. In this systematic review, we clustered the recommended items into four groups: lifestyle, nutrition, general health care information, and specific health conditions. There is a clear trend toward HRSs that provide well-being recommendations but do not directly intervene in the user’s medical status. For example, almost 70% (50/73; lifestyle and nutrition) focused on no strict medical recommendations. In the lifestyle group, physical activities (10/24, 42%) and advice on how to potentially change behavior (7/24, 29%) were recommended most often. In the nutrition group, these recommendations focused on nutritional advice (8/26, 31%), diets (7/26, 27%), and recipes (7/26, 27%). A similar trend was observed in the health care information group, where HRSs focused on guiding users to the appropriate environments such as hospitals (5/23, 22%) and medical professionals (4/23, 17%) or on helping users find qualitative information (5/23, 22%) on validated sources or from experiences by similar users and patients on health care forums (3/23, 13%). Thus, they only provide general information and do not intervene by recommending, for example, changing medication. Finally, when HRSs targeted specific health conditions, they recommended nonintervening actions, such as meditation sessions [ 84 ] or books to read [ 54 ].

Although collaborative filtering is commonly the most used technique in other domains [ 124 ], here only 3 included studies reported the use of a collaborative filtering approach. Moreover, 43% (32/73) of the studies applied a hybrid approach, showing that HRS data sets might need special attention, which might also be the reason why all 73 studies used distinct data sets. In addition, the HRS evaluations varied greatly and were divided over evaluations where the end user was involved and evaluations that did not evolve users (offline evaluations). Only 47% (34/73) of the studies reported implementing a user interface to communicate recommendations to the user, despite the need to show the rationale of recommendations, as echoed by many researchers and practitioners [ 11 ]. Moreover, only 15% (7/47) included a (basic) visualization.

Unfortunately, this general lack of agreement on how to report HRSs might introduce researcher bias, as a researcher is currently completely unconstrained in defining what and how to measure the added value of an HRS. Therefore, further debate in the health recommender community is needed on how to define and measure the impact of HRSs. On the basis of our review and contribution to this discussion, we put forward a set of essential information that researchers should report in their studies.

Considerations for Practice

The previously discussed results have direct implications in practice and provide suggestions for future research. Figure 6 shows a reference frame of these requirements that can be used in future studies as a quality assessment tool.

An external file that holds a picture, illustration, etc.
Object name is jmir_v23i6e18035_fig6.jpg

A reference frame to report health recommender system studies. On the basis of the results of this study, we suggest that it should be clear what and how items are recommended (A), who the target user is (B), which data are used (C), and which recommender techniques are applied (D). Finally, the evaluation design should be reported in detail (E).

Define the Target User

As shown in this review, HRSs are used in a plethora of subdomains and each domain has its own experts. For example, in nutrition, the expert is most likely a dietician. However, the user of an HRS is usually a layperson without the knowledge of these domain experts, who often have different viewing preferences [ 125 ]. Furthermore, each user is unique. All individuals have idiosyncratic reasons for why they act, think, behave, and feel in a certain way at a specific stage of their life [ 126 ]. Not everybody is motivated by the same elements. Therefore, it is important to know the target user of the HRS. What is their previous knowledge, what are their goals, and what motivates them to act on a recommended item?

Show What Is Recommended (and How)

Researchers have become aware that accuracy is not sufficient to increase the effectiveness of a recommender system [ 127 ]. In recent years, research on human factors has gained attention. For example, He et al [ 11 ] surveyed 24 existing interactive recommender systems and compared their transparency, justification, controllability, and diversity. However, none of these 24 papers discussed HRSs. This indicates the gap between HRSs and recommender systems in other fields. Human factors have gained interest in the recommender community by “combining interactive visualization techniques with recommendation techniques to support transparency and controllability of the recommendation process” [ 11 ]. However, in this study, only 10% (7/73) explained the rationale of recommendations and only 10% (7/73) included a visualization to communicate the recommendations to the user. We do not argue that all HRSs should include a visualization or an explanation. However, researchers should pay attention to the delivery of these recommendations. Users need to understand, believe, and trust the recommended items before they can act on it.

To compare and assess HRSs, researchers should unambiguously report what the HRS is recommending. After all, typical recommender systems act like a black box , that is, they show suggestions without explaining the provenance of these recommendations [ 11 ]. Although this approach is suitable for typical e-commerce applications that involve little risk, transparency is a core requirement in higher risk application domains such as health [ 128 ]. Users need to understand why a recommendation is made, to assess its value and importance [ 12 ]. Moreover, health information can be cumbersome and not always easy to understand or situate within a specific health condition [ 129 ]. Users need to know whether the recommended item or action is based on a trusted source, tailored to their needs, and actionable [ 130 ].

Report the Data Set Used

All 73 studies used a distinct data set. Furthermore, some studies combine data from multiple databases, making it even more difficult to judge the quality of the data [ 35 ]. Nonetheless, most studies use self-generated data sets. This makes it difficult to compare and externally validate HRSs. Therefore, we argued that researchers should clarify the data used and potentially share whether these data are publicly available. However, in health data are often highly privacy sensitive and cannot be shared among researchers.

Outline the Recommender Techniques

The results show that there is no panacea for which recommender technique to use. The included studies differ from logic filters to traditional recommender techniques, such as collaborative filtering and content-based filtering to hybrid solutions and self-developed algorithms. However, with 44% (32/73), there is a strong trend toward the use of hybrid recommender techniques. The low number of collaborative filter techniques might be related to the fact that the evaluation sample sizes were also relatively low. Unfortunately, some studies have not fully disclosed the techniques used and only reported on the main algorithm used. It is remarkable that studies published in high-impact journals, such as studies by Bidargaddi et al [ 45 ] and Cheung et al [ 83 ], did not provide information on the recommender technique used. Nonetheless, disclosing the recommender technique allows other researchers not only to build on empirically tested technologies but also to verify whether key variables are included [ 29 ]. User data and behavior data can be identified to augment theory-based studies [ 29 ]. Researchers should prove that the algorithm is capable of recommending valid and trustworthy recommendations to the user based on their available data set.

Elaborate on the Evaluation Protocols

HRSs can be evaluated using different evaluation protocols. However, the protocol should be outlined mainly by the research goals of the authors. On the basis of the papers included in this study, we differentiate between the two approaches. In the first approach, the authors aim to influence their users’ health, for example, by providing personalized diabetes guidelines [ 81 ] or prevention exercises for users with low back pain [ 95 ]. Therefore, the end user should always be involved in both the design and evaluation processes. However, only 8% (6/73) performed an RCT and 14% (10/73) deployed their HRS in the wild. This lack of user involvement has been noted previously by researchers and has been identified as a major challenge in the field [ 27 , 28 ]. Nonetheless, in other domains, such as job recommenders [ 131 ] or agriculture [ 132 ], user-centered design has been proposed as an important methodology in the design and development of tools used by end users, with the purpose of gaining trust and promoting technology acceptance, thereby increasing adoption with end users. Therefore, we recommend that researchers evaluate their HRSs with actual users. A potential model for a user-centric approach to recommender system evaluation is the user-centric framework proposed by Knijnenburg et al [ 117 ].

Research protocols need to be elaborated and approved by an ethical review board to prevent any impact on users. Authors should report how they informed their users and how they safeguarded the privacy of the users. This is in line with the modern journal and conference guidelines. For example, editorial policies of the Journal of Medical Internet Research state that “when reporting experiments on human subjects, authors should indicate IRB (Institutional Rese[a]rch Board, also known as REB) approval/exemption and whether the procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation” [ 133 ]. However, only 12% (9/73) reported their approval by an ethical review board. Acquiring review board approval will help the field mature and transition from small incremental studies to larger studies with representative users to make more reliable and valid findings.

In the second approach, the authors aim to design a better algorithm, where better is again defined by the authors. For example, the algorithm might perform faster, be more accurate, and be more efficient in computing power. Although the F 1 score, the mean absolute error, and nDCG are well defined and known within the recommender domain, other parameters are more ambiguous. For example, the performance or effectiveness can be assessed using different measurements. However, a health parameter can be monitored, such as the duration that a user remains within healthy ranges [ 81 ]. Furthermore, it could be a predictive parameter, such as an improved precision and recall as a proxy for performance [ 72 ]. Unfortunately, this difference makes it difficult to compare health recommendation algorithms. Furthermore, this inconsistency in measurement variables makes it infeasible to report in this systematic review which recommender techniques to use. Therefore, we argue that HRS algorithms should always be evaluated for other researchers to validate the results, if needed.

Limitations

This study has some limitations that affect its contribution. Although an extensive scope search was conducted in scientific databases and most relevant health care informatic journals, some relevant literature in other domains might have been excluded. The keywords used in the search string could have impacted the results. First, we did not include domain-specific constructs of health, such as asthma, pregnancy, and iron deficiency. Many studies may implicitly report healthy computer-generated recommendations when they research the impact of a new intervention. In these studies, however, building an HRS is often not their goal and, therefore, was excluded from this study. Second, we searched for papers that reported studying an HRS; nonincluded studies might have built an HRS but did not report it as such. Considering our RQs, we deemed it important that authors explicitly reported their work as a recommender system. To conclude, in this study, we provide a large cross-domain overview of health recommender techniques targeted to laypersons and deliver a set of recommendations that could help the field of HRS mature.

This study presents a comprehensive report on the use of HRS across domains. We have discussed the different subdomains HRS applied in, the different recommender techniques used, the different manners in which they are evaluated, and finally, how they present the recommendations to the user. On the basis of this analysis, we have provided research guidelines toward a consistent reporting of HRSs. We found that although most applications are intended to improve users’ well-being, there is a significant opportunity for HRSs to inform and guide users’ health actions. Although many of the studies present a lack of a user-centered evaluation approach, some studies performed full-scale RCT evaluations or elaborated in the wild studies to validate their HRS, showing the field of HRS is slowly maturing. On the basis of this study, we argue that it should always be clear what the HRS is recommending and to whom these recommendations are for. Graphical assets should be added to show how recommendations are presented to users. Authors should also report which data sets and algorithms were used to calculate the recommendations. Finally, detailed evaluation protocols should be reported.

We conclude that the results motivate the creation of richer applications in future design and development of HRSs. The field is maturing, and interesting opportunities are being created to inform and guide health actions.

Acknowledgments

This work was part of the research project PANACEA Gaming Platform with project HBC.2016.0177, which was financed by Flanders Innovation & Entrepreneurship and research project IMPERIUM with research grant G0A3319N from the Research Foundation-Flanders (FWO) and the Slovenian Research Agency grant ARRS-N2-0101. Project partners were BeWell Innovations and the University Hospital of Antwerp.

Abbreviations

DOCdegrees of compromise
HRShealth recommender system
ISO/IECInternational Organization of Standardization/International Electrotechnical Commission
nDCGnormalized discounted cumulative gain
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses
RCTrandomized controlled trial
RQresearch question

Multimedia Appendix 1

Multimedia appendix 2, multimedia appendix 3.

Conflicts of Interest: None declared.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Movie Recommendation Systems Based on Collaborative Filtering: A Case Study on Netflix

Profile image of Ecem Kaya

Related Papers

International Journal of Electrical and Computer Engineering (IJECE)

Nowadays, recommendation systems are used successfully to provide items (example: movies, music, books, news, images) tailored to user preferences. Amongst the approaches existing to recommend adequate content, we use the collaborative filtering approach of finding the information that satisfies the user by using the reviews of other users. These reviews are stored in matrices that their sizes increase exponentially to predict whether an item is relevant or not. The evaluation shows that these systems provide unsatisfactory recommendations because of what we call the cold start factor. Our objective is to apply a hybrid approach to improve the quality of our recommendation system. The benefit of this approach is the fact that it does not require a new algorithm for calculating the predictions. We are going to apply two algorithms: k-nearest neighbours (KNN) and the matrix factorization algorithm of collaborative filtering which are based on the method of (singular-value-decomposition). Our combined model has a very high precision and the experiments show that our method can achieve better results.

case study on recommendation system

Expressing reviews in the form of sentiments or ratings for item used or movie seen is the part of human habit. These reviews are easily available on different social websites. Based on interest pattern of a user, it is important to recommend him the items. Recommendation system is playing a vital role in everyone's life as demand of recommendation for user's interest increasing day by day. Movie recommendation system based on available ratings for a movie has become interesting part for new users. Till today, a lot many recommendation systems are designed using several machine learning algorithms. Still, sparsity problems, cold start problem, scalability, grey sheep problem are the hurdles for the recommendation systems that must be resolved using hybrid algorithms. We proposed in this paper, a movie rating system using a k-nearest neighbor (KNN-based) collaborative filtering (CF) approach. We compared user's ratings for different movies to get top K users. Then we have used this top K set to find missing ratings by user for a movie using CF. Our proposed system when evaluated for various criteria shows promising results for movie recommendations compared with existing systems.

International Journal of Electrical and Computer Engineering (IJECE) , Dajana Conte

Due to modern information and communication technologies (ICT), it is increasingly easier to exchange data and have new services available through the internet. However, the amount of data and services available increases the difficulty of finding what one needs. In this context, recommender systems represent the most promising solutions to overcome the problem of the so-called information overload, analyzing users' needs and preferences. Recommender systems (RS) are applied in different sectors with the same goal: to help people make choices based on an analysis of their behavior or users' similar characteristics or interests. This work presents a different approach for predicting ratings within the model-based collaborative filtering, which exploits singular value factorization. In particular, rating forecasts were generated through the characteristics related to users and items without the support of available ratings. The proposed method is evaluated through the MovieLens100K dataset performing an accuracy of 0.766 and 0.951 in terms of mean absolute error and root-mean-square error.

International Journal of Emerging Technology and Advanced Engineering

Vishal Paranjape

The primary aim of recommender system is to predict items which are of most interest to the users and today recommender systems play a vital role in boosting the sales in any e-commerce based platform. The present paper proposes an approach for recommending movies to the users on the basis on their choices. A novel technique for evaluation of collaborative filtering using SVD and hit ratio as a metric is taken in our proposed approach. We attempted to build a model-based Collaborative filtering technique. The proposed paper makes use of matrix factorization techniques like SVD & SVD++ for filtering movie recommendation system based on latent features. It makes better recommendations based on choice of user because it captures the underlying features driving the raw data. In this paper we are proposing a hybrid recommender system fusion of Content Based and SVD to get a new hybrid recommender system. Our proposed model gives the value of RMSE 0.87 for SVD model and RMSE 0.938 for SVD...

Nicola Barbieri

A New Similarity Measure for User-based Collaborative Filtering in Recommender Systems

srikanth T , Shashi Mogalla

ABSTRACT Collaborative filtering is a popular approach in recommender Systems that helps users in identifying the items they may like in a wagon of items. Finding similarity among users with the available item ratings so as to predict rating(s) for unseen item(s) based on the preferences of likeminded users for the current user is a challenging problem. Traditional measures like Cosine similarity and Pearson correlation‟s correlation exhibit some drawbacks in similarity calculation. This paper presents a new similarity measure which improves the performance of Recommender System. Experimental results on MovieLens dataset show that our proposed distance measure improves the quality of prediction. We present clustering results as an extension to validate the effectiveness of our proposed method. KEYWORDS Recommender Systems; Collaborative Filtering; Similarity Measure; Cosine Similarity; Pearson Correlation; Clustering; user-based Collaborative Filtering; Cluster Purity; Similarity

International Journal of Engineering Applied Sciences and Technology

Sonali Suryawanshi

Journal of Computer Science IJCSIS

Nowadays, with confronted huge volumes of data produced in the world that created this data in organizations, banks, military centers, hospitals, etc. Recommender systems are generated to deal with the problems of huge volumes of data. These systems help to users in many different fields among the massive volume of information to make the right decision. Recommender system by analyzing user behavior suggests to users appropriate services, like electronic stores. Also in today's world, the Internet provides vast amount of data to users. But if not available effective management on aggregate data, these data will be a barrier to progress. Nowadays, with the development of information systems need before than capable of directing users towards goods and services they are desire. But if effective management on aggregate data is not available, these data will be a barrier to progress. Therefore, in this article we offer a new proposed approach by using a hybrid method. To evaluate the proposed approach we used Movie Lens standard dataset. Also we used two techniques of collaborative filtering and content based filtering. For the proposed approach we offer four models. Finally, these models compared by accuracy of prediction and classification error. https://sites.google.com/site/ijcsis/

Ziad Al-Sharif

With the advent and explosive growth of the Web over the past decade, recommender systems have become at the heart of the business strategies of e-commerce and Internet-based companies such as Google, YouTube, Facebook, Netflix, LinkedIn, Amazon, etc. Hence, the collaborative filtering recommendation algorithms are highly valuable and play a vital role at the success of such businesses in reaching out to new users and promoting their services and products. With the aim of improving the recommendation performance of such an algorithm, this paper proposes a new collaborative filtering recommendation algorithm based on dimensionality reduction and clustering techniques. The k-means algorithm and Singular Value Decomposition (SVD) are both used to cluster similar users and reduce the dimensionality. It proposes and evaluates an effective two stage recommender system that can generate accurate and highly efficient recommendations. The experimental results show that this new method significantly improves the performance of the recommendation systems .

International Journal of Information Technology and Applied Sciences (IJITAS)

muhammad sanwal

In the current era, a rapid increase in data volume produces redundant information on the internet. This predicts the appropriate items for users a great challenge in information systems. As a result, recommender systems have emerged in this decade to resolve such problems. Various e-commerce platforms such as Amazon and Netflix prefer using some decent systems to recommend their items to users. In literature, multiple methods such as matrix factorization and collaborative filtering exist and have been implemented for a long time, however recent studies show that some other approaches, especially using artificial neural networks, have promising improvements in this area of research. In this research, we propose a new hybrid recommender system that results in better performance. In the proposed system, the users are divided into two main categories, namely average users, and non-average users. Then, various machine learning and deep learning methods are applied within these categorie...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Computational Intelligence and Neuroscience

Fawaz Alarfaj

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT

IRJET Journal

IEEE Access

Nima Joorabloo

Paulina Rosa

Information Sciences

Juan C Burguillo

Knowledge-Based Systems

EAI Endorsed Transactions on Creative Technologies

Sonika Malik

Ali Roumani

Zeinab Sharifi

Saransh Agarwal

abdelkader grota

Nancy Naina

JURNAL MEDIA INFORMATIKA BUDIDARMA

ERWIN BUDI SETIAWAN

Divya Kumawat

International Journal of Engineering & Technology

Rajesh Ojha(CSE DEPT)

The Ijes The Ijes

International Journal of Computational Intelligence Systems (IJCIS), Volume 16, Issue 1

Loc Nguyen's Academic Network , Loc Nguyen

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Insurance Policy Management Software
  • Property Management Software
  • Time Tracking Software
  • Case studies
  • Request demo

Best Low Code Development Platform

Recommendation engine

Recostream is an AI/ML personalized recommendation engine offered as a SaaS service for online stores of any size.  The software is built on the foundation of the Openkoda, utilizing a significant number of pre-existing features within the platform.

case study on recommendation system

Recostream is a product recommendation engine built with Openkoda

  • Real-time data
  • AI-driven recommendations
  • Multi-tenant architecture
  • Analytical dashboard
  • Advanced integrations

case study on recommendation system

Key features of the application

  • Real-time, low-latency calculation of product recommendations based on shopping behavior and user interactions
  • Wide range of recommendation models
  • State-of-the-art machine learning algorithms, including recommendations that adapt to the textual content of blog posts,
  • Advanced, transparent analytics available through direct integration with Google Analytics
  • Easy to install, reduced to inserting a single line of JS into your website

The ambitious scope of the product and the fast-paced nature of the ecommerce industry required to:

  • quickly launch a working MVP in a competitive ecommerce marketing automation space;
  • build the product on a solid technology foundation that supports low-latency, real-time and high-volume requirements;
  • deliver an enterprise-grade SaaS product with an advanced analytics dashboard, multiple authentication options, subscription payments, integration with Stripe, integration with Slack, full audit trail, etc.

Ideally, the implementation should focus on research and development of the recommendation engine’s machine learning algorithms, with the rest of the system delivered with minimal development effort, if possible.

case study on recommendation system

Recostream’s engineering team decided to use Openkoda as the foundation for the recommendation service, as a significant number of ready-to-use Openkoda features were already available in the platform.

The team started with a smaller PoC focused on building the prototype of the recommendation engine only, exploring and testing different machine learning algorithms and scaling the research prototypes to handle real, massive amounts of collected data describing shopping behavior and user interactions.

Once the system was validated and promoted from the experimental phase to production, the same team of developers was able to take over Openkoda and build a full enterprise SaaS service around the engine, reusing a large number of features, components, and integrations that already existed in Openkoda.

The engineering team of only 4 developers was able to evolve the bare prototype of the recommendation engine into a working product prototype and start the first implementations within two months after the PoC phase was completed. 

Such a short delivery time was made possible by adding their highly specialized IP behind the developed AI/ML recommendation engine to an enterprise-ready SaaS platform provided by Openkoda.

Since the Openkoda platform is freely available as an  open-source system under the MIT license , there were no licensing costs associated with using Openkoda in the project.

low code platforms

Recostream quickly became an industry-recognized product recommendation service for ecommerce stores of all sizes and was acquired two years later by GetResponse , a global leader in marketing automation.  Read more about the product history and the acquisition .  The Recostream product is now a part of the larger offering from GetResponse.

What is Openkoda?

Openkoda enables faster application development. This AI-powered open source platform provides pre-built application templates and customizable solutions to help you quickly build any functionality you need to run your business. With Openkoda, you can achieve up to 60% faster development and significant cost savings. Start streamlining your development process with Openkoda today.

Reduce time-to-market

Build enterprise applications based on openkoda, loved by developers.

case study on recommendation system

Build business applications faster

Open-source business application platform for fast development

Laciarska 4 50-140 Wroclaw, Poland [email protected]

  • Search Menu
  • Sign in through your institution
  • Volume 2024, 2024 (In Progress)
  • Volume 2023, 2023
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Database
  • About the International Society for Biocuration
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

International Society for Biocuration

Article Contents

Introduction, methods and evaluation, conclusion and future work, acknowledgement, conflicts of interest..

  • < Previous

A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Braja Gopal Patra, Kirk Roberts, Hulin Wu, A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository, Database , Volume 2020, 2020, baaa064, https://doi.org/10.1093/database/baaa064

  • Permissions Icon Permissions

It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers’ workload in identifying the right dataset and increase the reusability of biomedical datasets.

Database URL : http://genestudy.org/recommends/#/

In the Big Data era, extensive amounts of data have been generated for scientific discoveries. However, storing, accessing, analyzing and sharing a vast amount of data are becoming major bottlenecks for scientific research. Furthermore, making a large number of public scientific data findable, accessible, interoperable and reusable is a challenging task.

The research community has devoted substantial effort to enable data sharing. Promoting existing datasets for reuse is a major initiative that gained momentum in the past decade ( 1 ). Many repositories and knowledge bases have been established for specific types of data and domains. Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), UKBioBank (https://www.ukbiobank.ac.uk/), ImmPort (https://www.immport.org/shared/home) and TCGA (https://portal.gdc.cancer.gov/) are some examples of repositories for biomedical datasets. DATA.GOV archives the U.S. Government’s open data related to agriculture, climate, education, etc. for research use. However, a researcher looking for previous datasets on a topic still has to painstakingly visit all the individual repositories to find relevant datasets. This is a tedious and time-consuming process.

An initiative was taken by the developers of DataMed (https://datamed.org) to solve the aforementioned issues for the biomedical community by combining biomedical repositories together and enhancing the query searching based on advanced natural language processing (NLP) techniques ( 1 , 2 ). DataMed indexes provides the functionality to search diverse categories of biomedical datasets ( 1 ). The research focus of this last work was retrieving datasets using a focused query. In addition to that biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) dataset retrieval challenge was organized in 2016 to evaluate the effectiveness of information retrieval (IR) techniques in identifying relevant biomedical datasets in DataMed ( 3 ). Among the teams participated in this shared task, use of probabilistic or machine learning based IR ( 4 ), medical subject headings (MeSH) term based query expansion ( 5 ), word embeddings and identifying named entity ( 6 ), and re-ranking ( 7 ) for searching datasets using a query were the prevalent approaches. Similarly, a specialized search engine named Omicseq was developed for retrieving omics data ( 8 ).

Google Dataset Search (https://toolbox.google.com/datasetsearch) provides the facility to search datasets on the web, similar to DataMed. While DataMed indexes only biomedical domain data, indexing in Google Dataset Search covers data across several domains. Datasets are created and added to repositories frequently, which makes it difficult for a researcher to know and keep track of all datasets. Further, search engines such as DataMed or Google Dataset Search are helpful when the user knows what type of dataset to search for, but determining the user intent in web searches is a difficult problem due to the sparse data available concerning the searcher ( 9 ). To overcome the aforementioned problems and make dataset search more user-friendly, a dataset recommendation system based on a researcher’s profile is proposed here. The publications of researchers indicate their academic interest, and this information can be used to recommend datasets. Recommending a dataset to an appropriate researcher is a new field of research. There are many datasets available that may be useful to certain researchers for further exploration, and this important aspect of dataset recommendation has not been explored earlier.

Recommendation systems, or recommenders, are an information filtering system that deploys data mining and analytics of users’ behaviors, including preferences and activities, for predictions of users’ interests on information, products or services. Research publications in recommendation systems can be broadly grouped as content-based or collaborative filtering recommendation systems ( 10 ). This article describes the development of a recommendation system for scholarly use. In general, developing a scholarly recommendation system is both challenging and unique because semantic information plays an important role in this context, as inputs such as title, abstract and keywords need to be considered ( 11 ). The usefulness of similar research article recommendation systems has been established by the acceptance of applications such as Google Scholar (https://scholar.google.com/), Academia.edu (https://www.academia.edu/), ResearchGate (https://www.researchgate.net/), Semantic Scholar (https://www.semanticscholar.org/) and PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) by the research community.

Dataset recommendation is a challenging task due to the following reasons. First, while standardized formats for dataset metadata exist ( 12 ), no such standard has achieved universal adoption, and researchers use their own convention to describe their datasets. Further, many datasets do not have proper metadata, which makes the prepared dataset difficult to reuse/recommend. Second, there are many dataset repositories with the same dataset in different formats, making recommendation a challenging task. Additionally, the dataset recommendation system should be scalable to the increasing number of online datasets. We cast the problem of recommending datasets to researchers as a ranking problem of datasets matched against the researcher’s individual publication(s). The recommendation system can be viewed as an IR system where the most similar datasets can be retrieved for a researcher using his/her publications.

Data linking or identifying/clustering similar datasets have received relatively less attention in research on recommendation systems. Previous work on this topic includes ( 13–15 ). Reference ( 13 ) defined dataset recommendation as to the problem of computing a rank score for each of a set of target datasets ( D T ) so that the rank score indicates the relatedness of D T to a given source dataset ( D S ). The rank scores provide information on the likelihood of a D T to contain linking candidates for D S . Reference ( 15 ) proposed a dataset recommendation system by first creating similarity-based dataset networks, and then recommending connected datasets to users for each dataset searched. Despite the promising result this approach suffers from the cold start problem. Here cold start problem refers to the user’s initial dataset selection, where the user has no idea what dataset to select/search. If a user chooses a wrong dataset initially, then the system will always recommend wrong datasets to the user.

Some experiments were performed to identify datasets shared in the biomedical literature ( 16–18 ). Reference ( 17 ) identified data shared in biomedical literature articles using regular expression patterns and machine learning algorithms. Reference ( 16 ) identified datasets in social sciences papers using a semi-automatic method. The last system reportedly performed well (F-measure of 0.83) in finding datasets in the da|ra dataset registry. Different deep learning methods were used to extract the dataset mentions in publication and detect mention text fragment to a particular dataset in the knowledge base ( 18 ). Further, a content-based recommendation system was developed for recommending literature for datasets in ( 11 ), which was the first step toward developing a literature recommendation tool by recommending relevant literature for datasets.

This article proposes a dataset recommender that recommends datasets to researchers based on their publications. We collected dataset metadata (title and summary) from GEO and researcher’s publications (title, abstract and year of publication) from PubMed using name and curriculum vitae (CV) for developing a dataset recommendation system. A vector space model (VSM) is used to compare publications and datasets. We propose two novel ideas:

A method for representing researchers with multiple vectors reflecting each researcher’s diverse interests.

A system for recommending datasets to researchers based on their research vectors.

For the datasets, we focus on GEO (https://www.ncbi.nlm.nih.gov/geo/). GEO is a public repository for high-throughput microarray and next-generation sequence functional genomics data. It was found that an average of 21 datasets was added daily in the last 6 years (i.e. 2014–19). This gives a glimpse of the increasing number of datasets being made available online, considering that there are many other online data repositories as well. Many of these datasets were collected at significant expense, and most of these datasets were used only once. We believe that reusability of these datasets can be improved by recommending these to appropriate researchers.

Efforts on restructuring GEO have been performed by curating available metadata. In reference ( 19 ), the authors identified the important keywords present in the datasets descriptions and searched other similar datasets. Another task on restructuring the GEO database, ReGEO (http://regeo.org/) was developed by ( 20 ), who identified important metadata such as time points and cell lines for datasets using automated NLP techniques.

We developed this dataset recommendation system for researchers as a part of the dataset reusability platform (GETc Research Platform(http://genestudy.org/)) for GEO developed at the University Texas Health Science Center at Houston. This website recommends datasets to users using their publications.

The rest of the article is organized in the following manner. Section  2 provides an overview of GEO datasets and researcher publications. Methods used for developing the recommendation system and evaluation techniques used in this experiment are described in Section  3 . Section  4 describes results. Section  5 provides a discussion. Finally, conclusion and future directions are discussed in Section  6 .

The proposed dataset recommendation system requires both dataset metadata and the user profile for which datasets will be recommended. We collected metadata of datasets from the GEO repository, and researcher publications from PubMed using their names and CVs. The data collection methods and summaries of data are discussed next.

GEO Datasets

GEO is one of the most popular public repositories for functional genomics data. As of December 18, 2019, there were 122 222 series of datasets available in GEO. Histograms of datasets submitted to GEO per day and per year as presented in Figure  1 showed an increasing trend of submitting datasets to GEO, which justified our selection of this repository for developing the recommendation system.

Histogram of datasets submitted to GEO based on datasets collected on December 18, 2019

Histogram of datasets submitted to GEO based on datasets collected on December 18, 2019

Overview of dataset indexing pipeline

Overview of dataset indexing pipeline

Statistics of datasets collected from GEO

122 222
89 533
92 884 (Mean : 0.76, Max : 10, Unique : 61 228)
122 222
89 533
92 884 (Mean : 0.76, Max : 10, Unique : 61 228)

For the present experiment, metadata such as title, summary, submission date and name of dataset creator(s) were collected from GEO and indexed in a database, as shown in Figure  2 . We also collected the PMIDs of articles associated with each dataset. However, many datasets did not have articles associated with them. The detailed information of collected datasets is presented in Table  1 . Out of a total of 122 222 GEO datasets, 89 533 had 92 884 associated articles, out of which 61 228 were unique. The maximum number of articles associated with the datasets (‘GSE15907’ and ‘GSE31312’) was 10. These articles were used to remove the publications that were not related to GEO. Further, we used the GEO-related publications for building word embeddings to be used for subsequent text normalization as outlined in Section  3 .

Researcher publications

A researcher’s academic interest can be extracted from publications, grants, talks, seminars and much more. All this information is typically available in the CV, but it is presented in the form of titles/short texts. Here, short texts imply limited information. Further, lack of standardization in CV formats poses challenges to parse the CVs. In this work, an alternative approach was undertaken, which is outlined next.

Title and year of the researcher’s publications were present in the CV. However, we required title, abstract and year of publication for our experiment. A researcher’s list of publications (titles and abstracts) are easier to get from web sources such as Google Scholar, PubMed, Semantic Scholar and others. Unfortunately, the full texts of most scientific articles are not publicly available. Thus, for the present experiment, we used only the title and abstract of publications in identifying the researcher’s areas of research.

Overview of researcher’s publication extraction system to remove the author disambiguation

Overview of researcher’s publication extraction system to remove the author disambiguation

Given a researcher, we searched the researcher’s name in PubMed using Entrez API (https://www.ncbi.nlm.nih.gov/books/NBK25 501/) and collected all the publications. Multiple researchers with exact same name might exist, thus, querying the name in PubMed might sometime result in publications from other researchers as well. This is a typical challenge of author disambiguation. However, there are a few attempts that have been undertaken to resolve the issue of author disambiguation, and one of them is ORCID (https://orcid.org). A researcher needs to provide ORCID id to access his/her ORCID details. However, to the best of our knowledge, many researchers in the biomedical domain did not have an associated ORCID account. Thus we used a simple method to disambiguate the authors by using their CVs. Initially, the recommendation system prompts a researcher to provide his/her name and a CV (or list of publications). Next, we collected the publications (titles, names, MeSH terms and year of publication) for a researcher from PubMed by searching his/her name. For removing the publications of other authors with the same name, titles of all collected publications from PubMed were matched against the titles present in the CV. In the case of a match, publications were kept for further processing. An overview of the technique used for the researcher’s publication collection is provided in Figure  3 .

One of the limitations of the above publication collection method is that the publications could not be collected if they were not listed in PubMed. Further, the datasets used in the present experiments were from the biomedical domain, and the publications not listed in PubMed were less pertinent to biomedical datasets. For example, someone’s biomedical interests (in PubMed) may be more reliable markers for biomedical datasets than a theoretical computer science or statistics paper. Another downside is if the researcher’s CV may not be fully up-to-date.

This section describes how the two main objects of interest (datasets and publications of researchers) were embedded in a vector space and then how these vectors were compared in order to make recommendations. First, both datasets and papers were treated as text objects: the text of a dataset includes its title and summary, while the text of a paper includes its title and abstract. Pre-processing was performed on both a researcher’s publications and datasets by removing the low-value stopwords, links, punctuation and junk words. Further, the nltk WordNet lemmatizer (https://www.nltk.org/_modules/nltk/stem/wordnet.html) was used to get the root forms of the words. Next, we describe the methods used for converting datasets and researchers into vectors.

Dataset vector generation

VSMs can be built from text in a variety of ways, each of which has its distinct advantages and thus merit experimentation. For the present experiment, we used TF-IDF because it achieved better results for related literature recommendation for datasets in ( 11 ).

TF-IDF : For vocabulary W , each unique word w  ∈  W is assigned a score proportional to its frequency in the text (term frequency, TF) and its inverse frequency in the full collection (inverse document frequency, IDF). We tuned parameters such as minimum document frequency (min-df) and maximum n-gram size. For the present study, we kept maximum n-gram size = 2 (i.e. unigrams and bigrams) as including the higher n-gram increases the sparsity as well as computational complexity.

We converted each dataset into a vector using TF-IDF. For each dataset, the title and summary were preprocessed and normalized and then converted into a single vector. Finally, each publication vector (or publication cluster vector) is compared with dataset vectors to generate the recommendation score. Different methods for representing a researcher’s papers as vectors are discussed next.

Researcher vector generation

Baseline method.

For the baseline method, we combined multiple text-derived paper vectors into a single researcher vector ( v r ) in the same vector space using Equation ( 1 ):

where P r is the set of papers of a researcher r ; N r is the total number of papers of that researcher, and it acts as a normalization term; v p is the vector for a single paper p using TF-IDF; λ p is a recency penalty to favor more recent papers (thus better reflecting the researcher’s current interest).

It is evident that a researcher will be interested in datasets recommended for his/her current work rather than the work performed a few years back. Thus, we penalized each of the paper vectors from a different year, as stated in Equation ( 2 ):

where t is the difference between the current year and year of publication. k is the decaying function to decrease the rate proportional to its current value, and for the present study, we kept k =0.05.

. Multi-interest dataset recommendation (MIDR)

The baseline method for creating a researcher vector may be helpful for new researchers without many publications, whereas an established researcher may have multiple areas of expertise with multiple papers in each. Also, if the number of papers is imbalanced in multiple areas, then the above baseline method may not work. With a highly imbalanced set of publications this would obviously bias dataset recommendation to the dominant interest. For a more balanced set of interests that are highly dispersed, this mixture would result in the ‘centroid’ of these interests, which could be quite distinct from the individual interests. Both these cases are undesirable. The centroid of a researcher’s interests may not be of much interest to them (e.g. a researcher interested in mouse genomics and HIV vaccines may not be interested in mouse vaccines ).

For example, initial experiments were performed on Researcher 1 (mentioned later in Section 4), and it was observed that the datasets recommended for a researcher were biased toward a single research area with the largest number of publications. For example, Researcher 1 has a dominant number of publications on HIV and the baseline system recommends only HIV datasets, even if Researcher 1 has multiple research areas.

A critical limitation of the above baseline approach is that researchers can have multiple areas of expertise. We can easily build multiple vectors, each corresponding to a different expertise if we know how to properly group/cluster a researcher’s papers according to expertise or topic. However, parametric methods such as k-means clustering and latent Dirichlet allocation require specifying a priori how many clusters/topics to utilize. Generalizing the number of clusters is not possible due to a varying number of publications of researchers. Instead, our insight is that the more publications a researcher has, the more interests or areas of expertise he/she likely has as well, but this should be modeled as a ‘soft’ constraint rather than a ‘hard’ constraint. We propose to employ the non-parametric Dirichlet Process Mixture Model (DPMM) ( 21 ) to cluster papers into several groups of expertise.

High level architecture of proposed dataset recommendation

High level architecture of proposed dataset recommendation

DPMM : We employed a Gibbs Sampling-based Dirichlet Process Mixture Modeling for text clustering. DPMM offers the following advantages over its traditional counterparts. First, the number of text clusters need not be specified; second, it is relatively scalable and third, it is robust to outliers ( 22 ). The technique employs a collapsed Gibbs Sampling mechanism for Dirichlet process models wherein the clusters are added or removed based on the probability of a cluster associating with a given document. The scalability of the technique stems from the fact that word frequencies are used for text clustering. This reduces the computational burden significantly, considering the large number of samples associated with text processing problems. Further, the optimal number of clusters is likely to be chosen, as clusters with low association probability with documents are eliminated, and new clusters are created for documents that do not belong to selected clusters with high probability. For example, if a cluster c 1 contains five documents, each with low association probability, then the cluster c 1 is eliminated, and new clusters are initialized. In DPMM, the decision to create a new cluster is based on the number of papers to be clustered and the similarity of a given paper to previously clustered papers. Thus, researchers with many papers but few interests can still result in fewer clusters than a researcher with fewer papers but more interests. For example, our evaluation includes two researchers, one with 53 papers and one with 32; however, the DPMM resulted in five and six clusters, respectively. After clustering, we created a pseudo-researcher for each cluster using Equation ( 1 ), though one that can be tied back to the original researcher. The recommendation system uses these pseudo-researchers in its similarity calculations along the same lines as described above. Further, the α parameter was tuned to control the number of clusters ( 22 ). We describe tuning of the α parameter in Section 3.4.

Text normalization : Text normalization plays an important role in improving the performance of any NLP system. We also implemented text normalization techniques to improve the efficiency of the proposed clustering algorithm. We normalized similar words by grouping them together and replacing them with the most frequent words in the same word group. For example, HIV, HIV-1, HIV/AIDS and AIDS were replaced with the most frequent word HIV . For identifying similar words, we trained a word2vec model on the articles from PubMed using Gensim (https://radimrehurek.com/gensim/). The datasets are related to gene expressions, while the articles collected from PubMed contain a variety of topics related to biomedicine and life sciences which may not be suitable for building a word embedding in the current study (since some of these articles are highly unrelated to the type of information in GEO). The articles before 1998 were removed as the research on micro-array data started during that year ( 23 ). The publications related to GEO are filtered using the MeSH terms. We also developed a MeSH term classification system for those publications without MeSH terms. More details on GEO related publications filtering can be found in ( 11 ).

The similar words were identified using the most_similar function of word2vec . We only considered the top five similar words for each word using most_similar function. The normalized text was used for clustering. It was observed from the initial experiments that the text normalization improved clustering and resulted in the reduced number of clusters using DPMM.

Dataset recommendation

The most similar datasets can be recommended to researchers simply by comparing the cosine similarity of the researcher and dataset vectors using Equation ( 3 ):

where D is all the datasets that can be recommended to researcher r ; |$\cos(v_r, v_d)$| is the cosine similarity between researcher vector ( v r ) and dataset vector ( v d ).

The high-level system architecture of the dataset recommendation system is shown in Figure  4 . This dataset recommendation system is initiated by a researcher (user) by submitting his/her name and CV (or list of publications). The name is searched in PubMed for publication details, and then titles of publications from PubMed were matched with publication titles in CV. The matched publications are then clustered using DPMM to identify research fields of the researcher. Finally, the top similar datasets are recommended using the calculated cosine similarity between the researcher vector (or researcher’s cluster vector) and dataset vectors. The researcher vector (or researcher’s cluster vector) is calculated using Equation ( 1 ).

Three dataset recommendation systems are evaluated in this article: a baseline method using the researcher’s vector generation method and two proposed methods using the proposed researcher’s vector generation method.

Baseline system

The baseline system uses the researcher’s vector using Equation ( 1 ) of the baseline method in Section  3.2.1 . The top datasets are recommended after calculating the cosine similarity between the researcher’s vector and dataset vectors. This system reflects only one research field for each researcher.

MIDR System

The cluster vectors are generated using the modified Equation ( 1 ). Here, cluster-specific research area vectors are created for each researcher, instead of a single vector for each researcher as in baseline system. Papers in a single cluster are multiplied with their recency factors and summed. Then, the summation was divided the number of papers in that cluster.

This system uses multiple pseudo vectors for multiple clusters of a researcher ( ⁠|$v_{c_i}$| for i th cluster), indicating different research fields that a researcher might have, as mentioned in Section 3.2.2 .

This system compares each cluster vector with the dataset vectors and recommends the top datasets by computing the cosine similarity among them. Finally, it merges all the recommended datasets in a round-robin fashion for all the clusters, so that the researcher is able to see various datasets related to different research fields together.

MIDR System (Separate)

This system is an extension of our proposed MIDR system. Some researchers liked the way recommended datasets were merged. However, other researchers wanted dataset recommendations for each cluster separately. For this reason, another system was developed where the recommended datasets were shown separately for each research cluster, allowing researchers to obtain different recommended datasets for different research interests.

Number of clusters with varying α values for proposed α based on our initial evaluation. Abbreviations: P: Proposed, a: total number of clusters, b: number of clusters which contains more than one paper, c: number of clusters which contains only one paper

Researcher ID (No. of papers)Number of clusters (a, b, c) for different values
= 0.3 = 1.0 = 2.0 = 3.0 = 10.0 = P
16, 9, 78, 7, 16, 3, 35, 2, 33, 2, 18, 5, 3 (1.37)
15, 10, 510, 8, 28, 6, 27, 5, 24, 2, 28, 6, 2 (1.77)
15, 8, 79, 4, 53, 3, 06, 2, 42, 1, 16, 3, 3 (1.44)
15, 4, 1110, 5, 56, 4, 25, 3, 24, 1, 37, 4, 3 (2.13)
5, 2, 35, 2, 34, 3, 15, 2, 33, 3, 05, 2, 3 (3.8)
11, 10, 16, 5, 13, 3, 02, 2, 02, 1, 15, 5, 0 (1.14)
7, 3, 45, 4, 13, 3, 02, 2, 02, 2, 02, 2, 0 (2.77)
16, 15, 16, 5, 18, 4, 45, 2, 34, 1, 37, 6, 1 (0.72)
23, 23, 09, 8, 16, 6, 06, 3, 33, 1, 214, 14, 0 (0.59)
20, 11, 912, 10, 29, 8, 16, 6, 02, 2, 09, 8, 1 (1.36)
Researcher ID (No. of papers)Number of clusters (a, b, c) for different values
= 0.3 = 1.0 = 2.0 = 3.0 = 10.0 = P
16, 9, 78, 7, 16, 3, 35, 2, 33, 2, 18, 5, 3 (1.37)
15, 10, 510, 8, 28, 6, 27, 5, 24, 2, 28, 6, 2 (1.77)
15, 8, 79, 4, 53, 3, 06, 2, 42, 1, 16, 3, 3 (1.44)
15, 4, 1110, 5, 56, 4, 25, 3, 24, 1, 37, 4, 3 (2.13)
5, 2, 35, 2, 34, 3, 15, 2, 33, 3, 05, 2, 3 (3.8)
11, 10, 16, 5, 13, 3, 02, 2, 02, 1, 15, 5, 0 (1.14)
7, 3, 45, 4, 13, 3, 02, 2, 02, 2, 02, 2, 0 (2.77)
16, 15, 16, 5, 18, 4, 45, 2, 34, 1, 37, 6, 1 (0.72)
23, 23, 09, 8, 16, 6, 06, 3, 33, 1, 214, 14, 0 (0.59)
20, 11, 912, 10, 29, 8, 16, 6, 02, 2, 09, 8, 1 (1.36)

Tuning the α parameter

A researcher with a higher number of publications is more likely to have more research interests. In this paper, research interests are represented as clusters, expressed as vectors. A Dirichlet process is non-parametric because, in theory, there can be an infinite number of clusters. By changing the α parameter, DPMM can vary the number of clusters. The α value is inversely related to the number of clusters, i.e. decreasing the α parameter in DPMM may increase the number of output clusters. Therefore, we propose an α value, which is also inversely related to the number of research publications. Further, the α value must stabilize after a certain threshold to avoid the formation of too many clusters, and it must be generalized to the number of publications. To this end, α is calculated as follows:

where N is the total number of papers for a researcher. The α value is proposed based on manually observing the clusters and collecting feedback from different researchers. Apart from inherent requirements for setting α , Equation ( 4 ) maintains a reasonable number of clusters, which was found useful by most of the evaluators.

Different α values and their corresponding number of clusters are provided in Table  2 . The number of clusters are divided into three categories: (a) total number of clusters, (b) number of clusters which contains more than one paper, (c) number of clusters which contains only one paper. We removed the clusters with one paper and used the clusters with two or more papers for recommending datasets. We observed that the number of clusters did not entirely depend upon the number of papers, a researcher had. Moreover, it largely reflected the number of research fields that the researcher participated in. For example, Researcher 2 had fewer publications than Researcher 1 and Researcher 3, but the number of clusters was more than the others. This shows that non-parametric clustering is a good technique for segmenting research areas.

There is no existing labeled clustered publication datasets available for automatic evaluation. Again, manually evaluating the clusters was a time and resource-consuming task. It might be biased as the evaluation depends upon different judgments for different researchers. Thus, we implemented K-Means for comparing to the proposed DPMM. The automatic cluster comparison was performed using inter- and intra-cluster cosine similarity (IACCS) of words and MeSH terms in the publications, separately. IACCS was the mean cosine similarity of words or MeSH terms for each pair of papers in a given cluster. Considering a cluster of size n ( ⁠|$X=\{x_1, x_2, \dots x_n\}$|⁠ ), the IACCS can be formulated using Equation ( 5 ):

where, x i and x j are the list of MeSH terms or words of the i th and j th paper, respectively, and |$\cos(x_i, x_j)$| is the cosine similarity between them. Finally, the mean of IACCS was calculated using the IACCS of individual clusters.

We computed the mean cosine similarity between words or MeSH terms of papers within clusters to calculate the inter-cluster cosine similarity (ICCS). Considering n clusters ( ⁠|$c_1, c_2, \dots c_n$|⁠ ), ICSS can be formulated using Equation ( 6 ):

where, c i and c j are the list of MeSH terms or words of all the papers in the i th and j th clusters, respectively, and |$\cos(c_i, c_j)$| is the cosine similarity between them.

For the baseline comparison, publication vectors are created using TF-IDF, then K-Means is used to compute the publication clusters. K-Means is a parametric unsupervised clustering. We implemented K-Means with two and five clusters separately for comparison purposes. On the other hand, the tuning parameter proposed for DPMM resulted in a variable number of clusters for different researchers, and these clusters were used for comparison.

Recommendation system

Mean IACCS and ICCS for K-Means and DPMM (with different cluster sizes as mentioned in Table  2 ).

K-Means
Clusters = 2Clusters = 5DPMM
WordsMeSH termsWordsMeSH termsWordsMeSH terms
Researcher ID (No. of papers)IACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSS
0.14, 0.370.12, 0.560.20, 0.260.12, 0.270.22, 0.190.20, 0.32
0.09, 0.370.09, 0.420.14, 0.160.14, 0.230.22, 0.110.11, 0.12
0.16, 0.430.16, 0.560.16, 0.250.16, 0.290.24, 0.220.17, 0.37
0.12, 0.240.09, 0.510.21, 0.110.15, 0.200.20, 0.100.16, 0.15
0.16, 0.130.14, 0.190.18, 0.040.11, 0.080.45, 0.080.28, 0.13
0.17, 0.540.19, 0.640.20, 0.310.22, 0.360.20, 0.180.23, 0.19
0.33, 0.140.47, 0.300.32, 0.170.45, 0.300.34, 0.140.47, 0.30
0.10, 0.610.16, 0.760.17, 0.310.21, 0.600.17, 0.250.21, 0.43
0.07, 0.550.09, 0.640.09, 0.400.11, 0.540.23, 0.130.14, 0.19
0.07, 0.310.15, 0.570.11, 0.190.23, 0.330.17, 0.100.20, 0.24
K-Means
Clusters = 2Clusters = 5DPMM
WordsMeSH termsWordsMeSH termsWordsMeSH terms
Researcher ID (No. of papers)IACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSSIACSS, ICSS
0.14, 0.370.12, 0.560.20, 0.260.12, 0.270.22, 0.190.20, 0.32
0.09, 0.370.09, 0.420.14, 0.160.14, 0.230.22, 0.110.11, 0.12
0.16, 0.430.16, 0.560.16, 0.250.16, 0.290.24, 0.220.17, 0.37
0.12, 0.240.09, 0.510.21, 0.110.15, 0.200.20, 0.100.16, 0.15
0.16, 0.130.14, 0.190.18, 0.040.11, 0.080.45, 0.080.28, 0.13
0.17, 0.540.19, 0.640.20, 0.310.22, 0.360.20, 0.180.23, 0.19
0.33, 0.140.47, 0.300.32, 0.170.45, 0.300.34, 0.140.47, 0.30
0.10, 0.610.16, 0.760.17, 0.310.21, 0.600.17, 0.250.21, 0.43
0.07, 0.550.09, 0.640.09, 0.400.11, 0.540.23, 0.130.14, 0.19
0.07, 0.310.15, 0.570.11, 0.190.23, 0.330.17, 0.100.20, 0.24

Being a novel task, no prior ground truth annotations exist for publication-driven dataset recommendation. Thus, we performed a manual evaluation for each developed dataset recommendation system. We asked researchers to rate each retrieved dataset based on their publications or publication clusters. The researchers included in this study have already worked on the datasets from GEO and published papers on these datasets. The rating criterion was how likely they want to work on the retrieved datasets. We asked them to rate using one to three ‘stars’, with three stars being the highest score. Later, normalized discounted cumulative gain (NDCG) at 10 and Precision at 10 (P@10) were calculated to evaluate different systems. The ratings are:

1 star [not relevant] : This dataset is not useful at all.

2 star [partially relevant] : This dataset is partially relevant to the publication cluster. The researcher has already used this dataset or maybe work on it in the future.

3 star [most relevant] : This dataset is most relevant to the publication cluster, and the researcher wants to work on this dataset as soon as possible.

The primary evaluation metric used in this work is NDCG, which is a family of ranking measures widely used in IR applications. It has advantages compared to many other measures. First, NDCG allows each retrieved document to have a graded relevance, while most traditional ranking measures only allow binary relevance (i.e. each document is viewed as either relevant or not relevant). This enables the three-point scale to be directly incorporated into the evaluation metric. Second, NDCG involves a discount function over the rank while many other measures uniformly weight all positions. This feature is particularly important for search engines as users care about top-ranked documents much more than others ( 24 ). NDCG is calculated as follows:

where rating ( i ) is the i th dataset rating provided by users. For the present study, we set p = 10 for the simplicity of manual annotation.

The NDCG@10 for the baseline and MIDR systems is calculated using the ratings of only the top ten retrieved datasets. For the MIDR system (separate), there were multiple publication clusters for a single user, and for each publication cluster we recommended datasets separately. NDCG@10 was calculated for each publication cluster using the top ten datasets and later averaged to get a final NDCG@10 for a single researcher. For NDCG@10 calculation, the 1-star, 2-star and 3-star are converted to 0, 1 and 2, respectively. We also calculated P@10 (strict and partial) for the baseline and proposed systems. Strict considers only 3-star, while partial considers both 2- and 3-star results. The results presented in this study were evaluated using a total of five researchers (with an average of 32 publications) who already worked on GEO datasets. This is admittedly a small sample size, but is large enough to draw coarse comparisons on this novel task.

We compared DPMM clustering with K-Means as mentioned in Section  3.5 . ICCS and mean IACCS values for different clustering methods are presented in Table  3 . In general, higher mean IACSS and lower ICCS generally indicate better clustering. However, this is not always the case, especially when the number of clusters are small, and each cluster contains multiple publications for a single researcher. In this situation, the IACSS for individual cluster decreases after being divided by the number of publication pairs in each cluster. Furthermore, DPMM and K-Means were comparable when the number of clusters produced by both were close to each other. For all the cases, DPMM had higher mean IACSS and lower ICCS than K-Means using words. This suggests that DPMM was well-suited for clustering a researcher’s publications into multiple research fields.

Researcher-specific results of the dataset recommendation system are shown in Table  4 . The results for individual researchers are listed for all the systems. Metric-specific average results for all the systems are also shown in Table  4 . The baseline system did not have any publication clusters and all publications were vectorized using Equation ( 1 ). Next, the top ten similar datasets were used to evaluate the results of the baseline system and it obtained the average NDGC@10, P@10 (P) and P@10 (S) of 0.80, 0.69 and 0.45, respectively.

The proposed MIDR system obtained the average NDCG@10, P@10 (P) and P@10 (S) of 0.89, 0.78 and 0.61, respectively. The proposed MIDR (separate) system obtained the average NDGC@10, P@10 (P) and P@10 (S) of 0.62, 0.45 and 0.31, respectively. For calculating NDCG@10 and P@10 in the proposed MIDR (separate) system, individual cluster scores were calculated first, and then divided by the total number of clusters.

NDCG@10, partial and strict P@10 values of the different dataset recommendation systems based on three evaluators. Abbreviations: Partial: P; Strict: S

Researcher ID (No. of papers)BaselineMIDRMIDR (separate)
NDCG@10P@10 (P)P@10 (S)NDCG@10P@10 (P)P@10 (S)NDCG@10P@10 (P)P@10 (S)
0.820.800.300.920.900.400.740.560.32
0.810.760.500.950.800.700.520.380.20
0.760.600.280.800.600.340.600.250.13
0.780.480.360.800.600.600.480.340.22
0.850.800.801.001.001.000.780.700.70
Researcher ID (No. of papers)BaselineMIDRMIDR (separate)
NDCG@10P@10 (P)P@10 (S)NDCG@10P@10 (P)P@10 (S)NDCG@10P@10 (P)P@10 (S)
0.820.800.300.920.900.400.740.560.32
0.810.760.500.950.800.700.520.380.20
0.760.600.280.800.600.340.600.250.13
0.780.480.360.800.600.600.480.340.22
0.850.800.801.001.001.000.780.700.70

The proposed MIDR system performed better than the baseline system. The MIDR system recommended a variety of datasets involving multiple clusters/research fields as opposed to the baseline system recommended datasets from a single research field with the maximum number of publications.

Performances of the baseline and proposed MIDR (separate) systems could not be directly compared. Evaluation of the MIDR (separate) system was performed over multiple clusters with ten datasets recommended for each cluster. In contrast, evaluation of the baseline system was performed only on 10 datasets, for example, for Researcher 1 in Table  4 , evaluations of baseline system and MIDR (separate) system were performed on 10 and 50 datasets, respectively. There were other advantages of the MIDR (separate) system over the baseline system, irrespective of higher NDCG@10 for the latter. The baseline system had a bias toward a specific research field which was eliminated in the MIDR (separate) system. For Researchers 1 and 2 in Table  4 , the datasets recommended by the baseline system were found in the results of two clusters/research fields (which had the maximum number of publications) in the proposed MIDR (separate) system. However, for Researcher 3 in Table  4 , recommended datasets of the baseline system were found in the results of only one research field (with the maximum number of publications) in the proposed MIDR (separate) system.

For Researcher 1 in Table  4 , there were 31 papers with HIV keywords and those papers were not published recently. We penalized the papers according to the year of publication for all methods. However, the top datasets contained ‘HIV’ or related keywords for the baseline method. We manually checked the top 100 results and found that those were relevant to HIV . Whereas, the proposed MIDR system clustered the publications into different groups (such as HIV, Flu/Influenza , and others), which resulted in recommendations for different research fields. Therefore, Researcher 1 had the flexibility to choose the datasets after looking at the preferred clusters in the proposed MIDR or MIDR (separate) system.

Similarly, the results of the MIDR and MIDR (separate) systems could not be directly compared. Evaluation of the MIDR system was performed based on 10 datasets recommended for each researcher, whereas evaluation of MIDR (separate) system was performed based on 10 recommended datasets for each research field (cluster), which could be more than 10 datasets if a researcher had more than one research fields (clusters). Hence, the NDCG@10 and P@10 scores of MIDR (separate) system were less than the MIDR system.

For researchers looking to find specific types of datasets, a keyword-based IR system might be more useful. For researcher who generally wanted to find datasets related to their interests, but did not have a particular interest in mind, could benefit from our system. For instance, if a researcher wanted a regular update of datasets relevant to their interest, our method would be better suited. However, this proposed system may not be useful to early-stage researchers due to fewer publications. They may take advantage of the available dataset retrieval systems such as DataMed, Omicseq and Google Dataset Search; or the text-based dataset searching that we provided on the website.

Error analysis

For some clusters, evaluators rated all recommended datasets as one star. In most of these cases, we observed that the research field of that cluster was out of the scope of GEO. In this case, the NDCG@10 score was close to 1, but the P@10 score was 0. This may be one of the reasons why NDCG@10 scores were much higher compared to P@10 scores.

Screenshots of dataset recommendation system Researcher 1 (up) and Researcher 2 (down).

Screenshots of dataset recommendation system Researcher 1 (up) and Researcher 2 (down).

Initially, we had not identified whether the clusters were related to GEO or not. We recommended datasets for these unrelated clusters. For example, Researcher 2 in Table  4 had a paper cluster which was related to statistical image analysis. For this specific cluster, Researcher 2 rated all the recommended datasets as one star, which reduced the scores of the systems.

Later, we identified a threshold by averaging the similarity scores of publications and datasets for each cluster, and were able to remove the clusters which were not related to GEO. The threshold was set to 0.05 for the present study, i.e. a cluster was not considered for evaluation or showing recommendation if the average similarity score of the top 10 datasets for that cluster was less than or equals to 0.05. This threshold technique improved the results of proposed systems by 3% for Researcher 2. However, a thorough investigation on threshold involving datasets from different biomedical domains is needed for future work.

Further, a dislike button for each cluster may be provided, and users may press the dislike button if that cluster is not related to GEO datasets. Later, this information can be used to build a machine learning-based system to identify and remove such clusters from further processing. This will improve the usefulness and reduce time complexity of the proposed recommendation system.

. Limitations

The researchers’ names are searched in PubMed to collect their publications. Many recent conference/journal publications are not updated in PubMed. Further, if the researcher has most of his/her publications that did not belong to the biomedical domain, then there is a low chance of getting those papers in PubMed. This makes the dataset recommendation task harder. Authors might later be able to include a subset of their non-PubMed articles for consideration in dataset recommendation (e.g. bioRxiv preprints), but this work is currently limited to PubMed publications only.

We used PubMed name search to find the titles of a researcher’s papers. Finally, the titles were matched with the text in the CV to get publications. If there is any typo in the CV, then that publication would be rejected from being processed in further steps. As we do not fully parse the CV, instead just performing string matching to find publications, there is a high chance of rejecting publications with small typos.

The manual evaluation was performed by five researchers only. For each cluster, 10 datasets were recommended, and each researcher has to evaluate an average of 40 datasets. It was a time-consuming task for evaluators to check each of the recommended datasets. For manual evaluation, we required the human judges with expertise on the GEO datasets, which was challenging to find. Further research will entail the scaling of this evaluation process.

GETc Platform

We developed the GETc research platform that recommends datasets to researchers using the proposed methods. A researcher needs to provide his/her name (as in PubMed) and CV (or list of publications) in the website. After processing his/her publications collected from PubMed, the recommendation system recommends datasets from GEO. Researchers can provide feedback for the datasets recommended by our system based on the evaluation criteria mentioned in Section  3.5 . A screenshot of the dataset recommendation system is shown in Figure  5 . This platform also recommends datasets using texts/documents, where cosine similarity of text and datasets are calculated, and datasets with a high score are recommended to users. Apart from dataset recommendation, it can also recommend literature and collaborators for each dataset. The platform analyzes time-course datasets using a specialized analysis pipeline (http://genestudy.org/pipeline) ( 25 ). We believe that these functions implemented in the GETc platform will significantly improve the reusability of datasets.

This work is the first step toward developing a dataset recommendation tool to connect researchers to relevant datasets they may not otherwise be aware of. The maximum NDGC@10, P@10 (P) and P@10 (S) of 0.89, 0.78 and 0.61 were achieved based on the proposed method (MIDR) using five evaluators. This recommendation system will hopefully lead to greater biomedical data reuse and improved scientific productivity. Similar dataset recommendation can be developed for different datasets from both biomedical and other domains.

The next goal is to identify the clusters which are not related to datasets and used for recommendations in the present article. These clusters can be removed from further experiments. Later, we plan to implement other embedding methods and test the dataset recommendation system on a vast number of users. A user-specific feedback-based system can be developed to remove datasets from the recommendations. Several additional dataset repositories can be added in the future. Other APIs can also be added to retrieve more complete representation of researcher’s publication history.

Availability:   http://genestudy.org/recommends/#/

We thank Drs. H.M, J.T.C, A.G and W.J.Z. for their help in evaluating the results and comments on designing that greatly improved the GETc research platform.

This project is mainly supported by the Center for Big Data in Health Sciences (CBD-HS) at School of Public Health, University of Texas Health Science Center at Houston (UTHealth) and partially supported by the Cancer Research and Prevention Institute of Texas (CPRIT) project RP170 668 (K.R., H.W.) as well as the National Institute of Health (NIH) (grant R00LM012104) (K.R.).

None declared.

Chen   X.  et al. . ( 2018 ) Datamed–an open source discovery index for finding biomedical datasets . Journal of the American Medical Informatics Association , 25 , 300 – 308 . doi: 10.1093/jamia/ocx121

Google Scholar

Roberts   K.  et al. . ( 2017 ) Information retrieval for biomedical datasets: the 2016 biocaddie dataset retrieval challenge . Database , 2017 , 1 – 9 . doi: 10.1093/database/bax068

Cohen   T.  et al. . ( 2017 ) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 biocaddie dataset retrieval challenge . Database , 2017 , 1 – 10 . doi: 10.1093/database/bax061

Karisani   P. , Qin   Z.S. and Agichtein   E.  et al.  ( 2018 ) Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval . Database , 2018 , 1 – 12 . doi: 10.1093/database/bax104

Wright   T.B. , Ball   D. and Hersh   W.  et al.  ( 2017 ) Query expansion using mesh terms for dataset retrieval: Ohsu at the biocaddie 2016 dataset retrieval challenge . Database , 2017 , 1 – 9 . doi: 10.1093/database/bax065

Scerri   A.  et al. . ( 2017 ) Elsevier’s approach to the biocaddie 2016 dataset retrieval challenge . Database , 2017 , 1 – 12 . doi: 10.1093/database/bax056

Wei   W.  et al. . ( 2018 ) Finding relevant biomedical datasets: the UC San Diego solution for the biocaddie retrieval challenge . Database , 2018 , 1 – 10 . doi: 10.1093/database/bay017

Sun   X.  et al. . ( 2017 ) Omicseq: a web-based search engine for exploring omics datasets . Nucleic acids research , 45 , W445 – W452 . doi: 10.1093/nar/gkx258

Jansen   B.J.  et al. . ( 2007 ) Determining the user intent of web search engine queries . In Proceedings of the 16th international conference on World Wide Web . ACM , Banff, Alberta, Canada   pp. 1149 – 1150 .

Achakulvisut   T.  et al. . ( 2016 ) Science concierge: A fast content-based recommendation system for scientific publications . PloS one , 11 , e0158423. doi: 10.1371/journal.pone.0158423

Patra   B.G.  et al. . ( 2020 ) A content-based literature recommendation system for datasets to improve data reusability. A case study on Gene Expression Omnibus (GEO) datasets . Journal of Biomedical Informatics , 104 , 103399. doi: 10.1016/j.jbi.2020.103399

Sansone   S.-A.  et al. . ( 2017 ) Dats, the data tag suite to enable discoverability of datasets . Scientific data , 4 , 170059. doi: 10.1038/sdata.2017.59

Ellefi   M.B.  et al.  ( 2016 ) Dataset recommendation for data linking: An intensional approach . In European Semantic Web Conference . Springer , Heraklion, Crete, Greece   pp. 36 – 51 .

Nunes   B.P.  et al.  ( 2013 ). Combining a co-occurrence-based and a semantic measure for entity linking . In Extended Semantic Web Conference , Springer , Montpellier, France   548 – 562 .

Srivastava   K.S. ( 2018 ). Predicting and recommending relevant datasets in complex environments . US Patent App. 15/721,122.

Ghavimi   B.  et al.  ( 2016 ) Identifying and improving dataset references in social sciences full texts . arXiv preprint arXiv:1603.01774 Positioning and Power in Academic Players, Agents and Agendas IOS Press   105 – 114 . doi: 10.3233/978-1-61499-649-1-105

Piwowar   H.A. and Chapman   W.W. ( 2008 ) Identifying data sharing in biomedical literature . In AMIA Annual Symposium Proceedings . American Medical Informatics Association , Washington, D.C., USA   Vol. 2008 , p 596.

Prasad   A.  et al. . ( 2019 ) Dataset mention extraction and classification . In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications . Association for Computational Linguistics , Minneapolis, Minnesota, USA   pp 31 – 36 .

Li   Z. , Li   J. and Yu   P.  et al. . ( 2018 ) Geometacuration: a web-based application for accurate manual curation of gene expression omnibus metadata . Database , 2018 , 1 – 8 . doi: 10.1093/database/bay019

Chen   G.  et al.  ( 2019 ) Restructured geo: restructuring gene expression omnibus metadata for genome dynamics analysis . Database , 2019 , 1 – 8 . doi: 10.1093/database/bay145

Neal   R.M. ( 2000 ) Markov chain sampling methods for dirichlet process mixture models . Journal of computational and graphical statistics , 9 , 249 – 265 .

Yin   J. and Wang   J. ( 2016 ) A model-based approach for text clustering with outlier detection . In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE) . IEEE , Helsinki, Finland , pp 625 – 636 .

Lenoir   T. and Giannella   E. ( 2006 ) The emergence and diffusion of dna microarray technology . Journal of biomedical discovery and collaboration , 1 , 11. doi: 10.1186/1747-5333-1-11

Wang   Y.  et al. . ( 2013 ) A theoretical analysis of ndcg ranking measures . In 26th Annual Conference on Learning Theory (COLT 2013)   PMLR   Princeton, NJ, USA . Vol. 8 .

Carey   M.  et al. . ( 2018 ) A big data pipeline: Identifying dynamic gene regulatory networks from time-course gene expression omnibus data with applications to influenza infection . Statistical methods in medical research , 27 , 1930 – 1955 . doi: 10.1177/0962280217746719

Author notes

Citation details: Patra,B.G., Roberts,K., Wu,H., A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository. Database (2020) Vol. 00: article ID baaa064; doi:10.1093/database/baaa064

Month: Total Views:
October 2020 561
November 2020 277
December 2020 66
January 2021 85
February 2021 80
March 2021 97
April 2021 64
May 2021 46
June 2021 42
July 2021 36
August 2021 39
September 2021 41
October 2021 40
November 2021 33
December 2021 44
January 2022 57
February 2022 66
March 2022 88
April 2022 69
May 2022 43
June 2022 22
July 2022 50
August 2022 20
September 2022 53
October 2022 43
November 2022 20
December 2022 29
January 2023 27
February 2023 65
March 2023 62
April 2023 54
May 2023 54
June 2023 91
July 2023 71
August 2023 55
September 2023 22
October 2023 57
November 2023 69
December 2023 69
January 2024 71
February 2024 116
March 2024 67
April 2024 59
May 2024 58
June 2024 50

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1758-0463
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

How To Make Recommendation in Case Study (With Examples)

How To Make Recommendation in Case Study (With Examples)

After analyzing your case study’s problem and suggesting possible courses of action , you’re now ready to conclude it on a high note. 

But first, you need to write your recommendation to address the problem. In this article, we will guide you on how to make a recommendation in a case study. 

Table of Contents

What is recommendation in case study, what is the purpose of recommendation in the case study, 1. review your case study’s problem, 2. assess your case study’s alternative courses of action, 3. pick your case study’s best alternative course of action, 4. explain in detail why you recommend your preferred course of action, examples of recommendations in case study, tips and warnings.

example of recommendation in case study 1

The Recommendation details your most preferred solution for your case study’s problem.

After identifying and analyzing the problem, your next step is to suggest potential solutions. You did this in the Alternative Courses of Action (ACA) section. Once you’re done writing your ACAs, you need to pick which among these ACAs is the best. The chosen course of action will be the one you’re writing in the recommendation section. 

The Recommendation portion also provides a thorough justification for selecting your most preferred solution. 

Notice how a recommendation in a case study differs from a recommendation in a research paper . In the latter, the recommendation tells your reader some potential studies that can be performed in the future to support your findings or to explore factors that you’re unable to cover. 

example of recommendation in case study 2

Your main goal in writing a case study is not only to understand the case at hand but also to think of a feasible solution. However, there are multiple ways to approach an issue. Since it’s impossible to implement all these solutions at once, you only need to pick the best one. 

The Recommendation portion tells the readers which among the potential solutions is best to implement given the constraints of an organization or business. This section allows you to introduce, defend, and explain this optimal solution. 

How To Write Recommendation in Case Study

example of recommendation in case study 3

You cannot recommend a solution if you are unable to grasp your case study’s issue. Make sure that you’re aware of the problem as well as the viewpoint from which you want to analyze it . 

example of recommendation in case study 4

Once you’ve fully grasped your case study’s problem, it’s time to suggest some feasible solutions to address it. A separate section of your manuscript called the Alternative Courses of Action (ACA) is dedicated to discussing these potential solutions. 

Afterward, you need to evaluate each ACA by identifying its respective advantages and disadvantages. 

example of recommendation in case study 5

After evaluating each proposed ACA, pick the one you’ll recommend to address the problem. All alternatives have their pros and cons so you must use your discretion in picking the best among these ACAs.

To help you decide which ACA to pick, here are some factors to consider:

  • Realistic : The organization must have sufficient knowledge, expertise, resources, and manpower to execute the recommended solution. 
  • Economical: The recommended solution must be cost-effective.
  • Legal: The recommended solution must adhere to applicable laws.
  • Ethical: The recommended solution must not have moral repercussions. 
  • Timely: The recommended solution can be executed within the expected timeframe. 

You may also use a decision matrix to assist you in picking the best ACA 1 .  This matrix allows you to rank the ACAs based on your criteria. Please refer to our examples in the next section for an example of a Recommendation formed using a decision matrix. 

example of recommendation in case study 6

Provide your justifications for why you recommend your preferred solution. You can also explain why other alternatives are not chosen 2 .  

example of recommendation in case study 7

To help you understand how to make recommendations in a case study, let’s take a look at some examples below.

Case Study Problem : Lemongate Hotel is facing an overwhelming increase in the number of reservations due to a sudden implementation of a Local Government policy that boosts the city’s tourism. Although Lemongate Hotel has a sufficient area to accommodate the influx of tourists, the management is wary of the potential decline in the hotel’s quality of service while striving to meet the sudden increase in reservations. 

Alternative Courses of Action:

  • ACA 1: Relax hiring qualifications to employ more hotel employees to ensure that sufficient human resources can provide quality hotel service
  • ACA 2: Increase hotel reservation fees and other costs as a response to the influx of tourists demanding hotel accommodation
  • ACA 3: Reduce privileges and hotel services enjoyed by each customer so that hotel employees will not be overwhelmed by the increase in accommodations.

Recommendation: 

Upon analysis of the problem, it is recommended to implement ACA 1. Among all suggested ACAs, this option is the easiest to execute with the minimal cost required. It will not also impact potential profits and customers’ satisfaction with hotel service.

Meanwhile, implementing ACA 2 might discourage customers from making reservations due to higher fees and look for other hotels as substitutes. It is also not recommended to do ACA 3 because reducing hotel services and privileges offered to customers might harm the hotel’s public reputation in the long run. 

The first paragraph of our sample recommendation specifies what ACA is best to implement and why.

Meanwhile, the succeeding paragraphs explain that ACA 2 and ACA 3 are not optimal solutions due to some of their limitations and potential negative impacts on the organization. 

Example 2 (with Decision Matrix)

Case Study: Last week, Pristine Footwear released its newest sneakers model for women – “Flightless.” However, the management noticed that “Flightless” had a mediocre sales performance in the previous week. For this reason, “Flightless” might be pulled out in the next few months.  The management must decide on the fate of “Flightless” with Pristine Footwear’s financial performance in mind. 

  • ACA 1: Revamp “Flightless” marketing by hiring celebrities/social media influencers to promote the product
  • ACA 2: Improve the “Flightless” current model by tweaking some features to fit current style trends
  • ACA 3: Sell “Flightless” at a lower price to encourage more customers
  • ACA 4: Stop production of “Flightless” after a couple of weeks to cut losses

Decision Matrix

Recommendation

Based on the decision matrix above 3 , the best course of action that Pristine Wear, Inc. must employ is ACA 3 or selling “Flightless” shoes at lower prices to encourage more customers. This solution can be implemented immediately without the need for an excessive amount of financial resources. Since lower prices entice customers to purchase more, “Flightless” sales might perform better given a reduction in its price.

In this example, the recommendation was formed with the help of a decision matrix. Each ACA was given a score of between 1 – 4 for each criterion. Note that the criterion used depends on the priorities of an organization, so there’s no standardized way to make this matrix. 

Meanwhile, the recommendation we’ve made here consists of only one paragraph. Although the matrix already revealed that ACA 3 tops the selection, we still provided a clear explanation of why it is the best. 

  • Recommend with persuasion 4 . You may use data and statistics to back up your claim. Another option is to show that your preferred solution fits your theoretical knowledge about the case. For instance, if your recommendation involves reducing prices to entice customers to buy higher quantities of your products, you may invoke the “law of demand” 5 as a theoretical foundation of your recommendation. 
  • Be prepared to make an implementation plan. Some case study formats require an implementation plan integrated with your recommendation. Basically, the implementation plan provides a thorough guide on how to execute your chosen solution (e.g., a step-by-step plan with a schedule).
  • Manalili, K. (2021 – 2022). Selection of Best Applicant (Unpublished master’s thesis). Bulacan Agricultural State College. Retrieved September 23, 2022, from https://www.studocu.com/ph/document/bulacan-agricultural-state-college/business-administration/case-study-human-rights/19062233.
  • How to Analyze a Case Study. (n.d.). Retrieved September 23, 2022, from https://wps.prenhall.com/bp_laudon_essbus_7/48/12303/3149605.cw/content/index.html
  • Nguyen, C. (2022, April 13). How to Use a Decision Matrix to Assist Business Decision Making. Retrieved September 23, 2022, from https://venngage.com/blog/decision-matrix/
  • Case Study Analysis: Examples + How-to Guide & Writing Tips. (n.d.). Retrieved September 23, 2022, from https://custom-writing.org/blog/great-case-study-analysis
  • Hayes, A. (2022, January O8). Law of demand. Retrieved September 23, 2022, from https://www.investopedia.com/terms/l/lawofdemand.asp

Written by Jewel Kyle Fabula

in Career and Education , Juander How

case study on recommendation system

Jewel Kyle Fabula

Jewel Kyle Fabula is a Bachelor of Science in Economics student at the University of the Philippines Diliman. His passion for learning mathematics developed as he competed in some mathematics competitions during his Junior High School years. He loves cats, playing video games, and listening to music.

Browse all articles written by Jewel Kyle Fabula

Copyright Notice

All materials contained on this site are protected by the Republic of the Philippines copyright law and may not be reproduced, distributed, transmitted, displayed, published, or broadcast without the prior written permission of filipiknow.net or in the case of third party materials, the owner of that content. You may not alter or remove any trademark, copyright, or other notice from copies of the content. Be warned that we have already reported and helped terminate several websites and YouTube channels for blatantly stealing our content. If you wish to use filipiknow.net content for commercial purposes, such as for content syndication, etc., please contact us at legal(at)filipiknow(dot)net

case study on recommendation system

ACM Digital Library home

  • Advanced Search

Using Recommendation Systems in Disaster Management: A Systematic Literature Review

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, design and evaluation of a role improvisation exercise for crisis and disaster response teams.

This paper reports a case study, which aimed to investigate what role improvisation plays in crisis and disaster response teams and how improvisation performance of such teams can be simulated in representative contexts and constraints. The case study ...

Dynamic and Time Critical Emergency Management for Level Three Disaster: A Case Study Analysis of Kerala Floods 2018

Extreme variability of weather conditions has contributed to the increase in multi hazards events in Kerala, India. This work analyses the processes, systems and solutions used for management of a 24hr call center for an effective emergency response ...

A Fuzzy Integer Programming Model to Locate Temporary Medical Facilities as Part of Pre-Disaster Management

The number and the scale of natural disasters have drastically increased over the last decades. One of the most vital stages of disaster preparedness is disaster response planning, and it plays an important role in limiting material and immaterial ...

Information

Published in.

Elsevier Science Publishers B. V.

Netherlands

Publication History

Author tags.

  • disaster management
  • emergency management
  • recommendation systems
  • systematic literature review
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. Brief on Recommender Systems. Different types of recommendation…

    case study on recommendation system

  2. A case study of recommendation interpretation.

    case study on recommendation system

  3. How To Make Recommendation in Case Study (With Examples)

    case study on recommendation system

  4. Top 10 Data Science Use Cases in Retail [Case Studies Included

    case study on recommendation system

  5. Recommendation System Algorithms: An Overview

    case study on recommendation system

  6. Recommendation system

    case study on recommendation system

VIDEO

  1. Create a recommendation system for news articles based on user interests and reading history. with J

  2. February Planning Committee meeting

  3. Recommendation System for Books

  4. study tool recommendation #study #homework #studytools 1957

  5. 16.5 Personalized Ranking for Recommender Systems

  6. studying economics studying stocks Which stocks will go up the most when interest rates are cut?

COMMENTS

  1. Netflix Recommender System

    The study of the recommendation system is a branch of information filtering systems (Recommender system, 2020). Information filtering systems deal with removing unnecessary information from the data stream before it reaches a human. Recommendation systems deal with recommending a product or assigning a rating to item.

  2. A Complete Study of Amazon's Recommendation System

    Amazon is the largest e-commerce brand in the world in terms of revenue and market share. ( Statista) In 2021, Amazon's net revenue from e-commerce sales was US$470 billion, and about 35 percent of all sales on Amazon happen via recommendations. This clearly elucidates the power of recommendations. In this case study, we look at how Amazon is ...

  3. Deep learning for recommender systems: A Netflix case study

    On the practical side, integrating deep-learning toolboxes in our system has made it faster and easier to implement and experiment with both deep-learning and non-deep-learning approaches for various recommendation tasks. We conclude this article by summarizing our take-aways that may generalize to other applications beyond Netflix.

  4. (PDF) A Case Study on Recommendation Systems Based on Big Data

    A Case Study on Recommendation Systems Based on Big Data: Proceedings of the Second International Conference on SCI 2018, Volume 2 January 2019 DOI: 10.1007/978-981-13-1927-3_44

  5. Recommender Systems

    Content-Based vs. Collaborative Filtering approaches for recommender systems. (Image by author) Content-Based Approach. Content-based methods describe users and items by their known metadata.Each item i is represented by a set of relevant tags—e.g. movies of the IMDb platform can be tagged as"action", "comedy", etc. Each user u is represented by a user profile, which can created from ...

  6. Contemporary Recommendation Systems on Big Data and Their Applications

    As recommendation system algorithms, neural networks, and big data continue to evolve, their integration into various aspects of our lives is increasingly evident. ... S. Amer-Yahia, A comparative evaluation of top-n recommendation algorithms: Case study with total customers, in: 2020 IEEE International Conference on Big Data (Big Data), IEEE, ...

  7. A Survey on Modern Recommendation System based on Big Data

    Abstract. This survey provides an exhaustive exploration of the evolution and current state of recommendation systems, which have seen widespread integration in various web applications. It focuses on the advancement of personalized recommendation strategies for online products or services. We categorize recommendation techniques into four ...

  8. Deep Learning for Recommender Systems: A Netflix Case Study

    In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service. We found that different model architectures excel at different tasks. Even though many deep-learning models can be ...

  9. Recommender Systems in Industry: A Netflix Case Study

    The goal of this chapter is to give an up-to-date overview of recommender systems techniques used in an industrial setting. We will give a high-level description the practical use of recommendation and personalization techniques. We will highlight some of the main lessons learned from the Netflix Prize. We will then use Netflix personalization ...

  10. Recommender systems: Trends and frontiers

    The six papers in this special issue push the current frontiers in recommender systems and address several of the challenges of open questions outlined above. In their article, Jannach and Chen ( 2022) elaborate why building a conversational recommender system is difficult, and consider such systems a "Grand AI Challenge".

  11. Systematic Review of Recommendation Systems for Course Selection

    We examined case studies conducted over the previous six years (2017-2022), with a focus on 35 key studies selected from 1938 academic papers found using the CADIMA tool. ... Baskota, A.; Ng, Y.K. A graduate school recommendation system using the multi-class support vector machine and KNN approaches. In Proceedings of the 2018 IEEE ...

  12. A Case Study on Recommendation Systems Based on Big Data

    7 Case Study—(Drug Recommendation System—Medicines) Due to the quick improvement of e-commerce, most of the user preferring to buy medicine online for the sake of comfort. Moveover user does not realize issue while purchasing drug without the knowledge and proper guideless. Major problem while purchasing medicines for online site namely ...

  13. A Novel Approach to Recommendation System Business Workflows: A Case

    We observe a number of studies in recommendation systems that utilize a variety of approaches, such as rule-based recommendation systems, case-based reasoning-based recommendation systems, and hybrid recommendations systems using collaborative filtering-based algorithms [18,19,20,21,22,23,24]. However, in this particular study, we propose a ...

  14. A Case Study on Various Recommendation Systems

    A detailed review of various recommendation systems is presented and typically recommender systems are based on the keyword search which allows the efficient scanning of very large document collections. The goal of a recommender system is to generate relevant recommendations for users. It is an information filtering technique that assists users by filtering the redundant and unwanted data from ...

  15. Recommendation systems: Principles, methods and evaluation

    Recommender system has the ability to predict whether a particular user would prefer an item or not based on the user's profile. Recommender systems are beneficial to both service providers and users [3]. They reduce transaction costs of finding and selecting items in an online shopping environment [4].

  16. 5 Use Case Scenarios and how Recommendation Systems can Help

    5 Use Case Scenarios for Recommendation Systems and How They Help. The market for recommendation engines is projected to grow from USD 1.14B in 2018 to 12.03B by 2025 with a CAGR of 32.39%, for the forecasted period. These figures are an indication of the growing emphasis on customer experience while also being a byproduct of the widespread ...

  17. Facebook Recommendation System Case Study

    May 29, 2021. 1. In daily life we have used many social media application like FB, IG etc. but do you think how this platform will provide the friend recommendation alert. Also, the alert for the ...

  18. Health Recommender Systems: Systematic Review

    The search keywords were as follows, using an inclusive OR: (recommender OR recommendation systems OR recommendation system) ... The authors also limited themselves to their specific case studies and did not make any recommendations for policy (last box plot is presented in Figure 2). All 73 studies reported the use of different data sets.

  19. Movie Recommendation Systems Based on Collaborative Filtering: A Case

    Recommendation system is playing a vital role in everyone's life as demand of recommendation for user's interest increasing day by day. ... 2021 Movie Recommendation Systems Based on Collaborative Filtering: A Case Study on Netflix Muhammed SÜTÇÜ 2 *1,2 , Ecem KAYA 2 , Oğuzkan ERDEM 2 *1 Abdullah Gül University, Faculty of Engineering ...

  20. Case study: Recommendation System

    Read how a recommendation engine was built on the foundation of the Openkoda, utilizing a significant number of pre-existing features within the platform. ... Case studies: Recostream. Recommendation engine. august 2020. ... Once the system was validated and promoted from the experimental phase to production, the same team of developers was ...

  21. content-based dataset recommendation system for researchers—a case

    Braja Gopal Patra, Kirk Roberts, Hulin Wu, A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository, Database, Volume 2020, 2020, ... The recommendation system can be viewed as an IR system where the most similar datasets can be retrieved for a researcher using his/her publications.

  22. How To Make Recommendation in Case Study (With Examples)

    How To Write Recommendation in Case Study. 1. Review Your Case Study's Problem. 2. Assess Your Case Study's Alternative Courses of Action. 3. Pick Your Case Study's Best Alternative Course of Action. 4. Explain in Detail Why You Recommend Your Preferred Course of Action.

  23. Using Recommendation Systems in Disaster Management: A Systematic

    Selected studies were evaluated to provide an answer to five different research questions. We considered what are the most common recommendation approaches adopted in these studies, the input and output of the recommendation system, the disaster management phase involved, and the contribution of the study in the field.

  24. Factors that influence the performance management system in the

    21) did a survey study of government departments in Canada and the USA; the study showed PM was a system that is in constant need of improvement, and the study showed the balanced scorecard was a mainstream method for performance evaluation, numerous factors affect employee performance in organizations.