dc.description.abstract | The widespread of E-commerce over the years leads to the dramatic growth of customers and
products and poses some significant hurdles for recommender systems. It is required to develop
new recommender system technologies that can swiftly generate high-quality
recommendations, especially for very large-scale problems. Over the past few years, there has
been significant progress in recommendation methodologies, encompassing conventional
recommendation methods like collaborative filtering, content-based recommendation, and
matrix factorization, as well as deep learning-based technologies. Due to its capacity to
recognize non-linear user-item relationships and work with several data sources, including
images and text, deep learning in particular offers substantial advantages in completing
challenging assignments and handling complicated data. As a result, it is being used in
recommender systems more often. To maximize customer transactions, reward signals from the
environment can be employed in reinforcement learning to learn and optimize their decisions.
However, training a recommender system is often difficult because it is necessary to expose
users to irrelevant recommendations. As a result, learning the policy through logged implicit
feedback is vital, which is difficult due to the lack of off-policy settings and negative rewards
(feedback). This thesis is based on the fundamental concept of self-supervised reinforcement
learning in the context of sequential recommendation tasks. The objective of our research is to
investigate the utilization of reinforcement learning in recommendation systems, drawing upon
the methodologies employed by prominent companies such as JD, Microsoft, ByteDance,
Alibaba, and Google in their respective recommendation systems. We endeavor to implement
the Top-K method, which is one of Google's renowned techniques for proposing videos on the
YouTube platform. Initially, the algorithm is re-executed using an e-commerce dataset to
comprehend its functionality, as well as its input and output. Subsequently, we endeavor to
reapply the aforementioned approach to the MovieLens dataset to evaluate its outcomes.
Ultimately, a resolution can be reached regarding a recommender system that utilizes
reinforcement learning. | en_US |