Abstract
The aim of this article is to develop a method to find the importance of web pages without using web browser data or invading the privacy of users. Rather, it works on the structure of a website. To achieve this goal, we propose a novel method that can take webpage content as input and produce a score for each page automatically. Initially, we extract content from a web page in real-time. Subsequently, we consider two important factors based on the website structure: (1) “What is the minimum number of clicks needed to access web pages in a website?” and (2) “How a web page is linked with other web pages in a website?” We use a learning method to train our model by using the “web page views” results generated by “Google Analytics” and “Similar Web”. Experiments and Case studies on the world’s most popular websites show that our method can produce very effective results in real-time.
Keywords