An Automated Web Structure-based Method for Predicting the Importance of a Webpage

Article Fingerprint
Research ID VI4DB

IntelliPaper

Abstract

The aim of this article is to develop a method to find the importance of web pages without using web browser data or invading the privacy of users. Rather, it works on the structure of a website. To achieve this goal, we propose a novel method that can take webpage content as input and produce a score for each page automatically. Initially, we extract content from a web page in real-time. Subsequently, we consider two important factors based on the website structure: (1) “What is the minimum number of clicks needed to access web pages in a website?” and (2) “How a web page is linked with other web pages in a website?” We use a learning method to train our model by using the “web page views” results generated by “Google Analytics” and “Similar Web”. Experiments and Case studies on the world’s most popular websites show that our method can produce very effective results in real-time.

Explore Digital Article Text

Article file ID not found.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

Not applicable

Data Availability

The datasets used in this study are openly available at [repository link] and the source code is available on GitHub at [GitHub link].

Funding

This work did not receive any external funding.

Cite this article

Generating citation...

Related Research

  • Classification

    DDC Code: 005.2762 LCC Code: QA76.73.J39

  • Version of record

    v1.0

  • Issue date

    21 September 2022

  • Language

    en

Iconic historic building with domed tower in London, UK.
Open Access
Research Article
CC-BY-NC 4.0
Support