dc.description.abstract | The aim of this thesis is to apply Vector Space Model concept and Web Mining Algorithm to extract the meaningful information from a certain web page. First thing to do is to crawl the web and remove, ignore all the redundancy and noise from the web page. Then, the Algorithm will continue to extract all the main text of that web page. Next, Latent Semantic Indexing Algorithm (LSI) are applied to make sure that all of the extracted text is actually related to the title of the current page. The methodology for LSI is used to change every words and sentence of text into a Vector Space Model called TF-IDF, or TF – IDF matrix, in which each element in a vector is a weighted number. After that, this model is truncated to create a subspace in which only meaningful words remain by using a technique called: Singular Value Decomposition (SVD). Moreover, the similarity between the text and the title can be calculated by using Cosine Similarity measurement. However, another important step that can’t be ignored before we handle the data is to pre-process data is also needed to research in details. Last but not least, other resources that was used during this thesis will be introduced later. | en_US |