dc.description.abstract | A patent is a detailed description of technology owned by its rightful owner and the information contained in it can bring high economic value. The number of inventions registered has been increasing dramatically due to the developing and advent of new technologies. Thus, patent search and classification also progress through time to meet the demand of exploiting and protecting exclusive rights of the holders.
Current search approaches simply use “LIKE” statement to query data as text on relational database. However, using “LIKE” query will have some limitations: searching only in the predefined row, high noise, no search ranking and poor performance.
The aim of this thesis is to study data processing, information retrieval, graph-based search methods and full-text search methods for keyword searching based on the set of inventions documented in XML-based patents released by the United States Patent and Trademark Office (USPTO) – the federal agency for granting U.S. patents and registering trademarks [1].
Our contribution is two-fold. First, this thesis applies Neo4j graph database for managing classification and patents to reduce noise and improve performance. Second, this thesis integrates indexing and searching of Lucene for full-text search to provide search ranking. | en_US |