A data mining approach for determination of hindered internal rotation parameters for complex chemical systems
Abstract
A rigorous Hindered Internal Rotation treatment is essential to obtain correct thermodynamic properties for chemical species. Such approach requires detailed information about the rotation (i.e., rotational axis, group, frequency and symmetry and hindrance potential) [1]. However, the input parameters of these rotational components are numerous and tedious for chemists to prepare. Especially, the rotational frequency is considered the most difficult component due to the complex molecular structure and mixing modes of chemical species. Recently, there has been a study to help chemists with this arduous process [2]. To generate the rotational frequency, this previous research adopted a pre-defined-rule approach; thus, it lacked the ability to cover more complex cases in the future. Therefore, in this thesis, a data mining approach is proposed to help better predict the HIR of the chemical species. Within this framework, the pattern of the HIR will be found using the features extracted from existing chemical data provided by the domain experts. More importantly, the machine learning models were implemented to discover the effect of each component of the internal rotation. With such knowledge, chemists can have a deeper understanding of the HIR itself. The results of the conducted experiments were demonstrated to be more accurate and complete than the previous study. It also gives meaningful insights into the domain problem by expressing the contributions of the features in terms of weights. Finally yet importantly, the machine learning models found in this research were also integrated into our state-of-the-art tool MSMC-GUI (https://sites.google.com/site/msmccode/manual/gui-1) to provide both a convenient and powerful tool for the user to prepare the data needed for their thermodynamic computation.