Identifying organic compounds with visible light

by Justin Jackson, Phys.org
Graphic abstract. Credit: Journal of Physical Chemistry A (2023). DOI: 10.1021/acs.jpca.2c07955
Researchers from the Universidad de Santiago de Chile and the University of Notre Dame, working with machine learning, have devised a method to identify organic compounds based on the refractive index at a single optical wavelength. The technique could have research and industrial applications for automated chemical analysis that is cheaper, safer and requires less expertise to operate.
In the paper, “Machine learning identification of organic compounds using visible light,” published in The Journal of Physical Chemistry A, the researchers document the creative and novel way in which they acquired a unique data set and the steps they used to construct a test. of the organic chemistry detector concept.
The machine learning was trained on a publicly available database of past optical experiments with data published in the scientific literature dating back to 1940. In this database, the researchers found all the parameters needed to compile the profiles of identification for 61 organic molecules; group velocity and group velocity dispersion, measurement wavelength range and state of matter of samples, refractive indices and extinction coefficients over a wide wavelength range. In total, 194,816 spectral records of refractive index and extinction curves of 61 organic compounds and polymers were applied.
In a typical infrared (IR) molecular sorting detector, the identity of the molecule is confirmed by absorption and peak Raman scattering, creating a fingerprint of combined features that match a database. The static refractive index of organic compounds is a single-valued property that does not have the same encoded information. The same applies to refractive index databases at single wavelengths away from the ultraviolet and infrared absorption resonances, which is probably why visible light has not been used to classify organic molecules.
Initial testing with raw data reached 80%, and the researchers tried to increase it from there. The original database was not intended for machine learning optimization, as much of it came from research conducted before the first home computer was invented. There was a large amount of information on the wavelengths in the UV and IR that the AI was being trained on. So the researchers decided to take a more focused approach.
Several data preprocessing strategies were used to simulate a more idealized learning environment for AI. The goal was to create a balanced set of data so that the AI did not prioritize certain features over others just by the volume of information. Oversampling and undersampling and physical data-based augmentation techniques were used to substantially reduce the impact of IR wavelengths on the overall data set. By training with balanced preprocessed data, the researchers achieved molecular classification test accuracy in visible regions of better than 98%.
The researchers state that additional work is needed to extend and generalize the classifier to identify structural and other chemical features of molecules that are present in the Refractive Index database. In summary, they write that the work is a good starting point for the development of remote chemical sensors.
More information: Thulasi Bikku et al, Machine Learning Identification of Organic Compounds Using Visible Light, The Journal of Physical Chemistry A (2023). DOI: 10.1021/acs.jpca.2c07955
Journal information: Journal of Physical Chemistry A
© 2023 Science X Network