# WATER-TOLUENE (ΔG_toluene - ΔG_water) TRANSFER FREE ENERGY PREDICTIONS
#
# This file will be automatically parsed. It must contain the following four elements:
# predictions, name of method, software listing, and method description.
# These elements must be provided in the order shown with their respective headers.
#
# Any line that begins with a # is considered a comment and will be ignored when parsing.
#
# 
# PREDICTION SECTION
#
# It is mandatory to submit water to octanol (ΔG_octanol - ΔG_water) transfer free energy (TFE) predictions for all 22 molecules.
# Incomplete submissions will not be accepted.
# The energy units must be in kcal/mol.

# Please report the general molecule `ID tag` in the form of `SAMPL9-XX` (e.g. SAMPL9-1, SAMPL9-2, etc).
# Please report TFE standard error of the mean (SEM) and TFE model uncertainty.
#
# The data in each prediction line should be structured as follows:
# ID tag, TFE, TFE SEM, TFE model uncertainty
#
# If you use a microstate other than the challenge provided microstate, please note SMILES strings of microstates you used in your submission, such as in the methods section.
#
# The list of predictions must begin with the 'Predictions:' keyword as illustrated here.
Predictions:
SAMPL9-1,3.36,0.42,0.42
SAMPL9-2,1.75,0.31,0.31
SAMPL9-3,7.03,0.74,0.74
SAMPL9-4,5.97,0.92,0.92
SAMPL9-5,3.99,1.18,1.18
SAMPL9-6,-4.01,1.08,1.08
SAMPL9-7,2.39,1.10,1.10
SAMPL9-8,-2.22,1.91,1.91
SAMPL9-9,6.16,0.99,0.99
SAMPL9-10,3.04,0.96,0.96
SAMPL9-11,6.63,0.43,0.43
SAMPL9-12,-2.10,0.82,0.82
SAMPL9-13,1.12,1.16,1.16
SAMPL9-14,0.52,0.73,0.73
SAMPL9-15,9.62,0.93,0.93
SAMPL9-16,0.99,1.90,1.90
#
#
#
# Please list your name, using only UTF-8 characters as described above. The "Participant name:" entry is required.
Participant name:
Prajay Patel

#
#
# Please list your organization/affiliation, using only UTF-8 characters as described above.
Participant organization:
University of Dallas

#
#
# NAME SECTION
#
# Please provide an informal but informative name of the method used.
# The name must not exceed 40 characters.
# The 'Name:' keyword is required as shown here.
Name:
RI-B3LYP/def2-TZVP with MD clusters

#
#
# COMPUTE TIME SECTION
#
# Please provide the average compute time across all of the molecules.
# For physical methods, report the GPU and/or CPU compute time in hours.
# For empirical methods, report the query time in hours.
# Create a new line for each processor type.
# The 'Compute time:' keyword is required as shown here.
Compute time:
CPU + GPU compute time - 792

#
# COMPUTING AND HARDWARE SECTION
#
# Please provide details of the computing resources that were used to train models and make predictions.
# Please specify compute time for training models and querying separately for empirical prediction methods.
# Provide a detailed description of the hardware used to run the simulations.
# The 'Computing and hardware:' keyword is required as shown here.
Computing and hardware:
Dell Precision 5820 Tower Workstation, OS: Windows 11 Pro, RAM: 64.0 GB, CPU: Intel(R) Xeon(R) W-2275 CPU @ 3.30GHz 3.31 GHz, 14 cores
Dell Precision 5820 Tower Workstation, OS: Windows 11 Pro, RAM: 64.0 GB, CPU: Intel(R) Xeon(R) W-2295 CPU @ 3.30GHz 3.00 GHz, 18 cores

# SOFTWARE SECTION
#
# List all major software packages used and their versions.
# Create a new line for each software.
# The 'Software:' keyword is required.
Software:
ORCA 5.0.1
GROMACS 2021.2
OpenMM
Python 3.9

# METHOD CATEGORY SECTION
#
# State which method category your prediction method is better described as:
# `Physical (MM)`, `Physical (QM)`, `Empirical`, or `Mixed`.
# Pick only one category label.
# The `Category:` keyword is required.
Category:
Mixed

# METHOD DESCRIPTION SECTION
#
# Methodology and computational details.
# Level of details should be roughly equivalent to that used in a publication.
# Please include the values of key parameters with units.
# Please explain how statistical uncertainties were estimated.
#
# If you have evaluated additional microstates, please report their SMILES strings and populations of all the microstates in this section.
# If you used a microstate other than the challenge provided microstate (`SMXX_micro000`), please list your chosen `Molecule ID` (in the form of `SMXX_extra001`) along with the SMILES string in your methods description.
#
# Use as many lines of text as you need.
# All text following the 'Method:' keyword will be regarded as part of your free text methods description.
Method:
DFT-optimized structures (RI-B3LYP/def2-SVP) using 3D structure generated from provided SMILES strings were used as inputs to generate the topology files for the challenge molecules and toluene using CGenFF and CHARMM-GUI. 
The TIP4P potential for water was used for all aqueous simulations.
Each challenge molecule was placed in a 3 x 3 x 3 nm box and then solvated. 
For toluene solutions, approximately 152 toluene molecules were placed in the box and equilibrated to represent the concentration of toluene in toluene in a 27 cubic nanometer box.
For aqueous solutions, approximately 900 water molecules were placed in the box to represent the density of water.
Once the boxes were created, 499 frames were extracted from a 4 ns NVT simulation.
With the 499 frames, the solvation free energy of the challenge molecule was computed in ORCA using the B97-3c method with an implicit solvent model of toluene and water, which were used to generate a grid of 249001 possible logP values using Python.
Based on the histogram of computed logP values at the B97-3c level, multimodal distributions resulted in population-based weighting of each distribution.
Exploratory data analysis was done in Python using the RDKit and scikit-learn packages to identify and confirm potential clusters of structures based on 3D descriptors.
For each cluster/distribuion of logP values, five frames in water and five frames in toluene that yielded a logP value around the mean of the distribution were selected. 
For selected frames, B3LYP/def2-TZVP def2-TZVP/C RIJCOSX gas phase frequency calculations were performed. 
The structures were not optimized to a local minima since the frames were selected from MD simulations.
Thermal free energy corrections at 298.15 K were obtained with harmonic frequencies scaled by 1.0044 to account for anharmonicity.
Solvation free energy calculations were performed at the B3LYP/def2-TZVP level of theory with the RIJCOSX approximation and the corresponding auxiliary basis set (def2-TZVP/C).
Solvated energies were computed and the logP was calculated using a vertical solvation approach that calculates logP from the solvation free energies of the challenge molecules in toluene and water.
In the cases of SAMPL9-11, SAMPL9-14, and SAMPL9-15, multimodal distributions were observed, thus to report the logP mean and standard error, a weighted average and an inverse variance weighted standard error was applied. 
For all other challenge molecules, the mean and standard error of the logP values of the five frames were used to calculate the logP.
This method was done with major contributions from Jonathan Dannatt (University of Dallas) and Asa Waterman (University of Dallas)

#
#
# All submissions must either be ranked or non-ranked.
# Only one ranked submission per participant is allowed.
# Multiple ranked submissions from the same participant will not be judged.
# Non-ranked submissions are accepted so we can verify that they were made before the deadline.
# The "Ranked:" keyword is required, and expects a Boolean value (True/False)
Ranked:
True