Dataset Cleaning Methods
Processing raw metabolomics data requires careful cleaning to handle missing values and minimize noise.
Missing Value Imputation
Handles unrecorded concentrations using statistical estimates. Options include KNN (K-Nearest
Neighbors), median replacement, or half-LOD (Limit of Detection) substitution for low-abundance
metabolites.
KNN
Median
1/2 LOD
Data Transformation
Applies mathematical functions to stabilize variance and normalize distributions. Log transformation and
cube root transformation are commonly used to handle the exponential nature of metabolic concentrations.
Log10
Cube Root
Generalized Log
Scaling Methods
Ensures all metabolites are on a comparable scale, preventing high-concentration metabolites from
dominating the statistical model. Pareto scaling and Auto-scaling (UV scaling) are standard choices.
Pareto
Auto-scaling
Range Scaling
Feature Filtering
Removes unreliable metabolites based on user-defined thresholds. Standard filters include Coefficient of
Variation (CV) limits and maximum allowable percentage of missing values.
CV Filter
Missingness Filter
Ready to Process?
Clean your data by applying these methods in the "Analysis Options" section during your next upload.
Start New Analysis