Dataset Cleaning Methods

Processing raw metabolomics data requires careful cleaning to handle missing values and minimize noise.

🩹 Missing Value Imputation
Handles unrecorded concentrations using statistical estimates. Options include KNN (K-Nearest Neighbors), median replacement, or half-LOD (Limit of Detection) substitution for low-abundance metabolites.
KNN Median 1/2 LOD
🔄 Data Transformation
Applies mathematical functions to stabilize variance and normalize distributions. Log transformation and cube root transformation are commonly used to handle the exponential nature of metabolic concentrations.
Log10 Cube Root Generalized Log
⚖️ Scaling Methods
Ensures all metabolites are on a comparable scale, preventing high-concentration metabolites from dominating the statistical model. Pareto scaling and Auto-scaling (UV scaling) are standard choices.
Pareto Auto-scaling Range Scaling
🧹 Feature Filtering
Removes unreliable metabolites based on user-defined thresholds. Standard filters include Coefficient of Variation (CV) limits and maximum allowable percentage of missing values.
CV Filter Missingness Filter

Ready to Process?

Clean your data by applying these methods in the "Analysis Options" section during your next upload.

Start New Analysis