Feature Selection, ML & Stats

Scientific Research Project | 2015
work-single-image

Summary

In this project I implemented a pipeline in R to rank and select proteins based on three different Machine Learning ranking approaches: multivariate, univariate and semi-multivariate. The methods were SVM-RFE, Beta-Binomial and NSC. The project was developed in colaboration with the Brazilian Biosciences National Laboratory and experts from different fields.

The dataset was a result from a Discovery Proteomics quantification, which resulted in a high number of variables and low number of samples. For this reason, we implemented the multivariate and semi-multivariate classifiers/rankers in a Double Cross Validation scheme.

Multidimensional projection and trees visualization helped us to understand the data and the results.

Category
Machine Learning
Feature Selection
Statistics
Visualization
My job
Design
Develop
Research
Report
Technology
R
Visualization
Classifiers
Statistical tests
Multidimensional Projection

Publication

R., Meirelles, G. V., Heberle, H., Domingues, R. R., Granato, D. C., Yokoo, S., … Leme, A. F. P. (2015).
Integrative analysis to select cancer candidate biomarkers to targeted validation.
Oncotarget, 6(41), 43635—43652.

Gallery

Data and Results

(A, B, E) Intensities of proteins in each sample; (B, E) after feature selection. (C) Venn diagram and (D) set similarity compare the three ranking approaches.

Sample similarities

Considering all features (A) and selected features (B, C, D).

SO WHAT YOU THINK ?

Let me know if I can help you in any way.

Contact with me