|
|
|
|
|
General Details |
| Title: |
Bioliterature Mining (reposted) (#3505) |
| Category: |
Computer Technicians, Other, Website Developers |
| Inquirer: |
kenny8684 |
|
Initial Price: |
N/A |
| Description: |
Apply various classification techniques to the following two data sets
(a) Positive Data (all_pos_data1.txt). Negative Data (all_neg_data1.txt).
(b) Positive Data (all_pos_data2.txt). Negative Data (all_neg_data2.txt).
Compare and explain the classification results (k-fold cross validation).
Data Format:
(a) Each row is a document.
(b) The 1st column is the PubMed ID (i.e., the document ID), The last column is the class label. The rest columns are features
You can see the below. I have four test files. You must to analysis each method that bayesian, LDA...etc. on the methods list. These methods may not use test files but some methods may use the test files. You need to write a report that the test files why or why not use this method (reason). If some methods may execute the test file, you need to compare and explain the classification results (k-fold cross validation).
Please separate analysis every method. Don’t mix on the report. I will support my notes.
Methods list:
Introduction to Statistical Machine Learning (PDF)
Review of Probability and Statistics (PDF)
Bayesian Decision Theory (PDF)
Principal Component Analysis (PDF)
Linear Discriminant Analysis (PDF)
Maximum Likelihood Parameter Estimation (PDF1) (PDF2)
Bayesian Parameter Estimation (PDF)
MCMC (PDF)
Support Vector Machines (PDF)
Hidden Markov Models (PDF).
Bayesian Networks
Boosting
Boostrap Inference |
| Service Request Expires: |
Question closed |
| End Time: |
2006-05-23 05:29:48 Started:
2006-05-09 05:29:48
|
| Geography Constraints: |
No Location Requirement |
| |
|
|