## Description

Project: Solve a Real Data Mining Problem

In this project, you will practice what you learn in class to solve a real-world data mining problem.

You can choose any problem that you are interested in as long as it can be formulated as a data

mining task. This project is a team project. Each team should not have more than two members.

Complete the following tasks:

1. Pick a real-world application that data mining may help.

2. Formulate it as a data mining problem (clustering, classification, pattern mining, anomaly

detection, recommendation, or a combination of these tasks).

3. Collect relevant datasets. Some possible sources:

• https://archive.ics.uci.edu/ml/datasets.html

• https://kdd.ics.uci.edu/

• https://www.data.gov/

• http://www.kdnuggets.com/datasets/index.html

4. Preprocess the datasets into the format that can be used by data mining algorithms if necessary.

5. Apply your implemented algorithms or any existing package to solve the proposed problem.

6. Discuss the data mining results you obtain and evaluate the results.

7. Prepare for a short report based on the key points of your project. Name it as project.pdf or

project.doc or project.docx

8. Log in any CSE department server and submit your report as follows:

submit_cse469 project.pdf

Your report should include the following components.

• Introduction: What data mining problem you are trying to solve? What impact it will

bring if the problem is solved?

• Formulation: Which data mining task it can be formulated into? What’s the input and the

expected output?

• Datasets: Where do you get the datasets? Give some statistics about the data. How do you

preprocess the data?

• Algorithm: Which data mining algorithm do you apply?

• Experiments: Evaluate the output using an appropriate evaluation metric. Show the

results you get and discuss whether they are meaningful.

• (Optional) Challenges: What challenges do you find in the data? How do you tackle these

challenges?