Welcome to the Diamond Price Prediction project. This project focuses on predicting diamond prices based on a dataset of gemstone characteristics. Accurate diamond price predictions can be invaluable for both buyers and sellers in the jewelry market. This project utilizes machine learning techniques to develop a predictive model and provides a web application for users to make price predictions.
- Data
- Exploratory Data Analysis (EDA)
- Model Training
- Web Application
- Project Structure
- Setup
- Usage
- Contributing
- License
This project utilizes several datasets related to gemstone characteristics:
-
gemstone.csv: The primary raw dataset containing a comprehensive set of gemstone characteristics, including carat weight, cut, color, clarity, depth, table, price, and more. -
raw.csv: This dataset is a slightly modified version ofgemstone.csvand can be used for reference or alternative analysis. -
test.csv: A dataset designed for testing the trained prediction model. Users can input gemstone features, and the model will provide price predictions. -
train.csv: This dataset is used for training the machine learning model. It contains a subset of the gemstone data with corresponding price labels.
This dataset contains information about diamonds, including their price, weight, cut quality, color, clarity, and physical dimensions. Here's a breakdown of the dataset columns:
-
price: Price of the diamond in US dollars ($326 - $18,823).
-
carat: Weight of the diamond in carats (0.2 - 5.01).
-
cut: Quality of the cut (Fair, Good, Very Good, Premium, Ideal).
-
color: Diamond color, rated from J (worst) to D (best).
-
clarity: Measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)).
-
x: Length of the diamond in millimeters (0 - 10.74).
-
y: Width of the diamond in millimeters (0 - 58.9).
-
z: Depth of the diamond in millimeters (0 - 31.8).
-
depth: Total depth percentage, calculated as z / mean(x, y) or 2 * z / (x + y) (43 - 79).
-
table: Width of the top of the diamond relative to its widest point (43 - 95).
Explore and analyze the gemstone dataset in EDA.ipynb. This Jupyter Notebook provides insights into the data, including visualizations, summary statistics, and preprocessing steps. It helps prepare the dataset for model training.
In Model Training.ipynb, you'll find the detailed process of training the diamond price prediction model. This Jupyter Notebook covers various steps, such as data preprocessing, feature engineering, model selection, and evaluation. The model is saved as model.pkl, and the data preprocessing pipeline is saved as preprocessor.pkl within the artifacts/ directory.
-
model.pkl: This is the trained prediction model. It can be used for making predictions on new data. -
preprocessor.pkl: The data preprocessing pipeline used to transform input data before making predictions with the model.
The Diamond Price Prediction project includes a web application that allows users to interact with the trained model. The application is implemented in Python using Flask and incorporates machine learning pipelines for real-time predictions.
-
src/contains the source code for the web application.-
components/: This directory houses custom components used in the web app. -
pipeline/: It contains the data processing and prediction pipeline used by the web application. -
templates/: HTML templates used for rendering the user interface of the web app.
-
-
application.pyis the main Python file for running the web application. To use the web app, simply run this script.
The project directory structure is as follows:
-
src/: Contains the source code for the project. -
Document/: Documentation related to the project. -
.gitignore: Specifies which files should be ignored by Git. -
README.md: You are currently reading this file. -
git/: Contains Git-related files, such as hooks or configuration. -
requirements.txt: Lists the Python dependencies required to run the project. You can install them usingpip install -r requirements.txt. -
setup.py: Handles project setup and packaging configuration.
To set up and run the Diamond Price Prediction project, follow these steps:
-
Clone the repository:
$ git clone https://github.com/yourusername/Diamond-Price-Prediction.git -
Navigate to the project directory:
$ cd Diamond-Price-Prediction -
Install project dependencies:
$ pip install -r requirements.txt -
Run the web application:
$ python application.py
The web application should be accessible at http://localhost:5000 in your web browser.
To use the Diamond Price Prediction web application:
-
Access the web application by running
python application.pyas mentioned in the setup section. -
Fill out the gemstone characteristics in the web form.
-
Click the "Predict" button to obtain a diamond price prediction based on the trained model.
Feel free to explore the Jupyter notebooks for EDA and model training for a deeper understanding of the project.
We welcome contributions to this project. If you'd like to contribute, please follow these guidelines:
-
Submit bug reports or feature requests through the GitHub issue tracker.
-
If you'd like to contribute code, fork the repository, make your changes, and submit a pull request.
-
Please adhere to our code of conduct when participating in discussions or contributing to the project.
This project is licensed under the MIT License. For details, please refer to the LICENSE.md file.


