Get started with ML using Python and scikit-learn.
Learn the fundamentals of machine learning with Python and scikit-learn.
Machine learning (ML) enables systems to learn from data. This tutorial uses scikit-learn, a popular Python library, to build a simple ML model.
Install scikit-learn and dependencies:
pip install scikit-learn numpy pandas
ML models require clean data. Use pandas to load and preprocess a dataset (e.g., iris dataset):
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
Split the data into training and testing sets:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Use a decision tree classifier to predict iris species:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
Check the model’s accuracy:
from sklearn.metrics import accuracy_score
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Combine the steps into a complete script:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load data
iris = load_iris()
X = iris.data
y = iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
This script loads the iris dataset, trains a decision tree model, and evaluates its accuracy.
Try other algorithms like logistic regression or SVM, and experiment with datasets from Kaggle.