Machine Learning Basics

Get started with ML using Python and scikit-learn.

Illustration of machine learning concepts

Machine Learning Basics Tutorial

Learn the fundamentals of machine learning with Python and scikit-learn.

1. Introduction to Machine Learning

Machine learning (ML) enables systems to learn from data. This tutorial uses scikit-learn, a popular Python library, to build a simple ML model.

Installing Dependencies

Install scikit-learn and dependencies:

pip install scikit-learn numpy pandas

Data Preprocessing

ML models require clean data. Use pandas to load and preprocess a dataset (e.g., iris dataset):

import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

Split the data into training and testing sets:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Training a Model

Use a decision tree classifier to predict iris species:

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Evaluating the Model

Check the model’s accuracy:

from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Practical Example: Predicting Iris Species

Combine the steps into a complete script:

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

This script loads the iris dataset, trains a decision tree model, and evaluates its accuracy.

Next Steps

Try other algorithms like logistic regression or SVM, and experiment with datasets from Kaggle.