YouTip LogoYouTip

Ml Knn

K Nearest Neighbors (KNN) is a simple and commonly used classification and regression algorithm. KNN belongs to supervised learning, and its core idea is to calculate the distance between the sample to be classified and each sample in the training set, find the K nearest samples, and then predict the category or value of the sample to be classified based on the categories or values of these K samples. ### KNN Basic Principles The basic principles of the KNN algorithm can be summarized into the following steps: 1. **Calculate Distance**: Calculate the distance between the sample to be classified and each sample in the training set. Common distance measurement methods include Euclidean distance, Manhattan distance, etc. 2. **Select K Nearest Neighbors**: Based on the calculated distances, select the K samples with the smallest distances. 3. **Vote or Average**: For classification problems, the category that appears most frequently among the K nearest neighbors is the category of the sample to be classified; for regression problems, the average value of the K nearest neighbors is the value of the sample to be classified. ### KNN Characteristics * **Simple and Easy to Understand**: The principle of the KNN algorithm is very simple and easy to understand and implement. * **No Training Required**: KNN is a "lazy learning" algorithm that does not require an explicit training process; all calculations are performed at prediction time. * **No Assumptions About Data Distribution**: KNN does not make any assumptions about the distribution of data and is suitable for various types of data. * **High Computational Complexity**: Since KNN needs to calculate the distance to all samples at prediction time, the computational complexity can be very high when the dataset is large. ### KNN Algorithm Advantages and Disadvantages **Advantages** * **Simple and Easy to Use**: The principle of the KNN algorithm is simple and easy to understand and implement. * **No Training Required**: KNN does not require an explicit training process; all calculations are performed at prediction time. * **Suitable for Multi-classification Problems**: KNN can easily handle multi-classification problems. **Disadvantages** * **High Computational Complexity**: KNN needs to calculate the distance to all samples at prediction time, which can be very computationally expensive when the dataset is large. * **Sensitive to Noise**: KNN is relatively sensitive to noisy data, and noisy data may affect prediction results. * **Need to Choose Appropriate K Value**: The choice of K value has a significant impact on model performance, and choosing the appropriate K value is a challenge. * * * ## KNN Algorithm Implementation Steps ### 1. Import Necessary Libraries First, we need to import some commonly used Python libraries, such as `numpy` for numerical computation, `matplotlib` for plotting, and `sklearn` for loading datasets and evaluating models. ## Example import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score ### 2. Load Dataset We use the `load_iris` function from `sklearn` to load the classic Iris dataset. This dataset contains 150 samples, each with 4 features, and the goal is to classify the samples into 3 categories. ## Example # Load Iris dataset iris = datasets.load_iris() X = iris.data[:, :2]# Only take the first two features for visualization y = iris.target ### 3. Data Preprocessing Before applying the KNN algorithm, data usually needs to be standardized to ensure that each feature contributes equally to distance calculation. ## Example # Split the dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) ### 4. Train KNN Model Next, we use `KNeighborsClassifier` from `sklearn` to train the KNN model. Here we choose K=3, which means selecting 3 nearest neighbors. ## Example # Create KNN model, set K value to 3 knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) ### 5. Prediction and Evaluation Use the trained model to make predictions on the test set and calculate the model accuracy. ## Example # Make predictions on the test set y_pred = knn.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"KNN Model Accuracy: {accuracy:.4f}") The output is as follows: KNN Model Accuracy: 0.7556 ### 6. Visualize KNN Classification Results To more intuitively understand the classification effect of KNN, we can plot the data points and decision boundaries. Here we use the first two features of the dataset as input features. ## Example import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score # Load Iris dataset iris = datasets.load_iris() X = iris.data[:, :2]# Only take the first two features for visualization y = iris.target # Split the dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create KNN model, set K value to 3 knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) # Make predictions on the test set y_pred = knn.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"KNN Model Accuracy: {accuracy:.4f}") # Plot decision boundaries and data points h =.02# Grid step size x_min, x_max = X[:,0].min() - 1, X[:,0].max() + 1 y_min, y_max = X[:,1].min() - 1, X[:,1].max() + 1 # Create a two-dimensional grid representing different sample spaces xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Use KNN model to predict the category of each point in the grid Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot decision boundaries plt.contourf(xx, yy, Z, alpha=0.8) # Plot training data points plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k', marker='o', s=50) plt.title("KNN Demo") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.show() It is displayed as follows: !(#) ### 7. Adjust K Value The choice of K value has an important impact on model performance. Usually, we select the best K value through cross-validation or visualization methods. ## Example # Try different K values and plot the accuracy changes k_range =range(1,21) accuracies =[] for k in k_range: knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) accuracy = accuracy_score(y_test, y_pred) accuracies.append(accuracy) # Plot the relationship between K value and accuracy plt.plot(k_range, accuracies, marker='o') plt.title("Relationship Between K Value and Accuracy") plt.xlabel("K Value") plt.ylabel("Accuracy") plt.show() ### 8. Use KNN for Regression Tasks KNN can also be used for regression tasks (KNN Regression). In regression tasks, KNN predicts the output by averaging the target values of the K nearest neighbors. ## Example import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsRegressor # Generate sample data X = np.random.rand(100,1) * 10 y = np.sin(X).ravel() + 0.1 * np.random.randn(100) # Split into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create KNN regression model knn_reg = KNeighborsRegressor(n_neighbors=5) # Train the model knn_reg.fit(X_train, y_train) # Make predictions on the test set y_pred = knn_reg.predict(X_test) # Visualize regression results plt.scatter(X_test, y_test, color='red', label='True Values') plt.scatter(X_test, y_pred, color='blue', label='Predicted Values') plt.title("KNN Regression") plt.xlabel("Feature") plt.ylabel("Target") plt.legend() plt.show() Red represents true values, blue represents predicted values: !(#)
← Jsref NowMl Decision Tree β†’