1. Read csv data to do dbscan analysis
Read the corresponding columns in the csv file, and then transformed, processed for the format required by this algorithm, and then dbscan operation, the current public code is also more, this paper according to the public code modification, the
The specific code is as follows:
from sklearn import datasets import numpy as np import random import as plt import time import copy import pandas as pd # from import load_iris def find_neighbor(j, x, eps): N = list() for i in range([0]): temp = (((x[j] - x[i]))) # Calculate the Euclidean distance if temp <= eps: (i) return set(N) def DBSCAN(X, eps, min_Pts): k = -1 neighbor_list = [] # Used to keep the neighborhood of each data omega_list = [] # Core set of objects gama = set([x for x in range(len(X))]) # Initially mark all points as unvisited cluster = [-1 for _ in range(len(X))] # Clustering for i in range(len(X)): neighbor_list.append(find_neighbor(i, X, eps)) if len(neighbor_list[-1]) >= min_Pts: omega_list.append(i) # Add samples to the core object set omega_list = set(omega_list) # Converted to collections for ease of manipulation while len(omega_list) > 0: gama_old = (gama) j = (list(omega_list)) # A randomly selected core object k = k + 1 Q = list() (j) (j) while len(Q) > 0: q = Q[0] (q) if len(neighbor_list[q]) >= min_Pts: delta = neighbor_list[q] & gama deltalist = list(delta) for i in range(len(delta)): (deltalist[i]) gama = gama - delta Ck = gama_old - gama Cklist = list(Ck) for i in range(len(Ck)): cluster[Cklist[i]] = k omega_list = omega_list - Ck return cluster # X = load_iris().data data = pd.read_csv("") x,y=data['Time (sec)'],data['Height (m HAE)'] print(type(x)) n=len(x) x=(x) x=(n,1) y=(y) y=(n,1) X = ((x, y)) cluster_std=[[.1]], random_state=9) eps = 0.08 min_Pts = 5 begin = () C = DBSCAN(X, eps, min_Pts) end = () () (X[:, 0], X[:, 1], c=C) ()
2. Output result display
Modify the parameter display:
eps = 0.8 min_Pts = 5
3. Computational efficiency
When using a small amount of data to calculate the efficiency problem is not obvious, with the increase in the amount of data, the problem of computational efficiency becomes particularly obvious, it is difficult to meet the needs of a large number of data calculations, the later will try to find ways to optimize the calculation method or collection of C + + + code to optimize it.
to this article on Python take read csv file to do dbscan analysis of the article is introduced to this, more related Python dbscan analysis content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!