SoFunction
Updated on 2024-11-21

Python take read csv file do dbscan analysis

1. Read csv data to do dbscan analysis

Read the corresponding columns in the csv file, and then transformed, processed for the format required by this algorithm, and then dbscan operation, the current public code is also more, this paper according to the public code modification, the

The specific code is as follows:

from sklearn import datasets
import numpy as np
import random
import  as plt
import time
import copy
import pandas as pd
# from  import load_iris
 
def find_neighbor(j, x, eps):
    N = list()
    for i in range([0]):
        temp = (((x[j] - x[i])))  # Calculate the Euclidean distance
        if temp <= eps:
            (i)
    return set(N)
 
 
def DBSCAN(X, eps, min_Pts):
    k = -1
    neighbor_list = []  # Used to keep the neighborhood of each data
    omega_list = []  # Core set of objects
    gama = set([x for x in range(len(X))])  # Initially mark all points as unvisited
    cluster = [-1 for _ in range(len(X))]  # Clustering
    for i in range(len(X)):
        neighbor_list.append(find_neighbor(i, X, eps))
        if len(neighbor_list[-1]) >= min_Pts:
            omega_list.append(i)  # Add samples to the core object set
    omega_list = set(omega_list)  # Converted to collections for ease of manipulation
    while len(omega_list) > 0:
        gama_old = (gama)
        j = (list(omega_list))  # A randomly selected core object
        k = k + 1
        Q = list()
        (j)
        (j)
        while len(Q) > 0:
            q = Q[0]
            (q)
            if len(neighbor_list[q]) >= min_Pts:
                delta = neighbor_list[q] & gama
                deltalist = list(delta)
                for i in range(len(delta)):
                    (deltalist[i])
                    gama = gama - delta
        Ck = gama_old - gama
        Cklist = list(Ck)
        for i in range(len(Ck)):
            cluster[Cklist[i]] = k
        omega_list = omega_list - Ck
    return cluster
 
# X = load_iris().data
data = pd.read_csv("")
x,y=data['Time (sec)'],data['Height (m HAE)']
print(type(x))
n=len(x)
x=(x)
x=(n,1)
y=(y)
y=(n,1)
X = ((x, y))
cluster_std=[[.1]], random_state=9)
 
eps = 0.08
min_Pts = 5
begin = ()
C = DBSCAN(X, eps, min_Pts)
end = ()
()
(X[:, 0], X[:, 1], c=C)
()

2. Output result display

Modify the parameter display:

eps = 0.8
min_Pts = 5

3. Computational efficiency

When using a small amount of data to calculate the efficiency problem is not obvious, with the increase in the amount of data, the problem of computational efficiency becomes particularly obvious, it is difficult to meet the needs of a large number of data calculations, the later will try to find ways to optimize the calculation method or collection of C + + + code to optimize it.

to this article on Python take read csv file to do dbscan analysis of the article is introduced to this, more related Python dbscan analysis content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!