AI Gradient Descent Boosting Classification(feat. csv파일)

AI/Supervised Learning

AI Gradient Descent Boosting Classification(feat. csv파일)

공기반 코딩반 2022. 6. 8. 01:30

Q.Universal Bank 데이터를 가지고 해당 고객에게 대출의 허용여부를 결정

먼저 필요한 라이브러리 import

import os

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

UniversalBank Data 가져오기

dataBank = pd.read_csv("C:/AI_2022/Untitled Folder/UniversalBank.csv", encoding = 'utf-8')

dataBank

--> 5000개의 데이터를 가져온 것을 볼 수 있다.

# 1. 불필요하다고 판단되는 변수 제거

dataBank = dataBank.drop(['ID', 'ZIP Code', 'Online'], axis = 1)

dataBank

--> ID, ZIP Code, Online 변수를 제외시켰다(대출 유무를 판단하는데 기준에 필요 없다고 결론).

# 2. 더미 변수: Education을, 1,2,3각각 구별

더미변수 ==> 문자변수를 숫자형 변수로(0, 1로 표현)

pd.get_dummies(dataBank, columns = ['Education'])

1, 2, 3은 각각 대졸, 대학원졸, 전문직 이다.

이를 각각 1 0 0 / 0 1 0 / 0 0 1로 표현한 것이다.

# 3. y값 시각화

# X에서 Personal Loan 값 빼기

X = dataBank.drop(['Personal Loan'], axis=1)

Y = dataBank['Personal Loan']

np.unique(Y,return_counts=True)

sns.countplot(x=Y)

# 5. Gradient Descent Boosting 선택

from sklearn.ensemble import GradientBoostingClassifier as GDB

# 6. train-test 0.25

from sklearn.model_selection import train_test_split

tr_x,ts_x, tr_y, ts_y = train_test_split(X,Y, test_size = 0.25 ,random_state = 48)

gdb = GDB(random_state=1)

gdb.fit(tr_x, tr_y)

- ts_y와 pred_y비교

ts_y

pred_y = gdb.predict(ts_x)

# 7. 검증: f1, recall, accuracy 지표 활용

acc = accuracy_score(y_true=ts_y, y_pred=pred_y)

recall = recall_score(y_true=ts_y, y_pred=pred_y,pos_label=0)

f1 = f1_score(y_true=ts_y, y_pred=pred_y,pos_label=0)

결과

print('acc={:.3f}, recall={:.3f}, f1={:.3f}'.format(acc,recall,f1))

acc=0.987, recall=0.995, f1=0.993

==> 정확도가 꽤 높다

- confusion matrix

print(confusion_matrix(y_true=ts_y, y_pred=pred_y,labels=[0,1]))

[[1121    6]
 [  10  113]]

(예측0, 실제0): 1121 / (예측1, 실제1): 113

0: 대출불가 1: 대출가능

현재글AI Gradient Descent Boosting Classification(feat. csv파일)

다시바람

오늘도 배워가기

dataanalysis, dataframe, lambda, error, pandas, gitlab, sqs, Python, gitlab-ci.yml, numpy, nodejs, docker, Serverless, VPC, postman, Javascript, 테라폼, terraform, AWS, EC2,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

다시바람

AI Gradient Descent Boosting Classification(feat. csv파일)

'AI/Supervised Learning'의 다른글

티스토리툴바