[Pandas] Apply, Map Practice

AI/Data Science

[Pandas] Apply, Map Practice

Linuxias 2023. 4. 2. 11:46

이 예제는 https://www.datamanim.com/dataset/99_pandas/pandasMain.html#apply-map 를 풀이한 예제입니다.

Import library

import pandas as pd

Load Data

df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv',index_col=0)

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10127 entries, 0 to 10126
Data columns (total 18 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CLIENTNUM                 10127 non-null  int64  
 1   Attrition_Flag            10127 non-null  object 
 2   Customer_Age              10127 non-null  int64  
 3   Gender                    10127 non-null  object 
 4   Dependent_count           10127 non-null  int64  
 5   Education_Level           10127 non-null  object 
 6   Marital_Status            10127 non-null  object 
 7   Income_Category           10127 non-null  object 
 8   Card_Category             10127 non-null  object 
 9   Months_on_book            10127 non-null  int64  
 10  Total_Relationship_Count  10127 non-null  int64  
 11  Months_Inactive_12_mon    10127 non-null  int64  
 12  Contacts_Count_12_mon     10127 non-null  int64  
 13  Credit_Limit              10127 non-null  float64
 14  Total_Revolving_Bal       10127 non-null  int64  
 15  Avg_Open_To_Buy           10127 non-null  float64
 16  Total_Amt_Chng_Q4_Q1      10127 non-null  float64
 17  Total_Trans_Amt           10127 non-null  int64  
dtypes: float64(3), int64(9), object(6)
memory usage: 1.5+ MB

df.describe()

	CLIENTNUM	Customer_Age	Dependent_count	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt
count	1.012700e+04	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000	10127.000000
mean	7.391776e+08	46.325960	2.346203	35.928409	3.812580	2.341167	2.455317	8631.953698	1162.814061	7469.139637	0.759941	4404.086304
std	3.690378e+07	8.016814	1.298908	7.986416	1.554408	1.010622	1.106225	9088.776650	814.987335	9090.685324	0.219207	3397.129254
min	7.080821e+08	26.000000	0.000000	13.000000	1.000000	0.000000	0.000000	1438.300000	0.000000	3.000000	0.000000	510.000000
25%	7.130368e+08	41.000000	1.000000	31.000000	3.000000	2.000000	2.000000	2555.000000	359.000000	1324.500000	0.631000	2155.500000
50%	7.179264e+08	46.000000	2.000000	36.000000	4.000000	2.000000	2.000000	4549.000000	1276.000000	3474.000000	0.736000	3899.000000
75%	7.731435e+08	52.000000	3.000000	40.000000	5.000000	3.000000	3.000000	11067.500000	1784.000000	9859.000000	0.859000	4741.000000
max	8.283431e+08	73.000000	5.000000	56.000000	6.000000	6.000000	6.000000	34516.000000	2517.000000	34516.000000	3.397000	18484.000000

Q) Income_Category의 카테고리를 map 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라

Unknown : N
Less than $40K : a
$40K - $60K : b
$60K - $80K : c
$80K - $120K : d
$120K + : e

df.Income_Category.unique()

array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
       '$120K +', 'Unknown'], dtype=object)

df['newIncome'] = df.Income_Category.map({
    'Unknown'        : 'N',
    'Less than $40K' : 'a',
    '$40K - $60K'    : 'b',
    '$60K - $80K'    : 'c',
    '$80K - $120K'   : 'd',
    '$120K +'        : 'e'  })

df.newIncome.unique()

array(['c', 'a', 'd', 'b', 'e', 'N'], dtype=object)

Q) Income_Category의 카테고리를 apply 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라

Unknown : N
Less than $40K : a
$40K - $60K : b
$60K - $80K : c
$80K - $120K : d
$120K +’ : e

def changeCategory(x):
    if x == 'Unknown':
        return 'N'
    elif x == 'Less than $40K':
        return 'a'
    elif x == '$40K - $60K':
        return 'b'
    elif x == '$60K - $80K':
        return 'c'
    elif x == '$80K - $120K':
        return 'd'
    else:
        return 'e'

df['newIncome'] = df.Income_Category.apply(changeCategory)
df.newIncome.unique()

array(['c', 'a', 'd', 'b', 'e', 'N'], dtype=object)

Q)Customer_Age의 값을 이용하여 나이 구간을 AgeState 컬럼으로 정의하라.

(0~9 : 0 , 10~19 : 10 , 20~29 :20 … 각 구간의 빈도수를 출력하라

df['AgeState'] = df.Customer_Age.apply(lambda x : x // 10 * 10)
df.AgeState.value_counts()

40    4561
50    2998
30    1841
60     530
20     195
70       2
Name: AgeState, dtype: int64

Q) Education_Level의 값중 Graduate단어가 포함되는 값은 1 그렇지 않은 경우에는 0으로 변경하여 newEduLevel 컬럼을 정의하고 빈도수를 출력하라

df['newEduLevel'] = df.Education_Level.map(lambda x : 1 if 'Graduate' in x else 0)
df.newEduLevel.value_counts()

0    6483
1    3644
Name: newEduLevel, dtype: int64

Q) Credit_Limit 컬럼값이 4500 이상인 경우 1 그외의 경우에는 모두 0으로 하는 newLimit 정의하라. newLimit 각 값들의 빈도수를 출력하라

df['newLimit'] = df.Credit_Limit.map(lambda x : 1 if x >= 4500 else 0)
df.newLimit.value_counts()

1    5096
0    5031
Name: newLimit, dtype: int64

Q) Marital_Status 컬럼값이 Married 이고 Card_Category 컬럼의 값이 Platinum인 경우 1 그외의 경우에는 모두 0으로 하는 newState컬럼을 정의하라. newState의 각 값들의 빈도수를 출력하라

def checkCondition(x):
    if x.Marital_Status == 'Married' and x.Card_Category == 'Platinum':
        return 1
    return 0

df['newState'] = df.apply(checkCondition, axis = 1)
df.newState.value_counts()

0    10120
1        7
Name: newState, dtype: int64

df['newState'] = df.apply(lambda x : 1 if x.Marital_Status == 'Married' and x.Card_Category == 'Platinum' else 0, axis =1)
df.newState.value_counts()

0    10120
1        7
Name: newState, dtype: int64

Q) Gender 컬럼값 M인 경우 male F인 경우 female로 값을 변경하여 Gender 컬럼에 새롭게 정의하라. 각 value의 빈도를 출력하라

df.Gender.value_counts()

F    5358
M    4769
Name: Gender, dtype: int64

df['Gender'] = df.Gender.replace({'M':'male', 'F':'female'})

df.Gender.value_counts()

female    5358
male      4769
Name: Gender, dtype: int64

저작자표시 비영리