Operations
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()
'''
col1 col2 col3
0 1 444 abc
1 2 555 def
2 3 666 ghi
3 4 444 xyz
'''
Info on Unique Values
df['col2'].unique() # 유일한 값 찾기 (중복 제거)
# array([444, 555, 666])
df['col2'].nunique() # 유일한 값 인덱스 길이 반환. len(unique())와 같다.
# 3
df['col2'].value_counts() # 유일한 값 개수 세기
'''
444 2
555 1
666 1
Name: col2, dtype: int64
'''
Selecting Data
#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1']>2) & (df['col2']==444)]
'''
col1 col2 col3
3 4 444 xyz
'''
Applying Functions
def times2(x):
return x*2
df['col1'].apply(times2)
df['col3'].apply(len)
df['col1'].sum()
열 삭제: del df[ ]
del df['col1']
Get column and index names: .colums .index
df.columns # Index(['col2', 'col3'], dtype='object')
df.index # RangeIndex(start=0, stop=4, step=1)
Sorting and Ordering a DataFrame: .sort_values()
df.sort_values(by='col2') #inplace=False by default
'''
col2 col3
0 444 abc
3 444 xyz
1 555 def
2 666 ghi
'''
Find Null Values or Check for Null Values:
df.isnull()
# Drop rows with NaN Values
df.dropna()
Filling in NaN values with something else:
import numpy as np
df = pd.DataFrame({'col1':[1,2,3,np.nan],
'col2':[np.nan,555,666,444],
'col3':['abc','def','ghi','xyz']})
df.head() # 데이터의 상위 5개의 행을 출력
df.fillna('FILL') # NAN에 다음 값('FILL')을 채우기
data = {'A':['foo','foo','foo','bar','bar','bar'],
'B':['one','one','two','two','one','one'],
'C':['x','y','x','y','x','y'],
'D':[1,3,2,5,4,1]}
df = pd.DataFrame(data)
df.pivot_table(values='D',index=['A', 'B'],columns=['C']) # 피벗 테이블
Great Job!
'Python > Numpy & Pandas' 카테고리의 다른 글
--- (0) | 2020.12.02 |
---|---|
07-Data Input and Output (0) | 2020.12.01 |
05-Merging, Joining, and Concatenating (0) | 2020.12.01 |
04-Groupby (0) | 2020.12.01 |
03-Missing Data (0) | 2020.12.01 |