Pandas 병합하기 (Merge)¶

pandas는 Series와 DataFrame 객체를 쉽게 결합하기 위한 다양한 기능을 제공합니다.

◼︎ Table of Contents

1) DataFrame 만들기
2) 연결하기
3) 결합하기
4) 추가하기

1) DataFrame 만들기¶

예제¶

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(10, 4))

print(df)

          0         1         2         3
1.764052  0.400157  0.978738  2.240893
1.867558 -0.977278  0.950088 -0.151357
-0.103219  0.410599  0.144044  1.454274
0.761038  0.121675  0.443863  0.333674
1.494079 -0.205158  0.313068 -0.854096
-2.552990  0.653619  0.864436 -0.742165
2.269755 -1.454366  0.045759 -0.187184
1.532779  1.469359  0.154947  0.378163
-0.887786 -1.980796 -0.347912  0.156349
1.230291  1.202380 -0.387327 -0.302303

간단한 DataFrame 객체를 하나 만들었습니다.

2) 연결하기¶

예제¶

pieces = [df[:3], df[3:7], df[7:]]

print(pd.concat(pieces))

          0         1         2         3
1.764052  0.400157  0.978738  2.240893
1.867558 -0.977278  0.950088 -0.151357
-0.103219  0.410599  0.144044  1.454274
0.761038  0.121675  0.443863  0.333674
1.494079 -0.205158  0.313068 -0.854096
-2.552990  0.653619  0.864436 -0.742165
2.269755 -1.454366  0.045759 -0.187184
1.532779  1.469359  0.154947  0.378163
-0.887786 -1.980796 -0.347912  0.156349
1.230291  1.202380 -0.387327 -0.302303

concat()를 이용해서 각각의 pandas 객체를 연결합니다.

3) 결합하기¶

예제1¶

df_left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
df_right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})

print(df_left)
print(df_right)
print(pd.merge(df_left, df_right, on='key'))

   key  lval
foo     1
foo     2
   key  rval
foo     4
foo     5
   key  lval  rval
foo     1     4
foo     1     5
foo     2     4
foo     2     5

merge()를 사용해서 두 개의 DataFrame을 결합했습니다.

예제2¶

df_left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
df_right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})

print(df_left)
print(df_right)
print(pd.merge(df_left, df_right, on='key'))

   key  lval
0  foo     1
1  bar     2
   key  rval
0  foo     4
1  bar     5
   key  lval  rval
0  foo     1     4
1  bar     2     5

(Database style joining section 참고)

4) 추가하기¶

예제¶

df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
print(df)

s = df.iloc[3]
print(df.append(s, ignore_index=True))

          A         B         C         D
0.813101 -0.229251  2.161717 -0.956931
0.067311  0.206499 -0.456881 -1.059976
0.614957  1.429661 -0.211952 -0.080337
0.405398  0.118607  1.254414  1.419102
-0.743856 -2.517437 -1.507096  1.149076
-1.193578  1.141042  1.509445  1.067775
-0.686589  0.014873 -0.375666 -0.038224
0.367974 -0.044724 -0.302375 -2.224404
          A         B         C         D
0.813101 -0.229251  2.161717 -0.956931
0.067311  0.206499 -0.456881 -1.059976
0.614957  1.429661 -0.211952 -0.080337
0.405398  0.118607  1.254414  1.419102
-0.743856 -2.517437 -1.507096  1.149076
-1.193578  1.141042  1.509445  1.067775
-0.686589  0.014873 -0.375666 -0.038224
0.367974 -0.044724 -0.302375 -2.224404
0.405398  0.118607  1.254414  1.419102

append()를 사용하면 DataFrame에 행을 추가할 수 있습니다.

네 번째 행을 DataFrame의 맨 아래에 한 번 더 추가했습니다.

(Appending section 참고)

이전글 : Pandas 연산 (Operations)

다음글 : Pandas 그룹 (Grouping)