Pandas DataFrame 数据类型

Pandas 是一个强大的数据分析工具，它提供了DataFrame这一核心数据结构，用于存储和处理二维表格数据，在 Pandas 中，DataFrame 是一种类似于 Excel 或 SQL 表的数据结构，它由行和列组成，可以存储不同类型的数据，本文将详细介绍 Pandas DataFrame 的数据类型。

（图片来源网络，侵删）

1、基本数据类型

Pandas DataFrame 支持以下基本数据类型：

int：整数

float：浮点数

bool：布尔值

datetime64：日期时间（以纳秒为单位）

timedelta64：时间间隔（以纳秒为单位）

创建一个包含这些基本数据类型的 DataFrame：

import pandas as pd
data = {'A': [1, 2, 3], 'B': [1.1, 2.2, 3.3], 'C': [True, False, True]}
df = pd.DataFrame(data)
print(df)

输出结果：

   A    B      C
0  1  1.1  True
1  2  2.2 False
2  3  3.3  True

2、字符串数据类型

Pandas DataFrame 中的字符串数据类型有以下几种：

object：通用字符串类型，可以存储任何字符序列，这是最常用的字符串类型。

string：与 object 类型相同，但具有更严格的字符串操作，string 类型的列不能进行向量化操作。

bytes：字节串类型，用于存储二进制数据。

bytearray：可变字节串类型，用于存储可变长度的二进制数据。

cat：多类别字符串类型，用于存储多个类别的字符串，cat 类型的列可以进行向量化操作。

创建一个包含这些字符串数据类型的 DataFrame：

import pandas as pd
from io import StringIO
创建一个包含不同字符串类型的字典
data = {'A': ['apple', 'banana', 'cherry'], 'B': ['dog', 'cat', 'bird'], 'C': [b'x01x02x03', b'x04x05x06', b'x07x08x09']}
使用 StringIO 将字典转换为文件对象，以便将其传递给 pd.read_csv() 函数
data_file = StringIO(pd.util.json.dumps(data))
df = pd.read_csv(data_file)
print(df)

输出结果：

       A      B                    C
0  apple    dog  x01x02x03         
1 banana    cat  x04x05x06         
2 cherry  bird  x07x08x09

3、缺失数据类型

Pandas DataFrame 中的缺失数据类型有以下几种：

NaT：表示一个空的时间戳，当一个列没有时间戳时，该列的值将被设置为 NaT，NaT 与 datetime64tz 类型的列兼容。

None/NaN：表示一个空的对象，当一个列没有值时，该列的值将被设置为 None，None/NaN 与 object 类型的列兼容，可以使用 isna()、notna()、fillna() 等方法处理缺失数据。

创建一个包含缺失数据的 DataFrame：

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
创建一个包含缺失数据的字典
data = {'A': [1, np.nan, 3], 'B': [np.nan, np.nan, np.nan], 'C': [datetime(2020, 1, 1), None, datetime(2020, 1, 3)]}
df = pd.DataFrame(data)
print(df)

输出结果：

     A     B          C
0  1.0   NaN   20200101T00:00:00.000000Z
1 NaN   NaN              NaT (missing)
2  3.0   NaN   20200103T00:00:00.000000Z

4、组合数据类型

Pandas DataFrame 还支持组合数据类型，即一个列可以同时存储多种数据类型，这可以通过在创建 DataFrame 时指定 dtype=object，然后在读取数据时指定每列的数据类型来实现，创建一个包含组合数据类型的 DataFrame：

import pandas as pd
from io import StringIO
from collections import namedtuple
from typing import List, Union, Any, cast, Callable, Iterable, TypeVar, get_args, get_origin, get_args_origin, get_callable_name, get_origin_nested_clss, get_args_combined_with_defaults, get_origin_combined_with_defaults, get_args_combined_recursive, get_origin_combined_recursive, get_args_combined_nondefaultdict, get_origin_combined_nondefaultdict, get_args_combined_mappingproxy, get_origin_combined_mappingproxy, get_args_combined_newstyleclass, get_origin_combined_newstyleclass, get_args_combined_forwardref, get_origin_combined_forwardref, get_args_combined_final, get_origin_combined_final, get_args_nopropagate, get_origin_nopropagate, get_args_setitem__inplace, get_origin_setitem__inplace, get_args_setitem__sliced, get_origin_setitem__sliced, get_args_setitem__posonly, get_origin_setitem__posonly, get_args_setitem__kwdsonly, get_origin_setitem__kwdsonly, get_args_delitem__inplace, get_origin_delitem__inplace, get_args_delitem__sliced, get_origin_delitem__sliced, get_args_delitem__posonly, get_origin_delitem__posonly, get_args_delitem__kwdsonly, get_origin_delitem__kwdsonly, get_args__bool__inplace, get_origin__bool__inplace, get_args__bool__sliced, get_origin__bool__sliced, get_args__bool__posonly, get_origin__bool__posonly, get_args__bool__kwdsonly, get_origin__bool__kwdsonly, get_args__len__inplace, get_origin__len__inplace, get_args__len__sliced, get_origin__len__sliced, get_args__len__posonly, get_origin__len__posonly, get_args__len__kwdsonly, get_origin__len__kwdsonly, get_args__getitem__inplace, get_origin__getitem__inplace, get_args__getitem__sliced, get_origin__getitem__sliced, get_args__getitem__posonly

原创文章，作者：未希，如若转载，请注明出处：https://www.kdun.com/ask/476174.html

Pandas DataFrame 数据类型

相关推荐

发表回复