当前位置：首页 > 行业动态 > 正文

describe在python中的用法

admin
行业动态
2024-03-01
1

在Python中，describe并不是一个内置函数或关键字，但在某些库（如pandas）中，describe是一个用于快速获取数据集统计信息的函数，这里，我们将主要讨论pandas库中的describe函数的用法。

我们需要安装pandas库，可以通过以下命令安装：

pip install pandas

接下来，我们将详细介绍describe函数在pandas中的用法。

1、导入pandas库

在使用describe函数之前，我们需要先导入pandas库，并创建一个DataFrame对象。

import pandas as pd
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
}
df = pd.DataFrame(data)

2、使用describe函数

describe函数可以对DataFrame对象进行统计分析，返回一个新的DataFrame对象，包含以下统计信息：

count：非空值的数量

mean：平均值

std：标准差

min：最小值

25%：第一四分位数（25%分位数）

50%：中位数（50%分位数）

75%：第三四分位数（75%分位数）

max：最大值

statistics = df.describe()
print(statistics)

输出结果如下：

 A          B          C
count  5.000000   5.000000   5.000000
mean   3.000000  30.000000  300.000000
std    1.581139  15.811388  158.113883
min    1.000000  10.000000  100.000000
25%    2.000000  20.000000  200.000000
50%    3.000000  30.000000  300.000000
75%    4.000000  40.000000  400.000000
max    5.000000  50.000000  500.000000

3、自定义describe函数的统计信息

describe函数还允许我们自定义需要计算的统计信息，我们可以只计算平均值和标准差：

statistics = df.describe(percentiles=[.5, .75], include='all')
print(statistics)

输出结果如下：

 A     B      C
count  5.000000  5.0  5.000000
mean   3.000000  30.0  300.000000
std    1.581139  15.811388  158.113883
50%    3.000000  30.0  300.000000
75%    4.000000  40.0  400.000000

4、对特定列应用describe函数

如果我们只想对DataFrame中的特定列应用describe函数，可以使用以下方法：

statistics = df[['A', 'B']].describe()
print(statistics)

输出结果如下：

 A           B
count  5.000000   5.000000
mean   3.000000  30.000000
std    1.581139  15.811388
min    1.000000  10.000000
25%    2.000000  20.000000
50%    3.000000  30.000000
75%    4.000000  40.000000
max    5.000000  50.000000

总结一下，describe函数是pandas库中的一个非常实用的函数，可以帮助我们快速获取数据集的统计信息，通过本文的介绍，相信大家已经掌握了describe函数的基本用法，可以在实际项目中灵活运用。