Overview

Dataset statistics

Number of variables23
Number of observations5000
Missing cells12105
Missing cells (%)10.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.6 KiB
Average record size in memory184.0 B

Variable types

Numeric9
Categorical14

Alerts

generation-units has constant value "megawatthours" Constant
gross-generation-units has constant value "megawatthours" Constant
total-consumption-btu-units has constant value "MMBtu" Constant
consumption-for-eg-btu-units has constant value "MMBtu" Constant
plantName has a high cardinality: 1196 distinct values High cardinality
Unnamed: 0 is highly correlated with period and 2 other fieldsHigh correlation
plantCode is highly correlated with period and 2 other fieldsHigh correlation
generation is highly correlated with fuel2002 and 5 other fieldsHigh correlation
gross-generation is highly correlated with generation and 4 other fieldsHigh correlation
total-consumption is highly correlated with fuel2002 and 4 other fieldsHigh correlation
total-consumption-btu is highly correlated with generation and 4 other fieldsHigh correlation
consumption-for-eg is highly correlated with fuel2002 and 5 other fieldsHigh correlation
consumption-for-eg-btu is highly correlated with generation and 3 other fieldsHigh correlation
average-heat-content is highly correlated with fuel2002 and 6 other fieldsHigh correlation
period is highly correlated with Unnamed: 0 and 3 other fieldsHigh correlation
fuel2002 is highly correlated with fuelTypeDescription and 9 other fieldsHigh correlation
fuelTypeDescription is highly correlated with fuel2002 and 6 other fieldsHigh correlation
state is highly correlated with Unnamed: 0 and 9 other fieldsHigh correlation
stateDescription is highly correlated with Unnamed: 0 and 9 other fieldsHigh correlation
primeMover is highly correlated with consumption-for-eg-btu-units and 3 other fieldsHigh correlation
total-consumption-units is highly correlated with fuel2002 and 6 other fieldsHigh correlation
consumption-for-eg-units is highly correlated with fuel2002 and 6 other fieldsHigh correlation
average-heat-content-units is highly correlated with fuel2002 and 6 other fieldsHigh correlation
generation-units is highly correlated with consumption-for-eg-btu-units and 11 other fieldsHigh correlation
gross-generation-units is highly correlated with consumption-for-eg-btu-units and 11 other fieldsHigh correlation
total-consumption-btu-units is highly correlated with consumption-for-eg-btu-units and 11 other fieldsHigh correlation
consumption-for-eg-btu-units is highly correlated with total-consumption-units and 11 other fieldsHigh correlation
total-consumption has 1269 (25.4%) missing values Missing
total-consumption-units has 2326 (46.5%) missing values Missing
consumption-for-eg has 1269 (25.4%) missing values Missing
consumption-for-eg-units has 2326 (46.5%) missing values Missing
average-heat-content has 2589 (51.8%) missing values Missing
average-heat-content-units has 2326 (46.5%) missing values Missing
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
generation has 594 (11.9%) zeros Zeros
gross-generation has 606 (12.1%) zeros Zeros
total-consumption has 1320 (26.4%) zeros Zeros
total-consumption-btu has 597 (11.9%) zeros Zeros
consumption-for-eg has 1343 (26.9%) zeros Zeros
consumption-for-eg-btu has 596 (11.9%) zeros Zeros

Reproduction

Analysis started2022-11-17 22:34:34.934056
Analysis finished2022-11-17 22:34:52.566320
Duration17.63 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2499.5
Minimum0
Maximum4999
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2022-11-17T14:34:52.677954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile249.95
Q11249.75
median2499.5
Q33749.25
95-th percentile4749.05
Maximum4999
Range4999
Interquartile range (IQR)2499.5

Descriptive statistics

Standard deviation1443.520003
Coefficient of variation (CV)0.577523506
Kurtosis-1.2
Mean2499.5
Median Absolute Deviation (MAD)1250
Skewness0
Sum12497500
Variance2083750
MonotonicityStrictly increasing
2022-11-17T14:34:52.842644image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
33301
 
< 0.1%
33371
 
< 0.1%
33361
 
< 0.1%
33351
 
< 0.1%
33341
 
< 0.1%
33331
 
< 0.1%
33321
 
< 0.1%
33311
 
< 0.1%
33291
 
< 0.1%
Other values (4990)4990
99.8%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
49991
< 0.1%
49981
< 0.1%
49971
< 0.1%
49961
< 0.1%
49951
< 0.1%
49941
< 0.1%
49931
< 0.1%
49921
< 0.1%
49911
< 0.1%
49901
< 0.1%

period
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
2001-04
2185 
2001-12
1211 
2001-05
401 
2002-04
353 
2002-05
325 
Other values (5)
525 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters35000
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2001-04
2nd row2001-04
3rd row2001-04
4th row2001-04
5th row2001-04

Common Values

ValueCountFrequency (%)
2001-042185
43.7%
2001-121211
24.2%
2001-05401
 
8.0%
2002-04353
 
7.1%
2002-05325
 
6.5%
2002-11310
 
6.2%
2001-0683
 
1.7%
2003-0283
 
1.7%
2001-0944
 
0.9%
2001-105
 
0.1%

Length

2022-11-17T14:34:52.982096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:53.126064image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2001-042185
43.7%
2001-121211
24.2%
2001-05401
 
8.0%
2002-04353
 
7.1%
2002-05325
 
6.5%
2002-11310
 
6.2%
2001-0683
 
1.7%
2003-0283
 
1.7%
2001-0944
 
0.9%
2001-105
 
0.1%

Most occurring characters

ValueCountFrequency (%)
013479
38.5%
27282
20.8%
15765
16.5%
-5000
 
14.3%
42538
 
7.3%
5726
 
2.1%
683
 
0.2%
383
 
0.2%
944
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number30000
85.7%
Dash Punctuation5000
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
013479
44.9%
27282
24.3%
15765
19.2%
42538
 
8.5%
5726
 
2.4%
683
 
0.3%
383
 
0.3%
944
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
-5000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common35000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
013479
38.5%
27282
20.8%
15765
16.5%
-5000
 
14.3%
42538
 
7.3%
5726
 
2.1%
683
 
0.2%
383
 
0.2%
944
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII35000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
013479
38.5%
27282
20.8%
15765
16.5%
-5000
 
14.3%
42538
 
7.3%
5726
 
2.1%
683
 
0.2%
383
 
0.2%
944
 
0.1%

plantCode
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1173
Distinct (%)23.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31046.0786
Minimum2
Maximum55564
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2022-11-17T14:34:53.291308image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile1048
Q17183
median50095
Q354085
95-th percentile55037
Maximum55564
Range55562
Interquartile range (IQR)46902

Descriptive statistics

Standard deviation23186.19182
Coefficient of variation (CV)0.7468315764
Kurtosis-1.912303771
Mean31046.0786
Median Absolute Deviation (MAD)5258
Skewness-0.1562502582
Sum155230393
Variance537599491.2
MonotonicityNot monotonic
2022-11-17T14:34:53.443374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5442821
 
0.4%
5446419
 
0.4%
5036618
 
0.4%
5409015
 
0.3%
5408715
 
0.3%
5039815
 
0.3%
5215215
 
0.3%
5039515
 
0.3%
5409115
 
0.3%
5047914
 
0.3%
Other values (1163)4838
96.8%
ValueCountFrequency (%)
23
0.1%
32
 
< 0.1%
1703
0.1%
1715
0.1%
1737
0.1%
1743
0.1%
1803
0.1%
1823
0.1%
1873
0.1%
1883
0.1%
ValueCountFrequency (%)
555643
0.1%
555633
0.1%
555623
0.1%
555613
0.1%
555603
0.1%
555573
0.1%
555543
0.1%
555463
0.1%
555453
0.1%
555425
0.1%

plantName
Categorical

HIGH CARDINALITY

Distinct1196
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
Oil Storage
 
30
University of Notre Dame
 
18
Louisiana Mill
 
15
Mansfield Mill
 
15
Nekoosa Mill
 
15
Other values (1191)
4907 

Length

Max length39
Median length28
Mean length16.3592
Min length2

Characters and Unicode

Total characters81796
Distinct characters69
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)0.3%

Sample

1st rowChevak
2nd rowChevak
3rd rowEEK
4th rowEEK
5th rowEEK

Common Values

ValueCountFrequency (%)
Oil Storage30
 
0.6%
University of Notre Dame18
 
0.4%
Louisiana Mill15
 
0.3%
Mansfield Mill15
 
0.3%
Nekoosa Mill15
 
0.3%
Printing & Communication Paper15
 
0.3%
Georgetown Mill15
 
0.3%
International Paper Co Savanna15
 
0.3%
Eagle Point Cogen14
 
0.3%
General Electric Erie PA Power14
 
0.3%
Other values (1186)4834
96.7%

Length

2022-11-17T14:34:53.619951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cogen252
 
2.0%
hydro250
 
1.9%
co226
 
1.8%
energy213
 
1.7%
power200
 
1.6%
mill193
 
1.5%
inc185
 
1.4%
plant136
 
1.1%
recovery130
 
1.0%
corp118
 
0.9%
Other values (1510)10974
85.2%

Most occurring characters

ValueCountFrequency (%)
7911
 
9.7%
e7512
 
9.2%
o5601
 
6.8%
a5533
 
6.8%
r5387
 
6.6%
n5039
 
6.2%
i4249
 
5.2%
t3970
 
4.9%
l3826
 
4.7%
s2977
 
3.6%
Other values (59)29791
36.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter58499
71.5%
Uppercase Letter14885
 
18.2%
Space Separator7911
 
9.7%
Decimal Number368
 
0.4%
Other Punctuation65
 
0.1%
Dash Punctuation40
 
< 0.1%
Open Punctuation14
 
< 0.1%
Close Punctuation14
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e7512
12.8%
o5601
9.6%
a5533
9.5%
r5387
9.2%
n5039
8.6%
i4249
 
7.3%
t3970
 
6.8%
l3826
 
6.5%
s2977
 
5.1%
c1866
 
3.2%
Other values (16)12539
21.4%
Uppercase Letter
ValueCountFrequency (%)
C1908
 
12.8%
P1495
 
10.0%
S1183
 
7.9%
M878
 
5.9%
R849
 
5.7%
L844
 
5.7%
H775
 
5.2%
A665
 
4.5%
G649
 
4.4%
I642
 
4.3%
Other values (16)4997
33.6%
Decimal Number
ValueCountFrequency (%)
168
18.5%
263
17.1%
351
13.9%
544
12.0%
938
10.3%
435
9.5%
826
 
7.1%
618
 
4.9%
016
 
4.3%
79
 
2.4%
Other Punctuation
ValueCountFrequency (%)
&48
73.8%
#11
 
16.9%
'6
 
9.2%
Space Separator
ValueCountFrequency (%)
7911
100.0%
Dash Punctuation
ValueCountFrequency (%)
-40
100.0%
Open Punctuation
ValueCountFrequency (%)
(14
100.0%
Close Punctuation
ValueCountFrequency (%)
)14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin73384
89.7%
Common8412
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e7512
 
10.2%
o5601
 
7.6%
a5533
 
7.5%
r5387
 
7.3%
n5039
 
6.9%
i4249
 
5.8%
t3970
 
5.4%
l3826
 
5.2%
s2977
 
4.1%
C1908
 
2.6%
Other values (42)27382
37.3%
Common
ValueCountFrequency (%)
7911
94.0%
168
 
0.8%
263
 
0.7%
351
 
0.6%
&48
 
0.6%
544
 
0.5%
-40
 
0.5%
938
 
0.5%
435
 
0.4%
826
 
0.3%
Other values (7)88
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII81796
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7911
 
9.7%
e7512
 
9.2%
o5601
 
6.8%
a5533
 
6.8%
r5387
 
6.6%
n5039
 
6.2%
i4249
 
5.2%
t3970
 
4.9%
l3826
 
4.7%
s2977
 
3.6%
Other values (59)29791
36.4%

fuel2002
Categorical

HIGH CORRELATION

Distinct33
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
ALL
1269 
NG
957 
WAT
716 
DFO
684 
BIT
213 
Other values (28)
1161 

Length

Max length3
Median length3
Mean length2.7798
Min length2

Characters and Unicode

Total characters13899
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowALL
2nd rowDFO
3rd rowDFO
4th rowALL
5th rowDFO

Common Values

ValueCountFrequency (%)
ALL1269
25.4%
NG957
19.1%
WAT716
14.3%
DFO684
13.7%
BIT213
 
4.3%
WDS180
 
3.6%
RFO177
 
3.5%
LFG130
 
2.6%
WND92
 
1.8%
BLQ73
 
1.5%
Other values (23)509
10.2%

Length

2022-11-17T14:34:53.774198image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
all1269
25.4%
ng957
19.1%
wat716
14.3%
dfo684
13.7%
bit213
 
4.3%
wds180
 
3.6%
rfo177
 
3.5%
lfg130
 
2.6%
wnd92
 
1.8%
blq73
 
1.5%
Other values (23)509
10.2%

Most occurring characters

ValueCountFrequency (%)
L2781
20.0%
A2001
14.4%
G1229
8.8%
N1126
8.1%
W1054
 
7.6%
O1044
 
7.5%
F1027
 
7.4%
D998
 
7.2%
T976
 
7.0%
B473
 
3.4%
Other values (11)1190
8.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter13899
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L2781
20.0%
A2001
14.4%
G1229
8.8%
N1126
8.1%
W1054
 
7.6%
O1044
 
7.5%
F1027
 
7.4%
D998
 
7.2%
T976
 
7.0%
B473
 
3.4%
Other values (11)1190
8.6%

Most occurring scripts

ValueCountFrequency (%)
Latin13899
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L2781
20.0%
A2001
14.4%
G1229
8.8%
N1126
8.1%
W1054
 
7.6%
O1044
 
7.5%
F1027
 
7.4%
D998
 
7.2%
T976
 
7.0%
B473
 
3.4%
Other values (11)1190
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII13899
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L2781
20.0%
A2001
14.4%
G1229
8.8%
N1126
8.1%
W1054
 
7.6%
O1044
 
7.5%
F1027
 
7.4%
D998
 
7.2%
T976
 
7.0%
B473
 
3.4%
Other values (11)1190
8.6%

fuelTypeDescription
Categorical

HIGH CORRELATION

Distinct19
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
Total
1269 
Natural Gas
957 
Hydroelectric Conventional
708 
Distillate Fuel Oil
684 
Wood Waste Solids
269 
Other values (14)
1113 

Length

Max length28
Median length24
Mean length13.2
Min length4

Characters and Unicode

Total characters66000
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal
2nd rowDistillate Fuel Oil
3rd rowDistillate Fuel Oil
4th rowTotal
5th rowDistillate Fuel Oil

Common Values

ValueCountFrequency (%)
Total1269
25.4%
Natural Gas957
19.1%
Hydroelectric Conventional708
14.2%
Distillate Fuel Oil684
13.7%
Wood Waste Solids269
 
5.4%
Coal266
 
5.3%
Municiapl Landfill Gas196
 
3.9%
Residual Fuel Oil177
 
3.5%
Other112
 
2.2%
Wind92
 
1.8%
Other values (9)270
 
5.4%

Length

2022-11-17T14:34:53.911150image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
total1269
13.2%
gas1153
12.0%
natural957
9.9%
oil887
9.2%
fuel861
8.9%
hydroelectric716
 
7.4%
conventional708
 
7.4%
distillate684
 
7.1%
waste311
 
3.2%
other282
 
2.9%
Other values (18)1803
18.7%

Most occurring characters

ValueCountFrequency (%)
l8262
12.5%
a7110
10.8%
t5683
 
8.6%
e4943
 
7.5%
i4831
 
7.3%
o4676
 
7.1%
4631
 
7.0%
r2833
 
4.3%
s2830
 
4.3%
n2712
 
4.1%
Other values (27)17489
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter51920
78.7%
Uppercase Letter9449
 
14.3%
Space Separator4631
 
7.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l8262
15.9%
a7110
13.7%
t5683
10.9%
e4943
9.5%
i4831
9.3%
o4676
9.0%
r2833
 
5.5%
s2830
 
5.5%
n2712
 
5.2%
u2243
 
4.3%
Other values (12)5797
11.2%
Uppercase Letter
ValueCountFrequency (%)
T1269
13.4%
G1249
13.2%
O1117
11.8%
C1024
10.8%
N967
10.2%
F861
9.1%
H716
7.6%
D684
7.2%
W672
7.1%
S279
 
3.0%
Other values (4)611
6.5%
Space Separator
ValueCountFrequency (%)
4631
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin61369
93.0%
Common4631
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
l8262
13.5%
a7110
11.6%
t5683
 
9.3%
e4943
 
8.1%
i4831
 
7.9%
o4676
 
7.6%
r2833
 
4.6%
s2830
 
4.6%
n2712
 
4.4%
u2243
 
3.7%
Other values (26)15246
24.8%
Common
ValueCountFrequency (%)
4631
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII66000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l8262
12.5%
a7110
10.8%
t5683
 
8.6%
e4943
 
7.5%
i4831
 
7.3%
o4676
 
7.1%
4631
 
7.0%
r2833
 
4.3%
s2830
 
4.3%
n2712
 
4.1%
Other values (27)17489
26.5%

state
Categorical

HIGH CORRELATION

Distinct50
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
CA
674 
TX
 
300
NY
 
294
TN
 
274
AK
 
248
Other values (45)
3210 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters10000
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAK
2nd rowAK
3rd rowAK
4th rowAK
5th rowAK

Common Values

ValueCountFrequency (%)
CA674
 
13.5%
TX300
 
6.0%
NY294
 
5.9%
TN274
 
5.5%
AK248
 
5.0%
FL247
 
4.9%
PA177
 
3.5%
GA167
 
3.3%
MA165
 
3.3%
LA139
 
2.8%
Other values (40)2315
46.3%

Length

2022-11-17T14:34:54.034752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca674
 
13.5%
tx300
 
6.0%
ny294
 
5.9%
tn274
 
5.5%
ak248
 
5.0%
fl247
 
4.9%
pa177
 
3.5%
ga167
 
3.3%
ma165
 
3.3%
la139
 
2.8%
Other values (40)2315
46.3%

Most occurring characters

ValueCountFrequency (%)
A2068
20.7%
N1221
12.2%
C964
9.6%
I685
 
6.9%
T671
 
6.7%
M633
 
6.3%
L593
 
5.9%
Y320
 
3.2%
K302
 
3.0%
X300
 
3.0%
Other values (14)2243
22.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter10000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A2068
20.7%
N1221
12.2%
C964
9.6%
I685
 
6.9%
T671
 
6.7%
M633
 
6.3%
L593
 
5.9%
Y320
 
3.2%
K302
 
3.0%
X300
 
3.0%
Other values (14)2243
22.4%

Most occurring scripts

ValueCountFrequency (%)
Latin10000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A2068
20.7%
N1221
12.2%
C964
9.6%
I685
 
6.9%
T671
 
6.7%
M633
 
6.3%
L593
 
5.9%
Y320
 
3.2%
K302
 
3.0%
X300
 
3.0%
Other values (14)2243
22.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII10000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A2068
20.7%
N1221
12.2%
C964
9.6%
I685
 
6.9%
T671
 
6.7%
M633
 
6.3%
L593
 
5.9%
Y320
 
3.2%
K302
 
3.0%
X300
 
3.0%
Other values (14)2243
22.4%

stateDescription
Categorical

HIGH CORRELATION

Distinct50
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
California
674 
Texas
 
300
New York
 
294
Tennessee
 
274
Alaska
 
248
Other values (45)
3210 

Length

Max length14
Median length11
Mean length8.5858
Min length4

Characters and Unicode

Total characters42929
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlaska
2nd rowAlaska
3rd rowAlaska
4th rowAlaska
5th rowAlaska

Common Values

ValueCountFrequency (%)
California674
 
13.5%
Texas300
 
6.0%
New York294
 
5.9%
Tennessee274
 
5.5%
Alaska248
 
5.0%
Florida247
 
4.9%
Pennsylvania177
 
3.5%
Georgia167
 
3.3%
Massachusetts165
 
3.3%
Louisiana139
 
2.8%
Other values (40)2315
46.3%

Length

2022-11-17T14:34:54.157999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california674
 
11.4%
new569
 
9.7%
texas300
 
5.1%
york294
 
5.0%
tennessee274
 
4.7%
alaska248
 
4.2%
florida247
 
4.2%
carolina208
 
3.5%
pennsylvania177
 
3.0%
north177
 
3.0%
Other values (42)2719
46.2%

Most occurring characters

ValueCountFrequency (%)
a5871
13.7%
i4547
 
10.6%
n3791
 
8.8%
s3346
 
7.8%
e3304
 
7.7%
o3151
 
7.3%
r2359
 
5.5%
l1995
 
4.6%
t1071
 
2.5%
C964
 
2.2%
Other values (36)12530
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter36155
84.2%
Uppercase Letter5887
 
13.7%
Space Separator887
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a5871
16.2%
i4547
12.6%
n3791
10.5%
s3346
9.3%
e3304
9.1%
o3151
8.7%
r2359
6.5%
l1995
 
5.5%
t1071
 
3.0%
h954
 
2.6%
Other values (14)5766
15.9%
Uppercase Letter
ValueCountFrequency (%)
C964
16.4%
N797
13.5%
M633
10.8%
T574
9.8%
A416
 
7.1%
I405
 
6.9%
W300
 
5.1%
Y294
 
5.0%
F247
 
4.2%
P177
 
3.0%
Other values (11)1080
18.3%
Space Separator
ValueCountFrequency (%)
887
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin42042
97.9%
Common887
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a5871
14.0%
i4547
 
10.8%
n3791
 
9.0%
s3346
 
8.0%
e3304
 
7.9%
o3151
 
7.5%
r2359
 
5.6%
l1995
 
4.7%
t1071
 
2.5%
C964
 
2.3%
Other values (35)11643
27.7%
Common
ValueCountFrequency (%)
887
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a5871
13.7%
i4547
 
10.6%
n3791
 
8.8%
s3346
 
7.8%
e3304
 
7.7%
o3151
 
7.3%
r2359
 
5.5%
l1995
 
4.6%
t1071
 
2.5%
C964
 
2.2%
Other values (36)12530
29.2%

primeMover
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
ALL
3141 
1825 
ST
 
19
GT
 
9
HY
 
2
Other values (3)
 
4

Length

Max length3
Median length3
Mean length2.2632
Min length1

Characters and Unicode

Total characters11316
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowALL
2nd rowALL
3rd row
4th rowALL
5th rowALL

Common Values

ValueCountFrequency (%)
ALL3141
62.8%
1825
36.5%
ST19
 
0.4%
GT9
 
0.2%
HY2
 
< 0.1%
IC2
 
< 0.1%
CA1
 
< 0.1%
CT1
 
< 0.1%

Length

2022-11-17T14:34:54.290672image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:54.438056image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
all3141
98.9%
st19
 
0.6%
gt9
 
0.3%
hy2
 
0.1%
ic2
 
0.1%
ca1
 
< 0.1%
ct1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
L6282
55.5%
A3142
27.8%
1825
 
16.1%
T29
 
0.3%
S19
 
0.2%
G9
 
0.1%
C4
 
< 0.1%
H2
 
< 0.1%
Y2
 
< 0.1%
I2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter9491
83.9%
Space Separator1825
 
16.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L6282
66.2%
A3142
33.1%
T29
 
0.3%
S19
 
0.2%
G9
 
0.1%
C4
 
< 0.1%
H2
 
< 0.1%
Y2
 
< 0.1%
I2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1825
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9491
83.9%
Common1825
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
L6282
66.2%
A3142
33.1%
T29
 
0.3%
S19
 
0.2%
G9
 
0.1%
C4
 
< 0.1%
H2
 
< 0.1%
Y2
 
< 0.1%
I2
 
< 0.1%
Common
ValueCountFrequency (%)
1825
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII11316
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L6282
55.5%
A3142
27.8%
1825
 
16.1%
T29
 
0.3%
S19
 
0.2%
G9
 
0.1%
C4
 
< 0.1%
H2
 
< 0.1%
Y2
 
< 0.1%
I2
 
< 0.1%

generation
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1903
Distinct (%)38.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25561.98483
Minimum-41941
Maximum1695623
Zeros594
Zeros (%)11.9%
Negative47
Negative (%)0.9%
Memory size39.2 KiB
2022-11-17T14:34:54.590695image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-41941
5-th percentile0
Q1115.9175
median1548
Q39280
95-th percentile100456
Maximum1695623
Range1737564
Interquartile range (IQR)9164.0825

Descriptive statistics

Standard deviation107424.6026
Coefficient of variation (CV)4.202514136
Kurtosis94.12484749
Mean25561.98483
Median Absolute Deviation (MAD)1548
Skewness8.526353334
Sum127809924.1
Variance1.154004524 × 1010
MonotonicityNot monotonic
2022-11-17T14:34:54.776393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0594
 
11.9%
3410
 
0.2%
529
 
0.2%
69
 
0.2%
19
 
0.2%
0.358
 
0.2%
88
 
0.2%
38
 
0.2%
137
 
0.1%
507
 
0.1%
Other values (1893)4331
86.6%
ValueCountFrequency (%)
-419413
0.1%
-165233
0.1%
-159832
 
< 0.1%
-72381
 
< 0.1%
-1906
0.1%
-1673
0.1%
-1203
0.1%
-923
0.1%
-723
0.1%
-633
0.1%
ValueCountFrequency (%)
16956231
 
< 0.1%
16940822
< 0.1%
16533451
 
< 0.1%
16492512
< 0.1%
12192091
 
< 0.1%
12175322
< 0.1%
10726781
 
< 0.1%
9134141
 
< 0.1%
8780633
0.1%
8502281
 
< 0.1%

gross-generation
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1923
Distinct (%)38.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26500.2457
Minimum-43238.14
Maximum1748064.95
Zeros606
Zeros (%)12.1%
Negative30
Negative (%)0.6%
Memory size39.2 KiB
2022-11-17T14:34:54.964948image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-43238.14
5-th percentile0
Q1120.3925
median1584.28
Q39981.265
95-th percentile103009.28
Maximum1748064.95
Range1791303.09
Interquartile range (IQR)9860.8725

Descriptive statistics

Standard deviation111413.2384
Coefficient of variation (CV)4.204234168
Kurtosis93.81610436
Mean26500.2457
Median Absolute Deviation (MAD)1584.28
Skewness8.527532021
Sum132501228.5
Variance1.241290969 × 1010
MonotonicityNot monotonic
2022-11-17T14:34:55.129636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0606
 
12.1%
6.129
 
0.2%
34.697
 
0.1%
3.067
 
0.1%
61.716
 
0.1%
0.56
 
0.1%
14753.546
 
0.1%
1288.266
 
0.1%
104.316
 
0.1%
0.366
 
0.1%
Other values (1913)4335
86.7%
ValueCountFrequency (%)
-43238.143
 
0.1%
-17034.023
 
0.1%
-195.886
 
0.1%
-123.713
 
0.1%
-64.293
 
0.1%
-57.143
 
0.1%
-21.433
 
0.1%
-17.353
 
0.1%
-6.123
 
0.1%
0606
12.1%
ValueCountFrequency (%)
1748064.951
 
< 0.1%
1746476.292
< 0.1%
1704479.381
 
< 0.1%
1700258.762
< 0.1%
1308878.991
 
< 0.1%
1307078.652
< 0.1%
1105853.611
 
< 0.1%
941663.921
 
< 0.1%
905219.593
0.1%
876523.711
 
< 0.1%

total-consumption
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct1197
Distinct (%)32.1%
Missing1269
Missing (%)25.4%
Infinite0
Infinite (%)0.0%
Mean105171.0708
Minimum0
Maximum8920563
Zeros1320
Zeros (%)26.4%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2022-11-17T14:34:55.302572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median959
Q328750.26
95-th percentile450905
Maximum8920563
Range8920563
Interquartile range (IQR)28750.26

Descriptive statistics

Standard deviation491967.9015
Coefficient of variation (CV)4.67778732
Kurtosis154.1507362
Mean105171.0708
Median Absolute Deviation (MAD)959
Skewness10.92355893
Sum392393265.2
Variance2.420324161 × 1011
MonotonicityNot monotonic
2022-11-17T14:34:55.461924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01320
26.4%
29
 
0.2%
388
 
0.2%
18
 
0.2%
106
 
0.1%
126
 
0.1%
46
 
0.1%
2036
 
0.1%
56
 
0.1%
66
 
0.1%
Other values (1187)2350
47.0%
(Missing)1269
25.4%
ValueCountFrequency (%)
01320
26.4%
18
 
0.2%
29
 
0.2%
2.712
 
< 0.1%
34
 
0.1%
46
 
0.1%
56
 
0.1%
66
 
0.1%
72
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
89205632
< 0.1%
86401002
< 0.1%
68724101
< 0.1%
66749312
< 0.1%
66132032
< 0.1%
40584932
< 0.1%
35827721
< 0.1%
31039382
< 0.1%
30927312
< 0.1%
30719412
< 0.1%

total-consumption-units
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.1%
Missing2326
Missing (%)46.5%
Memory size39.2 KiB
MMBtu per Mcf
1185 
MMBtu per barrels
909 
MMBtu per short tons
580 

Length

Max length20
Median length17
Mean length15.87808527
Min length13

Characters and Unicode

Total characters42458
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMMBtu per barrels
2nd rowMMBtu per barrels
3rd rowMMBtu per barrels
4th rowMMBtu per barrels
5th rowMMBtu per barrels

Common Values

ValueCountFrequency (%)
MMBtu per Mcf1185
23.7%
MMBtu per barrels909
 
18.2%
MMBtu per short tons580
 
11.6%
(Missing)2326
46.5%

Length

2022-11-17T14:34:55.613440image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:55.753616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mmbtu2674
31.1%
per2674
31.1%
mcf1185
13.8%
barrels909
 
10.6%
short580
 
6.7%
tons580
 
6.7%

Most occurring characters

ValueCountFrequency (%)
M6533
15.4%
5928
14.0%
r5072
11.9%
t3834
9.0%
e3583
8.4%
u2674
6.3%
p2674
6.3%
B2674
6.3%
s2069
 
4.9%
c1185
 
2.8%
Other values (7)6232
14.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27323
64.4%
Uppercase Letter9207
 
21.7%
Space Separator5928
 
14.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r5072
18.6%
t3834
14.0%
e3583
13.1%
u2674
9.8%
p2674
9.8%
s2069
7.6%
c1185
 
4.3%
f1185
 
4.3%
o1160
 
4.2%
b909
 
3.3%
Other values (4)2978
10.9%
Uppercase Letter
ValueCountFrequency (%)
M6533
71.0%
B2674
29.0%
Space Separator
ValueCountFrequency (%)
5928
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin36530
86.0%
Common5928
 
14.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M6533
17.9%
r5072
13.9%
t3834
10.5%
e3583
9.8%
u2674
7.3%
p2674
7.3%
B2674
7.3%
s2069
 
5.7%
c1185
 
3.2%
f1185
 
3.2%
Other values (6)5047
13.8%
Common
ValueCountFrequency (%)
5928
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M6533
15.4%
5928
14.0%
r5072
11.9%
t3834
9.0%
e3583
8.4%
u2674
6.3%
p2674
6.3%
B2674
6.3%
s2069
 
4.9%
c1185
 
2.8%
Other values (7)6232
14.7%

total-consumption-btu
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1933
Distinct (%)38.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317364.4852
Minimum-10.17
Maximum16319726
Zeros597
Zeros (%)11.9%
Negative3
Negative (%)0.1%
Memory size39.2 KiB
2022-11-17T14:34:55.894718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-10.17
5-th percentile0
Q11608.5
median25266.1
Q3179287.88
95-th percentile1235527
Maximum16319726
Range16319736.17
Interquartile range (IQR)177679.38

Descriptive statistics

Standard deviation1116123.194
Coefficient of variation (CV)3.516849698
Kurtosis75.42235507
Mean317364.4852
Median Absolute Deviation (MAD)25266.1
Skewness7.60439878
Sum1586822426
Variance1.245730984 × 1012
MonotonicityNot monotonic
2022-11-17T14:34:56.065034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0597
 
11.9%
109
 
0.2%
68
 
0.2%
1.466
 
0.1%
126
 
0.1%
6646
 
0.1%
7606
 
0.1%
5.086
 
0.1%
6086.146
 
0.1%
155
 
0.1%
Other values (1923)4345
86.9%
ValueCountFrequency (%)
-10.173
 
0.1%
0597
11.9%
1.466
 
0.1%
23
 
0.1%
2.313
 
0.1%
3.633
 
0.1%
43
 
0.1%
4.013
 
0.1%
5.086
 
0.1%
68
 
0.2%
ValueCountFrequency (%)
163197261
 
< 0.1%
162855642
< 0.1%
161982571
 
< 0.1%
161858122
< 0.1%
120924621
 
< 0.1%
120689532
< 0.1%
103684961
 
< 0.1%
91687343
0.1%
88851671
 
< 0.1%
88455231
 
< 0.1%

consumption-for-eg
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct1206
Distinct (%)32.3%
Missing1269
Missing (%)25.4%
Infinite0
Infinite (%)0.0%
Mean76556.90803
Minimum0
Maximum6872410
Zeros1343
Zeros (%)26.9%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2022-11-17T14:34:56.243313image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median440
Q315564.36
95-th percentile332846
Maximum6872410
Range6872410
Interquartile range (IQR)15564.36

Descriptive statistics

Standard deviation366391.4347
Coefficient of variation (CV)4.785870329
Kurtosis159.5527835
Mean76556.90803
Median Absolute Deviation (MAD)440
Skewness10.95510124
Sum285633823.9
Variance1.342426834 × 1011
MonotonicityNot monotonic
2022-11-17T14:34:56.405995image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01343
26.9%
29
 
0.2%
18
 
0.2%
388
 
0.2%
107
 
0.1%
126
 
0.1%
46
 
0.1%
1625
 
0.1%
1074
 
0.1%
244
 
0.1%
Other values (1196)2331
46.6%
(Missing)1269
25.4%
ValueCountFrequency (%)
01343
26.9%
0.242
 
< 0.1%
0.382
 
< 0.1%
18
 
0.2%
1.042
 
< 0.1%
1.132
 
< 0.1%
1.932
 
< 0.1%
29
 
0.2%
2.712
 
< 0.1%
32
 
< 0.1%
ValueCountFrequency (%)
68724101
< 0.1%
66749312
< 0.1%
6484218.562
< 0.1%
40451252
< 0.1%
35827721
< 0.1%
3399147.442
< 0.1%
2770564.332
< 0.1%
25265771
< 0.1%
2428558.652
< 0.1%
24224222
< 0.1%

consumption-for-eg-units
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.1%
Missing2326
Missing (%)46.5%
Memory size39.2 KiB
Mcf
1185 
barrels
909 
short tons
580 

Length

Max length10
Median length7
Mean length5.878085266
Min length3

Characters and Unicode

Total characters15718
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowbarrels
2nd rowbarrels
3rd rowbarrels
4th rowbarrels
5th rowbarrels

Common Values

ValueCountFrequency (%)
Mcf1185
23.7%
barrels909
 
18.2%
short tons580
 
11.6%
(Missing)2326
46.5%

Length

2022-11-17T14:34:56.563692image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:56.695244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mcf1185
36.4%
barrels909
27.9%
short580
17.8%
tons580
17.8%

Most occurring characters

ValueCountFrequency (%)
r2398
15.3%
s2069
13.2%
M1185
7.5%
c1185
7.5%
f1185
7.5%
o1160
7.4%
t1160
7.4%
b909
 
5.8%
a909
 
5.8%
e909
 
5.8%
Other values (4)2649
16.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter13953
88.8%
Uppercase Letter1185
 
7.5%
Space Separator580
 
3.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r2398
17.2%
s2069
14.8%
c1185
8.5%
f1185
8.5%
o1160
8.3%
t1160
8.3%
b909
 
6.5%
a909
 
6.5%
e909
 
6.5%
l909
 
6.5%
Other values (2)1160
8.3%
Uppercase Letter
ValueCountFrequency (%)
M1185
100.0%
Space Separator
ValueCountFrequency (%)
580
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15138
96.3%
Common580
 
3.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r2398
15.8%
s2069
13.7%
M1185
7.8%
c1185
7.8%
f1185
7.8%
o1160
7.7%
t1160
7.7%
b909
 
6.0%
a909
 
6.0%
e909
 
6.0%
Other values (3)2069
13.7%
Common
ValueCountFrequency (%)
580
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII15718
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r2398
15.3%
s2069
13.2%
M1185
7.5%
c1185
7.5%
f1185
7.5%
o1160
7.4%
t1160
7.4%
b909
 
5.8%
a909
 
5.8%
e909
 
5.8%
Other values (4)2649
16.9%

consumption-for-eg-btu
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1923
Distinct (%)38.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean265433.5286
Minimum-66828
Maximum16319726
Zeros596
Zeros (%)11.9%
Negative25
Negative (%)0.5%
Memory size39.2 KiB
2022-11-17T14:34:56.832311image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-66828
5-th percentile0
Q11194
median15755
Q3112889.46
95-th percentile896264.3885
Maximum16319726
Range16386554
Interquartile range (IQR)111695.46

Descriptive statistics

Standard deviation1090511.728
Coefficient of variation (CV)4.108417404
Kurtosis84.3254756
Mean265433.5286
Median Absolute Deviation (MAD)15755
Skewness8.185109812
Sum1327167643
Variance1.18921583 × 1012
MonotonicityNot monotonic
2022-11-17T14:34:56.991917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0596
 
11.9%
1010
 
0.2%
68
 
0.2%
13847
 
0.1%
6646
 
0.1%
5.086
 
0.1%
126
 
0.1%
6086.146
 
0.1%
636
 
0.1%
7606
 
0.1%
Other values (1913)4343
86.9%
ValueCountFrequency (%)
-668281
 
< 0.1%
-630712
< 0.1%
-370072
< 0.1%
-347173
0.1%
-259063
0.1%
-37753
0.1%
-34462
< 0.1%
-26122
< 0.1%
-8152
< 0.1%
-3302
< 0.1%
ValueCountFrequency (%)
163197261
 
< 0.1%
162855642
< 0.1%
161982571
 
< 0.1%
161858122
< 0.1%
120924621
 
< 0.1%
120689532
< 0.1%
103684961
 
< 0.1%
91687343
0.1%
88851671
 
< 0.1%
88455231
 
< 0.1%

average-heat-content
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct506
Distinct (%)21.0%
Missing2589
Missing (%)51.8%
Infinite0
Infinite (%)0.0%
Mean7.144659477
Minimum0.06
Maximum33.92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.2 KiB
2022-11-17T14:34:57.160917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.06
5-th percentile0.56
Q11.02
median5.802
Q38.4
95-th percentile25.5
Maximum33.92
Range33.86
Interquartile range (IQR)7.38

Descriptive statistics

Standard deviation8.062987251
Coefficient of variation (CV)1.128533456
Kurtosis0.9500839915
Mean7.144659477
Median Absolute Deviation (MAD)4.775
Skewness1.415656485
Sum17225.774
Variance65.01176342
MonotonicityNot monotonic
2022-11-17T14:34:57.325576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1162
 
3.2%
1.03154
 
3.1%
1.02119
 
2.4%
1.0180
 
1.6%
1.0466
 
1.3%
1.02755
 
1.1%
6.342
 
0.8%
1.0537
 
0.7%
5.83835
 
0.7%
5.829
 
0.6%
Other values (496)1632
32.6%
(Missing)2589
51.8%
ValueCountFrequency (%)
0.062
< 0.1%
0.082
< 0.1%
0.094
0.1%
0.0922
< 0.1%
0.12
< 0.1%
0.274
0.1%
0.32
< 0.1%
0.364
0.1%
0.3692
< 0.1%
0.371
 
< 0.1%
ValueCountFrequency (%)
33.922
 
< 0.1%
332
 
< 0.1%
32.6672
 
< 0.1%
322
 
< 0.1%
31.5991
 
< 0.1%
316
0.1%
30.9992
 
< 0.1%
30.92
 
< 0.1%
30.5932
 
< 0.1%
30.42
 
< 0.1%

average-heat-content-units
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.1%
Missing2326
Missing (%)46.5%
Memory size39.2 KiB
MMBtu per Mcf
1185 
MMBtu per barrels
909 
MMBtu per short tons
580 

Length

Max length20
Median length17
Mean length15.87808527
Min length13

Characters and Unicode

Total characters42458
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMMBtu per barrels
2nd rowMMBtu per barrels
3rd rowMMBtu per barrels
4th rowMMBtu per barrels
5th rowMMBtu per barrels

Common Values

ValueCountFrequency (%)
MMBtu per Mcf1185
23.7%
MMBtu per barrels909
 
18.2%
MMBtu per short tons580
 
11.6%
(Missing)2326
46.5%

Length

2022-11-17T14:34:57.482484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:57.615655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mmbtu2674
31.1%
per2674
31.1%
mcf1185
13.8%
barrels909
 
10.6%
short580
 
6.7%
tons580
 
6.7%

Most occurring characters

ValueCountFrequency (%)
M6533
15.4%
5928
14.0%
r5072
11.9%
t3834
9.0%
e3583
8.4%
u2674
6.3%
p2674
6.3%
B2674
6.3%
s2069
 
4.9%
c1185
 
2.8%
Other values (7)6232
14.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27323
64.4%
Uppercase Letter9207
 
21.7%
Space Separator5928
 
14.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r5072
18.6%
t3834
14.0%
e3583
13.1%
u2674
9.8%
p2674
9.8%
s2069
7.6%
c1185
 
4.3%
f1185
 
4.3%
o1160
 
4.2%
b909
 
3.3%
Other values (4)2978
10.9%
Uppercase Letter
ValueCountFrequency (%)
M6533
71.0%
B2674
29.0%
Space Separator
ValueCountFrequency (%)
5928
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin36530
86.0%
Common5928
 
14.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M6533
17.9%
r5072
13.9%
t3834
10.5%
e3583
9.8%
u2674
7.3%
p2674
7.3%
B2674
7.3%
s2069
 
5.7%
c1185
 
3.2%
f1185
 
3.2%
Other values (6)5047
13.8%
Common
ValueCountFrequency (%)
5928
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M6533
15.4%
5928
14.0%
r5072
11.9%
t3834
9.0%
e3583
8.4%
u2674
6.3%
p2674
6.3%
B2674
6.3%
s2069
 
4.9%
c1185
 
2.8%
Other values (7)6232
14.7%

generation-units
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
megawatthours
5000 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters65000
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmegawatthours
2nd rowmegawatthours
3rd rowmegawatthours
4th rowmegawatthours
5th rowmegawatthours

Common Values

ValueCountFrequency (%)
megawatthours5000
100.0%

Length

2022-11-17T14:34:57.728702image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:57.838967image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
megawatthours5000
100.0%

Most occurring characters

ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter65000
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring scripts

ValueCountFrequency (%)
Latin65000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII65000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

gross-generation-units
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
megawatthours
5000 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters65000
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmegawatthours
2nd rowmegawatthours
3rd rowmegawatthours
4th rowmegawatthours
5th rowmegawatthours

Common Values

ValueCountFrequency (%)
megawatthours5000
100.0%

Length

2022-11-17T14:34:57.932951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:58.044499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
megawatthours5000
100.0%

Most occurring characters

ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter65000
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring scripts

ValueCountFrequency (%)
Latin65000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII65000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a10000
15.4%
t10000
15.4%
m5000
7.7%
e5000
7.7%
g5000
7.7%
w5000
7.7%
h5000
7.7%
o5000
7.7%
u5000
7.7%
r5000
7.7%

total-consumption-btu-units
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
MMBtu
5000 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters25000
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMMBtu
2nd rowMMBtu
3rd rowMMBtu
4th rowMMBtu
5th rowMMBtu

Common Values

ValueCountFrequency (%)
MMBtu5000
100.0%

Length

2022-11-17T14:34:58.773446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:58.882983image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mmbtu5000
100.0%

Most occurring characters

ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter15000
60.0%
Lowercase Letter10000
40.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M10000
66.7%
B5000
33.3%
Lowercase Letter
ValueCountFrequency (%)
t5000
50.0%
u5000
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin25000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII25000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

consumption-for-eg-btu-units
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.2 KiB
MMBtu
5000 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters25000
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMMBtu
2nd rowMMBtu
3rd rowMMBtu
4th rowMMBtu
5th rowMMBtu

Common Values

ValueCountFrequency (%)
MMBtu5000
100.0%

Length

2022-11-17T14:34:58.976140image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:34:59.086847image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mmbtu5000
100.0%

Most occurring characters

ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter15000
60.0%
Lowercase Letter10000
40.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M10000
66.7%
B5000
33.3%
Lowercase Letter
ValueCountFrequency (%)
t5000
50.0%
u5000
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin25000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII25000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M10000
40.0%
B5000
20.0%
t5000
20.0%
u5000
20.0%

Interactions

2022-11-17T14:34:49.950897image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.072909image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.306352image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.105522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.409204image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.723188image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.990294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.339027image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.642693image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.086590image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.203992image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.435510image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.242831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.545927image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.869760image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.132289image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.474451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.778286image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.221693image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.335030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.565639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.379691image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.684760image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.005896image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.273868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.616641image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.917800image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.367672image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.475651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.707499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.528993image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.834957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.145943image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.424953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.765010image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.065594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.512829image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.617789image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.847697image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.679159image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.984358image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.286940image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.582715image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.914253image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.214266image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.645960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.744844image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.975270image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.813642image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.120569image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.413283image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.722323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.050057image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.352635image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.798907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:39.890051image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:41.130126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:42.968338image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.273831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.568256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:46.883380image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.202106image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.507214image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:50.943882image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.031156image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:41.272058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.115771image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.422022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.706620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.037071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.347861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.657333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:51.095658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:40.168318image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:41.965007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:43.262258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:44.568014image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:45.852380image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:47.186620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:48.498423image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-17T14:34:49.800958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-17T14:34:59.203748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-17T14:34:59.481706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-17T14:34:59.670807image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-17T14:34:59.856957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-17T14:35:00.052699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-17T14:35:00.282775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-17T14:34:51.374611image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-17T14:34:51.914622image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-17T14:34:52.211706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-17T14:34:52.396620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0periodplantCodeplantNamefuel2002fuelTypeDescriptionstatestateDescriptionprimeMovergenerationgross-generationtotal-consumptiontotal-consumption-unitstotal-consumption-btuconsumption-for-egconsumption-for-eg-unitsconsumption-for-eg-btuaverage-heat-contentaverage-heat-content-unitsgeneration-unitsgross-generation-unitstotal-consumption-btu-unitsconsumption-for-eg-btu-units
002001-046311ChevakALLTotalAKAlaskaALL135.88138.65NaNNaN1457.0NaNNaN1457.0NaNNaNmegawatthoursmegawatthoursMMBtuMMBtu
112001-046311ChevakDFODistillate Fuel OilAKAlaskaALL135.88138.65250.0MMBtu per barrels1457.0250.0barrels1457.05.828MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
222001-046312EEKDFODistillate Fuel OilAKAlaska50.8951.93100.0MMBtu per barrels583.0100.0barrels583.05.830MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
332001-046312EEKALLTotalAKAlaskaALL50.8951.93NaNNaN583.0NaNNaN583.0NaNNaNmegawatthoursmegawatthoursMMBtuMMBtu
442001-046312EEKDFODistillate Fuel OilAKAlaskaALL50.8951.93100.0MMBtu per barrels583.0100.0barrels583.05.830MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
552001-046313ELIMDFODistillate Fuel OilAKAlaska76.0077.55133.0MMBtu per barrels775.0133.0barrels775.05.827MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
662001-046313ELIMALLTotalAKAlaskaALL76.0077.55NaNNaN775.0NaNNaN775.0NaNNaNmegawatthoursmegawatthoursMMBtuMMBtu
772001-046313ELIMDFODistillate Fuel OilAKAlaskaALL76.0077.55133.0MMBtu per barrels775.0133.0barrels775.05.827MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
882001-046314EmmonakDFODistillate Fuel OilAKAlaska177.04180.65340.0MMBtu per barrels1982.0340.0barrels1982.05.829MMBtu per barrelsmegawatthoursmegawatthoursMMBtuMMBtu
992001-046314EmmonakALLTotalAKAlaskaALL177.04180.65NaNNaN1982.0NaNNaN1982.0NaNNaNmegawatthoursmegawatthoursMMBtuMMBtu

Last rows

Unnamed: 0periodplantCodeplantNamefuel2002fuelTypeDescriptionstatestateDescriptionprimeMovergenerationgross-generationtotal-consumptiontotal-consumption-unitstotal-consumption-btuconsumption-for-egconsumption-for-eg-unitsconsumption-for-eg-btuaverage-heat-contentaverage-heat-content-unitsgeneration-unitsgross-generation-unitstotal-consumption-btu-unitsconsumption-for-eg-btu-units
499049902001-0454526Lyonsdale Power Co LLCWDSWood Waste SolidsNYNew YorkALL8645.09188.0113549.0NaN123702.013549.0NaN123702.09.13NaNmegawatthoursmegawatthoursMMBtuMMBtu
499149912001-0454529RidgeLFGMuniciapl Landfill GasFLFlorida296.0317.7718040.0MMBtu per Mcf10102.018040.0Mcf10102.00.56MMBtu per McfmegawatthoursmegawatthoursMMBtuMMBtu
499249922001-0454529RidgeTDFOtherFLFlorida6376.06844.947152.0MMBtu per short tons194534.07152.0short tons194534.027.20MMBtu per short tonsmegawatthoursmegawatthoursMMBtuMMBtu
499349932001-0454529RidgeWDSWood Waste SolidsFLFlorida9289.09972.1816671.0NaN133368.016671.0NaN133368.08.00NaNmegawatthoursmegawatthoursMMBtuMMBtu
499449942001-0454529RidgeALLTotalFLFloridaALL15961.017134.89NaNNaN338004.0NaNNaN338004.0NaNNaNmegawatthoursmegawatthoursMMBtuMMBtu
499549952001-0454529RidgeLFGMuniciapl Landfill GasFLFloridaALL296.0317.7718040.0MMBtu per Mcf10102.018040.0Mcf10102.00.56MMBtu per McfmegawatthoursmegawatthoursMMBtuMMBtu
499649962001-0454529RidgeTDFOtherFLFloridaALL6376.06844.947152.0MMBtu per short tons194534.07152.0short tons194534.027.20MMBtu per short tonsmegawatthoursmegawatthoursMMBtuMMBtu
499749972001-1255323City of Tacoma Steam PlantMSNOtherWAWashington0.00.000.0MMBtu per short tons0.00.0short tons0.0NaNMMBtu per short tonsmegawatthoursmegawatthoursMMBtuMMBtu
499849982001-1255323City of Tacoma Steam PlantNGNatural GasWAWashington0.00.000.0MMBtu per Mcf0.00.0Mcf0.0NaNMMBtu per McfmegawatthoursmegawatthoursMMBtuMMBtu
499949992001-1255323City of Tacoma Steam PlantOBSother renewablesWAWashington0.00.000.0MMBtu per short tons0.00.0short tons0.0NaNMMBtu per short tonsmegawatthoursmegawatthoursMMBtuMMBtu