Overview

Dataset statistics

Number of variables9
Number of observations50009
Missing cells9
Missing cells (%)< 0.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory3.4 MiB
Average record size in memory72.0 B

Variable types

Numeric1
Categorical8

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
fromba has a high cardinality: 77 distinct values High cardinality
fromba-name has a high cardinality: 77 distinct values High cardinality
toba has a high cardinality: 86 distinct values High cardinality
toba-name has a high cardinality: 86 distinct values High cardinality
value has a high cardinality: 21249 distinct values High cardinality
Unnamed: 0 is highly correlated with periodHigh correlation
period is highly correlated with Unnamed: 0 and 6 other fieldsHigh correlation
fromba is highly correlated with period and 5 other fieldsHigh correlation
fromba-name is highly correlated with period and 5 other fieldsHigh correlation
toba is highly correlated with period and 5 other fieldsHigh correlation
toba-name is highly correlated with period and 5 other fieldsHigh correlation
timezone is highly correlated with period and 5 other fieldsHigh correlation
value-units is highly correlated with period and 5 other fieldsHigh correlation
Unnamed: 0 is uniformly distributed Uniform

Reproduction

Analysis started2022-11-17 22:36:01.257580
Analysis finished2022-11-17 22:36:06.082039
Duration4.82 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM

Distinct5000
Distinct (%)10.0%
Missing9
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean2499.5
Minimum0
Maximum4999
Zeros10
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2022-11-17T14:36:06.196903image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile249.95
Q11249.75
median2499.5
Q33749.25
95-th percentile4749.05
Maximum4999
Range4999
Interquartile range (IQR)2499.5

Descriptive statistics

Standard deviation1443.390078
Coefficient of variation (CV)0.5774715255
Kurtosis-1.200000095
Mean2499.5
Median Absolute Deviation (MAD)1250
Skewness0
Sum124975000
Variance2083374.917
MonotonicityNot monotonic
2022-11-17T14:36:06.370864image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010
 
< 0.1%
333110
 
< 0.1%
333810
 
< 0.1%
333710
 
< 0.1%
333610
 
< 0.1%
333510
 
< 0.1%
333410
 
< 0.1%
333310
 
< 0.1%
333210
 
< 0.1%
333010
 
< 0.1%
Other values (4990)49900
99.8%
ValueCountFrequency (%)
010
< 0.1%
110
< 0.1%
210
< 0.1%
310
< 0.1%
410
< 0.1%
510
< 0.1%
610
< 0.1%
710
< 0.1%
810
< 0.1%
910
< 0.1%
ValueCountFrequency (%)
499910
< 0.1%
499810
< 0.1%
499710
< 0.1%
499610
< 0.1%
499510
< 0.1%
499410
< 0.1%
499310
< 0.1%
499210
< 0.1%
499110
< 0.1%
499010
< 0.1%

period
Categorical

HIGH CORRELATION

Distinct32
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
2022-10-28
 
1700
2022-10-15
 
1700
2022-10-17
 
1700
2022-10-18
 
1700
2022-10-19
 
1700
Other values (27)
41509 

Length

Max length10
Median length10
Mean length9.99928013
Min length6

Characters and Unicode

Total characters500054
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-11-12
2nd row2022-11-12
3rd row2022-11-12
4th row2022-11-12
5th row2022-11-12

Common Values

ValueCountFrequency (%)
2022-10-281700
 
3.4%
2022-10-151700
 
3.4%
2022-10-171700
 
3.4%
2022-10-181700
 
3.4%
2022-10-191700
 
3.4%
2022-10-201700
 
3.4%
2022-10-211700
 
3.4%
2022-10-221700
 
3.4%
2022-10-231700
 
3.4%
2022-10-241700
 
3.4%
Other values (22)33009
66.0%

Length

2022-11-17T14:36:06.531374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-10-281700
 
3.4%
2022-10-261700
 
3.4%
2022-11-091700
 
3.4%
2022-10-141700
 
3.4%
2022-11-031700
 
3.4%
2022-11-021700
 
3.4%
2022-10-151700
 
3.4%
2022-10-311700
 
3.4%
2022-10-301700
 
3.4%
2022-10-291700
 
3.4%
Other values (22)33009
66.0%

Most occurring characters

ValueCountFrequency (%)
2171353
34.3%
0100862
20.2%
-100000
20.0%
190468
18.1%
37021
 
1.4%
95100
 
1.0%
85081
 
1.0%
45041
 
1.0%
65032
 
1.0%
75029
 
1.0%
Other values (7)5067
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number400000
80.0%
Dash Punctuation100000
 
20.0%
Lowercase Letter54
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2171353
42.8%
0100862
25.2%
190468
22.6%
37021
 
1.8%
95100
 
1.3%
85081
 
1.3%
45041
 
1.3%
65032
 
1.3%
75029
 
1.3%
55013
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
p9
16.7%
e9
16.7%
r9
16.7%
i9
16.7%
o9
16.7%
d9
16.7%
Dash Punctuation
ValueCountFrequency (%)
-100000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common500000
> 99.9%
Latin54
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2171353
34.3%
0100862
20.2%
-100000
20.0%
190468
18.1%
37021
 
1.4%
95100
 
1.0%
85081
 
1.0%
45041
 
1.0%
65032
 
1.0%
75029
 
1.0%
Latin
ValueCountFrequency (%)
p9
16.7%
e9
16.7%
r9
16.7%
i9
16.7%
o9
16.7%
d9
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII500054
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2171353
34.3%
0100862
20.2%
-100000
20.0%
190468
18.1%
37021
 
1.4%
95100
 
1.0%
85081
 
1.0%
45041
 
1.0%
65032
 
1.0%
75029
 
1.0%
Other values (7)5067
 
1.0%

fromba
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
BPAT
 
2856
CISO
 
1737
AZPS
 
1390
SWPP
 
1386
WALC
 
1362
Other values (72)
41278 

Length

Max length6
Median length4
Mean length3.577436062
Min length2

Characters and Unicode

Total characters178904
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPACE
2nd rowSPA
3rd rowBPAT
4th rowCISO
5th rowTEN

Common Values

ValueCountFrequency (%)
BPAT2856
 
5.7%
CISO1737
 
3.5%
AZPS1390
 
2.8%
SWPP1386
 
2.8%
WALC1362
 
2.7%
SOCO1274
 
2.5%
DUK1245
 
2.5%
NWMT1240
 
2.5%
FPL1107
 
2.2%
PACW1098
 
2.2%
Other values (67)35314
70.6%

Length

2022-11-17T14:36:06.671280image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bpat2856
 
5.7%
ciso1737
 
3.5%
azps1390
 
2.8%
swpp1386
 
2.8%
walc1362
 
2.7%
soco1274
 
2.5%
duk1245
 
2.5%
nwmt1240
 
2.5%
fpl1107
 
2.2%
pacw1098
 
2.2%
Other values (67)35314
70.6%

Most occurring characters

ValueCountFrequency (%)
P24872
13.9%
C19050
10.6%
A17638
 
9.9%
S14879
 
8.3%
E12918
 
7.2%
W10857
 
6.1%
I9144
 
5.1%
T9046
 
5.1%
N7575
 
4.2%
O7295
 
4.1%
Other values (23)45630
25.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter178286
99.7%
Decimal Number564
 
0.3%
Lowercase Letter54
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P24872
14.0%
C19050
10.7%
A17638
 
9.9%
S14879
 
8.3%
E12918
 
7.2%
W10857
 
6.1%
I9144
 
5.1%
T9046
 
5.1%
N7575
 
4.2%
O7295
 
4.1%
Other values (15)45012
25.2%
Lowercase Letter
ValueCountFrequency (%)
f9
16.7%
r9
16.7%
o9
16.7%
m9
16.7%
b9
16.7%
a9
16.7%
Decimal Number
ValueCountFrequency (%)
4282
50.0%
8282
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin178340
99.7%
Common564
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
P24872
13.9%
C19050
10.7%
A17638
 
9.9%
S14879
 
8.3%
E12918
 
7.2%
W10857
 
6.1%
I9144
 
5.1%
T9046
 
5.1%
N7575
 
4.2%
O7295
 
4.1%
Other values (21)45066
25.3%
Common
ValueCountFrequency (%)
4282
50.0%
8282
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII178904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P24872
13.9%
C19050
10.6%
A17638
 
9.9%
S14879
 
8.3%
E12918
 
7.2%
W10857
 
6.1%
I9144
 
5.1%
T9046
 
5.1%
N7575
 
4.2%
O7295
 
4.1%
Other values (23)45630
25.5%

fromba-name
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Bonneville Power Administration
 
2856
California Independent System Operator
 
1737
Arizona Public Service Company
 
1390
Southwest Power Pool
 
1386
Western Area Power Administration - Desert Southwest Region
 
1362
Other values (72)
41278 

Length

Max length66
Median length38
Mean length29.22353976
Min length3

Characters and Unicode

Total characters1461440
Distinct characters58
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPacifiCorp East
2nd rowSouthwestern Power Administration
3rd rowBonneville Power Administration
4th rowCalifornia Independent System Operator
5th rowTennessee

Common Values

ValueCountFrequency (%)
Bonneville Power Administration2856
 
5.7%
California Independent System Operator1737
 
3.5%
Arizona Public Service Company1390
 
2.8%
Southwest Power Pool1386
 
2.8%
Western Area Power Administration - Desert Southwest Region1362
 
2.7%
Southern Company Services, Inc. - Trans1274
 
2.5%
Duke Energy Carolinas1245
 
2.5%
NorthWestern Corporation1240
 
2.5%
Florida Power & Light Co.1107
 
2.2%
PacifiCorp West1098
 
2.2%
Other values (67)35314
70.6%

Length

2022-11-17T14:36:06.834783image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
power14899
 
7.5%
company8437
 
4.3%
administration6752
 
3.4%
inc5924
 
3.0%
of5737
 
2.9%
5544
 
2.8%
energy5446
 
2.8%
public5169
 
2.6%
electric3995
 
2.0%
service3611
 
1.8%
Other values (129)132369
66.9%

Most occurring characters

ValueCountFrequency (%)
147874
 
10.1%
e130337
 
8.9%
o108872
 
7.4%
n106320
 
7.3%
r100381
 
6.9%
i100358
 
6.9%
t98916
 
6.8%
a78317
 
5.4%
s49065
 
3.4%
l45560
 
3.1%
Other values (48)495440
33.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1092175
74.7%
Uppercase Letter195003
 
13.3%
Space Separator147874
 
10.1%
Other Punctuation19303
 
1.3%
Dash Punctuation4977
 
0.3%
Decimal Number2108
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e130337
11.9%
o108872
10.0%
n106320
9.7%
r100381
9.2%
i100358
9.2%
t98916
9.1%
a78317
 
7.2%
s49065
 
4.5%
l45560
 
4.2%
c41611
 
3.8%
Other values (16)232438
21.3%
Uppercase Letter
ValueCountFrequency (%)
P29427
15.1%
C27711
14.2%
S19150
9.8%
A18380
9.4%
I13902
 
7.1%
E12920
 
6.6%
D10888
 
5.6%
L8316
 
4.3%
W8067
 
4.1%
N7979
 
4.1%
Other values (13)38263
19.6%
Decimal Number
ValueCountFrequency (%)
1918
43.5%
2626
29.7%
4282
 
13.4%
8282
 
13.4%
Other Punctuation
ValueCountFrequency (%)
,9621
49.8%
.8575
44.4%
&1107
 
5.7%
Space Separator
ValueCountFrequency (%)
147874
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4977
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1287178
88.1%
Common174262
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e130337
 
10.1%
o108872
 
8.5%
n106320
 
8.3%
r100381
 
7.8%
i100358
 
7.8%
t98916
 
7.7%
a78317
 
6.1%
s49065
 
3.8%
l45560
 
3.5%
c41611
 
3.2%
Other values (39)427441
33.2%
Common
ValueCountFrequency (%)
147874
84.9%
,9621
 
5.5%
.8575
 
4.9%
-4977
 
2.9%
&1107
 
0.6%
1918
 
0.5%
2626
 
0.4%
4282
 
0.2%
8282
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1461440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
147874
 
10.1%
e130337
 
8.9%
o108872
 
7.4%
n106320
 
7.3%
r100381
 
6.9%
i100358
 
6.9%
t98916
 
6.8%
a78317
 
5.4%
s49065
 
3.4%
l45560
 
3.1%
Other values (48)495440
33.9%

toba
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct86
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
BPAT
 
2653
CISO
 
1556
AZPS
 
1397
WALC
 
1393
SWPP
 
1303
Other values (81)
41707 

Length

Max length4
Median length4
Mean length3.565078286
Min length2

Characters and Unicode

Total characters178286
Distinct characters30
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLDWP
2nd rowMISO
3rd rowAVRN
4th rowLDWP
5th rowCAR

Common Values

ValueCountFrequency (%)
BPAT2653
 
5.3%
CISO1556
 
3.1%
AZPS1397
 
2.8%
WALC1393
 
2.8%
SWPP1303
 
2.6%
DUK1247
 
2.5%
SOCO1243
 
2.5%
FPL1116
 
2.2%
PACW1104
 
2.2%
SRP1077
 
2.2%
Other values (76)35920
71.8%

Length

2022-11-17T14:36:06.987207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bpat2653
 
5.3%
ciso1556
 
3.1%
azps1397
 
2.8%
walc1393
 
2.8%
swpp1303
 
2.6%
duk1247
 
2.5%
soco1243
 
2.5%
fpl1116
 
2.2%
pacw1104
 
2.2%
srp1077
 
2.2%
Other values (76)35920
71.8%

Most occurring characters

ValueCountFrequency (%)
P24658
13.8%
C19790
11.1%
A18423
10.3%
S14227
 
8.0%
E13126
 
7.4%
W10434
 
5.9%
T8613
 
4.8%
I8407
 
4.7%
N7557
 
4.2%
M7517
 
4.2%
Other values (20)45534
25.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter178250
> 99.9%
Lowercase Letter36
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P24658
13.8%
C19790
11.1%
A18423
10.3%
S14227
 
8.0%
E13126
 
7.4%
W10434
 
5.9%
T8613
 
4.8%
I8407
 
4.7%
N7557
 
4.2%
M7517
 
4.2%
Other values (16)45498
25.5%
Lowercase Letter
ValueCountFrequency (%)
t9
25.0%
o9
25.0%
b9
25.0%
a9
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin178286
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P24658
13.8%
C19790
11.1%
A18423
10.3%
S14227
 
8.0%
E13126
 
7.4%
W10434
 
5.9%
T8613
 
4.8%
I8407
 
4.7%
N7557
 
4.2%
M7517
 
4.2%
Other values (20)45534
25.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII178286
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P24658
13.8%
C19790
11.1%
A18423
10.3%
S14227
 
8.0%
E13126
 
7.4%
W10434
 
5.9%
T8613
 
4.8%
I8407
 
4.7%
N7557
 
4.2%
M7517
 
4.2%
Other values (20)45534
25.5%

toba-name
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct86
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Bonneville Power Administration
 
2653
California Independent System Operator
 
1556
Arizona Public Service Company
 
1397
Western Area Power Administration - Desert Southwest Region
 
1393
Southwest Power Pool
 
1303
Other values (81)
41707 

Length

Max length66
Median length37
Mean length28.98570257
Min length3

Characters and Unicode

Total characters1449546
Distinct characters57
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLos Angeles Department of Water and Power
2nd rowMidcontinent Independent System Operator, Inc.
3rd rowAvangrid Renewables, LLC
4th rowLos Angeles Department of Water and Power
5th rowCarolinas

Common Values

ValueCountFrequency (%)
Bonneville Power Administration2653
 
5.3%
California Independent System Operator1556
 
3.1%
Arizona Public Service Company1397
 
2.8%
Western Area Power Administration - Desert Southwest Region1393
 
2.8%
Southwest Power Pool1303
 
2.6%
Duke Energy Carolinas1247
 
2.5%
Southern Company Services, Inc. - Trans1243
 
2.5%
Florida Power & Light Co.1116
 
2.2%
PacifiCorp West1104
 
2.2%
Salt River Project Agricultural Improvement and Power District1077
 
2.2%
Other values (76)35920
71.8%

Length

2022-11-17T14:36:07.154453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
power14820
 
7.5%
company8305
 
4.2%
administration6472
 
3.3%
of5704
 
2.9%
inc5639
 
2.9%
5506
 
2.8%
energy5428
 
2.8%
public5181
 
2.6%
electric3886
 
2.0%
service3613
 
1.8%
Other values (142)131964
67.2%

Most occurring characters

ValueCountFrequency (%)
146509
 
10.1%
e127352
 
8.8%
o108828
 
7.5%
n104556
 
7.2%
r100542
 
6.9%
i99848
 
6.9%
t96586
 
6.7%
a80742
 
5.6%
s47864
 
3.3%
l44574
 
3.1%
Other values (47)492145
34.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1083197
74.7%
Uppercase Letter194159
 
13.4%
Space Separator146509
 
10.1%
Other Punctuation18861
 
1.3%
Dash Punctuation5251
 
0.4%
Decimal Number1569
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e127352
11.8%
o108828
10.0%
n104556
9.7%
r100542
9.3%
i99848
9.2%
t96586
8.9%
a80742
 
7.5%
s47864
 
4.4%
l44574
 
4.1%
c42165
 
3.9%
Other values (16)230140
21.2%
Uppercase Letter
ValueCountFrequency (%)
P29367
15.1%
C28654
14.8%
S18484
9.5%
A18428
9.5%
E13248
 
6.8%
I13024
 
6.7%
D11018
 
5.7%
L8097
 
4.2%
W7974
 
4.1%
N7312
 
3.8%
Other values (14)38553
19.9%
Other Punctuation
ValueCountFrequency (%)
,9421
49.9%
.8324
44.1%
&1116
 
5.9%
Decimal Number
ValueCountFrequency (%)
1935
59.6%
2634
40.4%
Space Separator
ValueCountFrequency (%)
146509
100.0%
Dash Punctuation
ValueCountFrequency (%)
-5251
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1277356
88.1%
Common172190
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e127352
 
10.0%
o108828
 
8.5%
n104556
 
8.2%
r100542
 
7.9%
i99848
 
7.8%
t96586
 
7.6%
a80742
 
6.3%
s47864
 
3.7%
l44574
 
3.5%
c42165
 
3.3%
Other values (40)424299
33.2%
Common
ValueCountFrequency (%)
146509
85.1%
,9421
 
5.5%
.8324
 
4.8%
-5251
 
3.0%
&1116
 
0.6%
1935
 
0.5%
2634
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1449546
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
146509
 
10.1%
e127352
 
8.8%
o108828
 
7.5%
n104556
 
7.2%
r100542
 
6.9%
i99848
 
6.9%
t96586
 
6.7%
a80742
 
5.6%
s47864
 
3.3%
l44574
 
3.1%
Other values (47)492145
34.0%

timezone
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Eastern
10534 
Central
10142 
Mountain
9893 
Arizona
9819 
Pacific
9612 

Length

Max length8
Median length7
Mean length7.198004359
Min length7

Characters and Unicode

Total characters359965
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMountain
2nd rowEastern
3rd rowEastern
4th rowEastern
5th rowCentral

Common Values

ValueCountFrequency (%)
Eastern10534
21.1%
Central10142
20.3%
Mountain9893
19.8%
Arizona9819
19.6%
Pacific9612
19.2%
timezone9
 
< 0.1%

Length

2022-11-17T14:36:07.308281image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:36:08.489620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
eastern10534
21.1%
central10142
20.3%
mountain9893
19.8%
arizona9819
19.6%
pacific9612
19.2%
timezone9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n50290
14.0%
a50000
13.9%
i38945
10.8%
t30578
 
8.5%
r30495
 
8.5%
e20694
 
5.7%
o19721
 
5.5%
c19224
 
5.3%
E10534
 
2.9%
s10534
 
2.9%
Other values (9)78950
21.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter309965
86.1%
Uppercase Letter50000
 
13.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n50290
16.2%
a50000
16.1%
i38945
12.6%
t30578
9.9%
r30495
9.8%
e20694
6.7%
o19721
 
6.4%
c19224
 
6.2%
s10534
 
3.4%
l10142
 
3.3%
Other values (4)29342
9.5%
Uppercase Letter
ValueCountFrequency (%)
E10534
21.1%
C10142
20.3%
M9893
19.8%
A9819
19.6%
P9612
19.2%

Most occurring scripts

ValueCountFrequency (%)
Latin359965
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n50290
14.0%
a50000
13.9%
i38945
10.8%
t30578
 
8.5%
r30495
 
8.5%
e20694
 
5.7%
o19721
 
5.5%
c19224
 
5.3%
E10534
 
2.9%
s10534
 
2.9%
Other values (9)78950
21.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII359965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n50290
14.0%
a50000
13.9%
i38945
10.8%
t30578
 
8.5%
r30495
 
8.5%
e20694
 
5.7%
o19721
 
5.5%
c19224
 
5.3%
E10534
 
2.9%
s10534
 
2.9%
Other values (9)78950
21.9%

value
Categorical

HIGH CARDINALITY

Distinct21249
Distinct (%)42.5%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
0
 
994
75
 
28
-75
 
25
-54
 
23
1178
 
21
Other values (21244)
48918 

Length

Max length7
Median length6
Mean length4.502089624
Min length1

Characters and Unicode

Total characters225145
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9270 ?
Unique (%)18.5%

Sample

1st row-1224
2nd row6259
3rd row-143
4th row826
5th row538

Common Values

ValueCountFrequency (%)
0994
 
2.0%
7528
 
0.1%
-7525
 
< 0.1%
-5423
 
< 0.1%
117821
 
< 0.1%
25221
 
< 0.1%
-38821
 
< 0.1%
-37621
 
< 0.1%
7321
 
< 0.1%
5420
 
< 0.1%
Other values (21239)48814
97.6%

Length

2022-11-17T14:36:08.640443image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0994
 
2.0%
7553
 
0.1%
5443
 
0.1%
7340
 
0.1%
25839
 
0.1%
25237
 
0.1%
7436
 
0.1%
31735
 
0.1%
37632
 
0.1%
38829
 
0.1%
Other values (14244)48671
97.3%

Most occurring characters

ValueCountFrequency (%)
131335
13.9%
225694
11.4%
-24941
11.1%
321555
9.6%
419380
8.6%
518309
8.1%
617448
7.7%
017150
7.6%
716984
7.5%
816613
7.4%
Other values (6)15736
7.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number200159
88.9%
Dash Punctuation24941
 
11.1%
Lowercase Letter45
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
131335
15.7%
225694
12.8%
321555
10.8%
419380
9.7%
518309
9.1%
617448
8.7%
017150
8.6%
716984
8.5%
816613
8.3%
915691
7.8%
Lowercase Letter
ValueCountFrequency (%)
v9
20.0%
a9
20.0%
l9
20.0%
u9
20.0%
e9
20.0%
Dash Punctuation
ValueCountFrequency (%)
-24941
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common225100
> 99.9%
Latin45
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
131335
13.9%
225694
11.4%
-24941
11.1%
321555
9.6%
419380
8.6%
518309
8.1%
617448
7.8%
017150
7.6%
716984
7.5%
816613
7.4%
Latin
ValueCountFrequency (%)
v9
20.0%
a9
20.0%
l9
20.0%
u9
20.0%
e9
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII225145
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
131335
13.9%
225694
11.4%
-24941
11.1%
321555
9.6%
419380
8.6%
518309
8.1%
617448
7.7%
017150
7.6%
716984
7.5%
816613
7.4%
Other values (6)15736
7.0%

value-units
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
megawatthours
50000 
value-units
 
9

Length

Max length13
Median length13
Mean length12.99964006
Min length11

Characters and Unicode

Total characters650099
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmegawatthours
2nd rowmegawatthours
3rd rowmegawatthours
4th rowmegawatthours
5th rowmegawatthours

Common Values

ValueCountFrequency (%)
megawatthours50000
> 99.9%
value-units9
 
< 0.1%

Length

2022-11-17T14:36:08.780990image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:36:08.915229image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
megawatthours50000
> 99.9%
value-units9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (6)50045
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter650090
> 99.9%
Dash Punctuation9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (5)50036
7.7%
Dash Punctuation
ValueCountFrequency (%)
-9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin650090
> 99.9%
Common9
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (5)50036
7.7%
Common
ValueCountFrequency (%)
-9
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII650099
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (6)50045
7.7%

Interactions

2022-11-17T14:36:05.157694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-17T14:36:09.010303image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-17T14:36:09.162261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-17T14:36:09.276405image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-17T14:36:09.388559image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-17T14:36:09.516964image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-17T14:36:09.672196image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-17T14:36:05.424478image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-17T14:36:05.732150image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-17T14:36:05.966647image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0periodfrombafromba-nametobatoba-nametimezonevaluevalue-units
00.02022-11-12PACEPacifiCorp EastLDWPLos Angeles Department of Water and PowerMountain-1224megawatthours
11.02022-11-12SPASouthwestern Power AdministrationMISOMidcontinent Independent System Operator, Inc.Eastern6259megawatthours
22.02022-11-12BPATBonneville Power AdministrationAVRNAvangrid Renewables, LLCEastern-143megawatthours
33.02022-11-12CISOCalifornia Independent System OperatorLDWPLos Angeles Department of Water and PowerEastern826megawatthours
44.02022-11-12TENTennesseeCARCarolinasCentral538megawatthours
55.02022-11-12PSEIPuget Sound Energy, Inc.TPWRCity of Tacoma, Department of Public Utilities, Light DivisionArizona-1313megawatthours
66.02022-11-12TECTampa Electric CompanySECSeminole Electric CooperativeEastern-3341megawatthours
77.02022-11-12IIDImperial Irrigation DistrictWALCWestern Area Power Administration - Desert Southwest RegionCentral423megawatthours
88.02022-11-12PACEPacifiCorp EastNWMTNorthWestern CorporationArizona5824megawatthours
99.02022-11-12TIDCTurlock Irrigation DistrictCISOCalifornia Independent System OperatorArizona-7262megawatthours

Last rows

Unnamed: 0periodfrombafromba-nametobatoba-nametimezonevaluevalue-units
499994990.02022-10-13SCSouth Carolina Public Service AuthoritySOCOSouthern Company Services, Inc. - TransArizona-2615megawatthours
500004991.02022-10-13TECTampa Electric CompanyFPCDuke Energy Florida, Inc.Arizona-9362megawatthours
500014992.02022-10-13FMPPFlorida Municipal Power PoolFPCDuke Energy Florida, Inc.Arizona10975megawatthours
500024993.02022-10-13SOCOSouthern Company Services, Inc. - TransFPCDuke Energy Florida, Inc.Arizona61megawatthours
500034994.02022-10-13SWPPSouthwest Power PoolSPASouthwestern Power AdministrationPacific250megawatthours
500044995.02022-10-13MIDAMid-AtlanticMIDWMidwestArizona84871megawatthours
500054996.02022-10-13FPLFlorida Power & Light Co.TECTampa Electric CompanyPacific-5727megawatthours
500064997.02022-10-13DUKDuke Energy CarolinasCPLWDuke Energy Progress WestPacific3278megawatthours
500074998.02022-10-13NYNew YorkNENew EnglandPacific9168megawatthours
500084999.02022-10-13FPLFlorida Power & Light Co.TECTampa Electric CompanyCentral-5857megawatthours

Duplicate rows

Most frequently occurring

Unnamed: 0periodfrombafromba-nametobatoba-nametimezonevaluevalue-units# duplicates
0NaNperiodfrombafromba-nametobatoba-nametimezonevaluevalue-units9