Overview

Dataset statistics

Number of variables10
Number of observations50009
Missing cells9
Missing cells (%)< 0.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory3.8 MiB
Average record size in memory80.0 B

Variable types

Numeric1
Categorical9

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
respondent has a high cardinality: 77 distinct values High cardinality
respondent-name has a high cardinality: 77 distinct values High cardinality
value has a high cardinality: 19112 distinct values High cardinality
Unnamed: 0 is highly correlated with periodHigh correlation
period is highly correlated with Unnamed: 0 and 7 other fieldsHigh correlation
respondent is highly correlated with period and 6 other fieldsHigh correlation
respondent-name is highly correlated with period and 6 other fieldsHigh correlation
fueltype is highly correlated with period and 6 other fieldsHigh correlation
type-name is highly correlated with period and 6 other fieldsHigh correlation
timezone is highly correlated with period and 6 other fieldsHigh correlation
timezone-description is highly correlated with period and 6 other fieldsHigh correlation
value-units is highly correlated with period and 6 other fieldsHigh correlation
Unnamed: 0 is uniformly distributed Uniform

Reproduction

Analysis started2022-11-17 22:35:52.894493
Analysis finished2022-11-17 22:35:57.080879
Duration4.19 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM

Distinct5000
Distinct (%)10.0%
Missing9
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean2499.5
Minimum0
Maximum4999
Zeros10
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2022-11-17T14:35:57.192455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile249.95
Q11249.75
median2499.5
Q33749.25
95-th percentile4749.05
Maximum4999
Range4999
Interquartile range (IQR)2499.5

Descriptive statistics

Standard deviation1443.390078
Coefficient of variation (CV)0.5774715255
Kurtosis-1.200000095
Mean2499.5
Median Absolute Deviation (MAD)1250
Skewness0
Sum124975000
Variance2083374.917
MonotonicityNot monotonic
2022-11-17T14:35:57.362818image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010
 
< 0.1%
333110
 
< 0.1%
333810
 
< 0.1%
333710
 
< 0.1%
333610
 
< 0.1%
333510
 
< 0.1%
333410
 
< 0.1%
333310
 
< 0.1%
333210
 
< 0.1%
333010
 
< 0.1%
Other values (4990)49900
99.8%
ValueCountFrequency (%)
010
< 0.1%
110
< 0.1%
210
< 0.1%
310
< 0.1%
410
< 0.1%
510
< 0.1%
610
< 0.1%
710
< 0.1%
810
< 0.1%
910
< 0.1%
ValueCountFrequency (%)
499910
< 0.1%
499810
< 0.1%
499710
< 0.1%
499610
< 0.1%
499510
< 0.1%
499410
< 0.1%
499310
< 0.1%
499210
< 0.1%
499110
< 0.1%
499010
< 0.1%

period
Categorical

HIGH CORRELATION

Distinct28
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
2022-10-31
 
1945
2022-10-22
 
1945
2022-10-30
 
1944
2022-10-21
 
1944
2022-10-23
 
1941
Other values (23)
40290 

Length

Max length10
Median length10
Mean length9.99928013
Min length6

Characters and Unicode

Total characters500054
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-11-13
2nd row2022-11-13
3rd row2022-11-13
4th row2022-11-13
5th row2022-11-13

Common Values

ValueCountFrequency (%)
2022-10-311945
 
3.9%
2022-10-221945
 
3.9%
2022-10-301944
 
3.9%
2022-10-211944
 
3.9%
2022-10-231941
 
3.9%
2022-10-191940
 
3.9%
2022-11-101940
 
3.9%
2022-11-111940
 
3.9%
2022-10-241940
 
3.9%
2022-10-251940
 
3.9%
Other values (18)30590
61.2%

Length

2022-11-17T14:35:57.517085image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-10-311945
 
3.9%
2022-10-221945
 
3.9%
2022-10-301944
 
3.9%
2022-10-211944
 
3.9%
2022-10-231941
 
3.9%
2022-10-291940
 
3.9%
2022-11-081940
 
3.9%
2022-11-071940
 
3.9%
2022-11-051940
 
3.9%
2022-11-041940
 
3.9%
Other values (18)30590
61.2%

Most occurring characters

ValueCountFrequency (%)
2175165
35.0%
-100000
20.0%
099065
19.8%
191229
18.2%
38779
 
1.8%
95797
 
1.2%
84447
 
0.9%
43880
 
0.8%
53880
 
0.8%
73880
 
0.8%
Other values (7)3932
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number400000
80.0%
Dash Punctuation100000
 
20.0%
Lowercase Letter54
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2175165
43.8%
099065
24.8%
191229
22.8%
38779
 
2.2%
95797
 
1.4%
84447
 
1.1%
43880
 
1.0%
53880
 
1.0%
73880
 
1.0%
63878
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
p9
16.7%
e9
16.7%
r9
16.7%
i9
16.7%
o9
16.7%
d9
16.7%
Dash Punctuation
ValueCountFrequency (%)
-100000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common500000
> 99.9%
Latin54
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2175165
35.0%
-100000
20.0%
099065
19.8%
191229
18.2%
38779
 
1.8%
95797
 
1.2%
84447
 
0.9%
43880
 
0.8%
53880
 
0.8%
73880
 
0.8%
Latin
ValueCountFrequency (%)
p9
16.7%
e9
16.7%
r9
16.7%
i9
16.7%
o9
16.7%
d9
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII500054
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2175165
35.0%
-100000
20.0%
099065
19.8%
191229
18.2%
38779
 
1.8%
95797
 
1.2%
84447
 
0.9%
43880
 
0.8%
53880
 
0.8%
73880
 
0.8%
Other values (7)3932
 
0.8%

respondent
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
ISNE
 
1055
NY
 
1050
NE
 
1048
SOCO
 
1045
US48
 
1041
Other values (72)
44770 

Length

Max length10
Median length4
Mean length3.443020256
Min length2

Characters and Unicode

Total characters172182
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPACE
2nd rowPACE
3rd rowIID
4th rowIPCO
5th rowIID

Common Values

ValueCountFrequency (%)
ISNE1055
 
2.1%
NY1050
 
2.1%
NE1048
 
2.1%
SOCO1045
 
2.1%
US481041
 
2.1%
SWPP1039
 
2.1%
TEN1039
 
2.1%
SW1038
 
2.1%
MIDA1037
 
2.1%
NEVP1037
 
2.1%
Other values (67)39580
79.1%

Length

2022-11-17T14:35:57.657651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
isne1055
 
2.1%
ny1050
 
2.1%
ne1048
 
2.1%
soco1045
 
2.1%
us481041
 
2.1%
swpp1039
 
2.1%
ten1039
 
2.1%
sw1038
 
2.1%
tva1037
 
2.1%
nevp1037
 
2.1%
Other values (67)39580
79.1%

Most occurring characters

ValueCountFrequency (%)
P19309
11.2%
C17280
 
10.0%
E16601
 
9.6%
S15346
 
8.9%
A14152
 
8.2%
N10870
 
6.3%
I9880
 
5.7%
W8691
 
5.0%
T7588
 
4.4%
M6901
 
4.0%
Other values (25)45564
26.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter170010
98.7%
Decimal Number2082
 
1.2%
Lowercase Letter90
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P19309
11.4%
C17280
 
10.2%
E16601
 
9.8%
S15346
 
9.0%
A14152
 
8.3%
N10870
 
6.4%
I9880
 
5.8%
W8691
 
5.1%
T7588
 
4.5%
M6901
 
4.1%
Other values (15)43392
25.5%
Lowercase Letter
ValueCountFrequency (%)
e18
20.0%
n18
20.0%
r9
10.0%
s9
10.0%
p9
10.0%
o9
10.0%
d9
10.0%
t9
10.0%
Decimal Number
ValueCountFrequency (%)
81041
50.0%
41041
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin170100
98.8%
Common2082
 
1.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
P19309
11.4%
C17280
 
10.2%
E16601
 
9.8%
S15346
 
9.0%
A14152
 
8.3%
N10870
 
6.4%
I9880
 
5.8%
W8691
 
5.1%
T7588
 
4.5%
M6901
 
4.1%
Other values (23)43482
25.6%
Common
ValueCountFrequency (%)
81041
50.0%
41041
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII172182
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P19309
11.2%
C17280
 
10.0%
E16601
 
9.6%
S15346
 
8.9%
A14152
 
8.2%
N10870
 
6.3%
I9880
 
5.7%
W8691
 
5.0%
T7588
 
4.4%
M6901
 
4.0%
Other values (25)45564
26.5%

respondent-name
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
ISO New England
 
1055
New York
 
1050
New England
 
1048
Southern Company Services, Inc. - Trans
 
1045
United States Lower 48
 
1041
Other values (72)
44770 

Length

Max length66
Median length39
Mean length24.83426983
Min length3

Characters and Unicode

Total characters1241937
Distinct characters58
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPacifiCorp East
2nd rowPacifiCorp East
3rd rowImperial Irrigation District
4th rowIdaho Power Company
5th rowImperial Irrigation District

Common Values

ValueCountFrequency (%)
ISO New England1055
 
2.1%
New York1050
 
2.1%
New England1048
 
2.1%
Southern Company Services, Inc. - Trans1045
 
2.1%
United States Lower 481041
 
2.1%
Southwest Power Pool1039
 
2.1%
Tennessee1039
 
2.1%
Southwest1038
 
2.1%
Mid-Atlantic1037
 
2.1%
Nevada Power Company1037
 
2.1%
Other values (67)39580
79.1%

Length

2022-11-17T14:35:57.821690image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
power9222
 
5.4%
company8145
 
4.8%
inc5764
 
3.4%
energy5458
 
3.2%
new5081
 
3.0%
of4810
 
2.8%
electric4412
 
2.6%
public3852
 
2.2%
service3334
 
1.9%
duke3134
 
1.8%
Other values (129)118059
68.9%

Most occurring characters

ValueCountFrequency (%)
121262
 
9.8%
e113206
 
9.1%
n88026
 
7.1%
o87515
 
7.0%
r82035
 
6.6%
t78090
 
6.3%
i77031
 
6.2%
a71405
 
5.7%
l43626
 
3.5%
s40955
 
3.3%
Other values (48)438786
35.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter924372
74.4%
Uppercase Letter173763
 
14.0%
Space Separator121262
 
9.8%
Other Punctuation16439
 
1.3%
Dash Punctuation3504
 
0.3%
Decimal Number2597
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e113206
12.2%
n88026
9.5%
o87515
9.5%
r82035
8.9%
t78090
 
8.4%
i77031
 
8.3%
a71405
 
7.7%
l43626
 
4.7%
s40955
 
4.4%
c38437
 
4.2%
Other values (16)204046
22.1%
Uppercase Letter
ValueCountFrequency (%)
C25006
14.4%
P21879
12.6%
S18315
10.5%
E14832
8.5%
I13711
 
7.9%
A12179
 
7.0%
N9180
 
5.3%
L8607
 
5.0%
D7825
 
4.5%
T6842
 
3.9%
Other values (13)35387
20.4%
Decimal Number
ValueCountFrequency (%)
81041
40.1%
41041
40.1%
1387
 
14.9%
2128
 
4.9%
Other Punctuation
ValueCountFrequency (%)
,8860
53.9%
.6929
42.1%
&650
 
4.0%
Space Separator
ValueCountFrequency (%)
121262
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3504
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1098135
88.4%
Common143802
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e113206
 
10.3%
n88026
 
8.0%
o87515
 
8.0%
r82035
 
7.5%
t78090
 
7.1%
i77031
 
7.0%
a71405
 
6.5%
l43626
 
4.0%
s40955
 
3.7%
c38437
 
3.5%
Other values (39)377809
34.4%
Common
ValueCountFrequency (%)
121262
84.3%
,8860
 
6.2%
.6929
 
4.8%
-3504
 
2.4%
81041
 
0.7%
41041
 
0.7%
&650
 
0.5%
1387
 
0.3%
2128
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1241937
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
121262
 
9.8%
e113206
 
9.1%
n88026
 
7.1%
o87515
 
7.0%
r82035
 
6.6%
t78090
 
6.3%
i77031
 
6.2%
a71405
 
5.7%
l43626
 
3.5%
s40955
 
3.3%
Other values (48)438786
35.3%

fueltype
Categorical

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
NG
8356 
WAT
7560 
SUN
7356 
OTH
6452 
COL
6318 
Other values (4)
13967 

Length

Max length8
Median length3
Mean length2.833809914
Min length2

Characters and Unicode

Total characters141716
Distinct characters20
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNG
2nd rowNG
3rd rowNG
4th rowWAT
5th rowNG

Common Values

ValueCountFrequency (%)
NG8356
16.7%
WAT7560
15.1%
SUN7356
14.7%
OTH6452
12.9%
COL6318
12.6%
WND5665
11.3%
OIL4159
8.3%
NUC4134
8.3%
fueltype9
 
< 0.1%

Length

2022-11-17T14:35:57.971096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:35:58.122582image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ng8356
16.7%
wat7560
15.1%
sun7356
14.7%
oth6452
12.9%
col6318
12.6%
wnd5665
11.3%
oil4159
8.3%
nuc4134
8.3%
fueltype9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N25511
18.0%
O16929
11.9%
T14012
9.9%
W13225
9.3%
U11490
8.1%
L10477
7.4%
C10452
7.4%
G8356
 
5.9%
A7560
 
5.3%
S7356
 
5.2%
Other values (10)16348
11.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter141644
99.9%
Lowercase Letter72
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N25511
18.0%
O16929
12.0%
T14012
9.9%
W13225
9.3%
U11490
8.1%
L10477
7.4%
C10452
7.4%
G8356
 
5.9%
A7560
 
5.3%
S7356
 
5.2%
Other values (3)16276
11.5%
Lowercase Letter
ValueCountFrequency (%)
e18
25.0%
f9
12.5%
u9
12.5%
l9
12.5%
t9
12.5%
y9
12.5%
p9
12.5%

Most occurring scripts

ValueCountFrequency (%)
Latin141716
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N25511
18.0%
O16929
11.9%
T14012
9.9%
W13225
9.3%
U11490
8.1%
L10477
7.4%
C10452
7.4%
G8356
 
5.9%
A7560
 
5.3%
S7356
 
5.2%
Other values (10)16348
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII141716
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N25511
18.0%
O16929
11.9%
T14012
9.9%
W13225
9.3%
U11490
8.1%
L10477
7.4%
C10452
7.4%
G8356
 
5.9%
A7560
 
5.3%
S7356
 
5.2%
Other values (10)16348
11.5%

type-name
Categorical

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Natural gas
8356 
Hydro
7560 
Solar
7356 
Other
6452 
Coal
6318 
Other values (4)
13967 

Length

Max length11
Median length9
Mean length6.261632906
Min length4

Characters and Unicode

Total characters313138
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNatural gas
2nd rowNatural gas
3rd rowNatural gas
4th rowHydro
5th rowNatural gas

Common Values

ValueCountFrequency (%)
Natural gas8356
16.7%
Hydro7560
15.1%
Solar7356
14.7%
Other6452
12.9%
Coal6318
12.6%
Wind5665
11.3%
Petroleum4159
8.3%
Nuclear4134
8.3%
type-name9
 
< 0.1%

Length

2022-11-17T14:35:58.288082image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:35:58.450697image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
natural8356
14.3%
gas8356
14.3%
hydro7560
13.0%
solar7356
12.6%
other6452
11.1%
coal6318
10.8%
wind5665
9.7%
petroleum4159
7.1%
nuclear4134
7.1%
type-name9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a42885
13.7%
r38017
12.1%
l30323
 
9.7%
o25393
 
8.1%
t18976
 
6.1%
e18922
 
6.0%
u16649
 
5.3%
d13225
 
4.2%
N12490
 
4.0%
8356
 
2.7%
Other values (16)87902
28.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter254773
81.4%
Uppercase Letter50000
 
16.0%
Space Separator8356
 
2.7%
Dash Punctuation9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a42885
16.8%
r38017
14.9%
l30323
11.9%
o25393
10.0%
t18976
7.4%
e18922
7.4%
u16649
 
6.5%
d13225
 
5.2%
g8356
 
3.3%
s8356
 
3.3%
Other values (7)33671
13.2%
Uppercase Letter
ValueCountFrequency (%)
N12490
25.0%
H7560
15.1%
S7356
14.7%
O6452
12.9%
C6318
12.6%
W5665
11.3%
P4159
 
8.3%
Space Separator
ValueCountFrequency (%)
8356
100.0%
Dash Punctuation
ValueCountFrequency (%)
-9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin304773
97.3%
Common8365
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a42885
14.1%
r38017
12.5%
l30323
 
9.9%
o25393
 
8.3%
t18976
 
6.2%
e18922
 
6.2%
u16649
 
5.5%
d13225
 
4.3%
N12490
 
4.1%
g8356
 
2.7%
Other values (14)79537
26.1%
Common
ValueCountFrequency (%)
8356
99.9%
-9
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII313138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a42885
13.7%
r38017
12.1%
l30323
 
9.7%
o25393
 
8.1%
t18976
 
6.1%
e18922
 
6.0%
u16649
 
5.3%
d13225
 
4.2%
N12490
 
4.0%
8356
 
2.7%
Other values (16)87902
28.1%

timezone
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Pacific
10064 
Mountain
10020 
Arizona
10015 
Eastern
9953 
Central
9948 

Length

Max length8
Median length7
Mean length7.200543902
Min length7

Characters and Unicode

Total characters360092
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMountain
2nd rowEastern
3rd rowMountain
4th rowPacific
5th rowCentral

Common Values

ValueCountFrequency (%)
Pacific10064
20.1%
Mountain10020
20.0%
Arizona10015
20.0%
Eastern9953
19.9%
Central9948
19.9%
timezone9
 
< 0.1%

Length

2022-11-17T14:35:58.614007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:35:58.758289image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
pacific10064
20.1%
mountain10020
20.0%
arizona10015
20.0%
eastern9953
19.9%
central9948
19.9%
timezone9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a50000
13.9%
n49965
13.9%
i40172
11.2%
t29930
 
8.3%
r29916
 
8.3%
c20128
 
5.6%
o20044
 
5.6%
e19919
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (9)79890
22.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter310092
86.1%
Uppercase Letter50000
 
13.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a50000
16.1%
n49965
16.1%
i40172
13.0%
t29930
9.7%
r29916
9.6%
c20128
6.5%
o20044
6.5%
e19919
 
6.4%
f10064
 
3.2%
z10024
 
3.2%
Other values (4)29930
9.7%
Uppercase Letter
ValueCountFrequency (%)
P10064
20.1%
M10020
20.0%
A10015
20.0%
E9953
19.9%
C9948
19.9%

Most occurring scripts

ValueCountFrequency (%)
Latin360092
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a50000
13.9%
n49965
13.9%
i40172
11.2%
t29930
 
8.3%
r29916
 
8.3%
c20128
 
5.6%
o20044
 
5.6%
e19919
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (9)79890
22.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII360092
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a50000
13.9%
n49965
13.9%
i40172
11.2%
t29930
 
8.3%
r29916
 
8.3%
c20128
 
5.6%
o20044
 
5.6%
e19919
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (9)79890
22.2%

timezone-description
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
Pacific
10064 
Mountain
10020 
Arizona
10015 
Eastern
9953 
Central
9948 

Length

Max length20
Median length7
Mean length7.202703513
Min length7

Characters and Unicode

Total characters360200
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMountain
2nd rowEastern
3rd rowMountain
4th rowPacific
5th rowCentral

Common Values

ValueCountFrequency (%)
Pacific10064
20.1%
Mountain10020
20.0%
Arizona10015
20.0%
Eastern9953
19.9%
Central9948
19.9%
timezone-description9
 
< 0.1%

Length

2022-11-17T14:35:58.902745image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:35:59.048402image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
pacific10064
20.1%
mountain10020
20.0%
arizona10015
20.0%
eastern9953
19.9%
central9948
19.9%
timezone-description9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a50000
13.9%
n49974
13.9%
i40190
11.2%
t29939
 
8.3%
r29925
 
8.3%
c20137
 
5.6%
o20053
 
5.6%
e19928
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (12)79926
22.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter310191
86.1%
Uppercase Letter50000
 
13.9%
Dash Punctuation9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a50000
16.1%
n49974
16.1%
i40190
13.0%
t29939
9.7%
r29925
9.6%
c20137
6.5%
o20053
6.5%
e19928
 
6.4%
f10064
 
3.2%
z10024
 
3.2%
Other values (6)29957
9.7%
Uppercase Letter
ValueCountFrequency (%)
P10064
20.1%
M10020
20.0%
A10015
20.0%
E9953
19.9%
C9948
19.9%
Dash Punctuation
ValueCountFrequency (%)
-9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin360191
> 99.9%
Common9
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a50000
13.9%
n49974
13.9%
i40190
11.2%
t29939
 
8.3%
r29925
 
8.3%
c20137
 
5.6%
o20053
 
5.6%
e19928
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (11)79917
22.2%
Common
ValueCountFrequency (%)
-9
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII360200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a50000
13.9%
n49974
13.9%
i40190
11.2%
t29939
 
8.3%
r29925
 
8.3%
c20137
 
5.6%
o20053
 
5.6%
e19928
 
5.5%
P10064
 
2.8%
f10064
 
2.8%
Other values (12)79926
22.2%

value
Categorical

HIGH CARDINALITY

Distinct19112
Distinct (%)38.2%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
0
4971 
-3
 
231
24
 
112
-48
 
108
48
 
94
Other values (19107)
44493 

Length

Max length7
Median length6
Mean length4.114399408
Min length1

Characters and Unicode

Total characters205757
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8656 ?
Unique (%)17.3%

Sample

1st row35238
2nd row35360
3rd row2508
4th row11272
5th row2476

Common Values

ValueCountFrequency (%)
04971
 
9.9%
-3231
 
0.5%
24112
 
0.2%
-48108
 
0.2%
4894
 
0.2%
-284
 
0.2%
-1176
 
0.2%
174
 
0.1%
471
 
0.1%
-1065
 
0.1%
Other values (19102)44123
88.2%

Length

2022-11-17T14:35:59.199449image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
04971
 
9.9%
3272
 
0.5%
48202
 
0.4%
2132
 
0.3%
1121
 
0.2%
24112
 
0.2%
4104
 
0.2%
10104
 
0.2%
14104
 
0.2%
11103
 
0.2%
Other values (18928)43784
87.6%

Most occurring characters

ValueCountFrequency (%)
130006
14.6%
224380
11.8%
021525
10.5%
320895
10.2%
419862
9.7%
518399
8.9%
618174
8.8%
718002
8.7%
816588
8.1%
916487
8.0%
Other values (6)1439
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number204318
99.3%
Dash Punctuation1394
 
0.7%
Lowercase Letter45
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
130006
14.7%
224380
11.9%
021525
10.5%
320895
10.2%
419862
9.7%
518399
9.0%
618174
8.9%
718002
8.8%
816588
8.1%
916487
8.1%
Lowercase Letter
ValueCountFrequency (%)
v9
20.0%
a9
20.0%
l9
20.0%
u9
20.0%
e9
20.0%
Dash Punctuation
ValueCountFrequency (%)
-1394
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common205712
> 99.9%
Latin45
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
130006
14.6%
224380
11.9%
021525
10.5%
320895
10.2%
419862
9.7%
518399
8.9%
618174
8.8%
718002
8.8%
816588
8.1%
916487
8.0%
Latin
ValueCountFrequency (%)
v9
20.0%
a9
20.0%
l9
20.0%
u9
20.0%
e9
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII205757
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
130006
14.6%
224380
11.8%
021525
10.5%
320895
10.2%
419862
9.7%
518399
8.9%
618174
8.8%
718002
8.7%
816588
8.1%
916487
8.0%
Other values (6)1439
 
0.7%

value-units
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.8 KiB
megawatthours
50000 
value-units
 
9

Length

Max length13
Median length13
Mean length12.99964006
Min length11

Characters and Unicode

Total characters650099
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmegawatthours
2nd rowmegawatthours
3rd rowmegawatthours
4th rowmegawatthours
5th rowmegawatthours

Common Values

ValueCountFrequency (%)
megawatthours50000
> 99.9%
value-units9
 
< 0.1%

Length

2022-11-17T14:35:59.338360image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-17T14:35:59.469548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
megawatthours50000
> 99.9%
value-units9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (6)50045
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter650090
> 99.9%
Dash Punctuation9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (5)50036
7.7%
Dash Punctuation
ValueCountFrequency (%)
-9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin650090
> 99.9%
Common9
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (5)50036
7.7%
Common
ValueCountFrequency (%)
-9
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII650099
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a100009
15.4%
t100009
15.4%
u50018
7.7%
e50009
7.7%
s50009
7.7%
m50000
7.7%
g50000
7.7%
w50000
7.7%
h50000
7.7%
o50000
7.7%
Other values (6)50045
7.7%

Interactions

2022-11-17T14:35:56.059275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-17T14:35:59.569252image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-17T14:35:59.730763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-17T14:35:59.843687image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-17T14:35:59.955197image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-17T14:36:00.082854image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-17T14:36:00.251546image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-17T14:35:56.346275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-17T14:35:56.689517image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-17T14:35:56.955498image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0periodrespondentrespondent-namefueltypetype-nametimezonetimezone-descriptionvaluevalue-units
00.02022-11-13PACEPacifiCorp EastNGNatural gasMountainMountain35238megawatthours
11.02022-11-13PACEPacifiCorp EastNGNatural gasEasternEastern35360megawatthours
22.02022-11-13IIDImperial Irrigation DistrictNGNatural gasMountainMountain2508megawatthours
33.02022-11-13IPCOIdaho Power CompanyWATHydroPacificPacific11272megawatthours
44.02022-11-13IIDImperial Irrigation DistrictNGNatural gasCentralCentral2476megawatthours
55.02022-11-13CPLWDuke Energy Progress WestCOLCoalEasternEastern0megawatthours
66.02022-11-13SCLSeattle City LightWATHydroArizonaArizona12100megawatthours
77.02022-11-13BPATBonneville Power AdministrationWNDWindPacificPacific1263megawatthours
88.02022-11-13CISOCalifornia Independent System OperatorWATHydroMountainMountain19652megawatthours
99.02022-11-13SCSouth Carolina Public Service AuthorityOILPetroleumEasternEastern-2megawatthours

Last rows

Unnamed: 0periodrespondentrespondent-namefueltypetype-nametimezonetimezone-descriptionvaluevalue-units
499994990.02022-10-18JEAJEASUNSolarEasternEastern247megawatthours
500004991.02022-10-18AECIAssociated Electric Cooperative, Inc.COLCoalCentralCentral33982megawatthours
500014992.02022-10-18ISNEISO New EnglandOILPetroleumEasternEastern573megawatthours
500024993.02022-10-18FPLFlorida Power & Light Co.SUNSolarEasternEastern18041megawatthours
500034994.02022-10-18AZPSArizona Public Service CompanyWNDWindArizonaArizona330megawatthours
500044995.02022-10-18TENTennesseeOTHOtherCentralCentral162megawatthours
500054996.02022-10-18AZPSArizona Public Service CompanyWATHydroCentralCentral0megawatthours
500064997.02022-10-18NEVPNevada Power CompanySUNSolarMountainMountain13370megawatthours
500074998.02022-10-18PACWPacifiCorp WestWNDWindEasternEastern-12megawatthours
500084999.02022-10-18PSCOPublic Service Company of ColoradoNGNatural gasMountainMountain43363megawatthours

Duplicate rows

Most frequently occurring

Unnamed: 0periodrespondentrespondent-namefueltypetype-nametimezonetimezone-descriptionvaluevalue-units# duplicates
0NaNperiodrespondentrespondent-namefueltypetype-nametimezonetimezone-descriptionvaluevalue-units9