EDA report

Mode: Single

single dataset

Overview

Snapshot
Rows
1309
Columns
24
Duplicates
0

Dataset Summary

Composition

Counts

Rows 1309
Columns 24
Duplicate rows 0
Memory 302.5K

Type counts

Numeric 8
Categorical 6
Text 4
Datetime 2
Boolean 1

Variables

Profiles

pclass

numeric

numeric
Values1309
Missing0 (0.0000)
Distinct3
Zeroes0
Stats
Min1
Max3
Mean2.2949
Median3
Std0.8375
Variance0.7014
IQR1
Outlier rate0
Quantiles
0.25002
0.50003
0.75003
Histogram
Most frequent values
3709
1323
2277
Smallest values
1323
2277
3709
Largest values
3709
2277
1323

survived

boolean

boolean
Values1309
Missing0 (0.0000)
Distinct2
Zeroes0
Histogram
Top categories
False809
True500

name

text

text
Values1309
Missing0 (0.0000)
Distinct1307
Zeroes0
Length histogram
Most frequent values
Connolly, Miss. Kate
Count: 2
Kelly, Mr. James
Count: 2
Abbing, Mr. Anthony
Count: 1
Abbott, Master. Eugene Joseph
Count: 1
Abbott, Mr. Rossmore Edward
Count: 1
Abbott, Mrs. Stanton (Rosa Hunt)
Count: 1
Abelseth, Miss. Karen Marie
Count: 1
Abelseth, Mr. Olaus Jorgensen
Count: 1
Abelson, Mr. Samuel
Count: 1
Abelson, Mrs. Samuel (Hannah Wizosky)
Count: 1
Text length stats
Mean27.131
Median25
Min12
Max82

sex

categorical

categorical
Values1309
Missing0 (0.0000)
Distinct2
Zeroes0
Histogram
Top categories
male843
female466

age

numeric

numeric
Values1046
Missing263 (0.2009)
Distinct99
Zeroes0
Stats
Min0.1700
Max80.000
Mean29.881
Median28.000
Std14.407
Variance207.55
IQR18.000
Outlier rate0.0086
Quantiles
0.250021.000
0.500028.000
0.750039.000
Histogram
Most frequent values
24.00047
22.00043
21.00041
30.00040
18.00039
25.00034
28.00032
36.00031
26.00030
27.00030
Smallest values
0.17001
0.33001
0.42001
0.67001
0.75003
Largest values
80.0001
76.0001
74.0001
71.0002
70.5001

sibsp

numeric

numeric
Values1309
Missing0 (0.0000)
Distinct7
Zeroes891
Stats
Min0
Max8
Mean0.4989
Median0
Std1.0413
Variance1.0842
IQR1
Outlier rate0.0435
Quantiles
0.25000
0.50000
0.75001
Histogram
Most frequent values
0891
1319
242
422
320
89
56
Smallest values
0891
1319
242
320
422
Largest values
89
56
422
320
242

parch

numeric

numeric
Values1309
Missing0 (0.0000)
Distinct8
Zeroes1002
Stats
Min0
Max9
Mean0.3850
Median0
Std0.8652
Variance0.7486
IQR0
Outlier rate0
Quantiles
0.25000
0.50000
0.75000
Histogram
Most frequent values
01002
1170
2113
38
46
56
62
92
Smallest values
01002
1170
2113
38
46
Largest values
92
62
56
46
38

ticket

text

text
Values1309
Missing0 (0.0000)
Distinct929
Zeroes0
Length histogram
Most frequent values
CA. 2343
Count: 11
1601
Count: 8
CA 2144
Count: 8
3101295
Count: 7
347077
Count: 7
347082
Count: 7
PC 17608
Count: 7
S.O.C. 14879
Count: 7
113781
Count: 6
19950
Count: 6
Text length stats
Mean6.7907
Median6
Min3
Max18

fare

numeric

numeric
Values1308
Missing1 (0.0008)
Distinct282
Zeroes17
Stats
Min0.0000
Max512.33
Mean33.295
Median14.454
Std51.739
Variance2676.9
IQR23.379
Outlier rate0.1307
Quantiles
0.25007.8958
0.500014.454
0.750031.275
Histogram
Most frequent values
8.050060
13.00059
7.750055
26.00050
7.895849
10.50035
7.775026
7.229224
7.925023
26.55022
Smallest values
0.000017
3.17081
4.01251
5.00001
6.23751
Largest values
512.334
263.006
262.387
247.523
227.535

cabin

text

text
Values295
Missing1014 (0.7746)
Distinct187
Zeroes0
Length histogram
Most frequent values
C23 C25 C27
Count: 6
B57 B59 B63 B66
Count: 5
G6
Count: 5
B96 B98
Count: 4
C22 C26
Count: 4
C78
Count: 4
D
Count: 4
F2
Count: 4
F33
Count: 4
F4
Count: 4
Text length stats
Mean3.7390
Median3
Min1
Max15

embarked

categorical

categorical
Values1307
Missing2 (0.0015)
Distinct4
Zeroes0
Histogram
Top categories
S914
C270
Q123

boat

categorical

categorical
Values486
Missing823 (0.6287)
Distinct28
Zeroes0
Histogram
Top categories
1339
C38
1537
1433
431
1029
527
326
1125
925

body

numeric

numeric
Values121
Missing1188 (0.9076)
Distinct122
Zeroes0
Stats
Min1
Max328
Mean160.81
Median155
Std97.292
Variance9465.8
IQR184
Outlier rate0
Quantiles
0.250072
0.5000155
0.7500256
Histogram
Most frequent values
11
41
71
91
141
151
161
171
181
191
Smallest values
11
41
71
91
141
Largest values
3281
3271
3221
3141
3121

home.dest

text

text
Values745
Missing564 (0.4309)
Distinct370
Zeroes0
Length histogram
Most frequent values
New York, NY
Count: 64
London
Count: 14
Montreal, PQ
Count: 10
Cornwall / Akron, OH
Count: 9
Paris, France
Count: 9
Philadelphia, PA
Count: 8
Wiltshire, England Niagara Falls, NY
Count: 8
Winnipeg, MB
Count: 8
Belfast
Count: 7
Brooklyn, NY
Count: 7
Text length stats
Mean19.165
Median17
Min5
Max50

noon_time

datetime

datetime
Values1309
Missing0 (0.0000)
Distinct1
Zeroes0
Histogram
Datetime range
Min12:00:00
Max12:00:00
Most frequent values
12:00:001309

always_null

categorical

categorical
Values0
Missing1309 (1.0000)
Distinct1
Zeroes0

birthdate

datetime

datetime
Values1046
Missing263 (0.2009)
Distinct99
Zeroes0
Histogram
Datetime range
Min1832-04-13
Max1912-02-11
Most frequent values
1888-04-1347
1890-04-1343
1891-04-1341
1882-04-1340
1894-04-1339
1887-04-1334
1884-04-1332
1876-04-1331
1883-04-1330
1885-04-1330

age_duration

numeric

numeric
Values1309
Missing0 (0.0000)
Distinct73
Zeroes275
Stats
Min0.0000
Max6912.0
Mean2061.4
Median2073.6
Std1519.5
Variance2308805
IQR2419.2
Outlier rate0.0008
Quantiles
0.2500604.80
0.50002073.6
0.75003024.0
Histogram
Most frequent values
0.0000275
2073.648
1900.844
1555.242
2592.042
1814.441
2419.235
2160.034
3110.433
2246.431
Smallest values
0.0000275
86.40010
172.8012
259.207
345.6010
Largest values
6912.01
6566.41
6393.61
6134.42
6048.03

name_binary

numeric

numeric
Values1309
Missing0 (0.0000)
Distinct1307
Zeroes0
Stats
Min12
Max82
Mean27.131
Median25
Std9.5029
Variance90.305
IQR10
Outlier rate0.0672
Quantiles
0.250020
0.500025
0.750030
Histogram
Most frequent values
2583
1982
1875
2673
2770
2469
2066
2865
1763
2158
Smallest values
122
134
144
1523
1639
Largest values
821
671
651
632
621

sex_enum

categorical

categorical
Values1309
Missing0 (0.0000)
Distinct2
Zeroes0
Histogram
Top categories
m843
f466

embarked_category

categorical

categorical
Values1307
Missing2 (0.0015)
Distinct4
Zeroes0
Histogram
Top categories
S914
C270
Q123

family_struct

struct

struct
Values1309
Missing0 (0.0000)
Distinct428
Zeroes0
Sample values
  • {"siblings_spouses": 0, "parents_children": 0, "fare": 211.3375}
  • {"siblings_spouses": 1, "parents_children": 2, "fare": 151.55}
  • {"siblings_spouses": 1, "parents_children": 2, "fare": 151.55}
  • {"siblings_spouses": 1, "parents_children": 2, "fare": 151.55}
  • {"siblings_spouses": 1, "parents_children": 2, "fare": 151.55}

voyage_notes

list

list
Values1309
Missing0 (0.0000)
Distinct41
Zeroes0
List length
Min3
Max3
Mean3.0000
Median3.0000
Sample values
  • [0, 0, 1]
  • [1, 2, 1]
  • [1, 2, 0]
  • [1, 2, 0]
  • [1, 2, 0]
Length histogram

voyage_array

list

list
Values1309
Missing0 (0.0000)
Distinct41
Zeroes0
List length
Min3
Max3
Mean3.0000
Median3.0000
Sample values
  • [0, 0, 1]
  • [1, 2, 1]
  • [1, 2, 0]
  • [1, 2, 0]
  • [1, 2, 0]
Length histogram

Associations

Relationships

Numerical associations

Metric: Pearson r (linear correlation, -1 to 1)

age × age_duration 1.0000
sibsp × parch 0.3736
parch × name_binary 0.2308
parch × fare 0.2215
fare × age_duration 0.2150
fare × name_binary 0.1794
age × fare 0.1787
age_duration × name_binary 0.1719
sibsp × fare 0.1602
sibsp × name_binary 0.1440
age × name_binary 0.0924
pclass × sibsp 0.0608
body × age_duration 0.0593
age × body 0.0588
parch × body 0.0511
pclass × parch 0.0183
pclass × body -0.0346
parch × age_duration -0.0402
fare × body -0.0431
sibsp × body -0.1000

Categorical associations

Metric: Cramer's V (association strength, 0 to 1)

embarked × embarked_category 1.0000
sex × sex_enum 1.0000
survived × sex 0.5287
survived × sex_enum 0.5287
boat × embarked_category 0.5161
embarked × boat 0.5161
boat × sex_enum 0.4589
sex × boat 0.4589
survived × boat 0.4200
survived × embarked 0.1840
survived × embarked_category 0.1840
embarked × sex_enum 0.1224
sex × embarked 0.1224
sex × embarked_category 0.1224
sex_enum × embarked_category 0.1224

Mixed associations

Metric: Correlation ratio η (numeric vs category, 0 to 1)

pclass × boat 0.8092
fare × boat 0.5806
name_binary × sex 0.4614
name_binary × sex_enum 0.4614
age_duration × boat 0.4134
age × boat 0.4074
parch × boat 0.3547
name_binary × survived 0.3525
sibsp × boat 0.3433
pclass × embarked_category 0.3302
pclass × embarked 0.3302
name_binary × boat 0.3195
pclass × survived 0.3125
fare × embarked 0.2991
fare × embarked_category 0.2991
fare × survived 0.2443
age_duration × embarked 0.2240
age_duration × embarked_category 0.2240
parch × sex 0.2131
parch × sex_enum 0.2131