Facies Labelling¶
Problem
Exercise: Clustering of industry-related data using K-means approach
Input: dataset with wireline log measures:
GR=Gamma ray
ILD_Log10=Resistivity logging
Delta PHI=Neutron-density porosity difference
PHIND=Average neutron-density porosity
PE=Photoelectric effect
Challenge: Need to assign input data records to rock facies
Imports¶
Data Extraction¶
Facies | Depth | GR | ILD_log10 | DeltaPHI | PHIND | PE | WellName | FaciesLabel | FaciesDescription | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | 2793.0 | 77.45 | 0.664 | 9.9 | 11.915 | 4.6 | SHRIMPLIN | FSiS | Nonmarine fine siltstone |
1 | 3 | 2793.5 | 78.26 | 0.661 | 14.2 | 12.565 | 4.1 | SHRIMPLIN | FSiS | Nonmarine fine siltstone |
2 | 3 | 2794.0 | 79.05 | 0.658 | 14.8 | 13.050 | 3.6 | SHRIMPLIN | FSiS | Nonmarine fine siltstone |
3 | 3 | 2794.5 | 86.10 | 0.655 | 13.9 | 13.115 | 3.5 | SHRIMPLIN | FSiS | Nonmarine fine siltstone |
4 | 3 | 2795.0 | 74.58 | 0.647 | 13.5 | 13.300 | 3.4 | SHRIMPLIN | FSiS | Nonmarine fine siltstone |
Data Exploration¶
Facies | FaciesLabel | |
---|---|---|
0 | 3 | FSiS |
1 | 2 | CSiS |
2 | 8 | PS |
3 | 6 | WS |
4 | 7 | D |
5 | 4 | SiSh |
6 | 5 | MS |
7 | 9 | BS |
8 | 1 | SS |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4149 entries, 0 to 4148
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Facies 4149 non-null int64
1 Depth 4149 non-null float64
2 GR 4149 non-null float64
3 ILD_log10 4149 non-null float64
4 DeltaPHI 4149 non-null float64
5 PHIND 4149 non-null float64
6 PE 3232 non-null float64
7 WellName 4149 non-null object
8 FaciesLabel 4149 non-null object
9 FaciesDescription 4149 non-null object
dtypes: float64(6), int64(1), object(3)
memory usage: 324.3+ KB
Preprocessing¶
Select Columns of interest¶
Depth | GR | ILD_log10 | DeltaPHI | PHIND | PE | |
---|---|---|---|---|---|---|
0 | 2793.0 | 77.450 | 0.664 | 9.900 | 11.915 | 4.600 |
1 | 2793.5 | 78.260 | 0.661 | 14.200 | 12.565 | 4.100 |
2 | 2794.0 | 79.050 | 0.658 | 14.800 | 13.050 | 3.600 |
3 | 2794.5 | 86.100 | 0.655 | 13.900 | 13.115 | 3.500 |
4 | 2795.0 | 74.580 | 0.647 | 13.500 | 13.300 | 3.400 |
... | ... | ... | ... | ... | ... | ... |
4144 | 3120.5 | 46.719 | 0.947 | 1.828 | 7.254 | 3.617 |
4145 | 3121.0 | 44.563 | 0.953 | 2.241 | 8.013 | 3.344 |
4146 | 3121.5 | 49.719 | 0.964 | 2.925 | 8.013 | 3.190 |
4147 | 3122.0 | 51.469 | 0.965 | 3.083 | 7.708 | 3.152 |
4148 | 3122.5 | 50.031 | 0.970 | 2.609 | 6.668 | 3.295 |
4149 rows × 6 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4149 entries, 0 to 4148
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GR 4149 non-null float64
1 ILD_log10 4149 non-null float64
2 DeltaPHI 4149 non-null float64
3 PHIND 4149 non-null float64
4 PE 3232 non-null float64
dtypes: float64(5)
memory usage: 162.2 KB
Normalize¶
array([[0.3386812 , 0.31230835, 0.33234149, ..., 0.21724934, 0.27155135,
0.18648717],
[0.40458037, 0.33378014, 0.37788947, ..., 0.26987168, 0.37232316,
0.24162093],
[0.4251983 , 0.32234207, 0.39128932, ..., 0.26971391, 0.40303073,
0.25479664],
...,
[0.16679157, 0.38264676, 0.39475868, ..., 0.22015877, 0.21032196,
0.12523228],
[0.17061741, 0.38297956, 0.39137533, ..., 0.22070974, 0.21661977,
0.12612891],
[0.15224809, 0.39269409, 0.39269105, ..., 0.22791892, 0.19900377,
0.12644818]])
Frequency | ||
---|---|---|
PredictedLabels | FaciesLabel | |
0 | BS | 4.359673 |
FSiS | 0.272480 | |
MS | 11.989101 | |
PS | 32.697548 | |
SiSh | 2.997275 | |
... | ... | ... |
8 | MS | 18.791946 |
PS | 15.771812 | |
SS | 2.013423 | |
SiSh | 12.751678 | |
WS | 28.523490 |
73 rows × 1 columns