3  Erste Schritte in ›Data Science‹ [43:12]

In den vergangenen Jahren werden immer mehr Daten publiziert, deren Menge zukünftig noch schneller anwachsen wird. Es ist nicht leicht, all den Publikationen dieser erfreulich großen Mengen immer neuer Informationen zu folgen. Jupyter Notebooks wurden gerade auch dafür entwickelt, auf große Mengen an Daten zuzugreifen, diese zu visualisieren, analysieren und auf Problemstellungen anzuwenden. Natürlich ist das nur möglich, wenn die Daten in entsprechenden Formaten vorliegen. Oftmals ist daher im ersten Schritt ein ›Daten clean-up‹ notwendig. Seit 2020 gibt es nun endlich mehrere Initiativen, darunter so große wie die Nationale Forschungsdatenbank Initiative (NFDI) der DFG, GWK und anderer, die versuchen gemeinsam mit internationalen Partnern Daten besser verfügbar zu machen. Schon bestehende Datenbanken wie GeoROC, EarthChem oder auch MetBase erlauben schon seit langer Zeit solchen Zugang, und damit Data Science in z.B. der Geo- und Kosmochemie. In dieser Einheit steigen wir in das neue Feld der Data Science in der Mineralogie ein.

3.1 Was Datensätze sind und wie diese aufgebaut sein müssen [06:09]

Zunächst müssen wir uns etwas anschauen, wie Datensätze und eine Datenbank überhaupt aufgebaut sind. Wir sind zwar sehr damit vertraut, Tabellen zu erstellen, als Datenbank taugen die meisten jedoch selten etwas. Datenbanken, bzw. einzelne Datensätze erstellt man mit einigen wenigen, einfachen Regeln, die wir nun kennen lernen wollen – danach sehen auch die meisten Tabellen besser aus. Außerdem lernen wir noch ein wenig hilfreiches, und Datenbank bezogenen Vokabular kennen.

3.2 Basic- Introduction to working with data using pandas [11:24]

pandas: shape(), info, columns, set_option(›display.max_rows’, 20), python: type() set_option(‘display.max_rows’, 20), drop_duplicates(), df[›category name‹], >, <, &
Available Notebooks: Lecture, Exercise, Solution

Python selbst stellt nur einen sehr begrenzten Umfang an Befehlen zur Daten-Manipulation zur verfügung. Allerdings gibt es sehr viele zusätzliche Befehlspakete – genannt: libraries oder packages – für alles mögliche, auch eines zur Daten-Manipulation. Diese heißt (warum auch immer) ›pandas‹, und dessen Grundzüge lernen wir nun kennen.

pip install mag4
Requirement already satisfied: mag4 in /Users/dominik/anaconda3/lib/python3.11/site-packages (0.0.214)
Note: you may need to restart the kernel to use updated packages.

Import pandas

import mag4 as mg
import pandas as pd

Display the data

df = mg.get_data('Mineral Data Example Dataset')

pd.set_option('display.max_rows', df.shape[0])
pd.set_option('display.max_columns', df.shape[1])

print(df.shape)

print(df.columns)
(25, 28)
Index(['mineral', 'oxygens', 'sample', 'no.', 'SiO2', 'TiO2', 'Al2O3', 'Cr2O3',
       'FeO', 'MnO', 'NiO', 'MgO', 'CaO', 'Na2O', 'K2O', 'total oxides', 'Si',
       'Ti', 'Al', 'Cr', 'Fe', 'Mn', 'Ni', 'Mg', 'Ca', 'Na', 'K',
       'total cations'],
      dtype='object')

Select the data of a single category

df['SiO2']
0     42.51
1     42.16
2     52.33
3     51.54
4     57.09
5     59.19
6     58.33
7     53.87
8     47.10
9     47.72
10    42.72
11    42.83
12    40.43
13    42.46
14    45.09
15    46.72
16    46.36
17    47.78
18     0.59
19    35.13
20    35.52
21     3.24
22     0.16
23    51.72
24    32.79
Name: SiO2, dtype: float64
df['mineral'].drop_duplicates()
0          ol
2         cpx
4         opx
8          fs
16    fsp/mes
18         sp
19      ol/sp
Name: mineral, dtype: object

Using a filter

filter = (df['MgO'] > 50) & (df['MgO'] > 55)

df[filter]
mineral oxygens sample no. SiO2 TiO2 Al2O3 Cr2O3 FeO MnO NiO MgO CaO Na2O K2O total oxides Si Ti Al Cr Fe Mn Ni Mg Ca Na K total cations
0 ol 4 chd#2 1 42.51 0.04 0.07 0.04 0.18 0.00 0.00 55.26 1.58 0.0 0.0 99.68 1.003 0.001 0.002 0.001 0.004 0.000 0.0 1.944 0.040 0.0 0.0 2.995
11 ol 4 chd#3 17 42.83 0.03 0.11 0.16 0.45 0.00 0.01 56.10 0.46 0.0 0.0 100.15 1.004 0.001 0.003 0.003 0.009 0.000 0.0 1.961 0.012 0.0 0.0 2.992
13 ol 4 chd#3 21 42.46 0.05 0.27 0.10 0.32 0.03 0.01 55.26 0.57 0.0 0.0 99.07 1.006 0.001 0.008 0.002 0.006 0.001 0.0 1.951 0.014 0.0 0.0 2.989
type(df)
pandas.core.frame.DataFrame

This is the Pandas DataFrame dataset (=row) numbering, starting with the mathematical 0.

Yes, the first row contains the column category names, sometimes also called attributes, not to be confused with attributes as part of a code.

&

fil = (df['SiO2'] > 50) & (df['CaO'] < 10)
df[fil]
mineral oxygens sample no. SiO2 TiO2 Al2O3 Cr2O3 FeO MnO NiO MgO CaO Na2O K2O total oxides Si Ti Al Cr Fe Mn Ni Mg Ca Na K total cations
4 opx 6 chd#2 8 57.09 0.22 0.76 0.48 1.92 0.08 0.03 36.87 0.67 0.01 0.0 98.13 1.971 0.006 0.031 0.013 0.055 0.002 0.001 1.897 0.025 0.001 0.0 4.002
5 opx 6 chd#2 10 59.19 0.19 0.63 0.51 0.67 0.20 0.03 38.06 0.67 0.01 0.0 100.16 1.988 0.005 0.025 0.014 0.019 0.006 0.001 1.906 0.024 0.001 0.0 3.988
6 cpx 6 chd#2 11 58.33 0.30 0.98 0.53 0.33 0.14 0.00 34.70 5.02 0.01 0.0 100.34 1.977 0.008 0.039 0.014 0.009 0.004 0.000 1.754 0.182 0.001 0.0 3.989

Only those datasets are shown for which the SiO2 values are larger than 50 wt% and the CaO values are larger than 10 wt%.

|

fil = (df['SiO2'] > 50) | (df['CaO'] < 10)
df[fil]
mineral oxygens sample no. SiO2 TiO2 Al2O3 Cr2O3 FeO MnO NiO MgO CaO Na2O K2O total oxides Si Ti Al Cr Fe Mn Ni Mg Ca Na K total cations
0 ol 4 chd#2 1 42.51 0.04 0.07 0.04 0.18 0.00 0.00 55.26 1.58 0.00 0.00 99.68 1.003 0.001 0.002 0.001 0.004 0.000 0.000 1.944 0.040 0.000 0.000 2.995
1 ol 4 chd#2 9 42.16 0.10 0.06 0.20 3.14 0.07 0.03 53.81 0.24 0.00 0.00 99.81 1.004 0.002 0.002 0.004 0.063 0.001 0.001 1.910 0.006 0.000 0.000 2.992
2 cpx 6 chd#2 4 52.33 2.00 4.09 0.81 0.32 0.10 0.00 19.44 21.00 0.01 0.01 100.11 1.872 0.054 0.172 0.023 0.010 0.003 0.000 1.037 0.805 0.001 0.000 3.977
3 cpx 6 chd#2 6 51.54 1.74 5.72 0.56 0.67 0.09 0.00 19.12 20.75 0.04 0.00 100.23 1.842 0.047 0.241 0.016 0.020 0.003 0.000 1.019 0.795 0.003 0.000 3.984
4 opx 6 chd#2 8 57.09 0.22 0.76 0.48 1.92 0.08 0.03 36.87 0.67 0.01 0.00 98.13 1.971 0.006 0.031 0.013 0.055 0.002 0.001 1.897 0.025 0.001 0.000 4.002
5 opx 6 chd#2 10 59.19 0.19 0.63 0.51 0.67 0.20 0.03 38.06 0.67 0.01 0.00 100.16 1.988 0.005 0.025 0.014 0.019 0.006 0.001 1.906 0.024 0.001 0.000 3.988
6 cpx 6 chd#2 11 58.33 0.30 0.98 0.53 0.33 0.14 0.00 34.70 5.02 0.01 0.00 100.34 1.977 0.008 0.039 0.014 0.009 0.004 0.000 1.754 0.182 0.001 0.000 3.989
7 cpx 6 chd#2 12 53.87 1.10 3.09 0.51 0.21 0.08 0.00 22.00 19.24 0.00 0.00 100.10 1.911 0.029 0.129 0.014 0.006 0.002 0.000 1.164 0.731 0.000 0.000 3.988
10 ol 4 chd#3 15 42.72 0.05 1.08 0.11 0.37 0.01 0.00 54.32 0.68 0.53 0.00 99.87 1.005 0.001 0.030 0.002 0.007 0.000 0.000 1.904 0.017 0.024 0.000 2.991
11 ol 4 chd#3 17 42.83 0.03 0.11 0.16 0.45 0.00 0.01 56.10 0.46 0.00 0.00 100.15 1.004 0.001 0.003 0.003 0.009 0.000 0.000 1.961 0.012 0.000 0.000 2.992
12 ol 4 chd#3 18 40.43 0.02 0.03 0.11 11.16 0.00 0.01 46.57 0.33 0.01 0.00 98.67 1.009 0.000 0.001 0.002 0.233 0.000 0.000 1.733 0.009 0.000 0.000 2.989
13 ol 4 chd#3 21 42.46 0.05 0.27 0.10 0.32 0.03 0.01 55.26 0.57 0.00 0.00 99.07 1.006 0.001 0.008 0.002 0.006 0.001 0.000 1.951 0.014 0.000 0.000 2.989
18 sp 4 #24 - CAI 74 0.59 0.27 67.71 0.63 5.54 0.03 0.00 23.43 0.14 0.07 0.00 98.41 0.015 0.005 1.972 0.012 0.115 0.001 0.000 0.863 0.004 0.003 0.000 2.990
19 ol/sp 4 #24 - CAI 76 35.13 0.58 14.39 0.30 8.06 0.05 0.00 33.12 4.09 1.04 0.13 96.89 0.886 0.011 0.428 0.006 0.170 0.001 0.000 1.246 0.111 0.051 0.004 2.913
21 sp 4 #24 - CAI 84 3.24 1.36 65.06 0.08 3.05 0.01 0.01 27.71 0.39 0.02 0.00 100.93 0.077 0.024 1.826 0.002 0.061 0.000 0.000 0.984 0.010 0.001 0.000 2.985
22 sp 4 #24 - CAI 85 0.16 0.51 69.36 0.26 2.41 0.00 0.00 25.71 0.08 0.01 0.00 98.50 0.004 0.009 1.988 0.005 0.049 0.000 0.000 0.932 0.002 0.000 0.000 2.990
23 cpx 6 #24 - CAI 75 51.72 0.20 3.27 0.05 0.27 0.02 0.00 17.69 24.25 0.43 0.05 97.95 1.908 0.006 0.142 0.001 0.008 0.001 0.000 0.973 0.959 0.031 0.002 4.031

All datasets with SiO2 values larger than 50 wt% or with CaO values are larger than 10 wt% are shown.
Alternatively stated: All datasets with SiO2 values larger than 50 wt% are shown and all datasets with CaO values larger than 10 wt% are shown.

With a filter

With the .sel() method

With the sel() function

Start with importing the pandas library.

Then we want to get some information about the dataset: First we need to read in some data. Use the ‘Mineral Data Example Dataset’ exercise file.
Store the data in the variable df.
Print some information about this file, such as its number of rows, columns.
In addition, use the .info() method to display the data types of the categories. What data type, for example, is MgO?
Then use the set_option('display.max_xxx') with xxx being rows/columns attribute to automatically display all rows and columns of the data.
Finally, display the category names.

import mag4 as mg
import pandas as pd
df = mg.get_data('Mineral Data Example Dataset')

print(df.shape)
print(df.info())

pd.set_option('display.max_rows', df.shape[0])
pd.set_option('display.max_columns', df.shape[1])

print(df.columns)
(25, 28)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 28 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   mineral        25 non-null     object 
 1   oxygens        25 non-null     int64  
 2   sample         25 non-null     object 
 3   no.            25 non-null     int64  
 4   SiO2           25 non-null     float64
 5   TiO2           25 non-null     float64
 6   Al2O3          25 non-null     float64
 7   Cr2O3          25 non-null     float64
 8   FeO            25 non-null     float64
 9   MnO            25 non-null     float64
 10  NiO            25 non-null     float64
 11  MgO            25 non-null     float64
 12  CaO            25 non-null     float64
 13  Na2O           25 non-null     float64
 14  K2O            25 non-null     float64
 15  total oxides   25 non-null     float64
 16  Si             25 non-null     float64
 17  Ti             25 non-null     float64
 18  Al             25 non-null     float64
 19  Cr             25 non-null     float64
 20  Fe             25 non-null     float64
 21  Mn             25 non-null     float64
 22  Ni             25 non-null     float64
 23  Mg             25 non-null     float64
 24  Ca             25 non-null     float64
 25  Na             25 non-null     float64
 26  K              25 non-null     float64
 27  total cations  25 non-null     float64
dtypes: float64(24), int64(2), object(2)
memory usage: 5.6+ KB
None
Index(['mineral', 'oxygens', 'sample', 'no.', 'SiO2', 'TiO2', 'Al2O3', 'Cr2O3',
       'FeO', 'MnO', 'NiO', 'MgO', 'CaO', 'Na2O', 'K2O', 'total oxides', 'Si',
       'Ti', 'Al', 'Cr', 'Fe', 'Mn', 'Ni', 'Mg', 'Ca', 'Na', 'K',
       'total cations'],
      dtype='object')

Extracting data
Extract a subset of from the df data imported above, e.g., all MnO values. Now extract all MnO values larger the detection limit of 0.02 wt%.
Finally, we would like to know which samples are in the dataset. Therefore display all sample names in the dataset, but each no more than once.

df['MnO']
0     0.00
1     0.07
2     0.10
3     0.09
4     0.08
5     0.20
6     0.14
7     0.08
8     0.00
9     0.03
10    0.01
11    0.00
12    0.00
13    0.03
14    0.00
15    0.00
16    0.00
17    0.00
18    0.03
19    0.05
20    0.02
21    0.01
22    0.00
23    0.02
24    0.03
Name: MnO, dtype: float64
fil = df['MnO'] > 0.02

df[fil]
mineral oxygens sample no. SiO2 TiO2 Al2O3 Cr2O3 FeO MnO NiO MgO CaO Na2O K2O total oxides Si Ti Al Cr Fe Mn Ni Mg Ca Na K total cations
1 ol 4 chd#2 9 42.16 0.10 0.06 0.20 3.14 0.07 0.03 53.81 0.24 0.00 0.00 99.81 1.004 0.002 0.002 0.004 0.063 0.001 0.001 1.910 0.006 0.000 0.000 2.992
2 cpx 6 chd#2 4 52.33 2.00 4.09 0.81 0.32 0.10 0.00 19.44 21.00 0.01 0.01 100.11 1.872 0.054 0.172 0.023 0.010 0.003 0.000 1.037 0.805 0.001 0.000 3.977
3 cpx 6 chd#2 6 51.54 1.74 5.72 0.56 0.67 0.09 0.00 19.12 20.75 0.04 0.00 100.23 1.842 0.047 0.241 0.016 0.020 0.003 0.000 1.019 0.795 0.003 0.000 3.984
4 opx 6 chd#2 8 57.09 0.22 0.76 0.48 1.92 0.08 0.03 36.87 0.67 0.01 0.00 98.13 1.971 0.006 0.031 0.013 0.055 0.002 0.001 1.897 0.025 0.001 0.000 4.002
5 opx 6 chd#2 10 59.19 0.19 0.63 0.51 0.67 0.20 0.03 38.06 0.67 0.01 0.00 100.16 1.988 0.005 0.025 0.014 0.019 0.006 0.001 1.906 0.024 0.001 0.000 3.988
6 cpx 6 chd#2 11 58.33 0.30 0.98 0.53 0.33 0.14 0.00 34.70 5.02 0.01 0.00 100.34 1.977 0.008 0.039 0.014 0.009 0.004 0.000 1.754 0.182 0.001 0.000 3.989
7 cpx 6 chd#2 12 53.87 1.10 3.09 0.51 0.21 0.08 0.00 22.00 19.24 0.00 0.00 100.10 1.911 0.029 0.129 0.014 0.006 0.002 0.000 1.164 0.731 0.000 0.000 3.988
9 fs 8 chd#2 5 47.72 0.06 31.87 0.01 0.71 0.03 0.00 0.58 16.88 2.12 0.00 99.98 2.200 0.002 1.732 0.000 0.027 0.001 0.000 0.040 0.834 0.190 0.000 5.026
13 ol 4 chd#3 21 42.46 0.05 0.27 0.10 0.32 0.03 0.01 55.26 0.57 0.00 0.00 99.07 1.006 0.001 0.008 0.002 0.006 0.001 0.000 1.951 0.014 0.000 0.000 2.989
18 sp 4 #24 - CAI 74 0.59 0.27 67.71 0.63 5.54 0.03 0.00 23.43 0.14 0.07 0.00 98.41 0.015 0.005 1.972 0.012 0.115 0.001 0.000 0.863 0.004 0.003 0.000 2.990
19 ol/sp 4 #24 - CAI 76 35.13 0.58 14.39 0.30 8.06 0.05 0.00 33.12 4.09 1.04 0.13 96.89 0.886 0.011 0.428 0.006 0.170 0.001 0.000 1.246 0.111 0.051 0.004 2.913
24 cpx 6 #24 - CAI 81 32.79 1.35 26.06 0.46 2.53 0.03 0.00 17.46 15.75 0.80 0.20 97.43 1.228 0.038 1.150 0.014 0.079 0.001 0.000 0.975 0.632 0.058 0.010 4.185
df['sample'].drop_duplicates()
0         chd#2
10        chd#3
18    #24 - CAI
Name: sample, dtype: object

3.3 Basics– Introduction to making a plot using matplotlib [10:49]

read_csv(), maptplotlib.pyplot, plt, plot(), scatter(), show(), xlabel(), legend(), xlim()
Available Notebooks: Lecture, Exercise, Solution

Eine weitere library ist die ›matplotlib‹, die praktische alle nur denkbaren Plots und deren noch so komplizierten Darstellungen ermöglicht. In diese nehmen wir nun einen ersten, kleinen Einblick.

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
df_minData = mg.get_data('Mineral Data Example Dataset')
df_mydata = mg.get_data('Mineral Data Example Dataset 2')

A simple line plot

el = 'MnO'

plt.plot(df_minData[el])

plt.show()

A simple scatter plot

el1 = 'MgO'
el2 = 'SiO2'

plt.scatter(df_minData[el1], df_minData[el2])

plt.show()

Formating elements

el1 = 'MnO'
el2 = 'SiO2'

plt.scatter(df_minData[el1], df_minData[el2], label = 'Mineral Example Data')
plt.scatter(df_mydata[el1], df_mydata[el2], label = 'Mineral Example Data 2')
plt.xlabel(el1)
plt.ylabel(el2)
plt.legend(loc = 'lower right')
plt.xlim(0, 0.22)
plt.ylim(0, 60)

plt.show()

This is the name in pandas for a table with pandas specific formatting, indices, structure, etc.

A line plot has categories on the x-axis and a corresponding value on the y-axis. A category could simply be a number, but also chemical elements, e.g., the REE. The values on the y-axis would then be e.g., their concentrations.
A scatter plot has values on the x- and y-axis, e.g., Mg- and Si- values of a rock or mineral on the x- and y-axes.

import plt

import matplotlib.pyplot as plt

import matplotlib as plt

import pyplot as plt

True

False

Start with importing the pandas and matplotlib libraries.

Then read the example file Mineral Data Example Dataset 3, and store it a variable.
Display the categories and the data table. Use .tolist() to display the categories in a compact form.
Then plot the FeO data on the y-axis and label it with FeO.
Make the code flexibel, so that FeO can be quickly replaced by another oxide.
Then plot two oxides against each other, again flexible, and add a legend.
Finally, add data from the file Mineral Data Example Dataset to the latter plot.

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
df = mg.get_data('Mineral Data Example Dataset 3')

print(df.columns.tolist())

df
['No.', 'SiO2', 'TiO2', 'Al2O3', 'FeO', 'MnO', 'MgO', 'CaO', 'Na2O', 'K2O', 'P2O5', 'Cr2O3', 'NiO', 'Total', 'Comment']
No. SiO2 TiO2 Al2O3 FeO MnO MgO CaO Na2O K2O P2O5 Cr2O3 NiO Total Comment
0 1 57.2700 1.6275 17.9100 6.4200 0.1614 2.2800 3.46 4.4600 5.4500 0.8939 0.0026 0.0000 99.9354 GProbe Mitte Grid 010x011 Grid 020x021
1 2 57.2800 1.5939 18.0100 6.4400 0.1372 2.3100 3.49 4.4300 5.4300 0.8622 0.0000 0.0070 99.9903 GProbe Mitte Grid 010x011 Grid 020x021
2 4 57.1600 1.6335 18.0300 6.4500 0.1535 2.3000 3.48 4.4700 5.4300 0.9481 0.0109 0.0163 100.0823 GProbe Mitte Grid 010x011 Grid 020x021
3 5 57.3100 1.6235 17.9800 6.4300 0.1502 2.2700 3.48 4.3800 5.4300 0.9043 0.0000 0.0000 99.9580 GProbe Mitte Grid 010x011 Grid 020x021
4 7 57.3900 1.6314 18.0800 6.5100 0.1466 2.2900 3.51 4.4000 5.4400 0.8663 0.0161 0.0000 100.2804 GProbe Mitte Grid 010x011 Grid 020x021
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
192 287 1.3845 0.0000 0.0045 0.0010 0.0000 1.3767 49.92 0.0713 0.0000 0.0104 0.0070 0.0000 52.7754 GProbe Mitte Grid 010x011 Grid 020x021
193 289 0.7298 0.0082 0.0000 0.0333 0.0000 1.4190 49.99 0.0640 0.0000 0.0128 0.0000 0.0000 52.2571 GProbe Mitte Grid 010x011 Grid 020x021
194 290 0.6210 0.0072 0.0006 0.0060 0.0000 1.3827 50.26 0.0655 0.0000 0.0000 0.0046 0.0000 52.3476 GProbe Mitte Grid 010x011 Grid 020x021
195 292 0.5870 0.0000 0.0000 0.0072 0.0000 1.3888 50.19 0.0542 0.0026 0.0000 0.0000 0.0000 52.2299 GProbe Mitte Grid 010x011 Grid 020x021
196 293 0.6773 0.0000 0.0084 0.0000 0.0000 1.4192 50.44 0.0713 0.0000 0.0002 0.0216 0.0143 52.6523 GProbe Mitte Grid 010x011 Grid 020x021

197 rows × 15 columns

el = 'FeO'

plt.plot(df[el])
plt.ylabel(el)

plt.show()

el1 = 'MnO'
el2 = 'FeO'

plt.scatter(df[el1], df[el2], label = 'Mineral Example Dataset 3')
plt.xlabel(el1)
plt.ylabel(el2)
plt.legend(loc = 'lower right')

plt.show()

df_minData = mg.get_data('Mineral Data Example Dataset')
el1 = 'MnO'
el2 = 'FeO'

plt.scatter(df[el1], df[el2], label = 'Mineral Example Dataset 3')
plt.scatter(df_minData[el1], df_minData[el2], label = 'Mineral Data')
plt.xlabel(el1)
plt.ylabel(el2)
plt.legend(loc = 'upper left')

plt.show()

3.4 Basics- Selecting data using locations and indices [12:36]

.iloc(), .loc(), .set_index()
Available Notebooks: Lecture, Exercise, Solution

Nun wollen wir lernen, wie leicht mit Pandas jegliche gewünschten Daten aus einer Datenbank, also z.B. einer einfachen Excel-Tabelle extrahiert werden können. Das ist besonders wichtig und hilfreich, wenn die Datenbank sehr groß ist, oder wenn man immer andere Datenbanken verwendet, die aber immer anders aufgebaut sind. Anders aufgebaut meint, dass z.B. die Kategorienamen in anderer Reihenfolge stehen, oder evtl. auch etwas anders lauten. Dann müsste bei einem Code mit Pandas nur der entsprechende Kategoriename geändert werden, der Code selbst bliebe gleich.

import mag4 as mg
import pandas as pd
df = mg.get_data("Banda Arc")  #could add: , index_col = 'xxx'

#pd.set_option('display.max_rows', df.shape[0])
#pd.set_option('display.max_columns', df.shape[1])

print(df.columns.tolist())

df
['Citations', 'Tectonic Setting', 'Location', 'Location Comment', 'Latitude (Min)', 'Latitude (Max)', 'Longitude (Min)', 'Longitude (Max)', 'Land or Sea', 'Elevation (Min)', 'Elevation (Max)', 'Sample Name', 'Rock Name', 'Age (a, Min)', 'Age (a, Max)', 'Geol', 'Age', 'Eruption Day', 'Eruption Month', 'Eruption Year', 'Rock Texture', 'Rock Type', 'Drill Depth (Min)', 'Drill Depth (Max)', 'Alteration', 'Mineral', 'Material', 'Si', 'Ti', 'B', 'Al', 'Cr', 'Fe3+', 'Fe2+', 'Fetot(2+)', 'Ca', 'Mg', 'Mn', 'Ni', 'K', 'Na', 'P', 'H2O', 'H2OP', 'H2OM', 'H2Otot', 'CO2', 'CO', 'F', 'Cl', 'Cl2', 'OH', 'CH4', 'SO2', 'SO3', 'SO4', 'S', 'LOI', 'Volatiles', 'O', 'Others', 'HE(CCM/G)', 'HE(CCMSTP/G)', 'HE3(CCMSTP/G)', 'HE3(AT/G)', 'HE4(CCM/G)', 'HE4(CCMSTP/G)', 'HE4(AT/G)', 'HE4(MOLE/G)', 'HE4(NCC/G)', 'HE(NCC/G)', 'Li', 'Be', 'B.1', 'C', 'CO2.1', 'F.1', 'Na.1', 'Mg.1', 'Al.1', 'P.1', 'S.1', 'Cl.1', 'K.1', 'Ca.1', 'Sc', 'Ti.1', 'V', 'Cr.1', 'Mn.1', 'Fe', 'Co', 'Ni.1', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U', 'Nd143/Nd144', 'Nd143/Nd144 initial', 'e Nd', 'Sr87/Sr86', 'SR87_SR86_INI', 'PB206_PB204', 'PB206_PB204_INI', 'PB207_PB204', 'PB207_PB204_INI', 'PB208_PB204', 'PB208_PB204_INI', 'OS184_OS188', 'OS186_OS188', 'OS187_OS186', 'OS187_OS188', 'RE187_OS186', 'RE187_OS188', 'HF176_HF177', 'HE3_HE4', 'HE3_HE4(R/R(A))', 'HE4_HE3', 'HE4_HE3(R/R(A))', 'K40_AR40', 'AR40_K40', 'Unique Id', 'Unnamed: 171']
Citations Tectonic Setting Location Location Comment Latitude (Min) Latitude (Max) Longitude (Min) Longitude (Max) Land or Sea Elevation (Min) Elevation (Max) Sample Name Rock Name Age (a, Min) ... OS186_OS188 OS187_OS186 OS187_OS188 RE187_OS186 RE187_OS188 HF176_HF177 HE3_HE4 HE3_HE4(R/R(A)) HE4_HE3 HE4_HE3(R/R(A)) K40_AR40 AR40_K40 Unique Id Unnamed: 171
0 [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / KISU / KISU / INDIAN O... NaN -8.38 -8.38 124.41 124.41 SUBAERIAL NaN NaN s BUA [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 3.57 NaN NaN NaN NaN 104872 NaN
1 [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / ROMANG / PACIFIC OCEAN NaN -7.55 -7.55 127.42 127.42 SUBAERIAL NaN NaN s RO7B [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 0.99 NaN NaN NaN NaN 104873 NaN
2 [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / BANDA / PACIFIC OCEAN NaN -4.52 -4.52 129.87 129.87 SUBAERIAL NaN NaN s BA113A [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 3.01 NaN NaN NaN NaN 104874 NaN
3 [3910] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TERNATE / TERNATE / PA... NaN 0.80 0.80 127.33 127.33 SUBAERIAL NaN NaN s T7 [3910] NOT GIVEN [3910] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11648-T7 NaN
4 [5178] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TREWEG / PACIFIC OCEAN NaN -8.00 -8.00 124.00 124.00 SUBAERIAL NaN NaN s TR1.2.1 [5178] ANDESITE, BASALTIC [5178] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 120443 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
465 [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-4 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-4 NaN
466 [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-5 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-5 NaN
467 [3147] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AM104A1 [3147] ANDESITE [3147] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AM104A1 NaN
468 [3140][2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-2 [3140] ANDESITE, BASALTIC [3140] / BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-2 NaN
469 [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-9 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-9 NaN

470 rows × 172 columns

Using .iloc[] (-> integer location)

# General Syntax:  iloc[row_position, column_position]


# just '.tolist()' everything to obtain the desired values only.

# Select a single dataset/row by its index
df.iloc[5]

# Select a sequence of datasets/row by their indices
df.iloc[4:9]   

# Select a single series by its index
df.iloc[:, 5]   

# Select multiple datasets/rows by their index
df.iloc[[4, 8]]

# Select a sequence of datasets/rows by their index
df.iloc[4:8]
df.iloc[4:]
df.iloc[:8]

# Select only values from one sereies, but of only specific rows by their index
df.iloc[[4, 8], 4]

# Select values from multiple datasets/rows and multiple series by their index
df.iloc[[4, 8], [4, 6]]

# Select values from multiple series and a sequence of datasets/rows by their index
df.iloc[4:8, [4, 6]]

# Select values from a sequence of series and a sequence of datasets/rows by their index
df.iloc[4:8, 4:7]

# Select a single value using its index
df.iloc[120, 100]
3.0

Using .loc[] (-> location)

# General Syntax:  loc[row_label, column_label]


# Single value using index and location
df.loc[3, 'Mg']

# The first part only selects datasets indexed 3 and 5, the latter selects the desired series
df.loc[[3, 5], 'Mg']

# The first part only selects datasets indexed 3 and 5, the latter selects the desired series
df.loc[[3, 5], ['Mg', 'Si']]

# The first part selects a sequence of datasets, the latter selects the desired series
df.loc[3:5, ['Mg', 'Si']]
Mg Si
3 27378.016 251106.082
4 20322.448 260174.321
5 21106.400 262885.444

Making a series indices to use it in .loc[]

df['Rock Type'].drop_duplicates()
0         VOLCANIC ROCK
47        PLUTONIC ROCK
50     METAMORPHIC ROCK
358     MANTLE XENOLITH
Name: Rock Type, dtype: object
df.set_index('Rock Type', inplace = True)   # this will move this index series to the left
df
Citations Tectonic Setting Location Location Comment Latitude (Min) Latitude (Max) Longitude (Min) Longitude (Max) Land or Sea Elevation (Min) Elevation (Max) Sample Name Rock Name Age (a, Min) ... OS186_OS188 OS187_OS186 OS187_OS188 RE187_OS186 RE187_OS188 HF176_HF177 HE3_HE4 HE3_HE4(R/R(A)) HE4_HE3 HE4_HE3(R/R(A)) K40_AR40 AR40_K40 Unique Id Unnamed: 171
Rock Type
VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / KISU / KISU / INDIAN O... NaN -8.38 -8.38 124.41 124.41 SUBAERIAL NaN NaN s BUA [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 3.57 NaN NaN NaN NaN 104872 NaN
VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / ROMANG / PACIFIC OCEAN NaN -7.55 -7.55 127.42 127.42 SUBAERIAL NaN NaN s RO7B [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 0.99 NaN NaN NaN NaN 104873 NaN
VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / BANDA / PACIFIC OCEAN NaN -4.52 -4.52 129.87 129.87 SUBAERIAL NaN NaN s BA113A [3085] NOT GIVEN [3085] NaN ... NaN NaN NaN NaN NaN NaN NaN 3.01 NaN NaN NaN NaN 104874 NaN
VOLCANIC ROCK [3910] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TERNATE / TERNATE / PA... NaN 0.80 0.80 127.33 127.33 SUBAERIAL NaN NaN s T7 [3910] NOT GIVEN [3910] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11648-T7 NaN
VOLCANIC ROCK [5178] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TREWEG / PACIFIC OCEAN NaN -8.00 -8.00 124.00 124.00 SUBAERIAL NaN NaN s TR1.2.1 [5178] ANDESITE, BASALTIC [5178] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 120443 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-4 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-4 NaN
VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-5 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-5 NaN
VOLCANIC ROCK [3147] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AM104A1 [3147] ANDESITE [3147] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AM104A1 NaN
VOLCANIC ROCK [3140][2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-2 [3140] ANDESITE, BASALTIC [3140] / BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-2 NaN
VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-9 [2708] BASALT [2708] NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-9 NaN

470 rows × 171 columns

# Select all datasets with a specific characteristic in the index series

df.loc["MANTLE XENOLITH"]

df.loc["MANTLE XENOLITH", ['Mg']]

df.loc[["MANTLE XENOLITH"], ['Mg', 'Si']]
Mg Si
Rock Type
MANTLE XENOLITH 275589.280 210392.4935
MANTLE XENOLITH 264372.736 212028.5160
MANTLE XENOLITH 248693.696 206372.5525
MANTLE XENOLITH 252734.064 204923.5040
MANTLE XENOLITH 240854.176 207961.8315
MANTLE XENOLITH 241216.000 209130.4190
MANTLE XENOLITH 236572.592 204456.0690
MANTLE XENOLITH 236813.808 207167.1920
MANTLE XENOLITH 222702.672 213290.5905
MANTLE XENOLITH 212752.512 195855.2650
MANTLE XENOLITH 220411.120 201978.6635
df.reset_index(inplace = True)
df
Rock Type Citations Tectonic Setting Location Location Comment Latitude (Min) Latitude (Max) Longitude (Min) Longitude (Max) Land or Sea Elevation (Min) Elevation (Max) Sample Name Rock Name ... OS186_OS188 OS187_OS186 OS187_OS188 RE187_OS186 RE187_OS188 HF176_HF177 HE3_HE4 HE3_HE4(R/R(A)) HE4_HE3 HE4_HE3(R/R(A)) K40_AR40 AR40_K40 Unique Id Unnamed: 171
0 VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / KISU / KISU / INDIAN O... NaN -8.38 -8.38 124.41 124.41 SUBAERIAL NaN NaN s BUA [3085] NOT GIVEN [3085] ... NaN NaN NaN NaN NaN NaN NaN 3.57 NaN NaN NaN NaN 104872 NaN
1 VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / ROMANG / PACIFIC OCEAN NaN -7.55 -7.55 127.42 127.42 SUBAERIAL NaN NaN s RO7B [3085] NOT GIVEN [3085] ... NaN NaN NaN NaN NaN NaN NaN 0.99 NaN NaN NaN NaN 104873 NaN
2 VOLCANIC ROCK [3085] CONVERGENT MARGIN BANDA ARC / BANDA ARC / BANDA / PACIFIC OCEAN NaN -4.52 -4.52 129.87 129.87 SUBAERIAL NaN NaN s BA113A [3085] NOT GIVEN [3085] ... NaN NaN NaN NaN NaN NaN NaN 3.01 NaN NaN NaN NaN 104874 NaN
3 VOLCANIC ROCK [3910] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TERNATE / TERNATE / PA... NaN 0.80 0.80 127.33 127.33 SUBAERIAL NaN NaN s T7 [3910] NOT GIVEN [3910] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11648-T7 NaN
4 VOLCANIC ROCK [5178] CONVERGENT MARGIN BANDA ARC / BANDA ARC / TREWEG / PACIFIC OCEAN NaN -8.00 -8.00 124.00 124.00 SUBAERIAL NaN NaN s TR1.2.1 [5178] ANDESITE, BASALTIC [5178] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 120443 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
465 VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-4 [2708] BASALT [2708] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-4 NaN
466 VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s A-5 [2708] BASALT [2708] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-A-5 NaN
467 VOLCANIC ROCK [3147] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AM104A1 [3147] ANDESITE [3147] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AM104A1 NaN
468 VOLCANIC ROCK [3140][2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-2 [3140] ANDESITE, BASALTIC [3140] / BASALT [2708] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-2 NaN
469 VOLCANIC ROCK [2708] CONVERGENT MARGIN BANDA ARC / BANDA ARC / AMBON / BANDA SEA / PA... NaN -3.61 -3.61 128.11 128.11 SUBAERIAL NaN NaN s AS-9 [2708] BASALT [2708] ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8829-AS-9 NaN

470 rows × 172 columns

Index is an integer number that can be used to refer to a row or column. A series is a pandas object, more specific, a single column with still pandas formating. More than two series togehter then constitute the pandas dataframe.

.iloc[] and .loc[]

.integerloc[] and .location[]

.intergerlocation[] and .location[]

True

False

Start with reading the file West African Craton into a variable and displaying the categories in a compact form.
Display the table, but with only 5 rows and columns.
Select the first 4 dataset.
Select the series number 4, 17, and 99.
What types of rock (-> ‘Rock Type’) are represented in the database?
Select all metamorphick rocks. (This is an already tricky one.)
Again, select all metamorphic rocks, but only display the series ‘Mg’ and ‘Ti’.
Reset the indices first, then select datasets 10 throughout and including 20, but only with the columns MgO and Cr2O3.

import mag4 as mg
import pandas as pd
df = mg.get_data('West African Craton')

pd.set_option('display.max_rows', 5)
pd.set_option('display.max_columns', 5)

print(df.columns.tolist())
df
['Citations', 'Tectonic Setting', 'Location', 'Location Comment', 'Latitude (Min)', 'Latitude (Max)', 'Longitude (Min)', 'Longitude (Max)', 'Land or Sea', 'Elevation (Min)', 'Elevation (Max)', 'Sample Name', 'Rock Name', 'Age (a, Min)', 'Age (a, Max)', 'Geol', 'Age', 'Eruption Day', 'Eruption Month', 'Eruption Year', 'Rock Texture', 'Rock Type', 'Drill Depth (Min)', 'Drill Depth (Max)', 'Alteration', 'Mineral', 'Material', 'Si', 'Ti', 'B', 'Al', 'Cr', 'Fe3+', 'Fe2+', 'Fetot(2+)', 'Ca', 'Mg', 'Mn', 'Ni', 'K', 'Na', 'P', 'H2O', 'H2OP', 'H2OM', 'H2Otot', 'CO2', 'CO', 'F', 'Cl', 'Cl2', 'OH', 'CH4', 'SO2', 'SO3', 'SO4', 'S', 'LOI', 'Volatiles', 'O', 'Others', 'HE(CCM/G)', 'HE(CCMSTP/G)', 'HE3(CCMSTP/G)', 'HE3(AT/G)', 'HE4(CCM/G)', 'HE4(CCMSTP/G)', 'HE4(AT/G)', 'HE4(MOLE/G)', 'HE4(NCC/G)', 'HE(NCC/G)', 'Li', 'Be', 'B.1', 'C', 'CO2.1', 'F.1', 'Na.1', 'Al.1', 'P.1', 'S.1', 'Cl.1', 'K.1', 'Ca.1', 'Sc', 'Ti.1', 'V', 'Cr.1', 'Mn.1', 'Fe', 'Co', 'Ni.1', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U', 'Nd143/Nd144', 'Nd143/Nd144 initial', 'e Nd', 'Sr87/Sr86', 'SR87_SR86_INI', 'PB206_PB204', 'PB206_PB204_INI', 'PB207_PB204', 'PB207_PB204_INI', 'PB208_PB204', 'PB208_PB204_INI', 'OS184_OS188', 'OS186_OS188', 'OS187_OS186', 'OS187_OS188', 'RE187_OS186', 'RE187_OS188', 'HF176_HF177', 'HE3_HE4', 'HE3_HE4(R/R(A))', 'HE4_HE3', 'HE4_HE3(R/R(A))', 'K40_AR40', 'AR40_K40', 'Unique Id']
Citations Tectonic Setting ... AR40_K40 Unique Id
0 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184947
1 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184948
... ... ... ... ... ...
41 [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247031
42 [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247032

43 rows × 170 columns

df.iloc[:4]
Citations Tectonic Setting ... AR40_K40 Unique Id
0 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184947
1 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184948
2 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184949
3 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184950

4 rows × 170 columns

df.iloc[:, [4, 17, 99]]
Latitude (Min) Eruption Day Rb
0 8.5 NaN 0.0
1 8.5 NaN 0.0
... ... ... ...
41 9.5 NaN 4.0
42 9.5 NaN 39.0

43 rows × 3 columns

df['Rock Type'].drop_duplicates()
0        VOLCANIC ROCK
23    METAMORPHIC ROCK
38       PLUTONIC ROCK
Name: Rock Type, dtype: object
df.set_index('Rock Type', inplace = True)
# plus the next line
df.loc['METAMORPHIC ROCK']
Citations Tectonic Setting ... AR40_K40 Unique Id
Rock Type
METAMORPHIC ROCK [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247013
METAMORPHIC ROCK [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247014
... ... ... ... ... ...
METAMORPHIC ROCK [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247031
METAMORPHIC ROCK [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247032

19 rows × 169 columns

df.loc['METAMORPHIC ROCK', ['Mg', 'Ti']]
Mg Ti
Rock Type
METAMORPHIC ROCK 48243.200 6592.740
METAMORPHIC ROCK 34373.280 10788.120
... ... ...
METAMORPHIC ROCK 151845.472 659.274
METAMORPHIC ROCK 246643.360 1618.218

19 rows × 2 columns

df.reset_index(inplace = True)
df
# plus the next one
Rock Type Citations ... AR40_K40 Unique Id
0 VOLCANIC ROCK [5205] ... NaN 184947
1 VOLCANIC ROCK [5205] ... NaN 184948
... ... ... ... ... ...
41 METAMORPHIC ROCK [8624] ... NaN 247031
42 METAMORPHIC ROCK [8624] ... NaN 247032

43 rows × 170 columns

df.loc[10:21, ['Si', 'Fe', 'Mn']]
Si Fe Mn
10 226705.9750 NaN 1626.3618
11 234699.1135 NaN 1394.0244
... ... ... ...
20 230866.1465 NaN 1626.3618
21 249703.7770 NaN 1936.1450

12 rows × 3 columns

3.5 Basics- Plotting selected data [03:03]

Available Notebooks: Lecture, Exercise, Solution

Zum Schluss dieser Einheit wollen wir die geballte Kraft von Pandas mit der von maplotlib kombinieren. D.h., wir selektieren zunächst mit Pandas die gewünschten Daten aus einer Datenbank, und visualisieren diese umgehend und einfach mit Hilfe der matplotlib Bibliothek.

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
df = mg.get_data('Banda Arc')

data = df.loc[3:9, ['Mg', 'Si', 'Mn']]

el1 = 'Mg'

plt.scatter(data[el1], data['Mn'])
plt.xlabel(el1)
Text(0.5, 0, 'Mg')

pandas and matplotlib.pyplot

True

False

Import the 2 databases we require for plotting, plus the one to load data.
Then, we need a database, which shall be West African Craton, and as usual we want to display some sensible information.
Then display datasets 5 through 25, and flexibly plot two elements against each other that are also shown as lables on the respective axes, as well as a legend indicating what data are shown.
The data are in wt-ppm. Change this to wt%.
If we want to concatenate strings, we can simply do this using + between two strings, e.g.: 'a' + 'b', which will result in 'ab'. Try using this to add units to the axes labelling.

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
df = mg.get_data('West African Craton')

print(df.columns.tolist())

df
['Citations', 'Tectonic Setting', 'Location', 'Location Comment', 'Latitude (Min)', 'Latitude (Max)', 'Longitude (Min)', 'Longitude (Max)', 'Land or Sea', 'Elevation (Min)', 'Elevation (Max)', 'Sample Name', 'Rock Name', 'Age (a, Min)', 'Age (a, Max)', 'Geol', 'Age', 'Eruption Day', 'Eruption Month', 'Eruption Year', 'Rock Texture', 'Rock Type', 'Drill Depth (Min)', 'Drill Depth (Max)', 'Alteration', 'Mineral', 'Material', 'Si', 'Ti', 'B', 'Al', 'Cr', 'Fe3+', 'Fe2+', 'Fetot(2+)', 'Ca', 'Mg', 'Mn', 'Ni', 'K', 'Na', 'P', 'H2O', 'H2OP', 'H2OM', 'H2Otot', 'CO2', 'CO', 'F', 'Cl', 'Cl2', 'OH', 'CH4', 'SO2', 'SO3', 'SO4', 'S', 'LOI', 'Volatiles', 'O', 'Others', 'HE(CCM/G)', 'HE(CCMSTP/G)', 'HE3(CCMSTP/G)', 'HE3(AT/G)', 'HE4(CCM/G)', 'HE4(CCMSTP/G)', 'HE4(AT/G)', 'HE4(MOLE/G)', 'HE4(NCC/G)', 'HE(NCC/G)', 'Li', 'Be', 'B.1', 'C', 'CO2.1', 'F.1', 'Na.1', 'Al.1', 'P.1', 'S.1', 'Cl.1', 'K.1', 'Ca.1', 'Sc', 'Ti.1', 'V', 'Cr.1', 'Mn.1', 'Fe', 'Co', 'Ni.1', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U', 'Nd143/Nd144', 'Nd143/Nd144 initial', 'e Nd', 'Sr87/Sr86', 'SR87_SR86_INI', 'PB206_PB204', 'PB206_PB204_INI', 'PB207_PB204', 'PB207_PB204_INI', 'PB208_PB204', 'PB208_PB204_INI', 'OS184_OS188', 'OS186_OS188', 'OS187_OS186', 'OS187_OS188', 'RE187_OS186', 'RE187_OS188', 'HF176_HF177', 'HE3_HE4', 'HE3_HE4(R/R(A))', 'HE4_HE3', 'HE4_HE3(R/R(A))', 'K40_AR40', 'AR40_K40', 'Unique Id']
Citations Tectonic Setting ... AR40_K40 Unique Id
0 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184947
1 [5205] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 184948
... ... ... ... ... ...
41 [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247031
42 [8624] ARCHEAN CRATON (INCLUDING GREENSTONE BELTS) ... NaN 247032

43 rows × 170 columns

el1 = 'Ti'
el2 = 'Al'

xData = df.loc[5:26, el1]
yData = df.loc[5:26, el2]

plt.scatter(xData, yData, label = 'W-Africa Craton')
plt.xlabel(el1)
plt.ylabel(el2)
plt.legend(loc = 'lower right')

plt.show()

el1 = 'Ti'
el2 = 'Al'

xData = df.loc[5:26, el1] / 10000
yData = df.loc[5:26, el2] / 10000

plt.scatter(xData, yData, label = 'W-Africa Craton')
plt.xlabel(el1 + ' (wt%)')
plt.ylabel(el2 + ' (wt%)')
plt.legend(loc = 'lower right')

plt.show()