7 API Requests und Auswertung [1:03:22]

“Eine besonders effektive und interessante Art auf Daten zuzugreifen ist über ›Application Programming Interfaces‹ – kurz: API. Der Begriff API ist sehr weit verbreitet, und sollte man sich in jedem Fall merken. In einer idealen Welt hätte man gar keine eigenen Datenbanken mehr, sondern würde nur noch über APIs auf Datenbanken zugreifen, sich die gewünschten Daten herunter laden, und diese dann auf dem eigenen Computer auswerten. Ein Großteil des Internets, und viele der erfolgreichsten Apps auf dem Smartphone funktionieren genau so: über ein API werden die gewünschten Daten geladen, und die App macht letztlich nicht mehr als ein Interface zur verfügung zu stellen, um diese Daten entsprechend auszuwählen und darzustellen. Das Interface ist dabei ein Graphical User Interface – kurz GUI. Ebenfalls ein Begriff, den man kennen sollte. Eine GUI ist letztlich nichts anders als das, was wir mit ›Interact‹ kennen gelernt haben. D.h., mit einer API und einer GUI machen wir exakt das, was fast jede unserer App auf dem Smartphone macht. Das hört sich alles großartig an – und ist es auch! Wäre da nicht das eine Problem: viele geowissenschaftliche Datenbanken haben – noch – keine vernünftige API. Glücklicherweise ändert sich daran gerade viel. Fun Fact Wir hier in Frankfurt sind an dieser Änderung recht zentral beteiligt.”

7.1 Basics- Welche online-Datenbanken u.ä. gibt es? [04:39]

Um 2000 herum begannen ein paar erste Initiativen damit mineralogische Datenbanken aufzubauen. Zu den Pionieren gehören sicherlich die GeoROC Datenbank am Max-Planck Institut für Geochemie in Mainz, die EarthChem Datenbank am Lamont-Doherty Earth Observatory, sowie die MetBase, welche von einem privaten Enthusiasten aufgebaut wurde. Die ersten beiden Datenbanken enthalten primär Gesteinsdaten, sind also geochemisch orientiert, wohingegen die MetBase eine kosmoschemische Datenbank mit Meteoriten-Daten ist. Zusätzlich gibt es Konsortien (NFDI4Earth, Research Data Alliance, European Open Science Cloud (EOSC), die selbst keine Datenbanken betreiben, sondern Standards, Schnittstellen, etc. definieren, und einen zentralen Zugangspunkt zu den verschiedenen Datenbanken anbieten. Das gibt es außerdem nicht nur für Daten, sondern auch für Labore (EPOS) oder die Lehre.

7.2 Basics- Dictionary – another type of list [12:36]

{ } (-> Dictionary brackets), pd.DataFrame()
Available Notebooks: Lecture, Exercise, Solution

Daten, die man sich aus online-Datenbanken zieht sind häufig in Listen des Typs Dictionary (z.B. json Formate). Dictionary sind schnell zugänglich und leicht verständlich nachdem wir nun Pandas DataFrames kenne – und können auch direkt in diese umformatiert werden.

Video nb

[1, 2, 3, 4, 5]

[1, 2, 3, 4, 5]

dict1 = {
    'Name': ['Dominik', 'Claudia', 'Chantal', 'Mark'],
    'Height': [190, 192, 176, 181]  # in cm
}

dict1

{'Name': ['Dominik', 'Claudia', 'Chantal', 'Mark'],
 'Height': [190, 192, 176, 181]}

import pandas as pd

pd.DataFrame(dict1)

	Name	Height
0	Dominik	190
1	Claudia	192
2	Chantal	176
3	Mark	181

dict2 = {
    'Name': ['Dominik', 'Claudia', 'Chantal', 'Mark'],
    'body parameters': {'Height': [190, 192, 176, 181], 'Weight': [94, 92, 71, 105], 'eye colour': ['blue', 'green', 'brown']}
}

dict2['body parameters']['eye colour'][0]

'blue'

dict3 = {
    'animal': 'cat',
    'name': 'furbal',
    vaccinated: 'yes',
    'born': 2018
    'legs': 4
}

Name at least two differences between a dictionary and a ‘normal’ list.

curly bracktes – square brackets
have keys – no keys
elements can be accessed using keys – apparently not, as no keys in ‘normal’ lists

Each element of a list consist of two part – what are these called?

Key – Value pairs
The key is before the colon, basically a category name, and the values are single values or a list of values after the colon

Individual ‘normal’ lists in a dictionary need to be of the same length.

✗True

✓False

Each key name in a dictionary needs to be unique.

✗True

✓False

Keys can be a variable or string.

✗True

✓False

Practise

Produce a dictionary with 3 keys and variable lengths of the values
Convert this dictionary into a pandas dataframe
Produce a dictionary with an additional dictionary nested within this dictionary
Extract various elements from this dictionray
Try and convert this nested dictionary into a DataFrame. Try and understand what happened.

Solution

dict = {
    'Name': ['Dominik', 'Claudia', 'Chantal', 'Mark'],
    'Height': [190, 192, 176, 181]
}

# Selecting all Georoc files from mag4
import pandas as pd

pd.DataFrame(dict)

	Name	Height
0	Dominik	190
1	Claudia	192
2	Chantal	176
3	Mark	181

dict2 = {
    'Name': ['Dominik', 'Claudia', 'Chantal', 'Mark'],
    'body parameters': {'Height': [190, 192, 176, 181], 'Weight': [94, 92, 71, 105], 'eye colour': ['blue', 'green', 'brown']}
}

print(dict2['Name'][1])

print(dict2['body parameters']['eye colour'][2])

print(dict2['body parameters']['Weight'][0])

Claudia
brown
94

pd.DataFrame(dict2)

What’s wrong with this code?

dict = {
    'brand': 'Apple',
    'OS': 'macOS',
    memory: '64 GB',
    'year build': 2014
}

Solution

memory needs to be a string

dict = {
    'city': ['Swakopmund', 'Timbuktu', 'Ulan Bator']
    'population': [45000, 54000, 1452000]
    'hair colour': ['black', 'brown', 'red', 'blond', 'grey', 'none']
}

Solution

commas after each entry are missing

for i in [1, 2, 3]
    print(i ** 3)

Solution

colon at the end of ‘for …’ missing

7.3 Command- API requests [20:40]

requests, .get(), json, str()
Available Notebooks: Lecture, Exercise, Solution

Was ist eine API? Ein Application Programming Interface. Das sagt kaum mehr. Es ist eine Schnittstelle. Hilft vielleicht etwas. Angenommen, wir wollen die aktuellen Wetterdaten einer Stadt wissen. Dann können wir eine Anfrage (request) an einen Server stellen, und bekommen die Wetterdaten geliefert. Hört sich einfach an – und ist es tatsächlich auch. In dieser Einheit lernen wir also, wie wir sehr einfach Daten von einem Server in unser Jupyter Notebook laden, und dort darstellen, weiter verarbeiten, visualisieren, etc. können.

Video nb

Be aware that no example results are shown, as the code is not executed. It would require an API key, which I cannot publicly provide here.

import requests, json

city = 'Frankfurt'

url = 'http://api.openweathermap.org/data/2.5/weather?q=' + city + '&appid={apikey}'   #you need to get your own api!

r = requests.get(url)
r

print(r.text)

res = r.json()
res

res['main']['temp'] - 273.15

print('The weather in ' + city + ' is:')
print('Temperature: ' + str(round(res['main']['temp'] - 273.15, 1)))
print('Wind Speed: ' + str(res['wind']['speed']) + ' m/s')
print('The sky looks: ' + str(res['weather'][0]['description']) + '...')

nr = 109359

url = 'https://api.geosamples.org/v1/sample/id/' + str(1000 + nr)

r = requests.get(url)

res = r.json()
res

#nr = 109359

for nr in range(10):
    url = 'https://api.geosamples.org/v1/sample/id/' + str(1000 + nr)

    r = requests.get(url)

    print(r.json())

What is an API?

It is a web-address to connect to a database server to e.g., request data for download. API means application interface.

Write down the general strucutre of an API.

https://api.mineralogydata.org/version1/minerals?q=tourmaline&appid=mystrongpassword123

Database results are best converted into a(n) ______

✗excel file

✗list

✓dictionary

An API is basically a web address.

✓True

✗False

Practise

It seems the geosamples API currently dos no longer works as intended. I’ll try and find a new example soon, e.g., using Wikidata
Find a free/open API, e.g., on: https://www.geosamples.org. If you chose this, use SampleType, and from the list choose the last ‘sample’
Import the required commands
Make the API flexible, i.e., with the above example make the sample name flexible
Use the api url to make a request. Check whether you get a correct response
Select one sample and convert the data in json
Display a sensible selection of the result
Extract longitude and latitude from the 10 first samples into a list. Then from all samples

Now for a challenge: Try and extract longitude and latitude from all samples into a list
You will get an error. Find out what the problem is, solve it – and get the list
Hint You will need the – in a general syntax – ‘if key in dict:’

Solution

import requests, json

sampleType = 'basalt'
url = 'https://api.geosamples.org/v1/samples?name=' + sampleType

r = requests.get(url)
print(r.text)

res = r.json()
res[10]

print(res[10]['description'])
print(res[10]['primaryLocationType'])
print(res[10]['launchPlatformName'])
print(res[10]['sampleComment'])

latLong = []

for i in range(10):
    latLong.append([res[i]['latitude'], res[i]['longitude']])
    
latLong

latLong = []

for i in range(len(res)):
    if 'latitude' in res[i]:
        latLong.append([res[i]['latitude'], res[i]['longitude']])
    
latLong

What’s wrong with this code?

def addTwo(number):
    result = number + 2

result

Solution

res is a local variable. No result will be displayed. Add a line in def ‘return result’ and execute the command name -> ‘addTwo()’ with a value.

import pandas as pd

df = pd.DataFrame{
    'student': ['Miriam', 'Jens', 'Jericho'],
    'city': ['Flörsheim', 'Bad Soden', 'Eschborn'],
    'age': [23, 21, 25]
    }

Solution

Brackets are missing after DataFrame and around the dictionary list -> pd.DataFrame({...})

After you solved the previous, what is the problem with:

import pandas as pd

df = pd.DataFrame({
    'student': ['Miriam', 'Jens', 'Jericho'],
    'city': ['Flörsheim', 'Bad Soden', 'Eschborn'],
    'age': [23, 21, 25]
    })

df.iloc[[0, 2], 'age']

Solution

Must be .loc, not iloc when ‘age’ is used – which is a string.

And here?

df.loc[[0, 3], 'age']

Solution

Only 3 series, not 4, and the count starts with 0, i.e, the third column is 2.

7.4 Basics- Manipulating row, columns, filters (again), etc. using pandas [13:55]

~filter, .isin(), .str.contains(), .index, .value_count(), .rename, export, .to_csv, .to_excel

In dieser zweiten Einheit zu Pandas lernen wir weitere Möglichkeiten um Daten in pandas dataframes zu manipulieren, zählen, exportieren, etc. Es macht jedoch keinen Sinn, in diesem Kurs den vollständigen Umfang von Pandas – oder jeglicher anderer Python-Bibliothek – darzustellen. Vielmehr sucht man sich je nach zu lösender Fragestellung die nötigen Befehle aus dem persönlichen Fundus plus der entsprechenden Recherche in den einschlägigen Ressourcen aus dem Internet zusammen.

Video nb

pip install mag4

Requirement already satisfied: mag4 in /Users/dominik/anaconda3/lib/python3.11/site-packages (0.0.214)
Note: you may need to restart the kernel to use updated packages.

import pandas as pd
import mag4 as mg

df = mg.get_data('Mineral Data Example Dataset')
df

	mineral	oxygens	sample	no.	SiO2	TiO2	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	K	total cations
0	ol	4	chd#2	1	42.51	0.04	0.07	0.04	0.18	0.00	...	0.002	0.001	0.004	0.000	0.000	1.944	0.040	0.000	0.000	2.995
1	ol	4	chd#2	9	42.16	0.10	0.06	0.20	3.14	0.07	...	0.002	0.004	0.063	0.001	0.001	1.910	0.006	0.000	0.000	2.992
2	cpx	6	chd#2	4	52.33	2.00	4.09	0.81	0.32	0.10	...	0.172	0.023	0.010	0.003	0.000	1.037	0.805	0.001	0.000	3.977
3	cpx	6	chd#2	6	51.54	1.74	5.72	0.56	0.67	0.09	...	0.241	0.016	0.020	0.003	0.000	1.019	0.795	0.003	0.000	3.984
4	opx	6	chd#2	8	57.09	0.22	0.76	0.48	1.92	0.08	...	0.031	0.013	0.055	0.002	0.001	1.897	0.025	0.001	0.000	4.002
5	opx	6	chd#2	10	59.19	0.19	0.63	0.51	0.67	0.20	...	0.025	0.014	0.019	0.006	0.001	1.906	0.024	0.001	0.000	3.988
6	cpx	6	chd#2	11	58.33	0.30	0.98	0.53	0.33	0.14	...	0.039	0.014	0.009	0.004	0.000	1.754	0.182	0.001	0.000	3.989
7	cpx	6	chd#2	12	53.87	1.10	3.09	0.51	0.21	0.08	...	0.129	0.014	0.006	0.002	0.000	1.164	0.731	0.000	0.000	3.988
8	fs	8	chd#2	3	47.10	0.06	32.39	0.01	0.21	0.00	...	1.754	0.000	0.008	0.000	0.000	0.073	0.885	0.143	0.000	5.029
9	fs	8	chd#2	5	47.72	0.06	31.87	0.01	0.71	0.03	...	1.732	0.000	0.027	0.001	0.000	0.040	0.834	0.190	0.000	5.026
10	ol	4	chd#3	15	42.72	0.05	1.08	0.11	0.37	0.01	...	0.030	0.002	0.007	0.000	0.000	1.904	0.017	0.024	0.000	2.991
11	ol	4	chd#3	17	42.83	0.03	0.11	0.16	0.45	0.00	...	0.003	0.003	0.009	0.000	0.000	1.961	0.012	0.000	0.000	2.992
12	ol	4	chd#3	18	40.43	0.02	0.03	0.11	11.16	0.00	...	0.001	0.002	0.233	0.000	0.000	1.733	0.009	0.000	0.000	2.989
13	ol	4	chd#3	21	42.46	0.05	0.27	0.10	0.32	0.03	...	0.008	0.002	0.006	0.001	0.000	1.951	0.014	0.000	0.000	2.989
14	cpx	6	chd#3	16	45.09	0.56	22.45	0.36	3.62	0.00	...	0.928	0.010	0.106	0.000	0.000	0.720	0.560	0.033	0.000	3.952
15	cpx	6	chd#3	19	46.72	1.91	22.05	0.76	0.64	0.00	...	0.922	0.021	0.019	0.000	0.001	0.388	0.742	0.034	0.000	3.837
16	fsp/mes	8	chd#3	20	46.36	0.67	27.52	0.26	0.53	0.00	...	1.516	0.010	0.021	0.000	0.005	0.330	0.947	0.053	0.000	5.073
17	fsp/mes	8	chd#3	25	47.78	1.23	22.21	0.69	1.17	0.00	...	1.218	0.025	0.046	0.000	0.000	0.611	0.892	0.106	0.000	5.164
18	sp	4	#24 - CAI	74	0.59	0.27	67.71	0.63	5.54	0.03	...	1.972	0.012	0.115	0.001	0.000	0.863	0.004	0.003	0.000	2.990
19	ol/sp	4	#24 - CAI	76	35.13	0.58	14.39	0.30	8.06	0.05	...	0.428	0.006	0.170	0.001	0.000	1.246	0.111	0.051	0.004	2.913
20	ol/sp	4	#24 - CAI	77	35.52	0.49	25.58	0.29	4.88	0.02	...	0.729	0.006	0.099	0.000	0.000	0.631	0.406	0.057	0.001	2.794
21	sp	4	#24 - CAI	84	3.24	1.36	65.06	0.08	3.05	0.01	...	1.826	0.002	0.061	0.000	0.000	0.984	0.010	0.001	0.000	2.985
22	sp	4	#24 - CAI	85	0.16	0.51	69.36	0.26	2.41	0.00	...	1.988	0.005	0.049	0.000	0.000	0.932	0.002	0.000	0.000	2.990
23	cpx	6	#24 - CAI	75	51.72	0.20	3.27	0.05	0.27	0.02	...	0.142	0.001	0.008	0.001	0.000	0.973	0.959	0.031	0.002	4.031
24	cpx	6	#24 - CAI	81	32.79	1.35	26.06	0.46	2.53	0.03	...	1.150	0.014	0.079	0.001	0.000	0.975	0.632	0.058	0.010	4.185

25 rows × 28 columns

A complementary look into filters
Simple filter

fil = (df['mineral'] == 'ol') | (df['mineral'] == 'sp')
df[fil]

	mineral	oxygens	sample	no.	SiO2	TiO2	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	total cations
0	ol	4	chd#2	1	42.51	0.04	0.07	0.04	0.18	0.00	...	0.002	0.001	0.004	0.000	0.000	1.944	0.040	0.000	2.995
1	ol	4	chd#2	9	42.16	0.10	0.06	0.20	3.14	0.07	...	0.002	0.004	0.063	0.001	0.001	1.910	0.006	0.000	2.992
10	ol	4	chd#3	15	42.72	0.05	1.08	0.11	0.37	0.01	...	0.030	0.002	0.007	0.000	0.000	1.904	0.017	0.024	2.991
11	ol	4	chd#3	17	42.83	0.03	0.11	0.16	0.45	0.00	...	0.003	0.003	0.009	0.000	0.000	1.961	0.012	0.000	2.992
12	ol	4	chd#3	18	40.43	0.02	0.03	0.11	11.16	0.00	...	0.001	0.002	0.233	0.000	0.000	1.733	0.009	0.000	2.989
13	ol	4	chd#3	21	42.46	0.05	0.27	0.10	0.32	0.03	...	0.008	0.002	0.006	0.001	0.000	1.951	0.014	0.000	2.989
18	sp	4	#24 - CAI	74	0.59	0.27	67.71	0.63	5.54	0.03	...	1.972	0.012	0.115	0.001	0.000	0.863	0.004	0.003	2.990
21	sp	4	#24 - CAI	84	3.24	1.36	65.06	0.08	3.05	0.01	...	1.826	0.002	0.061	0.000	0.000	0.984	0.010	0.001	2.985
22	sp	4	#24 - CAI	85	0.16	0.51	69.36	0.26	2.41	0.00	...	1.988	0.005	0.049	0.000	0.000	0.932	0.002	0.000	2.990

9 rows × 28 columns

Get the complementary data from the filter

df[~fil]

	mineral	oxygens	sample	no.	SiO2	TiO2	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	K	total cations
2	cpx	6	chd#2	4	52.33	2.00	4.09	0.81	0.32	0.10	...	0.172	0.023	0.010	0.003	0.000	1.037	0.805	0.001	0.000	3.977
3	cpx	6	chd#2	6	51.54	1.74	5.72	0.56	0.67	0.09	...	0.241	0.016	0.020	0.003	0.000	1.019	0.795	0.003	0.000	3.984
4	opx	6	chd#2	8	57.09	0.22	0.76	0.48	1.92	0.08	...	0.031	0.013	0.055	0.002	0.001	1.897	0.025	0.001	0.000	4.002
5	opx	6	chd#2	10	59.19	0.19	0.63	0.51	0.67	0.20	...	0.025	0.014	0.019	0.006	0.001	1.906	0.024	0.001	0.000	3.988
6	cpx	6	chd#2	11	58.33	0.30	0.98	0.53	0.33	0.14	...	0.039	0.014	0.009	0.004	0.000	1.754	0.182	0.001	0.000	3.989
7	cpx	6	chd#2	12	53.87	1.10	3.09	0.51	0.21	0.08	...	0.129	0.014	0.006	0.002	0.000	1.164	0.731	0.000	0.000	3.988
8	fs	8	chd#2	3	47.10	0.06	32.39	0.01	0.21	0.00	...	1.754	0.000	0.008	0.000	0.000	0.073	0.885	0.143	0.000	5.029
9	fs	8	chd#2	5	47.72	0.06	31.87	0.01	0.71	0.03	...	1.732	0.000	0.027	0.001	0.000	0.040	0.834	0.190	0.000	5.026
14	cpx	6	chd#3	16	45.09	0.56	22.45	0.36	3.62	0.00	...	0.928	0.010	0.106	0.000	0.000	0.720	0.560	0.033	0.000	3.952
15	cpx	6	chd#3	19	46.72	1.91	22.05	0.76	0.64	0.00	...	0.922	0.021	0.019	0.000	0.001	0.388	0.742	0.034	0.000	3.837
16	fsp/mes	8	chd#3	20	46.36	0.67	27.52	0.26	0.53	0.00	...	1.516	0.010	0.021	0.000	0.005	0.330	0.947	0.053	0.000	5.073
17	fsp/mes	8	chd#3	25	47.78	1.23	22.21	0.69	1.17	0.00	...	1.218	0.025	0.046	0.000	0.000	0.611	0.892	0.106	0.000	5.164
19	ol/sp	4	#24 - CAI	76	35.13	0.58	14.39	0.30	8.06	0.05	...	0.428	0.006	0.170	0.001	0.000	1.246	0.111	0.051	0.004	2.913
20	ol/sp	4	#24 - CAI	77	35.52	0.49	25.58	0.29	4.88	0.02	...	0.729	0.006	0.099	0.000	0.000	0.631	0.406	0.057	0.001	2.794
23	cpx	6	#24 - CAI	75	51.72	0.20	3.27	0.05	0.27	0.02	...	0.142	0.001	0.008	0.001	0.000	0.973	0.959	0.031	0.002	4.031
24	cpx	6	#24 - CAI	81	32.79	1.35	26.06	0.46	2.53	0.03	...	1.150	0.014	0.079	0.001	0.000	0.975	0.632	0.058	0.010	4.185

16 rows × 28 columns

If only part of a name is known
.isin()

fil = df['mineral'].isin(['ol','sp','fs'])
df.loc[fil]

	mineral	oxygens	sample	no.	SiO2	TiO2	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	total cations
0	ol	4	chd#2	1	42.51	0.04	0.07	0.04	0.18	0.00	...	0.002	0.001	0.004	0.000	0.000	1.944	0.040	0.000	2.995
1	ol	4	chd#2	9	42.16	0.10	0.06	0.20	3.14	0.07	...	0.002	0.004	0.063	0.001	0.001	1.910	0.006	0.000	2.992
8	fs	8	chd#2	3	47.10	0.06	32.39	0.01	0.21	0.00	...	1.754	0.000	0.008	0.000	0.000	0.073	0.885	0.143	5.029
9	fs	8	chd#2	5	47.72	0.06	31.87	0.01	0.71	0.03	...	1.732	0.000	0.027	0.001	0.000	0.040	0.834	0.190	5.026
10	ol	4	chd#3	15	42.72	0.05	1.08	0.11	0.37	0.01	...	0.030	0.002	0.007	0.000	0.000	1.904	0.017	0.024	2.991
11	ol	4	chd#3	17	42.83	0.03	0.11	0.16	0.45	0.00	...	0.003	0.003	0.009	0.000	0.000	1.961	0.012	0.000	2.992
12	ol	4	chd#3	18	40.43	0.02	0.03	0.11	11.16	0.00	...	0.001	0.002	0.233	0.000	0.000	1.733	0.009	0.000	2.989
13	ol	4	chd#3	21	42.46	0.05	0.27	0.10	0.32	0.03	...	0.008	0.002	0.006	0.001	0.000	1.951	0.014	0.000	2.989
18	sp	4	#24 - CAI	74	0.59	0.27	67.71	0.63	5.54	0.03	...	1.972	0.012	0.115	0.001	0.000	0.863	0.004	0.003	2.990
21	sp	4	#24 - CAI	84	3.24	1.36	65.06	0.08	3.05	0.01	...	1.826	0.002	0.061	0.000	0.000	0.984	0.010	0.001	2.985
22	sp	4	#24 - CAI	85	0.16	0.51	69.36	0.26	2.41	0.00	...	1.988	0.005	0.049	0.000	0.000	0.932	0.002	0.000	2.990

11 rows × 28 columns

Select datasets that contain a specific string in a series

fil = df['mineral'].str.contains('fs')
df[fil]

	mineral	oxygens	sample	no.	SiO2	TiO2	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	total cations
8	fs	8	chd#2	3	47.10	0.06	32.39	0.01	0.21	0.00	...	1.754	0.000	0.008	0.000	0.000	0.073	0.885	0.143	5.029
9	fs	8	chd#2	5	47.72	0.06	31.87	0.01	0.71	0.03	...	1.732	0.000	0.027	0.001	0.000	0.040	0.834	0.190	5.026
16	fsp/mes	8	chd#3	20	46.36	0.67	27.52	0.26	0.53	0.00	...	1.516	0.010	0.021	0.000	0.005	0.330	0.947	0.053	5.073
17	fsp/mes	8	chd#3	25	47.78	1.23	22.21	0.69	1.17	0.00	...	1.218	0.025	0.046	0.000	0.000	0.611	0.892	0.106	5.164

4 rows × 28 columns

An additional selection of selection options
Extract specific data (extractData())

cIdf = mg.get_data('Chondrite Element Abundances')
cIdf

	z	symbol	unit	CI	solar abundance	CM	CV	CO	CK	CR	CH	H	L	LL	EH	EL	R	K
0	1	H	wt%	2.02	2.880000e+10	1.40	0.28	0.07	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	2	He	NaN	NaN	2.290000e+09	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	3	Li	ppm	1.50	5.547000e+01	1.50	1.70	1.80	1.4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	4	Be	ppm	0.03	7.370000e-01	0.04	0.05	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	5	B	ppm	0.87	1.732000e+01	0.48	0.30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
82	83	Bi	ppb	110.00	1.388000e-01	71.00	54.00	35.00	20.0	40.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
83	89	Ac	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
84	90	Th	ppb	29.00	NaN	41.00	58.00	80.00	58.0	42.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
85	91	Pa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
86	92	U	ppb	8.00	NaN	12.00	17.00	18.00	15.0	13.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

87 rows × 18 columns

cIdf.loc[84, 'CM']

41.0

Be aware that in the video nb it is called 'Element', not 'symbol', as no required

indexEl = cIdf.index[cIdf['symbol'] == 'Th'][0]
cIdf.loc[indexEl, 'CM']

41.0

def extractData(catCol, elCol, catRow):
    indexEl = cIdf.index[cIdf[catCol] == elCol][0]
    res = cIdf.loc[indexEl, catRow]
    return res

extractData('symbol', 'Al', 'CO')

1.4

List Operations
Count the number of identical elements

df['mineral'].value_counts()

mineral
cpx        8
ol         6
sp         3
opx        2
fs         2
fsp/mes    2
ol/sp      2
Name: count, dtype: int64

Rename Columns

newNames = {'SiO2':'SiO2 (wt%)', 'TiO2':'TiO2 (wt%)'}
df.rename(columns = newNames, inplace = True)
df

	mineral	oxygens	sample	no.	SiO2 (wt%)	TiO2 (wt%)	Al2O3	Cr2O3	FeO	MnO	...	Al	Cr	Fe	Mn	Ni	Mg	Ca	Na	K	total cations
0	ol	4	chd#2	1	42.51	0.04	0.07	0.04	0.18	0.00	...	0.002	0.001	0.004	0.000	0.000	1.944	0.040	0.000	0.000	2.995
1	ol	4	chd#2	9	42.16	0.10	0.06	0.20	3.14	0.07	...	0.002	0.004	0.063	0.001	0.001	1.910	0.006	0.000	0.000	2.992
2	cpx	6	chd#2	4	52.33	2.00	4.09	0.81	0.32	0.10	...	0.172	0.023	0.010	0.003	0.000	1.037	0.805	0.001	0.000	3.977
3	cpx	6	chd#2	6	51.54	1.74	5.72	0.56	0.67	0.09	...	0.241	0.016	0.020	0.003	0.000	1.019	0.795	0.003	0.000	3.984
4	opx	6	chd#2	8	57.09	0.22	0.76	0.48	1.92	0.08	...	0.031	0.013	0.055	0.002	0.001	1.897	0.025	0.001	0.000	4.002
5	opx	6	chd#2	10	59.19	0.19	0.63	0.51	0.67	0.20	...	0.025	0.014	0.019	0.006	0.001	1.906	0.024	0.001	0.000	3.988
6	cpx	6	chd#2	11	58.33	0.30	0.98	0.53	0.33	0.14	...	0.039	0.014	0.009	0.004	0.000	1.754	0.182	0.001	0.000	3.989
7	cpx	6	chd#2	12	53.87	1.10	3.09	0.51	0.21	0.08	...	0.129	0.014	0.006	0.002	0.000	1.164	0.731	0.000	0.000	3.988
8	fs	8	chd#2	3	47.10	0.06	32.39	0.01	0.21	0.00	...	1.754	0.000	0.008	0.000	0.000	0.073	0.885	0.143	0.000	5.029
9	fs	8	chd#2	5	47.72	0.06	31.87	0.01	0.71	0.03	...	1.732	0.000	0.027	0.001	0.000	0.040	0.834	0.190	0.000	5.026
10	ol	4	chd#3	15	42.72	0.05	1.08	0.11	0.37	0.01	...	0.030	0.002	0.007	0.000	0.000	1.904	0.017	0.024	0.000	2.991
11	ol	4	chd#3	17	42.83	0.03	0.11	0.16	0.45	0.00	...	0.003	0.003	0.009	0.000	0.000	1.961	0.012	0.000	0.000	2.992
12	ol	4	chd#3	18	40.43	0.02	0.03	0.11	11.16	0.00	...	0.001	0.002	0.233	0.000	0.000	1.733	0.009	0.000	0.000	2.989
13	ol	4	chd#3	21	42.46	0.05	0.27	0.10	0.32	0.03	...	0.008	0.002	0.006	0.001	0.000	1.951	0.014	0.000	0.000	2.989
14	cpx	6	chd#3	16	45.09	0.56	22.45	0.36	3.62	0.00	...	0.928	0.010	0.106	0.000	0.000	0.720	0.560	0.033	0.000	3.952
15	cpx	6	chd#3	19	46.72	1.91	22.05	0.76	0.64	0.00	...	0.922	0.021	0.019	0.000	0.001	0.388	0.742	0.034	0.000	3.837
16	fsp/mes	8	chd#3	20	46.36	0.67	27.52	0.26	0.53	0.00	...	1.516	0.010	0.021	0.000	0.005	0.330	0.947	0.053	0.000	5.073
17	fsp/mes	8	chd#3	25	47.78	1.23	22.21	0.69	1.17	0.00	...	1.218	0.025	0.046	0.000	0.000	0.611	0.892	0.106	0.000	5.164
18	sp	4	#24 - CAI	74	0.59	0.27	67.71	0.63	5.54	0.03	...	1.972	0.012	0.115	0.001	0.000	0.863	0.004	0.003	0.000	2.990
19	ol/sp	4	#24 - CAI	76	35.13	0.58	14.39	0.30	8.06	0.05	...	0.428	0.006	0.170	0.001	0.000	1.246	0.111	0.051	0.004	2.913
20	ol/sp	4	#24 - CAI	77	35.52	0.49	25.58	0.29	4.88	0.02	...	0.729	0.006	0.099	0.000	0.000	0.631	0.406	0.057	0.001	2.794
21	sp	4	#24 - CAI	84	3.24	1.36	65.06	0.08	3.05	0.01	...	1.826	0.002	0.061	0.000	0.000	0.984	0.010	0.001	0.000	2.985
22	sp	4	#24 - CAI	85	0.16	0.51	69.36	0.26	2.41	0.00	...	1.988	0.005	0.049	0.000	0.000	0.932	0.002	0.000	0.000	2.990
23	cpx	6	#24 - CAI	75	51.72	0.20	3.27	0.05	0.27	0.02	...	0.142	0.001	0.008	0.001	0.000	0.973	0.959	0.031	0.002	4.031
24	cpx	6	#24 - CAI	81	32.79	1.35	26.06	0.46	2.53	0.03	...	1.150	0.014	0.079	0.001	0.000	0.975	0.632	0.058	0.010	4.185

25 rows × 28 columns

Take everything with you
Export a dataframe

df.to_csv('output/test.csv', index = False)
df.to_excel('output/test.xlsx', index = False)

pd.read_excel('output/test.xlsx')

Exercises are in preparation

7.5 Program- Analysing the Eu-anomaly [16:40]

Available Notebooks: Lecture, Exercise, Solution

Zum Abschluss der Einheit wollen wir ein konkretes Beispiel für Data Science anschauen. Die Eu- Anomalie kennen vermutlich alle – sonst schnell mal nachschauen, was das ist. Wie sie berechnet wird, wird jedoch im Video erklärt. Am Ende werden wir ein sehr kurzes Programm haben, mit dem sich (i) die Eu-Anomalie analysieren lässt, und das (ii) sehr flexibel ist, d.h. mit dem wir nun prinzipiell die Eu-Anomalie jeder beliebigen Datenbank analysieren können.

Video nb

Import libraries and read the file

import pandas as pd
import matplotlib.pyplot as plt
import mag4 as mg

df = mg.get_data('Tanzania Craton Archean')

Data selection and clean-up
*Be aware that the syntax for the dropna() argument changed. Instead of dropna(0) it is now dropna(axis=0).

cISmEuGd = [.15, .057, .2]

smEuGdData = df.loc[:, ['Sm', 'Eu', 'Gd']].dropna(axis=0) / cISmEuGd

smEuGdData

	Sm	Eu	Gd
11	34.600000	25.789474	21.10
12	28.066667	19.649123	19.80
13	41.933333	29.473684	27.35
14	49.266667	32.280702	30.60
15	26.800000	20.526316	18.50
...	...	...	...
230	49.200000	33.508772	28.60
231	54.200000	37.017544	28.55
232	52.400000	34.035088	29.15
233	47.800000	30.877193	27.25
234	45.333333	28.947368	25.55

206 rows × 3 columns

len(smEuGdData)

Calculate the anomaly and convert to a pandas dataframe

res = []

for n in range(len(smEuGdData)):
    l = smEuGdData.iloc[n].tolist()
    euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
    res.append(round(euAn))


res

[-7,
 -18,
 -15,
 -19,
 -9,
 -9,
 -17,
 -11,
 -12,
 -4,
 -10,
 -3,
 -15,
 -9,
 -8,
 -15,
 -12,
 -10,
 -17,
 -11,
 -14,
 -10,
 0,
 -11,
 -15,
 -10,
 -17,
 -21,
 -7,
 -15,
 -12,
 -8,
 -13,
 -12,
 -10,
 -18,
 -19,
 -14,
 -10,
 -22,
 -22,
 -17,
 1,
 1,
 -60,
 0,
 -3,
 5,
 -4,
 -9,
 1,
 1,
 0,
 6,
 0,
 7,
 11,
 -14,
 -3,
 -3,
 0,
 -1,
 5,
 -7,
 -5,
 13,
 1,
 -6,
 -3,
 1,
 14,
 -20,
 14,
 42,
 0,
 7,
 -19,
 58,
 8,
 6,
 7,
 3,
 -8,
 8,
 -16,
 7,
 22,
 -2,
 0,
 35,
 3,
 10,
 -31,
 -29,
 -15,
 -11,
 -7,
 -5,
 -3,
 -7,
 13,
 12,
 -6,
 -7,
 -2,
 3,
 1,
 3,
 0,
 -3,
 -25,
 -36,
 -4,
 -9,
 -38,
 -33,
 -38,
 -14,
 -41,
 -29,
 -24,
 -36,
 -26,
 -21,
 -9,
 -32,
 -48,
 -22,
 -15,
 -35,
 -43,
 -20,
 22,
 -19,
 -27,
 -10,
 -11,
 -1,
 -92,
 -98,
 -98,
 -83,
 6,
 -2,
 6,
 -2,
 17,
 7,
 0,
 8,
 8,
 -1,
 12,
 -2,
 3,
 3,
 -1,
 -12,
 -4,
 23,
 9,
 7,
 12,
 -3,
 18,
 19,
 20,
 4,
 6,
 1,
 3,
 13,
 -10,
 -5,
 14,
 -25,
 -34,
 -56,
 -44,
 9,
 -73,
 -70,
 -52,
 -38,
 -25,
 -31,
 -54,
 -33,
 -42,
 -68,
 -11,
 -48,
 -12,
 -20,
 -33,
 -48,
 -54,
 -52,
 -59,
 -13,
 -8,
 -14,
 -11,
 -17,
 -18,
 -18]

Visualise the data

plt.plot(res)

All in one (or at least the last 3 parts in one)

cISmEuGd = [.15, .057, .2]
smEuGdData = df.loc[:, ['Sm', 'Eu', 'Gd']].dropna(axis=0) / cISmEuGd

res = []

for n in range(len(smEuGdData)):
    l = smEuGdData.iloc[n].tolist()
    euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
    res.append(round(euAn))
    
plt.plot(res)
plt.show()

Your opinion is required: With regard to a program like analysing the Eu-anomaly: how much time and skill do you think is requried to preciscly define the calculation and how it is executed? and how much time and skill do you think is required to write the code?

I think that understanding and preparing the actual calculation, i.e., everything before coding requires great attention and time. The code might still also take some time. But without concentrated preparation, coding will quickly become futile.

Q1. In the .dropna() method: how do you delete all datasets where all values are missing? -> This is a bit tricky, as there was a recent change in how the syntax needs to be used.

✓.dropna(axis=0, how=‘any’)

✗.dropna()

Practise

We now want to calculate the anomaly of every REE element.
The element of interest shall be selectable from a drop-down menu.
Start with importing the required libraries and/or commands. Below we will ned the REE CI concentrations. Start with extracting these.
To do this, read in the CI data from 'Chondrite Element Abundances' using mag4 into the variable ‘cIdf’.
Define a command to find the index of a specific element.
(Maybe store this handy posRow() command in your 'mineralogyModule.py'.)
Use this command to select only the datasates with the SEE.
Extract the CI REE (-> from La to Lu) concentrations as a list from the imported chondrite element abundances file. Make sure you extracted the correct values.
Delete Pm from this list.
Below follows the program from the corresponding unit. Convert this program into a command called 'calcREEAnomaly'. Input to the command shall only be the REE element for which the anomlay shall be calculated, and the dataset, for which the anomaly shall be calculated.
As this is not trivial, you might want to have a peak at the solution.
As we only have the REE for which the anomaly shall be calculated, we first need modify the program to accomodate for this.
First, produce a list of the names of the REE without Pm.
Using the input REE, produce the list of the 3 REE called ‘threeREE’ required for calculating the anomaly.
You can then do the same with the cIREE list, i.e., selcet the REE values for these elements.

Then modify the program above:
(i) so that the 3 REE are now automaticall chosen depending on the single REE input.
(ii) so that the 3 REE CI concentrations are now automaticall chosen depending on the single REE input.

Finally you can convert this program into the command 'calcREEAnomaly' that calculates the anomaly of every REE (except for two, do you know which?), depending on user input.
Read in the 'West African Cratons' dataset as df.
Finally test your freshly defined command with e.g., calcREEAnomaly('Eu').
Expand your command so it displays a graph rather than a list of values.
Test wheter it works.
‘mineralogyModule’. (Maybe add this command to your 'mineralogyModule.py'.)
And for easier usage, make it interactive, where you can choose the element from a drop-down menu – that displays only thos elements, for which anomalies can be calculated.
(You might occassionally get an error, don’t worry, just try another element. Most should work. The error is most likely because of some missing or zero values. These could be avoided with accoriding ‘if’ statements and some kind of informative message.)

Solution

import pandas as pd

Be aware that the key is no longer 'Element', but 'symbol'

cIdf = mg.get_data('Chondrite Element Abundances')

def posRow(name):
    return cIdf.index[cIdf['symbol'] == name][0]

cIdf.loc[posRow('La'):posRow('Lu')]

	z	symbol	unit	CI	solar abundance	CM	CV	CO	CK	CR	CH	H	L	LL	EH	EL	R	K
56	57	La	ppm	0.235	0.44200	0.320	0.469	0.380	0.460	0.310	0.290	0.301	0.318	0.330	0.240	0.196	0.310	0.320
57	58	Ce	ppm	0.620	1.19100	0.940	1.190	1.140	1.270	0.750	0.870	0.763	0.970	0.880	0.650	0.580	0.830	NaN
58	59	Pr	ppm	0.094	0.17310	0.137	0.174	0.140	NaN	NaN	NaN	0.120	0.140	0.130	0.100	0.070	NaN	NaN
59	60	Nd	ppm	0.460	0.83550	0.626	0.919	0.850	0.990	0.790	NaN	0.581	0.700	0.650	0.440	0.370	NaN	NaN
60	61	Pm	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
61	62	Sm	ppm	0.150	0.25420	0.204	0.294	0.250	0.290	0.230	0.185	0.194	0.203	0.205	0.140	0.149	0.180	0.200
62	63	Eu	ppm	0.057	0.09599	0.078	0.105	0.096	0.110	0.080	0.076	0.074	0.080	0.078	0.052	0.054	0.072	0.080
63	64	Gd	ppm	0.200	0.33880	0.290	0.405	0.390	0.440	0.320	0.290	0.275	0.317	0.290	0.210	0.196	NaN	NaN
64	65	Tb	ppm	0.037	0.05701	0.051	0.071	0.060	NaN	0.050	0.050	0.049	0.059	0.054	0.034	0.032	NaN	NaN
65	66	Dy	ppm	0.250	0.39220	0.332	0.454	0.420	0.490	0.280	0.310	0.305	0.372	0.360	0.230	0.245	0.029	NaN
66	67	Ho	ppm	0.056	0.08666	0.077	0.097	0.096	0.100	0.100	0.070	0.074	0.089	0.082	0.050	0.051	0.059	NaN
67	68	Er	ppm	0.160	0.25040	0.221	0.277	0.305	0.350	NaN	NaN	0.213	0.252	0.240	0.160	0.160	NaN	NaN
68	69	Tm	ppm	0.025	0.03700	0.035	0.048	0.040	NaN	NaN	0.040	0.033	0.038	0.035	0.024	0.023	NaN	NaN
69	70	Yb	ppm	0.160	0.24840	0.215	0.312	0.270	0.320	0.220	0.210	0.203	0.226	0.230	0.154	0.157	0.216	0.215
70	71	Lu	ppm	0.025	0.03738	0.033	0.046	0.039	0.046	0.032	0.030	0.033	0.034	0.034	0.025	0.025	0.032	0.033

cIREE = cIdf.loc[posRow('La'):posRow('Lu'), 'CI'].tolist()

cIREE.pop(4)

cIREE

[0.235,
 0.62,
 0.094,
 0.46,
 0.15,
 0.057,
 0.2,
 0.037,
 0.25,
 0.056,
 0.16,
 0.025,
 0.16,
 0.025]

cISmEuGd = [.15, .057, .2]
smEuGdData = df.loc[:, ['Sm', 'Eu', 'Gd']].dropna(axis=0) / cISmEuGd

res = []

for n in range(len(smEuGdData)):
    l = smEuGdData.iloc[n].tolist()
    euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
    res.append(round(euAn))

Be aware that the key is no longer 'Element', but 'symbol'

ree = cIdf.loc[posRow('La'):posRow('Lu'), 'symbol'].tolist()

ree.pop(4)
ree

['La',
 'Ce',
 'Pr',
 'Nd',
 'Sm',
 'Eu',
 'Gd',
 'Tb',
 'Dy',
 'Ho',
 'Er',
 'Tm',
 'Yb',
 'Lu']

# input REE
el = 'Eu'

# either
leftREE = ree[ree.index(el) - 1]
rightREE =ree[ree.index(el) + 1]

threeREE = [leftREE, el, rightREE]
print(threeREE)

# or
threeREE = [ree[ree.index(el) - 1], el, ree[ree.index(el) + 1]]

print(threeREE)

['Sm', 'Eu', 'Gd']
['Sm', 'Eu', 'Gd']

el = 'Eu'

cISmEuGd = [cIREE[ree.index(el) - 1], cIREE[ree.index(el)], cIREE[ree.index(el) + 1]]
threeREE = [ree[ree.index(el) - 1], el, ree[ree.index(el) + 1]]

smEuGdData = df.loc[:, threeREE].dropna(axis=0) / cISmEuGd

res = []

for n in range(len(smEuGdData)):
    l = smEuGdData.iloc[n].tolist()
    euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
    res.append(round(euAn))

def calcREEAnomaly(el):
    cISmEuGd = [cIREE[ree.index(el) - 1], cIREE[ree.index(el)], cIREE[ree.index(el) + 1]]
    threeREE = [ree[ree.index(el) - 1], el, ree[ree.index(el) + 1]]

    smEuGdData = df.loc[:, threeREE].dropna(axis=0) / cISmEuGd

    res = []

    for n in range(len(smEuGdData)):
        l = smEuGdData.iloc[n].tolist()
        euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
        res.append(round(euAn))
    
    return res

df = mg.get_data('West African Craton')

calcREEAnomaly('Eu')

[-13,
 2,
 6,
 21,
 -65,
 -11,
 -23,
 15,
 -64,
 -46,
 12,
 -5,
 13,
 12,
 12,
 10,
 0,
 7,
 -2,
 -5,
 12,
 -3,
 -1]

import matplotlib.pyplot as plt

def calcREEAnomaly(el):
    cISmEuGd = [cIREE[ree.index(el) - 1], cIREE[ree.index(el)], cIREE[ree.index(el) + 1]]
    threeREE = [ree[ree.index(el) - 1], el, ree[ree.index(el) + 1]]

    smEuGdData = df.loc[:, threeREE].dropna(axis=0) / cISmEuGd

    res = []

    for n in range(len(smEuGdData)):
        l = smEuGdData.iloc[n].tolist()
        euAn = ((l[1] / ((l[0] + l[2]) / 2)) - 1) * 100
        res.append(round(euAn))
    
    plt.plot(res)
    
    return plt.show()

calcREEAnomaly('Sm')

from ipywidgets import interact

interact(calcREEAnomaly, el = ['Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb'])

<function __main__.calcREEAnomaly(el)>

What’s wrong with this code?

import matplotlib.pyplot as plt

xData = [2, 5, 3, 6, 9]
yData = [6, 2, 7, 4, 1]

plot(xData, yData)

show()

Solution

plt.’ is missing before ‘plot(…)’ and ‘show()’

l = [4, 5, 7, 2, 1]

for i in range[len[l]]:
    print(i + 4)

Solution

range and len need round brackets -> range(len(l))