6 Statistik II: Regression & Darstellung von Fehlerbalken [40:29]

stabw, stderr, gausskurve,

6.1 Basics- Function plotting with the numpy package [09:45]

numpy, .linspace(), .sin(), .log(), .power(), warnings
Available Notebooks: Lecture, Exercise, Solution

Bislang haben wir Daten aus Datenbanken oder Datensätzen in Diagrammen dargestellt und ausgewertet. Letztlich sehen wir bei einer solchen Daten-Darstellung immer den letzten Zustand eines bestimmten Prozesses – oder, noch präziser formuliert: ein Prozess wird zu einem bestimmten Zeitpunkt eingefroren, und die Daten beschreiben diesen eingefrorenen Zustand. Ganz direkt trifft das bei einem Magmatit zu, der ja tatsächlich nichts anderes als gefrorenes Magma ist. Unser Ziel ist es, anhand der Daten des eingefrorenen Prozesses den Prozess des Magmas davor zu rekonstruieren. Das machen wir anhand von Modellen wie z.B. fraktionierte Kristallisation, aber auch Datierung, mit Hilfe einer eingefrorenen Isotopenzusammensetzung. Ein Prozess wird mit einer Gleichung beschrieben, welche über eine Funktion in einem Diagramm dargestellt wird. Dazu schauen wir uns an, wie man eine Funktion in Jupyter Notebooks darstellt.

Video nb

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import warnings
warnings.filterwarnings('ignore')
#warnings.filterwarnings(action = 'once')

x = np.linspace(0, 10, 200)
y1 = np.sin(x)
y2 = np.log(x)
y3 = np.power(x, 0.6)

plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(x, y3)

plt.show()

def complexModel(a, b, c):
    x = np.linspace(0, 10, 200)
    h = a * 9/5 - b
    y = h * np.sin(.3 * np.power(x, 1.2) - c * np.log(x))
    plt.plot(x, y)
    return plt.show()

interact(complexModel, a = (0, 5), b = (2, 7, .2), c = (1, 5, .1))

<function __main__.complexModel(a, b, c)>

Describe the principle how functions are plotted.

First, x-axis values are defined, e.g., using the numpy .linspace() method. Second, the y-axis values are calculated using the function, applied to the just defined the x-axis vlaues.

The ‘warnings’ package allows to suppres calculations such as dividing by 0.

✗True

✓False

A function is plotted on the interval defined by plt.xlim(xmin, xmax)

✗True

✓False

Practise

Search for the equation for the radioactive decay and plot this function.
Add x- and y-labels to the plot.

Solution

import numpy as np
import matplotlib.pyplot as plt

Rb87HalfLife = 48.8  #in Ga
Rb87lambda = np.log(2)/Rb87HalfLife #log in numpy is Ln!! 

#Euler's number in numpy is 'exp' – as can be seen with this example:
print(np.exp(1))

t = np.linspace(0, 100, 200)
y = np.exp(-Rb87lambda * t)

plt.plot(t, y)
plt.xlabel('t (Ga)')
plt.ylabel('Remaining $^{87}$Rb')

plt.show()

2.718281828459045

What’s wrong with this code?

The goal is to plot the batch melting function, with the melt degree F along the x-axis. The equation for batch melting is:

\[ c_l = \frac{c_0}{F + D * (1 - F)}\]

import numpy as np
import matplotlib.pyplot as plt

c0 = 23.86
D = 3.2

x = linspace(0, 1, 75)
y = c0 / (F + D * (1 - F))

plt.plot(x, y)

show(plt)

Solution

coming

6.2 Program- Rayleigh isotope fractionation [15:08]

.axhline()
Available Notebooks: Lecture, Exercise, Solution

Rayleigh fractionation describes the isotope composition of the various phases during evaporation and condensation, with the following parameters:
f (0…1): Fraction of the gas (condensation) or liquid (evaporation)
\(\alpha\): Fraction factor
R_l,0: Initial bulk liquid composition
R_v,0: Initial bulk gas composition
R_v: Composition in the remining gas
R_l: Composition of the current liquid
R_v,0: Initial bulk gas composition
R_l: Composition of the entire, condensed liquid

Evaporation Equations
Isotope ratio of the remaining liquid: \[ R_l = R_{l_0} f^{\frac{1}{\alpha} -1}\]
Isotope ratio of the currently evaporating gas: \[ R_v = R_{l_0} \frac{1}{\alpha} f^{\frac{1}{\alpha} -1}\]
Isotope ratio of the evaporated gas so far: \[ \overline{R_v} = R_{l_0} \frac{1 - f^{\frac{1} {\alpha}}} {1 - f} \]

Condensation Equations
Isotope ratio of the remaining gas phase: \[ R_v = R_{v_0} f^{\alpha -1}\]
Isotope ratio of the currently evaporating liquid: \[ R_l = R_{v_0} \alpha f^{\alpha -1}\]
Isotope ratio of the evaporated liquid so far: \[ \overline{R_l} = R_{v_0} \frac{1 - f^\alpha} {1 - f} \]

Video nb

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact

import warnings
warnings.filterwarnings('ignore')
#warnings.filterwarnings(action='once')

def pEvapCond(alpha, Rl0):
    f = np.linspace(0, 1, 2000)

    yEvap1 = Rl0 * np.power(f, (1/alpha - 1))
    yEvap2 = Rl0 * (1/alpha) *np.power(f, (1/alpha - 1))
    yEvap3 = Rl0 * (1 - np.power(f, 1/alpha))/(1 - f)
    yCond1 = Rl0 * np.power(f, alpha - 1)
    yCond2 = Rl0 * alpha * np.power(f, alpha - 1)
    yCond3 = Rl0 * (1 - np.power(f, alpha))/(1 - f)

    fig, (ax1, ax2) = plt.subplots(1, 2, sharey = True, figsize = (9, 3))

    ax1.plot(f, yEvap1, f, yEvap2, f, yEvap3)
    ax2.plot(f, yCond1, f, yCond2, f, yCond3)

    ax1.set_xlim(0, 1)
    ax2.set_xlim(0, 1)
    ax1.set_ylim(0, 2)
    ax2.set_ylim(0, 2)

    ax1.set_xlabel('f (fraction liquid)')
    ax2.set_xlabel('f (fraction gas)')
    ax1.set_ylabel('isotope ratio')
    ax1.set_title('Evaporation', color = 'b')
    ax2.set_title('Condensation', color = 'b')

    ax1.legend(['remaining liquid', 'currently evap. gas', 'bulk evap. gas'])
    ax2.legend(['remaining gas', 'currently cond. liquid', 'bulk cond. liquid'])

    ax1.axhline(y = 1, color = 'grey', linestyle = '--', lw = 1)
    ax2.axhline(y = 1, color = 'grey', linestyle = '--', lw = 1)

    return plt.show()

interact(pEvapCond, alpha = (.1, 2), Rl0 = (0, 5,.01))

<function __main__.pEvapCond(alpha, Rl0)>

How can multiple functions be plotted in a single plt.plot(…) command?

The value 3.141 is assigened to the variable pi, the value 31.432 is assigened to the variable val, and the string ‘imaginative’ is assigened to the variable param.

A function (e.g., R0 * np.power(f, alpha - 1)) can be directly placed into the plt.plot(…) command.

✓True

✗False

Variables from interactive controls can be used for multiple plots.

✓True

✗False

Practise

Show an interactive decay plot with the half-life as variable. In a plot next to this, shwo the increase of the daughter isotope.
The x-axis shall go from 0 to the age of the Earth.
Suppress the appearing warnings.

Solution

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import warnings
warnings.filterwarnings('ignore')

def radDecay(halfLife):
    t = np.linspace(0, 4.56)  #t in Ga

    decay = np.exp(-np.log(2)/halfLife * t)
    growth = 1 - decay

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (9, 3))

    ax1.plot(t, decay)
    ax2.plot(t, growth)
    ax1.set_xlabel('t (Ga)')
    ax2.set_xlabel('t (Ga)')
    ax1.set_ylabel('decay')
    ax2.set_ylabel('growth')
    ax1.set_xlim(0, 4.56)
    ax2.set_xlim(0, 4.56)
    ax1.set_ylim(0, 1.1)
    ax2.set_ylim(0, 1.1)

    return plt.show()

interact(radDecay, halfLife = (0, 5, .1))

<function __main__.radDecay(halfLife)>

What’s wrong with this code?

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10)
y1 = np.sin(x)
y2 = np.cos(x)

fig, (ax1, ax2) = plt.subplot(1, 2, figsize = (9, 6))

ax1 = plt.subplots(2, 2, 1)
ax2 = plt.subplots(2, 2, 1)

plt.ax1(x, y1)
plt.ax2(x, y2)

plt.show()

Solution

coming

6.3 Command- Linear Regression [09:02]

scipy, .stats, linregress(),
Available Notebooks: Lecture, Exercise, Solution

Wir haben die Daten-Darstellung kennen gelernt, die Modellierung, auch Einblicke in die Daten- Analyse erhalten – und natürlich gibt es ausgiebig Packages zu etwas statistischen Analysen. Beispielhaft möchte ich die lineare Regression zeigen, die wir sehr häufig in der analytischen Mineralogie verwenden. Von dort ausgehend lassen sich leicht Packages und Methoden für die eigenen Problemstellungen finden. Und letztlich sind Methoden wie die lineare Regression nichts anderes als Funktionen, welche nicht mehr programmiert werden müssen, sondern schon vorgefertigt in Packages heraus verwendet werden können.

Video nb

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

df = mg.get_data('chdfeisoexdat')
df.columns.tolist()

['Sample', 'n', 'δ56/54Fe', '2 sd (56)', 'δ57/54Fe', '2 sd (57)']

plt.scatter(df['δ56/54Fe'], df['δ57/54Fe'])

slope, intercept, rvalue, pvalue, stderr = stats.linregress(df['δ56/54Fe'], df['δ57/54Fe'])
slope, intercept, rvalue, pvalue, stderr

(1.5028159607866804,
 -0.006918546227736001,
 0.9907863083403945,
 2.350657702586129e-36,
 0.03248066905659502)

xReg = np.linspace(-.8, .4)
yReg = slope * xReg + intercept

plt.plot(xReg, yReg, '--', color = 'grey', lw = 2, zorder = 1)
plt.scatter(df['δ56/54Fe'], df['δ57/54Fe'], zorder = 2)

What is the result of: pi, val, param = [3.141, 31.432, ‘imaginative’]

The value 3.141 is assigened to the variable pi, the value 31.432 is assigened to the variable val, and the string ‘imaginative’ is assigened to the variable param.

stats is a Python Package

✗True

✓False

What is the argument called to define the layer on which data are drawn?

✗order

✗xorder

✗yorder

✓zorder

Practise

A challenge
Make an interactive plot in wich you can select a csv-file out of all csv-files from a folder using a drop-dwon menu.
Make drop-downs for the REE that you extract from one file.
Also plot a regression through the data.

Solution

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets
from scipy import stats

# Selecting all Georoc files from mag4
df_mag4 = mg.available_datasets()
fil = df_mag4['Source'] == 'Georoc'
df_mag4_georoc = df_mag4[fil][['Source', 'Short Title', 'Title']]
df_mag4_georoc

	Source	Short Title	Title
0	Georoc	eastsearid	Easter Seamount Chain Salas Y Gomez Ridge
4	Georoc	bastcrat	Bastar Craton
6	Georoc	ninetyrid	Ninetyeast Ridge
9	Georoc	emei	Emeishan
10	Georoc	mcdis	McDonald Islands
11	Georoc	namcord	North American Cordillera - Paleozoic
15	Georoc	karaf	Karoo Province - Africa
18	Georoc	galis	Galapagos Islands
20	Georoc	hybibplat	Hyblean or Iblean Plateau, Sicily
27	Georoc	baizone	Baical Rift Zone
28	Georoc	westafcrat	West African Craton
29	Georoc	bandarc	Banda Arc
30	Georoc	tanzcratarch	Tanzania Craton Archean

filesList = df_mag4_georoc['Title'].tolist()

elList = mg.get_data(filesList[0]).columns.tolist()[118:132]

filesList, elList

(['Easter Seamount Chain Salas Y Gomez Ridge',
  'Bastar Craton',
  'Ninetyeast Ridge',
  'Emeishan',
  'McDonald Islands',
  'North American Cordillera - Paleozoic',
  'Karoo Province - Africa',
  'Galapagos Islands',
  'Hyblean or Iblean Plateau, Sicily',
  'Baical Rift Zone',
  'West African Craton',
  'Banda Arc',
  'Tanzania Craton Archean'],
 ['La',
  'Ce',
  'Pr',
  'Nd',
  'Sm',
  'Eu',
  'Gd',
  'Tb',
  'Dy',
  'Ho',
  'Er',
  'Tm',
  'Yb',
  'Lu'])

We usually use interact(sc_plot, xEl=('Si', elements), yEl=elements). I modified this slightly using widgets (a package I import above together with interact), so an initial value for the dropdown menues can be preselected. There is no error in this modified, new part.

def dataWithRegression(file, el1, el2):
    df = mg.get_data(file)

    #dropna is critical, as regression does not work, should there be cells with no data
    data = df.loc[:, [el1, el2]].dropna()

    xAxis = data[el1]
    yAxis = data[el2]

    #Using these min and max methods makes the code even more flexible
    xAxisValues = np.linspace(min(xAxis), max(xAxis))

    slope, intercept, rvalue, pvalue, stderr = stats.linregress(xAxis, yAxis)

    plt.scatter(xAxis, yAxis, marker = 6, s = 70, color = 'teal', zorder = 2)
    plt.plot(xAxisValues, slope * xAxisValues + intercept, '--', color = 'grey', zorder = 1)
    plt.xlabel(el1)
    plt.ylabel(el2)
    return plt.show()

interact(dataWithRegression, file = filesList, el1=widgets.Dropdown(options=elList, value='La'), el2=widgets.Dropdown(options=elList, value='Ce'))

<function __main__.dataWithRegression(file, el1, el2)>

What’s wrong with this code?

import pandas as pd
import matplotlib.pyplot as plt

df = mg.get_data('Hyblean or Iblean Plateau, Sicily')

df.iloc[:, ('Mn', 'Al')].dropna

Solution

coming

a, b, c = [3, 1, 7]

if a == 3:
    print(b)
else c = 4:
    print(b)
else b > a
    print('what a stupid code')

Solution

coming

for i in range.4:
    def addTwo(i)
        i + 2
    return i

Solution

coming

6.4 Command- Plot data points with error bars [06:34]

.errorbar()
Available Notebooks: Lecture, Exercise, Solution

Einer der wichtigsten Parameter der Analytik sind deren Fehler und Standardabweichung in allen Variation (intern, extern, long-term, etc.). Natürlich lassen sich Fehler einfach und mit hoher Granularität in Abbildungen darstellen. Auch Fehler-Umhüllende sind einfach darstellbar, auf die ich einmal nicht eingehen werde, sondern nur darauf, Fehlerbalken den Datenpunkten einfach hinzuzufügen.

Video nb

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

df = mg.get_data('chdfeisoexdat')
df.columns.tolist()

['Sample', 'n', 'δ56/54Fe', '2 sd (56)', 'δ57/54Fe', '2 sd (57)']

slope, intercept, rvalue, pvalue, stderr = stats.linregress(df['δ56/54Fe'], df['δ57/54Fe'])
xReg = np.linspace(-.8, .4)
yReg = slope * xReg + intercept

xData = 'δ56/54Fe'
yData = 'δ57/54Fe'

plt.plot(xReg, yReg, '--', color = 'grey', lw = 2, zorder = 1)
plt.errorbar(df[xData], df[yData]
             ,xerr = .05 * abs(df['δ56/54Fe'])
             ,yerr = abs(df['2 sd (57)'])
             ,fmt = 'o'
             ,color = 'black'
             ,elinewidth = .6
             ,capsize = 2
             ,zorder = 2)
plt.xlabel(xData)
plt.ylabel(yData)

Text(0, 0.5, 'δ57/54Fe')

How do .scatter() and .errorbar() differe from each other?

The basic structure is actually very similar. In fact, replacing .scatter() by .errorbar() does not change anything. The difference is that .errobar() takes additional arguments such as xerr, yerr and styling arguments for the error bars.

Error bars can only be absolute or relative.

✓True

✗False

The method ‘errorbar’ requires the import of scipy.

✗True

✓False

Practise

Make an interactive plot in wich you can select a csv-file out of all csv-files from a folder using a drop-dwon menu.
Make drop-downs for the REE that you extract from one file.
Plot the data with errorbars. The error shall be selectable with a slider. The error shall be a relative error, between 0 and 20% of the data value.

Solution

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets

# Selecting all Georoc files from mag4
df_mag4 = mg.available_datasets()
fil = df_mag4['Source'] == 'Georoc'
df_mag4_georoc = df_mag4[fil][['Source', 'Short Title', 'Title']]
df_mag4_georoc

	Source	Short Title	Title
0	Georoc	eastsearid	Easter Seamount Chain Salas Y Gomez Ridge
4	Georoc	bastcrat	Bastar Craton
6	Georoc	ninetyrid	Ninetyeast Ridge
9	Georoc	emei	Emeishan
10	Georoc	mcdis	McDonald Islands
11	Georoc	namcord	North American Cordillera - Paleozoic
15	Georoc	karaf	Karoo Province - Africa
18	Georoc	galis	Galapagos Islands
20	Georoc	hybibplat	Hyblean or Iblean Plateau, Sicily
27	Georoc	baizone	Baical Rift Zone
28	Georoc	westafcrat	West African Craton
29	Georoc	bandarc	Banda Arc
30	Georoc	tanzcratarch	Tanzania Craton Archean

filesList = df_mag4_georoc['Title'].tolist()

elList = mg.get_data(filesList[0]).columns.tolist()[118:132]

filesList, elList

(['Easter Seamount Chain Salas Y Gomez Ridge',
  'Bastar Craton',
  'Ninetyeast Ridge',
  'Emeishan',
  'McDonald Islands',
  'North American Cordillera - Paleozoic',
  'Karoo Province - Africa',
  'Galapagos Islands',
  'Hyblean or Iblean Plateau, Sicily',
  'Baical Rift Zone',
  'West African Craton',
  'Banda Arc',
  'Tanzania Craton Archean'],
 ['La',
  'Ce',
  'Pr',
  'Nd',
  'Sm',
  'Eu',
  'Gd',
  'Tb',
  'Dy',
  'Ho',
  'Er',
  'Tm',
  'Yb',
  'Lu'])

def dataWithRegression(file, el1, el2, err):
    df = mg.get_data(file)

    plt.errorbar(df[el1], df[el1]
             ,xerr = err * df[el1]
             ,yerr = err * df[el1]
             ,fmt = 'o'
             ,color = 'black'
             ,elinewidth = .6
             ,capsize = 2
             , zorder = 2)

    plt.xlabel(el1)
    plt.ylabel(el2)
    return plt.show()

interact(dataWithRegression, file = filesList, el1=widgets.Dropdown(options=elList, value='La'), el2=widgets.Dropdown(options=elList, value='Ce'), err = (0, .2, .01))

<function __main__.dataWithRegression(file, el1, el2, err)>

What’s wrong with this code?

df = mg.get_data(file)

plt.scatter(df[el1], df[el1]
          ,xerr = err / df[el1]
          ,yerr = err / df[el1]
          ,fmt = 'o'
          ,color = 'black'
          ,elinewidth = .6
          ,capsize = 2
          , zorder = 2)

plt.xlabel(el1)
plt.ylabel(el2)
return plt.show()

Solution

file, el1, el2, and err need to be defined
must be plt.errorbar(), not plt.scatter should be err * df[el1], not err / df[el1] return before plt.show() must be removed, it is not a function definition