6  Statistik II: Regression & Darstellung von Fehlerbalken [40:29]

stabw, stderr, gausskurve,

6.1 Basics- Function plotting with the numpy package  [09:45]

numpy, .linspace(), .sin(), .log(), .power(), warnings 
Available Notebooks: Lecture, Exercise, Solution

Bislang haben wir Daten aus Datenbanken oder Datensätzen in Diagrammen dargestellt und ausgewertet. Letztlich sehen wir bei einer solchen Daten-Darstellung immer den letzten Zustand eines bestimmten Prozesses – oder, noch präziser formuliert: ein Prozess wird zu einem bestimmten Zeitpunkt eingefroren, und die Daten beschreiben diesen eingefrorenen Zustand. Ganz direkt trifft das bei einem Magmatit zu, der ja tatsächlich nichts anderes als gefrorenes Magma ist. Unser Ziel ist es, anhand der Daten des eingefrorenen Prozesses den Prozess des Magmas davor zu rekonstruieren. Das machen wir anhand von Modellen wie z.B. fraktionierte Kristallisation, aber auch Datierung, mit Hilfe einer eingefrorenen Isotopenzusammensetzung. Ein Prozess wird mit einer Gleichung beschrieben, welche über eine Funktion in einem Diagramm dargestellt wird. Dazu schauen wir uns an, wie man eine Funktion in Jupyter Notebooks darstellt. 

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import warnings
warnings.filterwarnings('ignore')
#warnings.filterwarnings(action = 'once')
x = np.linspace(0, 10, 200)
y1 = np.sin(x)
y2 = np.log(x)
y3 = np.power(x, 0.6)

plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(x, y3)

plt.show()

def complexModel(a, b, c):
    x = np.linspace(0, 10, 200)
    h = a * 9/5 - b
    y = h * np.sin(.3 * np.power(x, 1.2) - c * np.log(x))
    plt.plot(x, y)
    return plt.show()
interact(complexModel, a = (0, 5), b = (2, 7, .2), c = (1, 5, .1))
<function __main__.complexModel(a, b, c)>

First, x-axis values are defined, e.g., using the numpy .linspace() method. Second, the y-axis values are calculated using the function, applied to the just defined the x-axis vlaues.

True

False

True

False

Search for the equation for the radioactive decay and plot this function.
Add x- and y-labels to the plot.

import numpy as np
import matplotlib.pyplot as plt
Rb87HalfLife = 48.8  #in Ga
Rb87lambda = np.log(2)/Rb87HalfLife #log in numpy is Ln!! 

#Euler's number in numpy is 'exp' – as can be seen with this example:
print(np.exp(1))

t = np.linspace(0, 100, 200)
y = np.exp(-Rb87lambda * t)

plt.plot(t, y)
plt.xlabel('t (Ga)')
plt.ylabel('Remaining $^{87}$Rb')

plt.show()
2.718281828459045

The goal is to plot the batch melting function, with the melt degree F along the x-axis. The equation for batch melting is:

\[ c_l = \frac{c_0}{F + D * (1 - F)}\]

import numpy as np
import matplotlib.pyplot as plt

c0 = 23.86
D = 3.2

x = linspace(0, 1, 75)
y = c0 / (F + D * (1 - F))

plt.plot(x, y)

show(plt)

coming

6.2 Program- Rayleigh isotope fractionation  [15:08]

.axhline() 
Available Notebooks: Lecture, Exercise, Solution

Rayleigh fractionation describes the isotope composition of the various phases during evaporation and condensation, with the following parameters:
f (0…1): Fraction of the gas (condensation) or liquid (evaporation)
\(\alpha\): Fraction factor
Rl,0: Initial bulk liquid composition
Rv,0: Initial bulk gas composition
Rv: Composition in the remining gas
Rl: Composition of the current liquid
Rv,0: Initial bulk gas composition
Rl: Composition of the entire, condensed liquid

Evaporation Equations
Isotope ratio of the remaining liquid: \[ R_l = R_{l_0} f^{\frac{1}{\alpha} -1}\]
Isotope ratio of the currently evaporating gas: \[ R_v = R_{l_0} \frac{1}{\alpha} f^{\frac{1}{\alpha} -1}\]
Isotope ratio of the evaporated gas so far: \[ \overline{R_v} = R_{l_0} \frac{1 - f^{\frac{1} {\alpha}}} {1 - f} \]

Condensation Equations
Isotope ratio of the remaining gas phase: \[ R_v = R_{v_0} f^{\alpha -1}\]
Isotope ratio of the currently evaporating liquid: \[ R_l = R_{v_0} \alpha f^{\alpha -1}\]
Isotope ratio of the evaporated liquid so far: \[ \overline{R_l} = R_{v_0} \frac{1 - f^\alpha} {1 - f} \]

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact

import warnings
warnings.filterwarnings('ignore')
#warnings.filterwarnings(action='once')
def pEvapCond(alpha, Rl0):
    f = np.linspace(0, 1, 2000)

    yEvap1 = Rl0 * np.power(f, (1/alpha - 1))
    yEvap2 = Rl0 * (1/alpha) *np.power(f, (1/alpha - 1))
    yEvap3 = Rl0 * (1 - np.power(f, 1/alpha))/(1 - f)
    yCond1 = Rl0 * np.power(f, alpha - 1)
    yCond2 = Rl0 * alpha * np.power(f, alpha - 1)
    yCond3 = Rl0 * (1 - np.power(f, alpha))/(1 - f)

    fig, (ax1, ax2) = plt.subplots(1, 2, sharey = True, figsize = (9, 3))

    ax1.plot(f, yEvap1, f, yEvap2, f, yEvap3)
    ax2.plot(f, yCond1, f, yCond2, f, yCond3)

    ax1.set_xlim(0, 1)
    ax2.set_xlim(0, 1)
    ax1.set_ylim(0, 2)
    ax2.set_ylim(0, 2)

    ax1.set_xlabel('f (fraction liquid)')
    ax2.set_xlabel('f (fraction gas)')
    ax1.set_ylabel('isotope ratio')
    ax1.set_title('Evaporation', color = 'b')
    ax2.set_title('Condensation', color = 'b')

    ax1.legend(['remaining liquid', 'currently evap. gas', 'bulk evap. gas'])
    ax2.legend(['remaining gas', 'currently cond. liquid', 'bulk cond. liquid'])

    ax1.axhline(y = 1, color = 'grey', linestyle = '--', lw = 1)
    ax2.axhline(y = 1, color = 'grey', linestyle = '--', lw = 1)

    return plt.show()

interact(pEvapCond, alpha = (.1, 2), Rl0 = (0, 5,.01))
<function __main__.pEvapCond(alpha, Rl0)>

The value 3.141 is assigened to the variable pi, the value 31.432 is assigened to the variable val, and the string ‘imaginative’ is assigened to the variable param.

True

False

True

False

Show an interactive decay plot with the half-life as variable. In a plot next to this, shwo the increase of the daughter isotope.
The x-axis shall go from 0 to the age of the Earth.
Suppress the appearing warnings.

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import warnings
warnings.filterwarnings('ignore')
def radDecay(halfLife):
    t = np.linspace(0, 4.56)  #t in Ga

    decay = np.exp(-np.log(2)/halfLife * t)
    growth = 1 - decay

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (9, 3))

    ax1.plot(t, decay)
    ax2.plot(t, growth)
    ax1.set_xlabel('t (Ga)')
    ax2.set_xlabel('t (Ga)')
    ax1.set_ylabel('decay')
    ax2.set_ylabel('growth')
    ax1.set_xlim(0, 4.56)
    ax2.set_xlim(0, 4.56)
    ax1.set_ylim(0, 1.1)
    ax2.set_ylim(0, 1.1)

    return plt.show()

interact(radDecay, halfLife = (0, 5, .1))
<function __main__.radDecay(halfLife)>
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10)
y1 = np.sin(x)
y2 = np.cos(x)

fig, (ax1, ax2) = plt.subplot(1, 2, figsize = (9, 6))

ax1 = plt.subplots(2, 2, 1)
ax2 = plt.subplots(2, 2, 1)

plt.ax1(x, y1)
plt.ax2(x, y2)

plt.show()

coming

6.3 Command- Linear Regression  [09:02]

scipy, .stats, linregress(), 
Available Notebooks: Lecture, Exercise, Solution

Wir haben die Daten-Darstellung kennen gelernt, die Modellierung, auch Einblicke in die Daten- Analyse erhalten – und natürlich gibt es ausgiebig Packages zu etwas statistischen Analysen. Beispielhaft möchte ich die lineare Regression zeigen, die wir sehr häufig in der analytischen Mineralogie verwenden. Von dort ausgehend lassen sich leicht Packages und Methoden für die eigenen Problemstellungen finden. Und letztlich sind Methoden wie die lineare Regression nichts anderes als Funktionen, welche nicht mehr programmiert werden müssen, sondern schon vorgefertigt in Packages heraus verwendet werden können. 

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
df = mg.get_data('chdfeisoexdat')
df.columns.tolist()
['Sample', 'n', 'δ56/54Fe', '2 sd (56)', 'δ57/54Fe', '2 sd (57)']
plt.scatter(df['δ56/54Fe'], df['δ57/54Fe'])

slope, intercept, rvalue, pvalue, stderr = stats.linregress(df['δ56/54Fe'], df['δ57/54Fe'])
slope, intercept, rvalue, pvalue, stderr
(1.5028159607866804,
 -0.006918546227736001,
 0.9907863083403945,
 2.350657702586129e-36,
 0.03248066905659502)
xReg = np.linspace(-.8, .4)
yReg = slope * xReg + intercept

plt.plot(xReg, yReg, '--', color = 'grey', lw = 2, zorder = 1)
plt.scatter(df['δ56/54Fe'], df['δ57/54Fe'], zorder = 2)

The value 3.141 is assigened to the variable pi, the value 31.432 is assigened to the variable val, and the string ‘imaginative’ is assigened to the variable param.

True

False

order

xorder

yorder

zorder

A challenge
Make an interactive plot in wich you can select a csv-file out of all csv-files from a folder using a drop-dwon menu.
Make drop-downs for the REE that you extract from one file.
Also plot a regression through the data.

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets
from scipy import stats
# Selecting all Georoc files from mag4
df_mag4 = mg.available_datasets()
fil = df_mag4['Source'] == 'Georoc'
df_mag4_georoc = df_mag4[fil][['Source', 'Short Title', 'Title']]
df_mag4_georoc
Source Short Title Title
0 Georoc eastsearid Easter Seamount Chain Salas Y Gomez Ridge
4 Georoc bastcrat Bastar Craton
6 Georoc ninetyrid Ninetyeast Ridge
9 Georoc emei Emeishan
10 Georoc mcdis McDonald Islands
11 Georoc namcord North American Cordillera - Paleozoic
15 Georoc karaf Karoo Province - Africa
18 Georoc galis Galapagos Islands
20 Georoc hybibplat Hyblean or Iblean Plateau, Sicily
27 Georoc baizone Baical Rift Zone
28 Georoc westafcrat West African Craton
29 Georoc bandarc Banda Arc
30 Georoc tanzcratarch Tanzania Craton Archean
filesList = df_mag4_georoc['Title'].tolist()

elList = mg.get_data(filesList[0]).columns.tolist()[118:132]

filesList, elList
(['Easter Seamount Chain Salas Y Gomez Ridge',
  'Bastar Craton',
  'Ninetyeast Ridge',
  'Emeishan',
  'McDonald Islands',
  'North American Cordillera - Paleozoic',
  'Karoo Province - Africa',
  'Galapagos Islands',
  'Hyblean or Iblean Plateau, Sicily',
  'Baical Rift Zone',
  'West African Craton',
  'Banda Arc',
  'Tanzania Craton Archean'],
 ['La',
  'Ce',
  'Pr',
  'Nd',
  'Sm',
  'Eu',
  'Gd',
  'Tb',
  'Dy',
  'Ho',
  'Er',
  'Tm',
  'Yb',
  'Lu'])

We usually use interact(sc_plot, xEl=('Si', elements), yEl=elements). I modified this slightly using widgets (a package I import above together with interact), so an initial value for the dropdown menues can be preselected. There is no error in this modified, new part.

def dataWithRegression(file, el1, el2):
    df = mg.get_data(file)

    #dropna is critical, as regression does not work, should there be cells with no data
    data = df.loc[:, [el1, el2]].dropna()

    xAxis = data[el1]
    yAxis = data[el2]

    #Using these min and max methods makes the code even more flexible
    xAxisValues = np.linspace(min(xAxis), max(xAxis))

    slope, intercept, rvalue, pvalue, stderr = stats.linregress(xAxis, yAxis)

    plt.scatter(xAxis, yAxis, marker = 6, s = 70, color = 'teal', zorder = 2)
    plt.plot(xAxisValues, slope * xAxisValues + intercept, '--', color = 'grey', zorder = 1)
    plt.xlabel(el1)
    plt.ylabel(el2)
    return plt.show()

interact(dataWithRegression, file = filesList, el1=widgets.Dropdown(options=elList, value='La'), el2=widgets.Dropdown(options=elList, value='Ce'))
<function __main__.dataWithRegression(file, el1, el2)>
import pandas as pd
import matplotlib.pyplot as plt

df = mg.get_data('Hyblean or Iblean Plateau, Sicily')

df.iloc[:, ('Mn', 'Al')].dropna

coming

a, b, c = [3, 1, 7]

if a == 3:
    print(b)
else c = 4:
    print(b)
else b > a
    print('what a stupid code')

coming

for i in range.4:
    def addTwo(i)
        i + 2
    return i

coming

6.4 Command- Plot data points with error bars  [06:34]

.errorbar() 
Available Notebooks: Lecture, Exercise, Solution

Einer der wichtigsten Parameter der Analytik sind deren Fehler und Standardabweichung in allen Variation (intern, extern, long-term, etc.). Natürlich lassen sich Fehler einfach und mit hoher Granularität in Abbildungen darstellen. Auch Fehler-Umhüllende sind einfach darstellbar, auf die ich einmal nicht eingehen werde, sondern nur darauf, Fehlerbalken den Datenpunkten einfach hinzuzufügen. 

import mag4 as mg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
df = mg.get_data('chdfeisoexdat')
df.columns.tolist()
['Sample', 'n', 'δ56/54Fe', '2 sd (56)', 'δ57/54Fe', '2 sd (57)']
slope, intercept, rvalue, pvalue, stderr = stats.linregress(df['δ56/54Fe'], df['δ57/54Fe'])
xReg = np.linspace(-.8, .4)
yReg = slope * xReg + intercept

xData = 'δ56/54Fe'
yData = 'δ57/54Fe'

plt.plot(xReg, yReg, '--', color = 'grey', lw = 2, zorder = 1)
plt.errorbar(df[xData], df[yData]
             ,xerr = .05 * abs(df['δ56/54Fe'])
             ,yerr = abs(df['2 sd (57)'])
             ,fmt = 'o'
             ,color = 'black'
             ,elinewidth = .6
             ,capsize = 2
             ,zorder = 2)
plt.xlabel(xData)
plt.ylabel(yData)
Text(0, 0.5, 'δ57/54Fe')

The basic structure is actually very similar. In fact, replacing .scatter() by .errorbar() does not change anything. The difference is that .errobar() takes additional arguments such as xerr, yerr and styling arguments for the error bars.

True

False

True

False

Make an interactive plot in wich you can select a csv-file out of all csv-files from a folder using a drop-dwon menu.
Make drop-downs for the REE that you extract from one file.
Plot the data with errorbars. The error shall be selectable with a slider. The error shall be a relative error, between 0 and 20% of the data value.

import mag4 as mg
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets
# Selecting all Georoc files from mag4
df_mag4 = mg.available_datasets()
fil = df_mag4['Source'] == 'Georoc'
df_mag4_georoc = df_mag4[fil][['Source', 'Short Title', 'Title']]
df_mag4_georoc
Source Short Title Title
0 Georoc eastsearid Easter Seamount Chain Salas Y Gomez Ridge
4 Georoc bastcrat Bastar Craton
6 Georoc ninetyrid Ninetyeast Ridge
9 Georoc emei Emeishan
10 Georoc mcdis McDonald Islands
11 Georoc namcord North American Cordillera - Paleozoic
15 Georoc karaf Karoo Province - Africa
18 Georoc galis Galapagos Islands
20 Georoc hybibplat Hyblean or Iblean Plateau, Sicily
27 Georoc baizone Baical Rift Zone
28 Georoc westafcrat West African Craton
29 Georoc bandarc Banda Arc
30 Georoc tanzcratarch Tanzania Craton Archean
filesList = df_mag4_georoc['Title'].tolist()

elList = mg.get_data(filesList[0]).columns.tolist()[118:132]

filesList, elList
(['Easter Seamount Chain Salas Y Gomez Ridge',
  'Bastar Craton',
  'Ninetyeast Ridge',
  'Emeishan',
  'McDonald Islands',
  'North American Cordillera - Paleozoic',
  'Karoo Province - Africa',
  'Galapagos Islands',
  'Hyblean or Iblean Plateau, Sicily',
  'Baical Rift Zone',
  'West African Craton',
  'Banda Arc',
  'Tanzania Craton Archean'],
 ['La',
  'Ce',
  'Pr',
  'Nd',
  'Sm',
  'Eu',
  'Gd',
  'Tb',
  'Dy',
  'Ho',
  'Er',
  'Tm',
  'Yb',
  'Lu'])

We usually use interact(sc_plot, xEl=('Si', elements), yEl=elements). I modified this slightly using widgets (a package I import above together with interact), so an initial value for the dropdown menues can be preselected. There is no error in this modified, new part.

def dataWithRegression(file, el1, el2, err):
    df = mg.get_data(file)

    plt.errorbar(df[el1], df[el1]
             ,xerr = err * df[el1]
             ,yerr = err * df[el1]
             ,fmt = 'o'
             ,color = 'black'
             ,elinewidth = .6
             ,capsize = 2
             , zorder = 2)

    plt.xlabel(el1)
    plt.ylabel(el2)
    return plt.show()

interact(dataWithRegression, file = filesList, el1=widgets.Dropdown(options=elList, value='La'), el2=widgets.Dropdown(options=elList, value='Ce'), err = (0, .2, .01))
<function __main__.dataWithRegression(file, el1, el2, err)>
df = mg.get_data(file)

plt.scatter(df[el1], df[el1]
          ,xerr = err / df[el1]
          ,yerr = err / df[el1]
          ,fmt = 'o'
          ,color = 'black'
          ,elinewidth = .6
          ,capsize = 2
          , zorder = 2)

plt.xlabel(el1)
plt.ylabel(el2)
return plt.show()

file, el1, el2, and err need to be defined
must be plt.errorbar(), not plt.scatter should be err * df[el1], not err / df[el1] return before plt.show() must be removed, it is not a function definition