Documentation

Author

Jie Xu & Dominik C. Hezel

Published

January 25, 2023

1 Boron LA-ICP-MS Data Reduction Application

1.1 Introduction

This program is capable of:

Read multiple .exp data files
Read additional .csv files
Outlier rejection
Background correction
Intra-sequence Instrumental drift correction
Ablation volume dependent B concentration offset correction
Combination of calculation results, laser parameters and trace elements results
Ready to use final data table

1.2 How to use the Progame

Upload data files from Neptune: click the ‘Browse files’ botton, then, use ‘command A’ to choose all ‘.exp’ data files required from Neptune.
Set up parameters for isotopic data: (1). drag slider to choose bacground and signal area. Orange color zone represents background, and blue color zone is signal area. (2). set your outlier factor: with smaller number, more data will be cut as outlier, which can be observed from red spots. (3). set the bulge factor for 11B factor: this is related with bulge correction from 10B, 0.6 here as a factor is defined by Dr. Axel Gerdes. (4). choose your standard for intra-sequence instrumental drift correction: the name, the ‘A/B/C/D’ inside the name of standard, the regression level.
Upload your log file from laser: click the ‘Browse files’ botton and choose your laser file. (have a check if it is matched with isotopic data.)
Set up parameters for corrected boron concentration from signal intensity: (1). the regression level for boron concentration correction. (2). insert the depth of selected reference depth and other sample depth if you have. Otherwise, just keep it. (3). insert the shape of your spots: circle or squre. (4). tell us if you used split stream or not.

*5. Upload your trace element file if you used split stream. (not necessary)

download your final results as a csv file.

1.3 Input File Requirements

Input datafiles from Neptune: (1). ‘.dat’, ‘.exp’, ‘.log’, ‘.TDT’ four type of datafiles for each measurement would be produced. Only ‘.exp’ can be read successfully and can be openned by excel. (2). underds datafiles are named in a format of ‘num-A/B/C/D/U’(e.g. ‘001-A’, ‘002-B’), num represents sequence number, A/B/C/D are four label for standards, U is label for unknown samples. Attention: all datafiles in one sequence need to be all uploaded once! (3). Inside each ‘.exp’ datafile, data start from the 23th row, and columns(‘9.9’, ‘10B’, ‘10.2’, ‘11B’) are necessary for data processing according to our method.
Input laser file: this is a csv file produced during abalation. Please have a check about all recorded information, which should be in the same order with Neptune datafile. Error may happen here.
Input trace element datafile: trace element data required from Ladr here. Raw data should be processed by Ladr and be appended here.

1.4 Description of the Python Code

Required packages

Code

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statistics as stt
from scipy import stats
from scipy.optimize import curve_fit

Uploading files, multiple files and only .exp files are allowed.

Code

if st.button('clear uploaded data'):
    st.session_state.uploaded_files = []

if 'uploaded_files' in st.session_state and len(st.session_state.uploaded_files) != 0:
    uploaded_files = st.session_state.uploaded_files

else:
    st.session_state.uploaded_files = st.file_uploader('upload files', type=['exp'], accept_multiple_files=True)

1.4.1 Explaining the Functions

-> Include here explanations of what the functions do to the data, e.g., why the regression, why higher orders: or – why the subtraction of the backgrounds, what two backgrounds exist: the ‘normal’ one, and one from an unknown source

def selSmpType(dataFiles)

Get the sequence number for each datafile from their file name. The sequence number can be used for Instrumental drift correction later.

Code

def selSmpType(dataFiles):
    l = []
    for file in dataFiles:    
        l.append(float(file.split('_')[0]))
    return l

def outlierCorrection(data, factorSD)

Outlier rejection of data, data is out of factorSD times of standard deviation will be taken as outliers. The first one is used for plot, the second one is used for calculation.

Code

def outlierCorrection_plot(data, factorSD):
    element_signal = np.array(data)
    mean = np.mean(element_signal, axis=0)
    sd = np.std(element_signal, axis=0)
    fil = (data < mean + factorSD * sd) & (data > mean - factorSD * sd)
    return fil


def outlierCorrection(data, factorSD):
    element_signal = np.array(data)
    mean = np.mean(element_signal, axis=0)
    sd = np.std(element_signal, axis=0)

    return [x for x in data if (x > mean - factorSD * sd) and (x < mean + factorSD * sd)]

def parseBoronTable(file)

find the useful data body from uploaded ‘.exp’ datafiles.

Code

def parseBoronTable(file):
    #content = file.read()
    content = file.getvalue().decode("utf-8")
    fname = file.__dict__["name"]
    _start = content.find("Cycle\tTime")
    _end = content.find("***\tCup")
    myTable = content[_start:_end-1]

    cleanFname = f"temp/{fname}_cleanTable"
    with open(cleanFname, "w") as _:
        _.write(myTable)

    df = pd.read_csv(cleanFname,
                     sep='\t',
                     # dtype="float"   #not working -->time
                     )

    return df, fname

def sig_selection()

Plot the signal selection zone.

Code

# st.session_state.sample_plot = st.selectbox(
#     'Which is your sample to plot?',
#     (st.session_state.uploaded_files))
    
# def sig_selection():
#     average_B = []
#     df_data, filename = parseBoronTable(st.session_state.sample_plot)
#     df_data = df_data[['Cycle', '9.9', '10B', '10.2', '11B']].astype(float)

#     fig, ax = plt.subplots()
#     ax.plot(df_data['11B'], label='11B', c='green')
#     ax.plot(df_data['10B'], label='10B', c='firebrick')
#     ax.set_ylabel('signal intensity')
#     ax.set_xlabel('cycle')
#     x = df_data['11B'].index.to_numpy()
#     ax.fill_between(x, max(df_data['11B']), where=(
#         x < st.session_state.sig_end) & (x > st.session_state.sig_str), alpha=0.5)
#     ax.fill_between(x, max(df_data['11B']), where=(
#         x < st.session_state.bac_end) & (x > st.session_state.bac_str), alpha=0.5)

#     ax.legend()
#     return fig

def bacground_sub(factorSD, factor_B11)

Background subtraction. ‘9.9’, ‘10B’, ’10.2’ and ‘11B’ are useful data here. (1). noise substraction from each cup. (2). bulge is defined by ‘9.9’ and ’10.2’. The average value of ‘9.9’ and ’10.2’ is applied for 10B correction, multiply 0.6 of the average value is applied for 11B correction. (3). the outlier data is plotted here. (4). the average of 11B/10B, standard deviation and name of datafile are returned.

Code

def bacground_sub(factorSD, factor_B11):
    average_B = []
    for i in st.session_state.uploaded_files:
        df_data, filename = parseBoronTable(i)
        df_data = df_data[['Cycle', '9.9', '10B', '10.2', '11B']].astype(float)

        df_bacground_mean = df_data[st.session_state.bac_str:st.session_state.bac_end].mean()
        df_signal = df_data[st.session_state.sig_str:st.session_state.sig_end]

        df_bacground_sub = df_signal - df_bacground_mean
        df_bacground_sub['10B_bulc_sub'] = df_bacground_sub['10B'] - \
            (df_bacground_sub['9.9']+df_bacground_sub['10.2'])/2
        df_bacground_sub['11B_bulc_sub'] = df_bacground_sub['11B'] - \
            factor_B11*(df_bacground_sub['9.9']+df_bacground_sub['10.2'])/2
        df_bacground_sub['11B/10B'] = df_bacground_sub['11B_bulc_sub'] / \
            df_bacground_sub['10B_bulc_sub']
        fil = outlierCorrection_plot(df_bacground_sub['11B/10B'], factorSD)
        res_iso = df_bacground_sub['11B/10B'][fil]
        res_iso_outlier = df_bacground_sub['11B/10B'][~fil]
        res_11B = outlierCorrection(df_bacground_sub['11B'], factorSD)
        if i == st.session_state.sample_plot:
            fig1, ax = plt.subplots()
            ax.plot(df_bacground_sub['11B/10B'], 'ko')
            ax.plot(res_iso_outlier, 'ro', label='outliers')
            ax.set_ylabel('$^{11}B$/$^{1O}B$')
            ax.legend()
            st.pyplot(fig1)
        average_B.append({'filename': filename, '11B': np.mean(
            res_11B), '11B/10B_row': np.mean(res_iso), 'se': np.std(res_iso)/np.sqrt(len(res_iso))})

    df = pd.DataFrame(average_B)
    st.session_state.average_B = df

    return df

def polynomFit(inp, *args)

used for regression function.

Code

def polynomFit(inp, *args):
    x=inp
    res=0
    for order in range(len(args)):
        res+=args[order] * x**order
    return res

def regression(x, y, ref_stand, order, listname)

Get the correction function of the Intra-sequence Instrumental drift.

Code

def regression(x, y, ref_stand, order, listname):
    x_use = np.array(x)
    popt, pcov = curve_fit(polynomFit, xdata=x_use, ydata=y , p0=[0]*(order+1))
    fitData=polynomFit(x_use,*popt)
    
    res = []
    for unknown in listname:
        y_unknown = ref_stand / polynomFit(unknown,*popt)
        res.append({'factor': y_unknown})
    return(pd.DataFrame(res))

def regression_plot(x, y, ref_stand, order, listname)

Return the plot the regress line.

Code

def regression_plot(x, y, ref_stand, order, listname):
    fig, ax = plt.subplots()
    ax.plot(x, y, label='measuered', marker='o', linestyle='none' )
    x_use = np.array(x)
    popt, pcov = curve_fit(polynomFit, xdata=x_use, ydata=y , p0=[0]*(order+1))
    fitData=polynomFit(x_use,*popt)
    ax.plot(x_use, fitData, label='polyn. fit, order '+str(order), linestyle='--' )
    ax.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
    
    return fig

def prepare_trace(datafile)

Prepare trace element datafile from Ladr: change the column titles and change data formate from str to float.

Code

def prepare_trace(datafile):
    if 'LR' in datafile.columns[14]:
        del datafile['44Ca(LR)']
        del datafile['26Mg(LR)']
    else:
        del datafile['44Ca']
        del datafile['26Mg']

    datafile.columns = datafile.columns.str.replace('\d+', '')
    datafile.columns = datafile.columns.str.replace('\('+'LR'+'\)', '')
    res = []
    for i in range(13, len(datafile.columns)):
        for j in datafile.iloc[:, i]:
            if '<' in j:
                res.append(j)
    RES = datafile.replace(to_replace=res, value='nan', regex=True)
    RES2 = RES.replace(
        {'ERROR: Error (#1002): Internal standard composition can not be 0': np.nan})
    RES3 = RES2.replace(
        {'ERROR: Error (#1003): Calibration RM composition does not contain analyte element': np.nan})
    RES4 = RES3.iloc[:, 13:].astype(float)
    columns = RES4.iloc[:, 13:].columns
    RES4[columns] = RES4.iloc[:, 13:]
    RES4[' Sequence Number'] = RES3['LB#']
    return(RES4)

def processData()

Use functions for Intra-sequence Instrumental drift.

Code

def processData():
    st.set_option('deprecation.showPyplotGlobalUse', False)
    st.subheader('1.1 select your background and signal area')
    st.session_state.bac_str, st.session_state.bac_end = st.slider('Select bacground', 0, 200, (5, 70))
    st.session_state.sig_str, st.session_state.sig_end = st.slider('Select signal', 0, 200, (95, 175))
    st.pyplot(sig_selection())

    st.subheader('1.2 Please set your outlier and bulge factor')
    outlier_factor = st.number_input('insert your outlier factor (means data is outlier_factor times of sd will be cut)',
                                     value=1.5)
    bulc_factor = st.number_input(
        'insert your bulge factor for 11B correction', value=0.6)



    if "average_B" in st.session_state:
        df_data = st.session_state.average_B
    else:
        df_data = bacground_sub(outlier_factor, bulc_factor)

    st.subheader(
        '1.3 Please choose your standard for boron isotopes correction')

    standard = st.selectbox(
        'NIST 612 or B5 for correction?',
        ('NIST SRM 612', 'B5'))
    if standard == 'B5':
        number_iso = int(4.0332057)
        number_trace = int(8.42)
        SRM951_value = int(4.0492)

    if standard == 'NIST SRM 612':
        number_iso = int(4.05015)
        number_trace = int(35)
        SRM951_value = int(4.0545)

    st.session_state.standard_values = {
        "number_iso" : number_iso,
        "number_trace" : number_trace,
        "SRM951_value" : SRM951_value

    }

    st.session_state.sample_correction = st.selectbox(
        'Which type is your choosed standard?',
        ('A', 'B', 'C', 'D'))

    st.session_state.default_reg_level = 4
    st.session_state.regress_level = st.number_input('insert your regression level (4 is recommended)', step=1, value=st.session_state.default_reg_level, format='%X'
                                                     )
    fil = df_data['filename'].str.contains(st.session_state.sample_correction)
    df_data_B = df_data[fil]
    df_data[' Sequence Number'] = selSmpType(df_data['filename'])

    y_isotope = df_data_B['11B/10B_row']
    y_11B = df_data_B['11B']
    x = df_data_B.index.to_numpy()

    factor_iso = regression(x, y_isotope,
                            number_iso,
                            st.session_state.regress_level if "regress_level" in st.session_state else st.session_state.default_reg_level,
                            df_data.index.to_numpy()
                            )

    df_data['factor_iso'] = factor_iso

    df_data['11B/10B_corrected'] = df_data['factor_iso']*df_data['11B/10B_row']
    df_data['δ11B'] = ((df_data['11B/10B_corrected']/SRM951_value)-1)*1000
    df_data['δ11B_se'] = (df_data['se']*df_data['factor_iso']/SRM951_value)*1000

    st.session_state.df_data = df_data
    st.session_state.df_data_B = df_data_B

def processLaser()

Use functions and volume factors for corrected boron concerntrations.

Code

def processLaser():
    if "df_data" in st.session_state:
        st.header('2. Please upload your log file from Laser')
        st.session_state.uploaded_laser_file = st.file_uploader("Choose a laser file", type='csv')
        if st.session_state.uploaded_laser_file is not None:
            st.session_state.df_Laser = pd.read_csv(st.session_state.uploaded_laser_file)

            st.session_state.df_Laser_part1 = st.session_state.df_Laser[st.session_state.df_Laser[' Laser State']
                                    == 'On'].iloc[:, [13, 20, 21]]
            st.session_state.df_Laser_part2 = st.session_state.df_Laser[st.session_state.df_Laser[' Sequence Number'].notnull()].iloc[:, [
                    1, 4]]

            st.session_state.df_Laser_res = pd.concat([st.session_state.df_Laser_part2.reset_index(
                    drop=True), st.session_state.df_Laser_part1.reset_index(drop=True)], axis=1)
                    
            st.session_state.df_map1 = st.session_state.df_Laser_res.merge(st.session_state.df_data, on=' Sequence Number')

            st.subheader('2.1 B concerntration correction')
            st.session_state.regress_level_B = st.number_input('insert your regression level for [B] (4 is recommended)', 
            step=1, 
            value=st.session_state.default_reg_level, 
            format='%X'
                                                            )     

            y_isotope = st.session_state.df_data_B['11B/10B_row']
            y_11B = st.session_state.df_data_B['11B']
            x = st.session_state.df_data_B.index.to_numpy()   
            factor_B = regression(x, y_11B, st.session_state.standard_values["number_trace"],
                            st.session_state.regress_level_B if "regress_level_B" in st.session_state else st.session_state.default_reg_level_B, 
                            st.session_state.df_data.index.to_numpy()
                            )
            st.session_state.df_map1['factor_B'] = factor_B
            

            depth_ref = st.number_input('insert the abalation depth of selected reference / µm', value = 30.0)
            depth_sample = st.number_input('insert the abalation depth of other samples / µm', value = 30.0)
                    
            depth_ratios = []
            for i in st.session_state.df_map1['filename'].str.contains('A'):
                if i == True:
                    depth_ratio = 1 
                else:
                    depth_ratio = depth_sample / depth_ref
                depth_ratios.append(depth_ratio)

            st.session_state.df_map1['depth_correction'] = depth_ratios

            spot_shape = st.selectbox(
                        'What is the type of your spots?',
                        ('circle', 'squre'))
            if spot_shape == 'circle':
                st.session_state.df_map1[' Spot Size (um)'] = st.session_state.df_Laser_res[' Spot Size (um)']
                ref = ((st.session_state.df_map1[st.session_state.df_map1['filename'].str.contains(st.session_state.sample_correction)][' Spot Size (um)']/2)**2).mean()
                st.session_state.df_map1['[B]_corrected'] = st.session_state.df_map1['11B']*st.session_state.df_map1['factor_B'] * (ref / ((st.session_state.df_map1[' Spot Size (um)']/2)**2) / depth_ratios)

            if spot_shape == 'squre':

                dia = st.session_state.df_map1[' Spot Size (um)']
                spotsize = dia.str.split(' ').str[0].apply(lambda x: float(x))
                st.session_state.df_map1[' Spot Size (um)'] = spotsize
                ref = ((st.session_state.df_map1[st.session_state.df_map1['filename'].str.contains(st.session_state.sample_correction)][' Spot Size (um)'])**2).mean()
                st.session_state.df_map1['[B]_corrected'] = st.session_state.df_map1['11B']*st.session_state.df_map1['factor_B'] * (ref / ((st.session_state.df_map1[' Spot Size (um)'])**2) / depth_ratios)   
    
            st.session_state.df_map1 = st.session_state.df_map1

def maping()

upload trace element datafile and merge laser parameter, isotopic results and trace element compositions into one file based on sequence number.

Code

def maping():
    if "df_map1" in st.session_state:
        st.subheader('2.2 export results or append your trace elements')

        trace_file = st.selectbox(
            'split stream or not?',
            ('Split stream', 'No'))

        if trace_file == 'No':
            st.session_state.df_all = st.session_state.df_map1


        elif trace_file == 'Split stream':
            st.header('3. Please upload your trace element data processed from Ladr')

            st.session_state.trace = st.file_uploader("Choose a file", type='csv', accept_multiple_files=True)
            if "trace" in  st.session_state and len(st.session_state.trace) > 0:

                trace_file = pd.read_csv(st.session_state.trace[0])

                #trace_file = pd.read_csv('2022-11-28-Si corrected-B5.csv')

                df_trace = prepare_trace(trace_file)

                st.session_state.df_all = st.session_state.df_map1.merge(df_trace, on=' Sequence Number')
                # fig4, ax = plt.subplots()
                # ax.plot([0,1],[0,1], transform=ax.transAxes, c = 'red')
                # ax.scatter(st.session_state.df_all['[B]_corrected'], st.session_state.df_all['B'], s =70, c = 'darkorange', edgecolors = 'black')
                # ax.set_ylabel('[B]_measured by Element')
                # ax.set_xlabel('[B]_corrected by Neptune')
                # st.pyplot(fig4)


        if "df_all" in st.session_state:
            st.session_state.df_all.to_csv('final.csv')
            st.write(st.session_state.df_all)
            result_csv = st.session_state.df_all.to_csv().encode('utf-8')
            st.download_button(
                label='download results as .csv',
                data=result_csv,
                file_name='boron results.csv',
                mime='txt/csv',
            )

1.4.2 Explaining the Main Body of the Code

run thw function:

Code

# if len(st.session_state.uploaded_files) != 0:
#     processData()

# processLaser()
# maping()

1.5 Explain the Output

What exactly is the output, likely best with screen shots.

–>’Sequence Number’ column: the number of datafile in all sequence. –>The ‘Comment’ column: sample name, labelled by yourself during measuring. –> ‘Spot size (um)’, ‘Laser HV (kV)’, ‘Laser Energy (mJ)’: useful information selected from laser parameters. –>The ‘filename’ column: name of datafile. –>from ‘11B’ to ‘factor_iso’: all results from Neptune. ’[B]_corrected’ is calculated B concentrations from 11B. ‘δ11B’ and ‘δ11B_se’column are calculated isotope results and erros. –>from ‘Li’, ‘B’ to ‘U’ are all trace element results from Element XR.

(the following is copied from what was a ‘text’ file.) 1. csv files are changed from original .exp file 2. data automatically from machine can be found in ‘data/original data type’.

1.6 Testing

For a demonstration of a line plot on a polar axis, see Figure 1.

Code

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

Figure 1: ?(caption)