Skip to content

mibiscreen.analysis API reference

mibiscreen module for data analysis.

reduction

mibiscreen module for data analysis reducing sample data.

ordination

Routines for performing ordination statistics on sample data.

@author: Alraune Zech, Jorrit Bakker

cca(data_frame, independent_variables, dependent_variables, n_comp=2, verbose=False)

Function that performs Canonical Correspondence Analysis.

Function makes use of skbio.stats.ordination.CCA on the input data and gives the site scores and loadings.

Input
data_frame : pd.dataframe
    Tabular data containing variables to be evaluated with standard
    column names and rows of sample data.
independent_variables : list of strings
    list with column names data to be the independent variables (=environment)
dependent_variables : list of strings
    list with column names data to be the dependen variables (=species)
n_comp : int, default is 2
    number of dimensions to return
verbose : Boolean, The default is False.
    Set to True to get messages in the Console about the status of the run code.
Output
results : Dictionary
    * method: name of ordination method (str)
    * loadings_independent: loadings of independent variables (np.ndarray)
    * loadings_dependent: loadings of dependent variables (np.ndarray)
    * names_independent: names of independent varialbes (list of str)
    * names_dependent: names of dependent varialbes (list of str)
    * scores: scores (np.ndarray)
    * sample_index: names of samples (list of str)
Source code in mibiscreen/analysis/reduction/ordination.py
def cca(data_frame,
        independent_variables,
        dependent_variables,
        n_comp = 2,
        verbose = False,
        ):
    """Function that performs Canonical Correspondence Analysis.

    Function makes use of skbio.stats.ordination.CCA on the input data and gives
    the site scores and loadings.

    Input
    -----
        data_frame : pd.dataframe
            Tabular data containing variables to be evaluated with standard
            column names and rows of sample data.
        independent_variables : list of strings
            list with column names data to be the independent variables (=environment)
        dependent_variables : list of strings
            list with column names data to be the dependen variables (=species)
        n_comp : int, default is 2
            number of dimensions to return
        verbose : Boolean, The default is False.
            Set to True to get messages in the Console about the status of the run code.

    Output
    ------
        results : Dictionary
            * method: name of ordination method (str)
            * loadings_independent: loadings of independent variables (np.ndarray)
            * loadings_dependent: loadings of dependent variables (np.ndarray)
            * names_independent: names of independent varialbes (list of str)
            * names_dependent: names of dependent varialbes (list of str)
            * scores: scores (np.ndarray)
            * sample_index: names of samples (list of str)
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'cca()' on data")
        print('==============================================================')

    results = constrained_ordination(data_frame,
                           independent_variables,
                           dependent_variables,
                           method = 'cca',
                           n_comp = n_comp,
                           )
    return results

constrained_ordination(data_frame, independent_variables, dependent_variables, method='cca', n_comp=2)

Function that performs constrained ordination.

Function makes use of skbio.stats.ordination on the input data and gives the scores and loadings.

Input
data_frame : pd.DataFrame
    Tabular data containing variables to be evaluated with standard
    column names and rows of sample data.
independent_variables : list of strings
   list with column names data to be the independent variables (=environment)
dependent_variables : list of strings
   list with column names data to be the dependen variables (=species)
method : string, default is cca
    specification of ordination method of choice. Options 'cca' & 'rda'
n_comp : int, default is 2
    number of dimensions to return
Output
results : Dictionary
    * method: name of ordination method (str)
    * loadings_independent: loadings of independent variables (np.ndarray)
    * loadings_dependent: loadings of dependent variables (np.ndarray)
    * names_independent: names of independent varialbes (list of str)
    * names_dependent: names of dependent varialbes (list of str)
    * scores: scores (np.ndarray)
    * sample_index: names of samples (list of str)
Source code in mibiscreen/analysis/reduction/ordination.py
def constrained_ordination(data_frame,
                           independent_variables,
                           dependent_variables,
                           method = 'cca',
                           n_comp = 2,
        ):
    """Function that performs constrained ordination.

    Function makes use of skbio.stats.ordination on the input data and gives
    the scores and loadings.

    Input
    -----
        data_frame : pd.DataFrame
            Tabular data containing variables to be evaluated with standard
            column names and rows of sample data.
        independent_variables : list of strings
           list with column names data to be the independent variables (=environment)
        dependent_variables : list of strings
           list with column names data to be the dependen variables (=species)
        method : string, default is cca
            specification of ordination method of choice. Options 'cca' & 'rda'
        n_comp : int, default is 2
            number of dimensions to return

    Output
    ------
        results : Dictionary
            * method: name of ordination method (str)
            * loadings_independent: loadings of independent variables (np.ndarray)
            * loadings_dependent: loadings of dependent variables (np.ndarray)
            * names_independent: names of independent varialbes (list of str)
            * names_dependent: names of dependent varialbes (list of str)
            * scores: scores (np.ndarray)
            * sample_index: names of samples (list of str)
    """
    data,cols= check_data_frame(data_frame)

    intersection = extract_variables(cols,
                          independent_variables,
                          name_variables = 'independent variables'
                          )
    data_independent_variables = data[intersection]

    intersection = extract_variables(cols,
                          dependent_variables,
                          name_variables = 'dependent variables'
                          )
    data_dependent_variables = data[intersection]

    # Checking if the dimensions of the dataframe allow for CCA
    if (data_dependent_variables.shape[0] < data_dependent_variables.shape[1]) or \
        (data_independent_variables.shape[0] < data_independent_variables.shape[1]):
        raise ValueError("Ordination method {} not possible with more variables than samples.".format(method))

    # Performing constrained ordination using function from scikit-bio.
    if method == 'cca':
        try:
            sci_ordination = sciord.cca(data_dependent_variables, data_independent_variables, scaling = n_comp)
        except(TypeError,ValueError):
            raise TypeError("Not all column values are numeric values. Consider standardizing data first.")
    elif method == 'rda':
        try:
            sci_ordination = sciord.rda(data_dependent_variables, data_independent_variables, scaling = n_comp)
        except(TypeError,ValueError):
            raise TypeError("Not all column values are numeric values. Consider standardizing data first.")
    else:
        raise ValueError("Ordination method {} not a valid option.".format(method))

    loadings_independent = sci_ordination.biplot_scores.to_numpy()[:,0:n_comp]
    loadings_dependent = sci_ordination.features.to_numpy()[:,0:n_comp]
    scores = sci_ordination.samples.to_numpy()[:,0:n_comp]

    if loadings_independent.shape[1]<n_comp:
        raise ValueError("Number of dependent variables too small.")

    results = {"method": method,
               "loadings_dependent": loadings_dependent,
               "loadings_independent": loadings_independent,
               "names_independent" : data_independent_variables.columns.to_list(),
               "names_dependent" : data_dependent_variables.columns.to_list(),
               "scores": scores,
               "sample_index" : list(data.index),
               }

    return results

extract_variables(columns, variables, name_variables='variables')

Checking overlap of two given list.

Function is used for checking if a list of variables is present in the column names of a given dataframe (of quantities for data analysis)

Input
columns: list of strings
    given extensive list (usually column names of a pd.DataFrame)
variables: list of strings
    list of names to extract/check overlap with strings in list 'column'
name_variables: str, default is 'variables'
    name of type of variables given in list 'variables'
Output
intersection: list
    list of strings present in both lists 'columns' and 'variables'
Source code in mibiscreen/analysis/reduction/ordination.py
def extract_variables(columns,
                      variables,
                      name_variables = 'variables'
                      ):
    """Checking overlap of two given list.

    Function is used for checking if a list of variables is present in
    the column names of a given dataframe (of quantities for data analysis)

    Input
    -----
        columns: list of strings
            given extensive list (usually column names of a pd.DataFrame)
        variables: list of strings
            list of names to extract/check overlap with strings in list 'column'
        name_variables: str, default is 'variables'
            name of type of variables given in list 'variables'

    Output
    ------
        intersection: list
            list of strings present in both lists 'columns' and 'variables'

    """
    if isinstance(variables,list):
        intersection = list(set(columns) & set(variables))
        remainder = list(set(variables) - set(columns))
        if len(intersection) == 0:
            raise ValueError("No column names for '{}' identified in columns of dataframe.".format(name_variables))
        elif len(intersection) < len(variables):
            print("WARNING: not all column names for '{}' are found in dataframe.".format(name_variables))
            print('----------------------------------------------------------------')
            print("Columns used in analysis:", intersection)
            print("Column names not identified in data:", remainder)
            print('________________________________________________________________')
    else:
        raise ValueError("List of column names for '{}' empty or in wrong format.".format(name_variables))

    return intersection

pca(data_frame, independent_variables=False, dependent_variables=False, n_comp=2, verbose=False)

Function that performs Principal Component Analysis.

Makes use of routine sklearn.decomposition.PCA on the input data and gives the site scores and loadings.

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified.

Input
data_frame : pd.dataframe
    Tabular data containing variables to be evaluated with standard
    column names and rows of sample data.
independent_variables : Boolean or list of strings; default False
    list with column names to select from data_frame
    being characterized as independent variables (= environment)
dependent_variables : Boolean or list of strings; default is False
    list with column names to select from data_frame
    being characterized as dependent variables (= species)
n_comp : int, default is 2
    Number of components to report
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.
Output
results : Dictionary
    containing the scores and loadings of the PCA,
    the percentage of the variation explained by the first principal components,
    the correlation coefficient between the first two PCs,
    names of columns (same length as loadings)
    names of indices (same length as scores)
Source code in mibiscreen/analysis/reduction/ordination.py
def pca(data_frame,
        independent_variables = False,
        dependent_variables = False,
        n_comp = 2,
        verbose = False,
        ):
    """Function that performs Principal Component Analysis.

    Makes use of routine sklearn.decomposition.PCA on the input data and gives
    the site scores and loadings.

    Principal component analysis (PCA) is a linear dimensionality reduction
    technique with applications in exploratory data analysis, visualization
    and data preprocessing. The data is linearly transformed onto a new
    coordinate system such that the directions (principal components) capturing
    the largest variation in the data can be easily identified.

    Input
    -----
        data_frame : pd.dataframe
            Tabular data containing variables to be evaluated with standard
            column names and rows of sample data.
        independent_variables : Boolean or list of strings; default False
            list with column names to select from data_frame
            being characterized as independent variables (= environment)
        dependent_variables : Boolean or list of strings; default is False
            list with column names to select from data_frame
            being characterized as dependent variables (= species)
        n_comp : int, default is 2
            Number of components to report
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.

    Output
    ------
        results : Dictionary
            containing the scores and loadings of the PCA,
            the percentage of the variation explained by the first principal components,
            the correlation coefficient between the first two PCs,
            names of columns (same length as loadings)
            names of indices (same length as scores)
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'pca()' on data")
        print('==============================================================')

    data,cols= check_data_frame(data_frame)

    if independent_variables is False and dependent_variables is False:
        data_pca = data
        names_independent = cols
        names_dependent = []

    elif independent_variables is not False and dependent_variables is False:
        names_independent = extract_variables(cols,
                              independent_variables,
                              name_variables = 'independent variables'
                              )
        names_dependent = []
        data_pca = data[names_independent]
    elif independent_variables is False and dependent_variables is not False:
        names_dependent = extract_variables(cols,
                              dependent_variables,
                              name_variables = 'dependent variables'
                              )
        names_independent = []
        data_pca = data[names_dependent]

    else:
        names_independent = extract_variables(cols,
                              independent_variables,
                              name_variables = 'independent variables'
                              )
        names_dependent = extract_variables(cols,
                              dependent_variables,
                              name_variables = 'dependent variables'
                              )
        data_pca = data[names_independent + names_dependent]

    # Checking if the dimensions of the dataframe allow for PCA
    if data_pca.shape[0] < data_pca.shape[1]:
        raise ValueError("PCA not possible with more variables than samples.")

    try:
        # Using scikit.decomposoition.PCA with an amount of components equal
        # to the amount of variables, then getting the loadings, scores and explained variance ratio.
        pca = decomposition.PCA(n_components=len(data_pca.columns))
        pca.fit(data_pca)
        loadings = pca.components_.T
        PCAscores = pca.transform(data_pca)
        variances = pca.explained_variance_ratio_
    except(ValueError,TypeError):
        raise TypeError("Not all column values are numeric values (or NaN). Consider standardizing data first.")

    # Taking the first two PC for plotting
    if dependent_variables is False:
        loadings_independent = loadings[:, 0:n_comp]
        loadings_dependent = np.array([[],[]]).T
    else:
        loadings_independent = loadings[:-len(names_dependent), 0:n_comp]
        loadings_dependent = loadings[-len(names_dependent):, 0:n_comp]
    scores = PCAscores[:, 0:n_comp]
    percent_explained = np.around(100*variances/np.sum(variances), decimals=2)
    coef = np.corrcoef(scores[:,0], scores[:,1])[0,1]

    if verbose:
        print("Information about the success of the PCA:")
        print('----------------------------------------------------------------')
        for i in range(len(percent_explained)):
            print('Principle component {} explains {}% of the total variance.'.format(i,percent_explained[i]))
        print('\nThe correlation coefficient between PC1 and PC2 is {:.2e}.'.format(coef))
        print('----------------------------------------------------------------')

    results = {"method": 'pca',
               "loadings_dependent": loadings_dependent,
               "loadings_independent": loadings_independent,
               "names_independent" : names_independent,
               "names_dependent" : names_dependent,
               "scores": scores,
               "sample_index" : list(data_pca.index),
               "percent_explained": percent_explained,
               "corr_PC1_PC2": coef,
               }

    return results

rda(data_frame, independent_variables, dependent_variables, n_comp=2, verbose=False)

Function that performs Redundancy Analysis.

Function makes use of skbio.stats.ordination.RDA on the input data and gives the site scores and loadings.

Input
data_frame : pd.dataframe
    Tabular data containing variables to be evaluated with standard
    column names and rows of sample data.
independent_variables : list of strings
    list with column names data to be the independent variables (=envirnoment)
dependent_variables : list of strings
    list with column names data to be the dependent variables (=species)
n_comp : int, default is 2
    number of dimensions to return
verbose : Boolean, The default is False.
    Set to True to get messages in the Console about the status of the run code.
Output
results : Dictionary
    * method: name of ordination method (str)
    * loadings_independent: loadings of independent variables (np.ndarray)
    * loadings_dependent: loadings of dependent variables (np.ndarray)
    * names_independent: names of independent varialbes (list of str)
    * names_dependent: names of dependent varialbes (list of str)
    * scores: scores (np.ndarray)
    * sample_index: names of samples (list of str)
Source code in mibiscreen/analysis/reduction/ordination.py
def rda(data_frame,
        independent_variables,
        dependent_variables,
        n_comp = 2,
        verbose = False,
        ):
    """Function that performs Redundancy Analysis.

    Function makes use of skbio.stats.ordination.RDA on the input data and gives
    the site scores and loadings.

    Input
    -----
        data_frame : pd.dataframe
            Tabular data containing variables to be evaluated with standard
            column names and rows of sample data.
        independent_variables : list of strings
            list with column names data to be the independent variables (=envirnoment)
        dependent_variables : list of strings
            list with column names data to be the dependent variables (=species)
        n_comp : int, default is 2
            number of dimensions to return
        verbose : Boolean, The default is False.
            Set to True to get messages in the Console about the status of the run code.

    Output
    ------
        results : Dictionary
            * method: name of ordination method (str)
            * loadings_independent: loadings of independent variables (np.ndarray)
            * loadings_dependent: loadings of dependent variables (np.ndarray)
            * names_independent: names of independent varialbes (list of str)
            * names_dependent: names of dependent varialbes (list of str)
            * scores: scores (np.ndarray)
            * sample_index: names of samples (list of str)
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'rda()' on data")
        print('==============================================================')

    results = constrained_ordination(data_frame,
                           independent_variables,
                           dependent_variables,
                           method = 'rda',
                           n_comp = n_comp,
                           )
    return results

stable_isotope_regression

Routines for performing linear regression on isotope data.

@author: Alraune Zech

Keeling_regression(concentration, delta_mix=None, relative_abundance=None, validate_indices=True, verbose=False, **kwargs)

Performing a linear regression linked to the Keeling plot.

A Keeling fit/plot is an approach to identify the isotopic composition of a contaminating source from measured concentrations and isotopic composition (delta) of a target species in the mix of the source and a pool.

It is based on the linear relationship of the given quantities (concentration) and delta-values (or alternatively the relative abundance x) which are measured over time or across a spatial interval according to

delta_mix = delta_source + m * 1/c_mix

where m is the slope relating the isotopic quantities of the pool (which mixes with the sourse) by m = (delta_pool + delta_source)*c_pool.

The analysis is based on a linear regression of the inverse concentration data against the delta (or x)-values. The parameter of interest, the delta (or relative_abundance, respectively) of the source quantity is the intercept of linear fit with the y-axis, or in other words, the absolute value of the linear fit function.

A plot of the results with data and linear trendline can be generate with the method Keeling_plot() [in the module visualize].

Note that the approach is only applicable if (i) the isotopic composition of the unknown source is constant (ii) the concentration and isotopic composition of the target compound is constant (over time or across space) (i.e. in absence of contamination from the unknown source)

Input
concentration : np.array, pd.dataframe
    total molecular mass/molar concentration of target substance
    at different locations (at a time) or at different times (at one location)
delta_mix : np.array, pd.dataframe (same length as c_mix), default None
    relative isotope ratio (delta-value) of target substance
relative_abundance : None or np.array, pd.dataframe (same length as c_mix), default None
    if not None it replaces delta_mix in the inverse estimation and plotting
    relative abundance of target substance
validate_indices: boolean, default True
    flag to run index validation (i.e. removal of nan and infinity values)
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.
**kwargs : dict
    keywordarguments dictionary, e.g. for passing forward keywords to
    valid_indices()
Returns
results : dict
    results of fitting, including:
        * coefficients : array/list of lenght 2, where coefficients[0]
            is the slope of the linear fit and coefficient[1] is the
            intercept of linear fit with y-axis, reflecting delta
            (or relative_abundance, respectively) of the source quantity
        * delta_C: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
        * delta_H: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
Source code in mibiscreen/analysis/reduction/stable_isotope_regression.py
def Keeling_regression(concentration,
                       delta_mix = None,
                       relative_abundance = None,
                       validate_indices = True,
                       verbose = False,
                       **kwargs,
                       ):
    """Performing a linear regression linked to the Keeling plot.

    A Keeling fit/plot is an approach to identify the isotopic composition of a
    contaminating source from measured concentrations and isotopic composition
    (delta) of a target species in the mix of the source and a pool.

    It is based on the linear relationship of the given quantities (concentration)
    and delta-values (or alternatively the relative abundance x) which are measured
    over time or across a spatial interval according to

        delta_mix = delta_source + m * 1/c_mix

    where m is the slope relating the isotopic quantities of the pool (which mixes
    with the sourse) by m = (delta_pool + delta_source)*c_pool.

    The analysis is based on a linear regression of the inverse concentration
    data against the delta (or x)-values. The parameter of interest, the delta
    (or relative_abundance, respectively) of the source quantity is the
    intercept of linear fit with the y-axis, or in other words, the absolute
    value of the linear fit function.

    A plot of the results with data and linear trendline can be generate with the
    method Keeling_plot() [in the module visualize].

    Note that the approach is only applicable if
        (i)  the isotopic composition of the unknown source is constant
        (ii) the concentration and isotopic composition of the target compound
            is constant (over time or across space)
            (i.e. in absence of contamination from the unknown source)

    Input
    -----
        concentration : np.array, pd.dataframe
            total molecular mass/molar concentration of target substance
            at different locations (at a time) or at different times (at one location)
        delta_mix : np.array, pd.dataframe (same length as c_mix), default None
            relative isotope ratio (delta-value) of target substance
        relative_abundance : None or np.array, pd.dataframe (same length as c_mix), default None
            if not None it replaces delta_mix in the inverse estimation and plotting
            relative abundance of target substance
        validate_indices: boolean, default True
            flag to run index validation (i.e. removal of nan and infinity values)
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.
        **kwargs : dict
            keywordarguments dictionary, e.g. for passing forward keywords to
            valid_indices()

    Returns
    -------
        results : dict
            results of fitting, including:
                * coefficients : array/list of lenght 2, where coefficients[0]
                    is the slope of the linear fit and coefficient[1] is the
                    intercept of linear fit with y-axis, reflecting delta
                    (or relative_abundance, respectively) of the source quantity
                * delta_C: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
                * delta_H: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'Keeling_regression()' on data")
        print('==============================================================')

    if delta_mix is not None:
        y = delta_mix
        text = 'delta'
    elif relative_abundance is not None:
        y = relative_abundance
        text = 'relative abundance'
    else:
        raise ValueError("One of the quantities 'delta_mix' or 'relative_abundance' must be provided")

    ### ---------------------------------------------------------------------------
    ### check length of data arrays and remove non-valid values (NaN, inf & zero)

    if validate_indices:
        data1, data2 = valid_indices(concentration,
                                 y,
                                 remove_nan = True,
                                 remove_infinity = True,
                                 remove_zero = True,
                                 **kwargs,
                                 )
    else:
        data1, data2 = concentration,y

    ### ---------------------------------------------------------------------------
    ### perform linear regression

    coefficients = np.polyfit(1./data1, data2, 1)

    if verbose:
        print("The {} of the source quantity, being the intercept".format(text))
        print("of the linear fit, is identified with {:.2f}".format(coefficients[1]))
        print('______________________________________________________________')

    results = dict(
        concentration = data1,
        delta = data2,
        coefficients = coefficients,
        )

    return results

Lambda_regression(delta_C, delta_H, validate_indices=True, verbose=False, **kwargs)

Performing linear regression to achieve Lambda value.

The Lambda values relates the δ13C versus δ2H signatures of a chemical compound. Relative changes in the ratio can indicate the occurrence of specific enzymatic degradation reactions.

The analysis is based on a linear regression of the hydrogen versus carbon isotope signatures. The parameter of interest, the Lambda values is the slope of the the linear trend line.

A plot of the results with data and linear trendline can be generate with the method Lambda_plot() [in the module visualize].

Input
delta_C : np.array, pd.series
    relative isotope ratio (delta-value) of carbon of target molecule
delta_H : np.array, pd.series (same length as delta_C)
    relative isotope ratio (delta-value) of hydrogen of target molecule
validate_indices: boolean, default True
    flag to run index validation (i.e. removal of nan and infinity values)
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.
**kwargs : dict
    keywordarguments dictionary, e.g. for passing forward keywords to
    valid_indices()
Returns
results : dict
    results of fitting, including:
        * coefficients : array/list of lenght 2, where coefficients[0]
            is the slope of the linear fit, reflecting the lambda values
            and coefficient[1] is the absolute value of the linear function
        * delta_C: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
        * delta_H: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
Source code in mibiscreen/analysis/reduction/stable_isotope_regression.py
def Lambda_regression(delta_C,
                      delta_H,
                      validate_indices = True,
                      verbose = False,
                      **kwargs,
                      ):
    """Performing linear regression to achieve Lambda value.

    The Lambda values relates the δ13C versus δ2H signatures of a chemical
    compound. Relative changes in the ratio can indicate the occurrence of
    specific enzymatic degradation reactions.

    The analysis is based on a linear regression of the hydrogen versus
    carbon isotope signatures. The parameter of interest, the Lambda values
    is the slope of the the linear trend line.

    A plot of the results with data and linear trendline can be generate with the
    method Lambda_plot() [in the module visualize].

    Input
    -----
        delta_C : np.array, pd.series
            relative isotope ratio (delta-value) of carbon of target molecule
        delta_H : np.array, pd.series (same length as delta_C)
            relative isotope ratio (delta-value) of hydrogen of target molecule
        validate_indices: boolean, default True
            flag to run index validation (i.e. removal of nan and infinity values)
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.
        **kwargs : dict
            keywordarguments dictionary, e.g. for passing forward keywords to
            valid_indices()

    Returns
    -------
        results : dict
            results of fitting, including:
                * coefficients : array/list of lenght 2, where coefficients[0]
                    is the slope of the linear fit, reflecting the lambda values
                    and coefficient[1] is the absolute value of the linear function
                * delta_C: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
                * delta_H: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
    """
    ### ---------------------------------------------------------------------------
    ### check length of data arrays and remove non-valid values (NaN, inf & zero)

    if verbose:
        print('==============================================================')
        print(" Running function 'Lambda_regression()' on data")
        print('==============================================================')

    if validate_indices:
        data1, data2 = valid_indices(delta_C,
                                     delta_H,
                                     remove_nan = True,
                                     remove_infinity = True,
                                     remove_zero=True,
                                     )
    else:
        data1, data2 = delta_C,delta_H

    ### ---------------------------------------------------------------------------
    ### perform linear regression

    coefficients = np.polyfit(data1, data2, 1)

    if verbose:
        print("Lambda value, being the slope of the linear fit is \n identified with {:.2f}".format(coefficients[0]))
        print('______________________________________________________________')

    results = dict(
        delta_C = data1,
        delta_H = data2,
        coefficients = coefficients,
        )

    return results

Rayleigh_fractionation(concentration, delta, validate_indices=True, verbose=False, **kwargs)

Performing Rayleigh fractionation analysis.

Rayleigh fractionation is a common application to characterize the removal of a substance from a finite pool using stable isotopes. It is based on the change in the isotopic composition of the pool due to different kinetics of the change in lighter and heavier isotopes.

We follow the most simple approach assuming that the substance removal follows first-order kinetics, where the rate coefficients for the lighter and heavier isotopes of the substance differ due to kinetic isotope fractionation effects. The isotopic composition of the remaining substance in the pool will change over time, leading to the so-called Rayleigh fractionation.

The analysis is based on a linear regression of the log-transformed concentration data against the delta-values. The parameter of interest, the kinetic fractionation factor (epsilon or alpha -1) of the removal process is the slope of the the linear trend line.

A plot of the results with data and linear trendline can be generate with the method Rayleigh_fractionation_plot() [in the module visualize].

Input
concentration : np.array, pd.dataframe
    total molecular mass/molar concentration of target substance
    at different locations (at a time) or at different times (at one location)
delta : np.array, pd.dataframe (same length as concentration)
    relative isotope ratio (delta-value) of target substance
validate_indices: boolean, default True
    flag to run index validation (i.e. removal of nan and infinity values)
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.
**kwargs : dict
    keywordarguments dictionary, e.g. for passing forward keywords to
    valid_indices()
Returns
results : dict
    results of fitting, including:
        * coefficients : array/list of lenght 2, where coefficients[0]
            is the slope of the linear fit, reflecting the kinetic
            fractionation factor (epsilon or alpha -1) of the removal process
            and coefficient[1] is the absolute value of the linear function
        * delta_C: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
        * delta_H: np.array with isotope used for fitting - all samples
            where non-zero values are available for delta_C and delta_H
Source code in mibiscreen/analysis/reduction/stable_isotope_regression.py
def Rayleigh_fractionation(concentration,
                           delta,
                           validate_indices = True,
                           verbose = False,
                           **kwargs,
                           ):
    """Performing Rayleigh fractionation analysis.

    Rayleigh fractionation is a common application to characterize the removal
    of a substance from a finite pool using stable isotopes. It is based on the
    change in the isotopic composition of the pool due to different kinetics of
    the change in lighter and heavier isotopes.

    We follow the most simple approach assuming that the substance removal follows
    first-order kinetics, where the rate coefficients for the lighter and heavier
    isotopes of the substance differ due to kinetic isotope fractionation effects.
    The isotopic composition of the remaining substance in the pool will change
    over time, leading to the so-called Rayleigh fractionation.

    The analysis is based on a linear regression of the log-transformed concentration
    data against the delta-values. The parameter of interest, the kinetic
    fractionation factor (epsilon or alpha -1) of the removal process is the slope
    of the the linear trend line.

    A plot of the results with data and linear trendline can be generate with the
    method Rayleigh_fractionation_plot() [in the module visualize].

    Input
    -----
        concentration : np.array, pd.dataframe
            total molecular mass/molar concentration of target substance
            at different locations (at a time) or at different times (at one location)
        delta : np.array, pd.dataframe (same length as concentration)
            relative isotope ratio (delta-value) of target substance
        validate_indices: boolean, default True
            flag to run index validation (i.e. removal of nan and infinity values)
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.
        **kwargs : dict
            keywordarguments dictionary, e.g. for passing forward keywords to
            valid_indices()

    Returns
    -------
        results : dict
            results of fitting, including:
                * coefficients : array/list of lenght 2, where coefficients[0]
                    is the slope of the linear fit, reflecting the kinetic
                    fractionation factor (epsilon or alpha -1) of the removal process
                    and coefficient[1] is the absolute value of the linear function
                * delta_C: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
                * delta_H: np.array with isotope used for fitting - all samples
                    where non-zero values are available for delta_C and delta_H
    """
    ### ---------------------------------------------------------------------------
    ### check length of data arrays and remove non-valid values (NaN, inf & zero)
    if verbose:
        print('==============================================================')
        print(" Running function 'Rayleigh_fractionation()' on data")
        print('==============================================================')

    if validate_indices:
        data1, data2 = valid_indices(concentration,
                                 delta,
                                 remove_nan = True,
                                 remove_infinity = True,
                                 remove_zero = True,
                                 **kwargs,
                                 )
    else:
        data1, data2 = concentration,delta

    ### ---------------------------------------------------------------------------
    ### perform linear regression
    if np.any(data1<=0):
        raise ValueError("Concentration data provided is negative, but has to be positive.")

    coefficients = np.polyfit(np.log(data1), data2, 1)

    if verbose:
        print("The kinetic fractionation factor ('epsilon' or 'alpha-1') of")
        print("the removal process, being the slope of the linear fit, is ")
        print("identified with {:.2f}".format(coefficients[0]))
        print('______________________________________________________________')

    results = dict(
        concentration = data1,
        delta = data2,
        coefficients = coefficients,
        )

    return results

extract_isotope_data(df, molecule, name_13C='delta_13C', name_2H='delta_2H')

Extracts isotope data from standardised input-dataframe.

Parameters

df : pd.dataframe numeric (observational) data molecule : str name of contaminant molecule to extract isotope data for name_13C : str, default ‘delta_13C’ (standard name) name of C13 isotope to extract data for name_2H : str, default ‘delta_2H’ (standard name) name of deuterium isotope to extract data for

Returns

C_data : np.array numeric isotope data H_data : np.array numeric isotope data

Source code in mibiscreen/analysis/reduction/stable_isotope_regression.py
def extract_isotope_data(df,
                         molecule,
                         name_13C = 'delta_13C',
                         name_2H = 'delta_2H',
                         ):
    """Extracts isotope data from standardised input-dataframe.

    Parameters
    ----------
    df : pd.dataframe
        numeric (observational) data
    molecule : str
        name of contaminant molecule to extract isotope data for
    name_13C : str, default 'delta_13C' (standard name)
        name of C13 isotope to extract data for
    name_2H : str, default 'delta_2H' (standard name)
        name of deuterium isotope to extract data for

    Returns
    -------
    C_data : np.array
        numeric isotope data
    H_data : np.array
        numeric isotope data

    """
    molecule_standard = names_contaminants.get(molecule.lower(), False)
    isotope_13C = names_isotopes.get(name_13C.lower(), False)
    isotope_2H = names_isotopes.get(name_2H.lower(), False)

    if molecule_standard is False:
        raise ValueError("Contaminant (name) unknown: {}".format(molecule))
    if isotope_13C is False:
        raise ValueError("Isotope (name) unknown: {}".format(name_13C))
    if isotope_2H is False:
        raise ValueError("Isotope (name) unknown: {}".format(name_2H))

    name_C = '{}-{}'.format(isotope_13C,molecule_standard)
    name_H = '{}-{}'.format(isotope_2H,molecule_standard)

    if name_C not in df.columns.to_list():
        raise ValueError("No isotope data available for : {}".format(name_C))
    if name_H not in df.columns.to_list():
        raise ValueError("No isotope data available for : {}".format(name_H))

    C_data = df[name_C].values
    H_data = df[name_H].values

    return C_data, H_data

valid_indices(data1, data2, remove_nan=True, remove_infinity=True, remove_zero=False, **kwargs)

Identifies valid indices in two equaly long arrays and compresses both.

Optional numerical to remove from array are: nan, infinity and zero values.

Parameters

data1 : np.array or pd.series numeric data data2 : np.array or pd.series (same len/shape as data1) numeric data remove_nan : boolean, default True flag to remove nan-values remove_infinity : boolean, default True flag to remove infinity values remove_zero : boolean, default False flag to remove zero values **kwargs : dict keywordarguments dictionary

Returns

data1 : np.array or pd.series numeric data of reduced length where only data at valid indices is in data2 : np.array or pd.series numeric data of reduced length where only data at valid indices is in

Source code in mibiscreen/analysis/reduction/stable_isotope_regression.py
def valid_indices(data1,
                  data2,
                  remove_nan = True,
                  remove_infinity = True,
                  remove_zero = False,
                  **kwargs,
                  ):
    """Identifies valid indices in two equaly long arrays and compresses both.

    Optional numerical to remove from array are: nan, infinity and zero values.

    Parameters
    ----------
    data1 : np.array or pd.series
        numeric data
    data2 : np.array or pd.series (same len/shape as data1)
        numeric data
    remove_nan : boolean, default True
        flag to remove nan-values
    remove_infinity : boolean, default True
        flag to remove infinity values
    remove_zero : boolean, default False
        flag to remove zero values
    **kwargs : dict
        keywordarguments dictionary

    Returns
    -------
    data1 : np.array or pd.series
        numeric data of reduced length where only data at valid indices is in
    data2 : np.array or pd.series
        numeric data of reduced length where only data at valid indices is in

    """
    if data1.shape != data2.shape:
        raise ValueError("Shape of provided data must be identical.")

    valid_indices = np.full(data1.shape, True, dtype=bool)

    if remove_nan:
        valid_indices *= ~np.isnan(data1) & ~np.isinf(data1)
    if remove_infinity:
        valid_indices *= ~np.isnan(data2) & ~np.isinf(data2)
    if remove_zero:
        valid_indices *= (data1 != 0) & (data2 != 0)

    return data1[valid_indices],data2[valid_indices]

transformation

Routines for performing ordination statistics on sample data.

@author: Alraune Zech, Jorrit Bakker

filter_values(data_frame, replace_NaN='remove', drop_rows=[], inplace=False, verbose=False)

Filtering values of dataframes for ordination to assure all are numeric.

Ordination methods require all cells to be filled. This method checks the provided data frame if values are missing/NaN or not numeric and handles missing/NaN values accordingly.

It then removes select rows and mutates the cells containing NULL values based on the input parameters.

Input
data_frame : pd.dataframe
    Tabular data containing variables to be evaluated with standard
    column names and rows of sample data.
replace_NaN : string or float, default "remove"
    Keyword specifying how to handle missing/NaN/non-numeric values, options:
        - remove: remove rows with missing values
        - zero: replace values with 0.0
        - average: replace the missing values with the average of the variable
                    (using all other available samples)
        - median: replace the missing values with the median of the variable
                                (using all other available samples)
        - float-value: replace all empty cells with that numeric value
drop_rows : List, default [] (empty list)
    List of rows that should be removed from dataframe.
inplace: bool, default True
    If False, return a copy. Otherwise, do operation in place.
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.
Output
data_filtered : pd.dataframe
    Tabular data containing filtered data.
Source code in mibiscreen/analysis/reduction/transformation.py
def filter_values(data_frame,
                  replace_NaN = 'remove',
                  drop_rows = [],
                  inplace = False,
                  verbose = False):
    """Filtering values of dataframes for ordination to assure all are numeric.

    Ordination methods require all cells to be filled. This method checks the
    provided data frame if values are missing/NaN or not numeric and handles
    missing/NaN values accordingly.

    It then removes select rows and mutates the cells containing NULL values based
    on the input parameters.

    Input
    -----
        data_frame : pd.dataframe
            Tabular data containing variables to be evaluated with standard
            column names and rows of sample data.
        replace_NaN : string or float, default "remove"
            Keyword specifying how to handle missing/NaN/non-numeric values, options:
                - remove: remove rows with missing values
                - zero: replace values with 0.0
                - average: replace the missing values with the average of the variable
                            (using all other available samples)
                - median: replace the missing values with the median of the variable
                                        (using all other available samples)
                - float-value: replace all empty cells with that numeric value
        drop_rows : List, default [] (empty list)
            List of rows that should be removed from dataframe.
        inplace: bool, default True
            If False, return a copy. Otherwise, do operation in place.
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.

    Output
    ------
        data_filtered : pd.dataframe
            Tabular data containing filtered data.
    """
    data,cols= check_data_frame(data_frame,inplace = inplace)

    if verbose:
        print("==============================================================================")
        print('Perform filtering of values since ordination requires all values to be numeric.')

    if len(drop_rows)>0:
        data.drop(drop_rows, inplace = True)
        if verbose:
            print('The samples of rows {} have been removed'.format(drop_rows))

    # Identifying which rows and columns contain any amount of NULL cells and putting them in a list.
    NaN_rows = data[data.isna().any(axis=1)].index.tolist()
    NaN_cols = data.columns[data.isna().any()].tolist()

    # If there are any rows containing NULL cells, the NULL values will be filtered
    if len(NaN_rows)>0:
        if replace_NaN == 'remove':
            data.drop(NaN_rows, inplace = True)
            text = 'The sample row(s) have been removed since they contain NaN values: {}'.format(NaN_rows)
        elif replace_NaN == 'zero':
            set_NaN = 0.0
            data.fillna(set_NaN, inplace = True)
            text = 'The values of the empty cells have been set to zero (0.0)'
        elif isinstance(replace_NaN, (float, int)):
            set_NaN = float(replace_NaN)
            data.fillna(set_NaN, inplace = True)
            text = 'The values of the empty cells have been set to the value of {}'.format(set_NaN)
        elif replace_NaN == "average":
            for var in NaN_cols:
                data[var] = data[var].fillna(data[var].mean(skipna = True))
            text = 'The values of the empty cells have been replaced by the average of\
                  the corresponding variables (using all other available samples).'
        elif replace_NaN == "median":
            for var in NaN_cols:
                data[var] = data[var].fillna(data[var].median(skipna = True))
            text = 'The values of the empty cells have been replaced by the median of\
                  the corresponding variables (using all other available samples).'
        else:
            raise ValueError("Value of 'replace_NaN' unknown: {}".format(replace_NaN))
    else:
        text = 'No data to be filtered out.'

    if verbose:
        print(text)

    return data

transform_values(data_frame, name_list='all', how='log_scale', log_scale_A=1, log_scale_B=1, inplace=False, verbose=False)

Extracting data from dataframe for specified variables.


data_frame: pandas.DataFrames
    dataframe with the measurements
name_list: string or list of strings, default 'all'
    list of quantities (column names) to perfrom transformation on
how: string, default 'standardize'
    Type of transformation:
        * standardize
        * log_scale
        * center
log_scale_A : Integer or float, default 1
    Log transformation parameter A: log10(Ax+B).
log_scale_B : Integer or float, default 1
    Log transformation parameter B: log10(Ax+B).
inplace: bool, default True
    If False, return a copy. Otherwise, do operation in place and return None.
verbose : Boolean, The default is False.
   Set to True to get messages in the Console about the status of the run code.

data: pd.DataFrame
    dataframe with the measurements
Raises:

None (yet).

Example:

To be added.

Source code in mibiscreen/analysis/reduction/transformation.py
def transform_values(data_frame,
                     name_list = 'all',
                     how = 'log_scale',
                     log_scale_A = 1,
                     log_scale_B = 1,
                     inplace = False,
                     verbose = False,
                     ):
    """Extracting data from dataframe for specified variables.

    Args:
    -------
        data_frame: pandas.DataFrames
            dataframe with the measurements
        name_list: string or list of strings, default 'all'
            list of quantities (column names) to perfrom transformation on
        how: string, default 'standardize'
            Type of transformation:
                * standardize
                * log_scale
                * center
        log_scale_A : Integer or float, default 1
            Log transformation parameter A: log10(Ax+B).
        log_scale_B : Integer or float, default 1
            Log transformation parameter B: log10(Ax+B).
        inplace: bool, default True
            If False, return a copy. Otherwise, do operation in place and return None.
        verbose : Boolean, The default is False.
           Set to True to get messages in the Console about the status of the run code.

    Returns:
    -------
        data: pd.DataFrame
            dataframe with the measurements

    Raises:
    -------
    None (yet).

    Example:
    -------
    To be added.
    """
    data,cols= check_data_frame(data_frame,inplace = inplace)

    if verbose:
        print('==============================================================')
        print(" Running function 'transform_values()' on data")
        print('==============================================================')

    if name_list == 'all':
        intersection = list(set(cols) - set(setting_data))

    else:
        if isinstance(name_list, str):
            name_list = [name_list]
        intersection,remainder_list1,remainder_list2 = compare_lists(cols,name_list)
        if len(intersection) < len(name_list):
            print("WARNING: not all variables in name_list are found in dataframe.")
            print('----------------------------------------------------------------')
            print("Column names identified:", intersection)
            print("Column names not identified in data:", remainder_list2)
            print('________________________________________________________________')

    for quantity in intersection:
        if how == 'log_scale':
            data[quantity] = np.log10(log_scale_A * data[quantity] + log_scale_B)
        elif how == 'center':
            data[quantity] =  data[quantity]-data[quantity].mean()
        elif how == 'standardize':
            data[quantity] = zscore(data[quantity].values)
        else:
            raise ValueError("Value of 'how' unknown: {}".format(how))

    return data

sample

mibiscreen module for data analysis performed on each sample.

concentrations

Routines for calculating total concentrations and counts for samples.

@author: Alraune Zech

total_concentration(data_frame, name_list='all', verbose=False, include=False, **kwargs)

Calculate total concentration of given list of quantities.

Input
data: pd.DataFrame
    Contaminant concentrations in [ug/l], i.e. microgram per liter
name_ist: str or list, dafault is 'all'
    either short name for group of quantities to use, such as:
            - 'all' (all qunatities given in data frame except settings)
            - 'BTEX' (for benzene, toluene, ethylbenzene, xylene)
            - 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                          indene, indane and naphthaline)
    or list of strings with names of quantities to use
verbose: Boolean
    verbose flag (default False)
include: bool, default False
    whether to include calculated values to DataFrame
Output
tot_conc: pd.Series
    Total concentration of contaminants in [ug/l]
Source code in mibiscreen/analysis/sample/concentrations.py
def total_concentration(
        data_frame,
        name_list = "all",
        verbose = False,
        include = False,
        **kwargs,
        ):
    """Calculate total concentration of given list of quantities.

    Input
    -----
        data: pd.DataFrame
            Contaminant concentrations in [ug/l], i.e. microgram per liter
        name_ist: str or list, dafault is 'all'
            either short name for group of quantities to use, such as:
                    - 'all' (all qunatities given in data frame except settings)
                    - 'BTEX' (for benzene, toluene, ethylbenzene, xylene)
                    - 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                                  indene, indane and naphthaline)
            or list of strings with names of quantities to use
        verbose: Boolean
            verbose flag (default False)
        include: bool, default False
            whether to include calculated values to DataFrame


    Output
    ------
        tot_conc: pd.Series
            Total concentration of contaminants in [ug/l]

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'total_concentration()' on data")
        print('==============================================================')

    ### check on correct data input format and extracting column names as list
    data,cols= check_data_frame(data_frame,inplace = include)

    ### sorting out which columns in data to use for summation of concentrations
    quantities = determine_quantities(cols,name_list = name_list, verbose = verbose)

    ### actually performing summation
    tot_conc = data[quantities].sum(axis = 1)

    if isinstance(name_list, str):
        name_column = 'total concentration {}'.format(name_list)
    elif isinstance(name_list, list):
        name_column = 'total concentration selection'

    tot_conc.rename(name_column,inplace = True)
    if verbose:
        print('________________________________________________________________')
        print("{} in [ug/l] is:\n{}".format(name_column,tot_conc))
        print('--------------------------------------------------')

    ### additing series to data frame
    if include:
        data[name_column] = tot_conc

    return tot_conc

total_count(data_frame, name_list='all', threshold=0.0, verbose=False, include=False, **kwargs)

Calculate total number of quantities with concentration exceeding threshold value.

Input
data: pd.DataFrame
    Contaminant concentrations in [ug/l], i.e. microgram per liter
name_ist: str or list, dafault is 'all'
    either short name for group of quantities to use, such as:
            - 'all' (all qunatities given in data frame except settings)
            - 'BTEX' (for benzene, toluene, ethylbenzene, xylene)
            - 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                          indene, indane and naphthaline)
    or list of strings with names of quantities to use
threshold: float, default 0
    threshold concentration value in [ug/l] to test on exceedence
verbose: Boolean
    verbose flag (default False)
include: bool, default False
    whether to include calculated values to DataFrame
Output
tot_count: pd.Series
    Total number of quantities with concentration exceeding threshold value
Source code in mibiscreen/analysis/sample/concentrations.py
def total_count(
        data_frame,
        name_list = "all",
        threshold = 0.,
        verbose = False,
        include = False,
        **kwargs,
        ):
    """Calculate total number of quantities with concentration exceeding threshold value.

    Input
    -----
        data: pd.DataFrame
            Contaminant concentrations in [ug/l], i.e. microgram per liter
        name_ist: str or list, dafault is 'all'
            either short name for group of quantities to use, such as:
                    - 'all' (all qunatities given in data frame except settings)
                    - 'BTEX' (for benzene, toluene, ethylbenzene, xylene)
                    - 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                                  indene, indane and naphthaline)
            or list of strings with names of quantities to use
        threshold: float, default 0
            threshold concentration value in [ug/l] to test on exceedence
        verbose: Boolean
            verbose flag (default False)
        include: bool, default False
            whether to include calculated values to DataFrame

    Output
    ------
        tot_count: pd.Series
            Total number of quantities with concentration exceeding threshold value

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'total_count()' on data")
        print('==============================================================')

    threshold = float(threshold)
    if threshold<0:
        raise ValueError("Threshold value '{}' not valid.".format(threshold))

    ### check on correct data input format and extracting column names as list
    data,cols= check_data_frame(data_frame,inplace = include)

    ### sorting out which column in data to use for summation of concentrations
    quantities = determine_quantities(cols,name_list = name_list, verbose = verbose)

    ### actually performing count of values above threshold:
    total_count = (data[quantities]>threshold).sum(axis = 1)

    if isinstance(name_list, str):
        name_column = 'total count {}'.format(name_list)
    elif isinstance(name_list, list):
        name_column = 'total count selection'
    total_count.rename(name_column,inplace = True)

    if verbose:
        print('________________________________________________________________')
        print("Number of quantities out of {} exceeding \
              concentration of {:.2f} ug/l :\n{}".format(len(quantities),threshold,total_count))
        print('--------------------------------------------------')

    if include:
        data[name_column] = total_count

    return total_count

properties

Properties for Natural Attenuation Screening.

File containing name specifications of quantities and parameters measured in groundwater samples useful for biodegredation and bioremediation analysis

@author: A. Zech

screening_NA

Routines for calculating natural attenuation potential.

@author: Alraune Zech

NA_traffic(data, inplace=False, verbose=False, **kwargs)

Function evaluating if natural attenuation (NA) is ongoing.

Function to calculate electron balance, based on electron availability calculated from concentrations of contaminant and electron acceptors.

Input
data: pd.DataFrame
    Ratio of electron availability
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)
Output
traffic : pd.Series
    Traffic light (decision) based on ratio of electron availability
Source code in mibiscreen/analysis/sample/screening_NA.py
def NA_traffic(
        data,
        inplace = False,
        verbose = False,
        **kwargs,
        ):
    """Function evaluating if natural attenuation (NA) is ongoing.

    Function to calculate electron balance, based on electron availability
    calculated from concentrations of contaminant and electron acceptors.

    Input
    -----
        data: pd.DataFrame
            Ratio of electron availability
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

    Output
    ------
        traffic : pd.Series
            Traffic light (decision) based on ratio of electron availability

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'NA_traffic()' on data")
        print('==============================================================')

    cols = check_data(data)

    if names.name_e_balance in cols:
        e_balance = data[names.name_e_balance]
    else:
        e_balance = electron_balance(data,**kwargs)
        # raise ValueError("Electron balance not given in data.")

    e_bal = e_balance.values
    traffic = np.where(e_bal<1,"red","green")
    traffic[np.isnan(e_bal)] = 'y'

    NA_traffic = pd.Series(name =names.name_na_traffic_light,data = traffic,index = e_balance.index)

    if inplace:
        data[names.name_na_traffic_light] = NA_traffic

    if verbose:
        print("Evaluation if natural attenuation (NA) is ongoing:")#" for {}".format(contaminant_group))
        print('--------------------------------------------------')
        print("Red light: Reduction is limited at {} out of {} locations".format(
            np.sum(traffic == "red"),len(e_bal)))
        print("Green light: Reduction is limited at {} out of {} locations".format(
            np.sum(traffic == "green"),len(e_bal)))
        print("Yellow light: No decision possible at {} out of {} locations".format(
            np.sum(np.isnan(e_bal)),len(e_bal)))
        print('________________________________________________________________')

    return NA_traffic

available_NP(data, inplace=False, verbose=False, **kwargs)

Function calculating available nutrients.

Approximating the amount of hydrocarbons that can be degraded based on the amount of nutrients (nitrogen and phosphate available)

Input
data: pd.DataFrame
    nitrate, nitrite and phosphate concentrations in [mg/l]
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)

Output
------
NP_avail: pd.Series
    The amount of nutrients for degrading contaminants
Source code in mibiscreen/analysis/sample/screening_NA.py
def available_NP(
        data,
        inplace = False,
        verbose = False,
        **kwargs,
        ):
    """Function calculating available nutrients.

    Approximating the amount of hydrocarbons that can be degraded based
    on the amount of nutrients (nitrogen and phosphate available)

    Input
    -----
        data: pd.DataFrame
            nitrate, nitrite and phosphate concentrations in [mg/l]
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

        Output
        ------
        NP_avail: pd.Series
            The amount of nutrients for degrading contaminants

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'available_NP()' on data")
        print('==============================================================')

    cols = check_data(data)

    nutrient_list = [names.name_nitrate, names.name_nitrite, names.name_phosphate]
    list_nut_miss = []

    for nut in nutrient_list:
        if nut not in cols:
            list_nut_miss.append(nut)
    if len(list_nut_miss)>0:
        raise ValueError("Concentrations of nutrient(s) missing:", list_nut_miss)

    CNs = (data[names.name_nitrate] + data[names.name_nitrite]) * (39. / 4.5)
    CPs = data[names.name_phosphate] * (39. / 1.)
    NP_avail =CNs.combine(CPs, min, 0)
    NP_avail.name = names.name_NP_avail

    if inplace:
        data[names.name_NP_avail] = NP_avail

    if verbose:
        print("Total NP available is:\n{}".format(NP_avail))
        print('----------------------')

    return NP_avail

check_data(data)

Checking data on correct format.

Input
data: pd.DataFrame
    concentration values of quantities
Output
cols: list
List of column names
Source code in mibiscreen/analysis/sample/screening_NA.py
def check_data(data):
    """Checking data on correct format.

    Input
    -----
        data: pd.DataFrame
            concentration values of quantities

    Output
    ------
        cols: list
        List of column names
    """
    if isinstance(data, pd.DataFrame):
        cols = data.columns.to_list()
    elif isinstance(data, pd.Series):
        cols = [data.name]
    else:
        raise ValueError("Calculation of not possible with given data. \
                          Data has to be a panda-DataFrame or Series \
                          but is given as type {}".format(type(data)))

    return cols

electron_balance(data, inplace=False, verbose=False, **kwargs)

Decision if natural attenuation is taking place.

Function to calculate electron balance, based on electron availability calculated from concentrations of contaminant and electron acceptors

Input
data: pd.DataFrame
    tabular data containinng "total_reductors" and "total_oxidators"
        -total amount of electrons available for reduction [mmol e-/l]
        -total amount of electrons needed for oxidation [mmol e-/l]
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)
Output
e_bal : pd.Series
    Ratio of electron availability: electrons available for reduction
    devided by electrons needed for oxidation
Source code in mibiscreen/analysis/sample/screening_NA.py
def electron_balance(
        data,
        inplace = False,
        verbose = False,
        **kwargs,
        ):
    """Decision if natural attenuation is taking place.

    Function to calculate electron balance, based on electron availability
    calculated from concentrations of contaminant and electron acceptors

    Input
    -----
        data: pd.DataFrame
            tabular data containinng "total_reductors" and "total_oxidators"
                -total amount of electrons available for reduction [mmol e-/l]
                -total amount of electrons needed for oxidation [mmol e-/l]
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

    Output
    ------
        e_bal : pd.Series
            Ratio of electron availability: electrons available for reduction
            devided by electrons needed for oxidation

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'electron_balance()' on data")
        print('==============================================================')

    cols = check_data(data)

    if names.name_total_reductors in cols:
        tot_reduct = data[names.name_total_reductors]
    else:
        tot_reduct = reductors(data,**kwargs)
        # raise ValueError("Total amount of oxidators not given in data.")

    if names.name_total_oxidators in cols:
        tot_oxi = data[names.name_total_oxidators]
    else:
        tot_oxi = oxidators(data,**kwargs)
        # raise ValueError("Total amount of reductors not given in data.")

    e_bal = tot_reduct.div(tot_oxi, axis=0)
    e_bal.name = names.name_e_balance

    if inplace:
        data[names.name_e_balance] = e_bal

    if verbose:
        print("Electron balance e_red/e_cont is:\n{}".format(e_bal))
        print('---------------------------------')

    return e_bal #,decision

oxidators(data, contaminant_group='BTEXIIN', nutrient=False, inplace=False, verbose=False, **kwargs)

Calculate the amount of electron oxidators [mmol e-/l].

Calculate the amount of electron oxidators in [mmol e-/l] based on concentrations of contaminants, stiochiometric ratios of reactions, contaminant properties (e.g. molecular masses in [mg/mmol])

alternatively: based on nitrogen and phosphate availability

Input
data: pd.DataFrame
    Contaminant contentrations in [ug/l], i.e. microgram per liter
    if nutrient is True, data also needs to contain concentrations
    of Nitrate, Nitrite and Phosphate
contaminant_group: str
    Short name for group of contaminants to use
    default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                          indene, indane and naphthaline)
nutrient: Boolean
    flag to include oxidator availability based on nutrient supply
    calls internally routine "available_NP()" with data
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)
Output
tot_oxi: pd.Series
    Total amount of electrons oxidators in [mmol e-/l]
Source code in mibiscreen/analysis/sample/screening_NA.py
def oxidators(
    data,
    contaminant_group = "BTEXIIN",
    nutrient = False,
    inplace = False,
    verbose = False,
    **kwargs,
    ):
    """Calculate the amount of electron oxidators [mmol e-/l].

    Calculate the amount of electron oxidators in [mmol e-/l]
    based on concentrations of contaminants, stiochiometric ratios of reactions,
    contaminant properties (e.g. molecular masses in [mg/mmol])

    alternatively: based on nitrogen and phosphate availability

    Input
    -----
        data: pd.DataFrame
            Contaminant contentrations in [ug/l], i.e. microgram per liter
            if nutrient is True, data also needs to contain concentrations
            of Nitrate, Nitrite and Phosphate
        contaminant_group: str
            Short name for group of contaminants to use
            default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                                  indene, indane and naphthaline)
        nutrient: Boolean
            flag to include oxidator availability based on nutrient supply
            calls internally routine "available_NP()" with data
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

    Output
    ------
        tot_oxi: pd.Series
            Total amount of electrons oxidators in [mmol e-/l]
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'oxidators()' on data")
        print('==============================================================')

    tot_oxi = 0.
    cols = check_data(data)


    if nutrient:
        NP_avail = available_NP(data)

    try:
        eas = names.contaminants[contaminant_group].copy()
        if (names.name_o_xylene in cols) and (names.name_pm_xylene in cols): # and (names.name_xylene in cols):
            eas.remove(names.name_xylene)

        for cont in eas:
            if cont in cols:
                # tot_oxi += data[cont]*0.001/properties[cont]['molecular_mass']*
                    #properties[cont]['factor_stoichiometry']
                if nutrient:
                    nut_avail = 1000.*NP_avail*properties[cont]['molecular_mass']/(properties[cont]['cs']*12.)
                    c_min = nut_avail.combine(data[cont], min, 0) # mass concentration in ug/l
                else:
                    c_min = data[cont]

                cm_cont = c_min* 0.001/properties[cont]['molecular_mass'] # molar concentration in mmol/l

                tot_oxi += cm_cont *  properties[cont]['factor_stoichiometry']
            else:
                print("WARNING: No data on {} given, zero concentration assumed.".format(cont))
                print('________________________________________________________________')
    except KeyError:
        raise ValueError("group of contaminant ('contaminant_group') not defined: '{}'".format(contaminant_group))
    except TypeError:
        raise ValueError("Data not in standardized format. Run 'standardize()' first.")

    # if isinstance(tot_oxi, float):
    #     print("\nWARNING: No data on contaminant concentrations given.")
    #     print('________________________________________________________________')
    #     tot_oxi = False
    if isinstance(tot_oxi, pd.Series):
        tot_oxi.rename(names.name_total_oxidators,inplace = True)
        if verbose:
            print("Total amount of oxidators per well in [mmol e-/l] is:\n{}".format(tot_oxi))
            print('-----------------------------------------------------')
    else:
        raise ValueError("No data on oxidators or only zero concentrations given.")

    if inplace:
        data[names.name_total_oxidators] = tot_oxi

    return tot_oxi

reductors(data, ea_group='ONS', inplace=False, verbose=False, **kwargs)

Calculate the amount of electron reductors [mmol e-/l].

making use of imported molecular mass values for quantities in [mg/mmol]

Input
data: pd.DataFrame
    concentration values of electron acceptors in [mg/l]
ea_group: str
    Short name for group of electron acceptors to use
    default is 'ONS' (for oxygen, nitrate, and sulfate)
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)
Output
tot_reduct: pd.Series
Total amount of electrons needed for reduction in [mmol e-/l]
Source code in mibiscreen/analysis/sample/screening_NA.py
def reductors(
    data,
    ea_group = 'ONS',
    inplace = False,
    verbose = False,
    **kwargs,
    ):
    """Calculate the amount of electron reductors [mmol e-/l].

    making use of imported molecular mass values for quantities in [mg/mmol]

    Input
    -----
        data: pd.DataFrame
            concentration values of electron acceptors in [mg/l]
        ea_group: str
            Short name for group of electron acceptors to use
            default is 'ONS' (for oxygen, nitrate, and sulfate)
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

    Output
    ------
        tot_reduct: pd.Series
        Total amount of electrons needed for reduction in [mmol e-/l]
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'reductors()' on data")
        print('==============================================================')

    tot_reduct = 0.
    cols= check_data(data)

    try:
        for ea in names.electron_acceptors[ea_group]:
            if ea in cols:
                tot_reduct += properties[ea]['factor_stoichiometry']* data[ea]/properties[ea]['molecular_mass']
                #     pd.to_numeric(data[ea]) / properties[ea]['molecular_mass']
            else:
                print("WARNING: No data on {} given, zero concentration assumed.".format(ea))
                print('________________________________________________________________')
    except KeyError:
        raise ValueError("Group of electron acceptors ('ea_group') not defined: '{}'".format(ea_group))
    except TypeError:
        raise ValueError("Data not in standardized format. Run 'standardize()' first.")

    if isinstance(tot_reduct, pd.Series):
        tot_reduct.rename(names.name_total_reductors,inplace = True)
        if verbose:
            print("Total amount of electron reductors per well in [mmol e-/l] is:\n{}".format(tot_reduct))
            print('----------------------------------------------------------------')
    else:
        raise ValueError("No data on electron acceptors or only zero concentrations given.")
    # if isinstance(tot_reduct, float) and tot_reduct <= 0.:
    #     print("\nWARNING: No data on electron acceptor concentrations given.")
    #     tot_reduct = False

    if inplace:
        data[names.name_total_reductors] = tot_reduct

    return tot_reduct

screening_NA(data, ea_group='ONS', contaminant_group='BTEXIIN', nutrient=False, inplace=False, verbose=False, **kwargs)

Calculate the amount of electron reductors [mmol e-/l].

making use of imported molecular mass values for quantities in [mg/mmol]

Input
data: pd.DataFrame
    Concentration values of
        - electron acceptors in [mg/l]
        - contaminants in [ug/l]
        - nutrients (Nitrate, Nitrite and Phosphate) if nutrient is True
ea_group: str, default 'ONS'
    Short name for group of electron acceptors to use
    'ONS' stands for oxygen, nitrate, sulfate and ironII
contaminant_group: str, default 'BTEXIIN'
    Short name for group of contaminants to use
    'BTEXIIN' stands for benzene, toluene, ethylbenzene, xylene,
                           indene, indane and naphthaline
nutrient: Boolean, default False
    flag to include oxidator availability based on nutrient supply
    calls internally routine "available_NP()" with data
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean, default False
    verbose flag
Output
na_data: pd.DataFrame
    Tabular data with all quantities of NA screening listed per sample
Source code in mibiscreen/analysis/sample/screening_NA.py
def screening_NA(
    data,
    ea_group = 'ONS',
    contaminant_group = "BTEXIIN",
    nutrient = False,
    inplace = False,
    verbose = False,
    **kwargs,
    ):
    """Calculate the amount of electron reductors [mmol e-/l].

    making use of imported molecular mass values for quantities in [mg/mmol]

    Input
    -----
        data: pd.DataFrame
            Concentration values of
                - electron acceptors in [mg/l]
                - contaminants in [ug/l]
                - nutrients (Nitrate, Nitrite and Phosphate) if nutrient is True
        ea_group: str, default 'ONS'
            Short name for group of electron acceptors to use
            'ONS' stands for oxygen, nitrate, sulfate and ironII
        contaminant_group: str, default 'BTEXIIN'
            Short name for group of contaminants to use
            'BTEXIIN' stands for benzene, toluene, ethylbenzene, xylene,
                                   indene, indane and naphthaline
        nutrient: Boolean, default False
            flag to include oxidator availability based on nutrient supply
            calls internally routine "available_NP()" with data
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean, default False
            verbose flag

    Output
    ------
        na_data: pd.DataFrame
            Tabular data with all quantities of NA screening listed per sample
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'screening_NA()' on data")
        print(" Runs all checks on data: column names, units and values")
        print('==============================================================')

    check_data(data)

    tot_reduct = reductors(data,
                            ea_group = ea_group,
                            inplace = inplace,
                            verbose = verbose)
    tot_oxi = oxidators(data,
                        contaminant_group = contaminant_group,
                        nutrient = nutrient,
                        inplace = inplace,
                        verbose = verbose)
    e_bal = electron_balance(data,
                             inplace = inplace,
                             verbose = verbose)
    na_traffic = NA_traffic(data,
                            contaminant_group = contaminant_group,
                            inplace = inplace,
                            verbose = verbose)
    tot_cont = total_contaminant_concentration(data,
                                               contaminant_group = contaminant_group,
                                               inplace = inplace,
                                               verbose = verbose)
    na_data = thresholds_for_intervention(data,
                                          contaminant_group = contaminant_group,
                                          inplace = inplace,
                                          verbose = verbose)

    if inplace is False:
        for add in [tot_cont,na_traffic,e_bal,tot_oxi,tot_reduct]:
            na_data.insert(2, add.name, add)

        if nutrient:
            NP_avail = available_NP(data,verbose = verbose)
            na_data.insert(4, NP_avail.name, NP_avail)

    return na_data

thresholds_for_intervention(data, contaminant_group='BTEXIIN', inplace=False, verbose=False, **kwargs)

Function to evalute intervention threshold exceedance.

Determines which contaminants exceed concentration thresholds set by
the Dutch government for intervention.
Input
data: pd.DataFrame
    Contaminant contentrations in [ug/l], i.e. microgram per liter
contaminant_group: str
    Short name for group of contaminants to use
    default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                          indene, indane and naphthaline)
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean, default False
    verbose flag
Output
intervention: pd.DataFrame
DataFrame of similar format as input data with well specification and
three columns on intervention threshold exceedance analysis:
    - traffic light if well requires intervention
    - number of contaminants exceeding the intervention value
    - list of contaminants above the threshold of intervention
Source code in mibiscreen/analysis/sample/screening_NA.py
def thresholds_for_intervention(
        data,
        contaminant_group = "BTEXIIN",
        inplace = False,
        verbose = False,
        **kwargs,
        ):
    """Function to evalute intervention threshold exceedance.

        Determines which contaminants exceed concentration thresholds set by
        the Dutch government for intervention.

    Input
    -----
        data: pd.DataFrame
            Contaminant contentrations in [ug/l], i.e. microgram per liter
        contaminant_group: str
            Short name for group of contaminants to use
            default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                                  indene, indane and naphthaline)
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean, default False
            verbose flag

    Output
    ------
        intervention: pd.DataFrame
        DataFrame of similar format as input data with well specification and
        three columns on intervention threshold exceedance analysis:
            - traffic light if well requires intervention
            - number of contaminants exceeding the intervention value
            - list of contaminants above the threshold of intervention
    """
    if verbose:
        print('==============================================================')
        print(" Running function 'thresholds_for_intervention()' on data")
        print('==============================================================')

    cols= check_data(data)

    if inplace:
        na_intervention = data
    else:
        na_intervention= pd.DataFrame(data, columns=[names.name_sample,names.name_observation_well])
    traffic = np.zeros(data.shape[0],dtype=int)
    intervention = [[] for i in range(data.shape[0])]

    try:
        eas = names.contaminants[contaminant_group].copy()
        if (names.name_o_xylene in cols) and (names.name_pm_xylene in cols): # and (names.name_xylene in cols):
            eas.remove(names.name_xylene)
        for cont in eas:
            if cont in cols:
                th_value = properties[cont]['thresholds_for_intervention_NL']
                traffic += (data[cont].values > th_value)
                for i in range(data.shape[0]):
                    if data[cont].values[i] > th_value:
                        intervention[i].append(cont)
            else:
                print("WARNING: No data on {} given, zero concentration assumed.".format(cont))
                print('________________________________________________________________')

        traffic_light = np.where(traffic>0,"red","green")
        traffic_light[np.isnan(traffic)] = 'y'
        na_intervention[names.name_intervention_traffic] = traffic_light
        na_intervention[names.name_intervention_number] = traffic
        na_intervention[names.name_intervention_contaminants] = intervention

        if verbose:
            print("Evaluation of contaminant concentrations exceeding intervention values for {}:".format(
                contaminant_group))
            print('------------------------------------------------------------------------------------')
            print("Red light: Intervention values exceeded for {} out of {} locations".format(
                np.sum(traffic >0),data.shape[0]))
            print("green light: Concentrations below intervention values at {} out of {} locations".format(
                np.sum(traffic == 0),data.shape[0]))
            print("Yellow light: No decision possible at {} out of {} locations".format(
                np.sum(np.isnan(traffic)),data.shape[0]))
            print('________________________________________________________________')
    except KeyError:
        raise ValueError("Group of contaminant ('contaminant_group') not defined: '{}'".format(contaminant_group))
    except TypeError:
        raise ValueError("Data not in standardized format. Run 'standardize()' first.")

    return na_intervention

total_contaminant_concentration(data, contaminant_group='BTEXIIN', inplace=False, verbose=False, **kwargs)

Function to calculate total concentration of contaminants.

Input
data: pd.DataFrame
    Contaminant contentrations in [ug/l], i.e. microgram per liter
contaminant_group: str
    Short name for group of contaminants to use
    default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                          indene, indane and naphthaline)
inplace: bool, default False
    Whether to modify the DataFrame rather than creating a new one.
verbose: Boolean
    verbose flag (default False)
Output
tot_conc: pd.Series
    Total concentration of contaminants in [ug/l]
Source code in mibiscreen/analysis/sample/screening_NA.py
def total_contaminant_concentration(
        data,
        contaminant_group = "BTEXIIN",
        inplace = False,
        verbose = False,
        **kwargs,
        ):
    """Function to calculate total concentration of contaminants.

    Input
    -----
        data: pd.DataFrame
            Contaminant contentrations in [ug/l], i.e. microgram per liter
        contaminant_group: str
            Short name for group of contaminants to use
            default is 'BTEXIIN' (for benzene, toluene, ethylbenzene, xylene,
                                  indene, indane and naphthaline)
        inplace: bool, default False
            Whether to modify the DataFrame rather than creating a new one.
        verbose: Boolean
            verbose flag (default False)

    Output
    ------
        tot_conc: pd.Series
            Total concentration of contaminants in [ug/l]

    """
    if verbose:
        print('==============================================================')
        print(" Running function 'total_contaminant_concentration()' on data")
        print('==============================================================')

    tot_conc = 0.
    cols = check_data(data)
    try:
        eas = names.contaminants[contaminant_group].copy()
        if (names.name_o_xylene in cols) and (names.name_pm_xylene in cols): # and (names.name_xylene in cols):
            eas.remove(names.name_xylene)
        for cont in eas:
            if cont in cols:
                tot_conc += data[cont] # mass concentration in ug/l
            else:
                print("WARNING: No data on {} given, zero concentration assumed.".format(cont))
                print('________________________________________________________________')
    except KeyError:
        raise ValueError("Group of contaminant ('contaminant_group') not defined: '{}'".format(contaminant_group))
    except TypeError:
        raise ValueError("Data not in standardized format. Run 'standardize()' first.")

    # if isinstance(tot_conc, float):
    #     print("\nWARNING: No data on contaminant concentrations given.")
    #     print('________________________________________________________________')
    #     tot_conc = False
    if isinstance(tot_conc, pd.Series):
        tot_conc.rename(names.name_total_contaminants,inplace = True)
        if verbose:
            print("Total concentration of {} in [ug/l] is:\n{}".format(contaminant_group,tot_conc))
            print('--------------------------------------------------')
    else:
        raise ValueError("No data on contaminants or only zero concentrations given.")

    if inplace:
        data[names.name_total_contaminants] = tot_conc

    return tot_conc