soilpulsecore package

Submodules

soilpulsecore.data_publishers module

class soilpulsecore.data_publishers.ZenodoPublisher(zenodo_id)

Bases: Publisher

getFileInfo()

Collect resource files information from Zenodo record

Returns:

list of SourceFile instances created from Zenodo API response

getMetadata()

Collect metadata package from Zenodo record

Parameters:

zenodo_id – Zenodo record identifier

Returns:

response as JSON

key = 'Zenodo'
name = 'Zenodo'

soilpulsecore.db_access module

@author: Jan Devátý, Jonas Lenz

class soilpulsecore.db_access.DBconnector

Bases: object

concepts_translations_filename = '_concepts_translations.json'
concepts_vocabulary_filenames = {'AGROVOC': 'agrovoc_excerpt.json'}
create_project_directory(project_id)
datasets_directory_name = 'datasets'
deleteDatasetRecord(dataset)
deleteProject(project, delete_dir=True)
dirname_prefix = None
establishProjectRecord(user_id, project, unique_names=True)
getDatasetsOfProject(project_id)
getNewProjectID()

Finds a correct ID that should be assigned to next new project.

getProjectsOfUser(user_id)
getUserNameByID(id)
classmethod get_connector()

Returns DBconnector subclass instance that links to storage where project structural information will be stored. If running MySQL server with soilpulse DB is found MySQLconnector is return otherwise NullConnector

loadConceptsVocabularies()

Loads string-concepts translations JSON file

loadMethodsVocabularies()

Loads string-method translations JSON file

loadProject(project)
loadTranslationsOfProject(project)
loadUnitsVocabularies()

Loads string-unit translations JSON file

loadVocabularyFromFile(input_file)
loadVocabularyFromResource(filename)

Loads string-* translations JSON file

Parameters:

filename – name of vocabulary file inside the package ‘vocabularies’ folder

methods_translations_filename = '_methods_translations.json'
methods_vocabulary_filenames = {}
printUserInfo(user_id)
project_files_dir_name = 'project_files'
project_files_root = PosixPath('/home/docs/SoilPulse/project_files')
soilpulse_root_dir_name = 'SoilPulse'
units_translations_filename = '_units_translations.json'
units_vocabulary_filenames = {}
updateContainerRecord(container, cascade=False)
updateProjectRecord(project, cascade=False)
updateTranslationDictionaries(project)
vocabularies_dir_name = 'vocabularies'
class soilpulsecore.db_access.EntityKeywordsDB

Bases: object

Provides methods to access and manipulate the SoilPulse metadata entity keywords.

DBs = {}
classmethod connect(dbpath)
dbDir = 'soilpulse\\databases'
classmethod loadKeywords(entityClass)

Loads entity’s keywords from DB and translates them into RE search patterns. Here the translation of the keywords by some thesaurus is possible …

Parameters:

entity – MetadataEntity subclass

Returns:

dictionary of regular expression patterns with group names {unique group name: search pattern, …}

classmethod registerKeywordsDB(dbType, dbFilename)
class soilpulsecore.db_access.EntitySearchPatternsDB

Bases: object

Provides methods to access and manipulate the SoilPulse metadata entity types and their properties stored in the DB “entities”

classmethod connect()
dbpath = 'soilpulse\\databases\\entity_search_patterns'
classmethod loadSearchPatterns(entityClass)

Loads search patterns stored in the DB for given entityID (string ID from metadata scheme definition). Returns dictionary

Parameters:

entityClass – MetadataEntity subclass

Returns:

dictionary of regular expression patterns with group names {unique group name: search pattern, …}

class soilpulsecore.db_access.MySQLConnector

Bases: DBconnector

Provides methods to access and manipulate the SoilPulse database storage of MetadataMappings and possibly the data storage as well

checkContainersTableStructure(needed_fields)
checkoutUser(user_id)
conceptDictionaryTableName = '`concepts_dictionary`'
conceptsContainersTableName = '`container_concepts`'
containersTableName = '`containers`'
datasetContainerRecordExists(dataset, container)

Checks if datset-container link already has database entry

Parameters:
  • dataset – local container ID

  • container – ID of project the container belongs to

datasetsContainersTableName = '`datasets_containers`'
datasetsTableName = '`datasets`'
db_name = 'soilpulse'
deleteDatasetRecord(dataset)
deleteOrphannedConceptTranslations(project_id, current_vocab)

Deletes translations from the database that are no longer in use for given project.

Parameters:
  • project_id – The ID of the project for which to clean up translations.

  • current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).

deleteOrphannedMethodTranslations(project_id, current_vocab)

Deletes string-method translations from the database that are no longer in use for given project.

Parameters:
  • project_id – The ID of the project for which to clean up translations.

  • current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).

deleteOrphannedUnitTranslations(project_id, current_vocab)

Deletes string-unit translations from the database that are no longer in use for given project.

Parameters:
  • project_id – The ID of the project for which to clean up translations.

  • current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).

deleteProject(project, delete_dir=True)
deleteRemovedContainers()
dirname_prefix = ''
establishProjectRecord(user_id, project, unique_names=True)
getConceptTranslationID(string, vocabulary, uri, project_id)

Returns id of string-concept translation in DB or None if no such translation exists. The return ID is used in container-translation relation

getContainerGlobalID(container)

Get a global ID (SoilPulse DB scope) of a container

Parameters:

container – the container

Returns:

global ID of the container in SoilPulse DB

getDatasetGlobalID(dataset)

Checks if dataset with provided name and Project ID already has database entry

Parameters:

dataset – dataset object instance

getDatasetsOfProject(project_id)

Loads Dataset info of a given project ID from SoilPulse database. Call this function to obtain list of dataset names within a project

Parameters:

project_id – ID of the Project whose Datasets should be loaded

Returns:

list of dataset names [ID, … ]

getMethodTranslationID(string, vocabulary, uri, project_id)

Returns id of string-method translation in DB or None if no such translation exists. The return ID is used in container-translation relation

getNewProjectID()

Finds a correct ID that should be assigned to next new project.

getProjectsOfUser(user_id)

Loads Projects info of a given user from SoilPulse database. Call this function to obtain dictionary of Project names and IDs that are owned by user with given user_id

Parameters:

user_id – ID of the user whose Projects should be loaded

Returns:

dictionary of ProjectManagers info {Project id: Project name, …}

getUnitTranslationID(string, vocabulary, uri, project_id)

Returns id of string-unit translation in DB or None if no such translation exists. The return ID is used in container-translation relation

getUserNameByID(id)
insertStringConceptTranslation(string, term, vocabulary, uri, project_id)

Inserts DB entry of a string-concept definition into database table and returns ID of the translation

Parameters:
  • string – the string which translation it is

  • term – the term of the uri

  • vocabulary – vocabulary of the unit definition

  • uri – unique identifier of the term within specified vocabulary

  • project_id – ID of project to which the translation belongs

Returns:

ID of the newly created translation DB entry

insertStringMethodTranslation(string, term, vocabulary, uri, project_id)

Inserts DB entry of a string-method definition into database table and returns ID of the translation

Parameters:
  • string – the string which translation it is

  • term – the term of the uri

  • vocabulary – vocabulary of the unit definition

  • uri – unique identifier of the term within specified vocabulary

  • project_id – ID of project to which the translation belongs

Returns:

ID of the newly created translation DB entry

insertStringUnitTranslation(string, term, vocabulary, uri, project_id)

Inserts DB entry of a string-unit definition into database table and returns ID of the translation

Parameters:
  • string – the string being translated

  • term – the term of the uri

  • vocabulary – vocabulary of the unit definition

  • uri – unique identifier of the term within specified vocabulary

  • project_id – ID of project to which the translation belongs

Returns:

ID of the newly created translation DB entry

loadChildContainers(project, parent_container=None)
loadConceptsOfContainer(container)

Collects all concept records from DB belonging to a given container

Parameters:

container – the container instance

Returns:

loadDatasetsOfProject(project)
loadMethodsOfContainer(container)

Collects all method records from DB belonging to a given container

Parameters:

container – the container instance

Returns:

loadProject(project, cascade=True)

Loads database record of Project and all of its contents if cascade == True

Parameters:
  • project – the Project instance to be loaded from DB

  • cascade – whether to load all contents

Returns:

input Project instance with filled attributes

loadSearchPatterns(entity)

Loads search patterns stored in the DB for given entityID (string ID from metadata scheme definition). Returns dictionary

Parameters:

entity – MetadataEntity subclass

Returns:

dictionary of regular expression patterns with group names {unique group name: search pattern, …}

loadUnitsOfContainer(container)

Collects all unit records from DB belonging to a given container

Parameters:

container – the container instance

Returns:

methodsContainersTableName = '`container_methods`'
methodsDictionaryTableName = '`methods_dictionary`'
projectsTableName = '`projects`'
pwd = 'NFDI4earth'
server = 'localhost'
unitsContainersTableName = '`container_units`'
unitsDictionaryTableName = '`units_dictionary`'
updateConceptsOfContainer(container)

Updates DB record of all string-concept translations of a container

updateContainerRecord(container, cascade=False, cont_updated=[])

Updates DB entry of the container and its related entities (concepts, unit, methods) On cascade=True recursively invokes update on all sub-containers Records list of updated container ids in cont_updated

Parameters:
  • container – the container instance to be updated

  • cascade – whether or not to update the sub-containers’ records too

  • cont_updated – list of IDs of updated containers

updateDatasetRecord(dataset)

Saves dataset state into DB

updateMethodsOfContainer(container)

Updates DB record of all string-method translations of a container

updateProjectRecord(project, cascade=False)

Updates database record of a Project and all of its contents

Parameters:
  • project – the Project instance reference to be saved

  • cascade – whether to update the containers too

updateUnitsOfContainer(container)

Updates DB record of all string-units translations of a container

userProjectsTableName = '`user_projects`'
userTableName = '`users`'
username = 'soilpulse'
class soilpulsecore.db_access.NullConnector

Bases: DBconnector

checkoutUser(user_id)
containers_attr_filename = '_containers.json'
datasets_attr_filename = '_datasets.json'
deleteProject(project, delete_dir=True)
dirname_prefix = 'temp_'
establishProjectRecord(user_id, project, unique_names=True)
getAllTempProjectIDs()
getNewProjectID()

Finds a correct ID that should be assigned to next new project.

getProjectsOfUser(user_id)
getUserNameByID(user_id)
loadChildContainers(project, containers_serialized, parent_container=None)
loadDatasets(project, datasets_serialized)
loadProject(project, cascade=True)
printUserInfo(user_id)
project_attr_filename = '_project.json'
updateContainerRecord(container, cascade=False)
updateDatasetRecord(dataset)
updateProjectRecord(project, cascade=False)
soilpulsecore.db_access.generate_project_unique_name(existing_names, name)

Generates a unique name for a user’s project by appending a number in parentheses if the name already exists.

Parameters:
  • existing_names – list of currently existing names in database

  • name – the original name to check and modify if necessary.

Returns:

a unique name

soilpulsecore.exceptions module

@author: Jan Devátý

exception soilpulsecore.exceptions.ContainerStructureError(message)

Bases: Exception

exception soilpulsecore.exceptions.DOIdataRetrievalException(message)

Bases: Exception

This exception is raised whenever there’s something wrong about DOI data retrieval and manipulation

exception soilpulsecore.exceptions.DatabaseEntryError(message)

Bases: Exception

exception soilpulsecore.exceptions.DatabaseFetchError(message)

Bases: Exception

This exception is raised when there’s something wrong with data retrieval from SoilPulse database

exception soilpulsecore.exceptions.DeserializationError(message)

Bases: Exception

This exception is raised when there’s something wrong with data being read from a serialization

exception soilpulsecore.exceptions.LocalFileManipulationError(message)

Bases: Exception

This exception is raised when there’s something wrong with local files manipulation

exception soilpulsecore.exceptions.MetadataSchemeException(message)

Bases: Exception

This exception is raised when there’s some inconsistency in the resource’s metadata structure

exception soilpulsecore.exceptions.NameNotUniqueError(message)

Bases: Exception

exception soilpulsecore.exceptions.ValueNotInDomainError(message)

Bases: Exception

This exception is raised when instance is initialized with value not present in class’ allowed values list

soilpulsecore.metadata_scheme module

@author: Jan Devátý

class soilpulsecore.metadata_scheme.AlternateTitle(value, language, encoding)

Bases: TextMetadataEntity

ID = '2.1'
description = 'A short name by which the dataset is also known.'
key = 'alternate_title'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Alternate title'
class soilpulsecore.metadata_scheme.DateAccapted(value)

Bases: DateMetadataEntity

ID = '5.1'
description = 'The date that the publisher accepted the resource into their system.'
key = 'date_accepted'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Date accepted'
class soilpulsecore.metadata_scheme.DateAvailable(value)

Bases: DateMetadataEntity

ID = '5.2'
description = 'The date the resource was or will be made publicly available.'
key = 'date_available'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Date available'
class soilpulsecore.metadata_scheme.DateCollected(value)

Bases: DateMetadataEntity

ID = '5.3'
description = 'The date or date range in which the dataset content was collected.'
key = 'date_collected'
maxMultiplicity = 2
minMultiplicity = 0
name = 'Date collected'
class soilpulsecore.metadata_scheme.DateCopyrighted(value)

Bases: DateMetadataEntity

ID = '5.4'
description = 'The specific, documented date at which the dataset receives a copyrighted status, if applicable.'
key = 'date_copyrighted'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Date copyrighted'
class soilpulsecore.metadata_scheme.DateCreated(value)

Bases: DateMetadataEntity

ID = '5.5'
description = 'The date the dataset itself was put together; a single date for a final component (e.g. the finalised file with all of the data).'
key = 'date_created'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Date created'
class soilpulsecore.metadata_scheme.DateIssued(value)

Bases: DateMetadataEntity

ID = '5.6'
key = 'date_issued'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Date issued'
class soilpulsecore.metadata_scheme.DateMetadataEntity(value)

Bases: MetadataEntity

Abstract interface class of metadata element with date value

class soilpulsecore.metadata_scheme.DateSubmitted(value)

Bases: DateMetadataEntity

ID = '5.7'
description = 'The date the author submits the resource to the publisher. This could be different from “Accepted” if the publisher then applies a selection process.'
key = 'date_submitted'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Date submitted'
class soilpulsecore.metadata_scheme.DateUpdated(value)

Bases: DateMetadataEntity

ID = '5.8'
description = 'The date of the last update (last revision) to the dataset, when the dataset is being added to.'
key = 'date_updated'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Date updated'
class soilpulsecore.metadata_scheme.DateValid(value)

Bases: DateMetadataEntity

ID = '5.9'
description = 'The date or date range during which the dataset or resource is accurate.'
key = 'date_valid'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Date valid'
class soilpulsecore.metadata_scheme.EntityManager

Bases: object

Manages the entity type classes, provides access to entity instances, takes care about limiting number of instances of a particular type

classmethod checkMaxCounts()
classmethod checkMinCounts()
classmethod createEntityInstance(entityType, *args)
currentCount = {'alternate_title': 0, 'bounding_box': 0, 'date_accepted': 0, 'date_available': 0, 'date_collected': 0, 'date_copyrighted': 0, 'date_created': 0, 'date_issued': 0, 'date_submitted': 0, 'date_updated': 0, 'date_valid': 0, 'funding_reference': 0, 'graphic_overview': 0, 'other_alternate_title': 0, 'responsible_organization': 0, 'responsible_person': 0, 'subtitle': 0, 'summary': 0, 'temporal_extent': 0, 'title': 0, 'translated_title': 0}
keywordDatabases = {}
keywordPatterns = {'alternate_title': {}, 'bounding_box': {}, 'date_accepted': {}, 'date_available': {}, 'date_collected': {}, 'date_copyrighted': {}, 'date_created': {}, 'date_issued': {}, 'date_submitted': {}, 'date_updated': {}, 'date_valid': {}, 'funding_reference': {}, 'graphic_overview': {}, 'other_alternate_title': {}, 'responsible_organization': {}, 'responsible_person': {}, 'subtitle': {}, 'summary': {}, 'temporal_extent': {}, 'title': {}, 'translated_title': {}}
maxCounts = {'alternate_title': 1, 'bounding_box': 1, 'date_accepted': 1, 'date_available': 1, 'date_collected': 2, 'date_copyrighted': 1, 'date_created': 1, 'date_issued': 1, 'date_submitted': 1, 'date_updated': 1, 'date_valid': 1, 'funding_reference': None, 'graphic_overview': None, 'other_alternate_title': 1, 'responsible_organization': None, 'responsible_person': None, 'subtitle': 1, 'summary': None, 'temporal_extent': 1, 'title': 1, 'translated_title': 1}
metadataEntities = {'alternate_title': <class 'soilpulsecore.metadata_scheme.AlternateTitle'>, 'bounding_box': <class 'soilpulsecore.metadata_scheme.GeographicalBoundingBox'>, 'date_accepted': <class 'soilpulsecore.metadata_scheme.DateAccapted'>, 'date_available': <class 'soilpulsecore.metadata_scheme.DateAvailable'>, 'date_collected': <class 'soilpulsecore.metadata_scheme.DateCollected'>, 'date_copyrighted': <class 'soilpulsecore.metadata_scheme.DateCopyrighted'>, 'date_created': <class 'soilpulsecore.metadata_scheme.DateCreated'>, 'date_issued': <class 'soilpulsecore.metadata_scheme.DateIssued'>, 'date_submitted': <class 'soilpulsecore.metadata_scheme.DateSubmitted'>, 'date_updated': <class 'soilpulsecore.metadata_scheme.DateUpdated'>, 'date_valid': <class 'soilpulsecore.metadata_scheme.DateValid'>, 'funding_reference': <class 'soilpulsecore.metadata_scheme.FundingReference'>, 'graphic_overview': <class 'soilpulsecore.metadata_scheme.GraphicOverview'>, 'other_alternate_title': <class 'soilpulsecore.metadata_scheme.OtherAlternateTitle'>, 'responsible_organization': <class 'soilpulsecore.metadata_scheme.ResponsibleOrganization'>, 'responsible_person': <class 'soilpulsecore.metadata_scheme.ResponsiblePerson'>, 'subtitle': <class 'soilpulsecore.metadata_scheme.Subtitle'>, 'summary': <class 'soilpulsecore.metadata_scheme.Summary'>, 'temporal_extent': <class 'soilpulsecore.metadata_scheme.TemporalExtent'>, 'title': <class 'soilpulsecore.metadata_scheme.Title'>, 'translated_title': <class 'soilpulsecore.metadata_scheme.TranslatedTitle'>}
minCounts = {'alternate_title': 0, 'bounding_box': 1, 'date_accepted': 0, 'date_available': 1, 'date_collected': 0, 'date_copyrighted': 0, 'date_created': 1, 'date_issued': 1, 'date_submitted': 0, 'date_updated': 1, 'date_valid': 0, 'funding_reference': 1, 'graphic_overview': 0, 'other_alternate_title': 0, 'responsible_organization': None, 'responsible_person': 2, 'subtitle': 0, 'summary': 1, 'temporal_extent': 0, 'title': 1, 'translated_title': 0}
classmethod registerMetadataEntityType(entityClass)
searchPatterns = {'alternate_title': {}, 'bounding_box': {}, 'date_accepted': {}, 'date_available': {}, 'date_collected': {}, 'date_copyrighted': {}, 'date_created': {}, 'date_issued': {}, 'date_submitted': {}, 'date_updated': {}, 'date_valid': {}, 'funding_reference': {}, 'graphic_overview': {}, 'other_alternate_title': {}, 'responsible_organization': {}, 'responsible_person': {}, 'subtitle': {}, 'summary': {}, 'temporal_extent': {}, 'title': {}, 'translated_title': {}}
classmethod showEntityCount()
classmethod showSearchExpressions()
class soilpulsecore.metadata_scheme.FundingReference(sourceString, value=None)

Bases: MetadataEntity

ID = '7'
description = 'Information about financial support (funding) for the dataset being registered.'
key = 'funding_reference'
maxMultiplicity = None
minMultiplicity = 1
name = 'Funding reference'
class soilpulsecore.metadata_scheme.GeographicalBoundingBox(northLat, southLat, westLong, eastLong, coordinateSystem, epsg=None)

Bases: GeographicalMetadataEntity

ID = '9'
description = 'The spatial limits of a box. A box is defined by two geographic points.    Lower left corner and upper right corner. Each point is defined by its longitude and latitude value.'
key = 'bounding_box'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Geographical bounding box'
class soilpulsecore.metadata_scheme.GeographicalMetadataEntity(coordinateSystem, epsg=None)

Bases: MetadataEntity

Abstract interface of metadata element with geographical value

class soilpulsecore.metadata_scheme.GraphicOverview(sourceString, value=None)

Bases: MetadataEntity

ID = '4'
description = 'Graphic that provides an illustration of the dataset.'
key = 'graphic_overview'
maxMultiplicity = None
minMultiplicity = 0
name = 'Graphic overview'
class soilpulsecore.metadata_scheme.MetadataEntity(sourceString, value=None)

Bases: object

Top level abstract class of the metadata entity. Defines metadata elements interface

ID = None
dataType = None
description = None
domain = None
getMySQLrepresenatation()

Creates the string of element’s MySQL snippet :return: MySQL query string

getXMLrepresenatation()

Creates the string of element’s XML representation :return: XML string

key = None
keywords = {}
maxMultiplicity = None
minMultiplicity = 0
name = None
searchPatterns = {}
classmethod showSearchPhrases()
subtypeOf = None
class soilpulsecore.metadata_scheme.MetadataStructureMap

Bases: object

Realisation of a metadata element set and relationships describing particular dataset

addEntity(entity, pointer)

Adds an entity-pointer pair to elements list

Parameters:
  • entity – MetadataEntity instance

  • pointer – Pointer instance

checkConsistency()

Checks the number of appearances of entity types

mergeEntities()

Merges two entities into one.

removeEntity(index)

Removes an entity-pointer pair from elements list by index

Parameters:

index – list index of the entity-pointer pair to be removed

saveToDatabase()

Saves the structure map to a database

splitEntity()

Splits one entity into two

class soilpulsecore.metadata_scheme.OtherAlternateTitle(value, language, encoding)

Bases: TextMetadataEntity

ID = '2.4'
description = ''
key = 'other_alternate_title'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Other alternate title'
class soilpulsecore.metadata_scheme.ResponsibleOrganization(roleType)

Bases: SubjectMetadataEntity

ID = '6.2'
description = 'Institution involved in producing (collecting, managing, distributing, or otherwise            contributing to the development of the dataset) the data, or having a relation to the authors of the publication,             in priority order.'
key = 'responsible_organization'
maxMultiplicity = None
minMultiplicity = None
name = 'Responsible organization'
roleTypes = {'Distributor': 0, 'Hosting institution': 0, 'Owner': 0, 'Point of contact': 0, 'Processor': 0, 'Publisher': 0, 'Registration agency': 0, 'Registration authority': 0, 'Research group': 0, 'Resource provider': 0, 'Rights Holder': 0, 'Sponsor': 0}
setRoleType(roleType)

Setter for private __roleType attribute

class soilpulsecore.metadata_scheme.ResponsiblePerson(roleType)

Bases: SubjectMetadataEntity

ID = '6.1'
description = 'Person involved in producing (collecting, managing, distributing, or otherwise            contributing to the development of the dataset) the data, or the authors of the publication,             in priority order. Will be cited if Author is used as contact type.'
key = 'responsible_person'
maxMultiplicity = None
minMultiplicity = 2
name = 'Responsible person'
roleTypes = {'Author': 1, 'Custodian': 0, 'Data Collector': 0, 'Data Curator': 0, 'Editor': 0, 'Originator': 0, 'Owner': 0, 'Point of contact': 0, 'Principal investigator': 0, 'Processor': 0, 'Producer': 0, 'Project leader': 1, 'Project manager': 0, 'Project member': 0, 'Related person': 0, 'Researcher': 0, 'Rights Holder': 0, 'Sponsor': 0, 'Supervisor': 0, 'User': 0, 'Work package leader': 0}
setRoleType(roleType)

Setter for private __roleType attribute

class soilpulsecore.metadata_scheme.SubjectMetadataEntity(value)

Bases: MetadataEntity

Abstract interface class of metadata element that represents a person or an institution that is responsible for producing (collecting, managing, distributing, or otherwise contributing to the development of the dataset) the data, or has relation to authors of the publication

roleTypes = {}
class soilpulsecore.metadata_scheme.Subtitle(value, language, encoding)

Bases: TextMetadataEntity

ID = '2.2'
description = ''
key = 'subtitle'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Subtitle'
class soilpulsecore.metadata_scheme.Summary(value, language, encoding)

Bases: TextMetadataEntity

ID = '3'
description = 'Brief narrative summary of the content of the dataset.'
key = 'summary'
maxMultiplicity = None
minMultiplicity = 1
name = 'Summary'
class soilpulsecore.metadata_scheme.TemporalExtent

Bases: DateMetadataEntity

ID = '11'
description = 'The time period in which the resource content was collected (e.g. From 2008-01-01 to 2008-12-31)'
key = 'temporal_extent'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Temporal extent'
class soilpulsecore.metadata_scheme.TextMetadataEntity(value, language, encoding)

Bases: MetadataEntity

Abstract interface class of metadata element with textual value

class soilpulsecore.metadata_scheme.Title(value, language, encoding)

Bases: TextMetadataEntity

ID = '1'
description = 'A characteristic, unique name by which the dataset is known.'
key = 'title'
maxMultiplicity = 1
minMultiplicity = 1
name = 'Title'
class soilpulsecore.metadata_scheme.TranslatedTitle(value, language, encoding)

Bases: TextMetadataEntity

ID = '2.3'
description = ''
key = 'translated_title'
maxMultiplicity = 1
minMultiplicity = 0
name = 'Subtitle'

soilpulsecore.project_management module

class soilpulsecore.project_management.ContainerHandler(project, parent_container, **kwargs)

Bases: object

Represents an enclosed data structure. It can be either a file or string or other data structure that can be manipulated and analyzed

DBfields = {}
addStringConcept(string, concept)

Add string to concept translation to container while checking for duplicity with already present concepts

Parameters:
  • string – string to be assigned the concept translation

  • concept – concept to be added

Returns:

None

addStringMethod(string, method)

Add string to method translation to container while checking for duplicity with already present methods

Parameters:
  • string – string to be assigned the method translation

  • method – method to be added

Returns:

None

addStringUnit(string, unit)

Add string to unit translation to container while checking for duplicity with already present methods

Parameters:
  • string – string to be assigned the unit translation

  • unit – unit to be added

Returns:

None

assignCrawler(crawler)
collectConcepts(collection={}, cascade=True)

Collects assigned string-concepts translations from the container and recursively from all sub-containers if desired

Parameters:
  • collection – the collection of string-concepts translations that will be returned

  • cascade – whether to include translations from sub-containers

Returns:

collection of string-concepts translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]

collectContainerIDsToList(output=[])

Collects recursively IDs of all sub-containers (and their sub-containers …)

Parameters:

output – the output list of all sub-container IDs

collectMethods(collection={}, cascade=True)

Collects assigned string-methods translations from the container and recursively from all sub-containers if desired

Parameters:
  • collection – the collection of string-methods translations that will be returned

  • cascade – whether to include translations from sub-containers

Returns:

collection of string-methods translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]

collectUnits(collection={}, cascade=True)

Collects assigned string-units translations from the container and recursively from all sub-containers if desired

Parameters:
  • collection – the collection of string-units translations that will be returned

  • cascade – whether to include translations from sub-containers

Returns:

collection of string-units translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]

containerFormat = None
containerType = None
createTree(*args)
deleteOwnFiles(failed=[])

Deletes container’s own file (if exists) from locale storage. First induces deleting own files of sub-containers to prevent errors.

Parameters:

failed – list of unsuccessful attempts and reason for that [undeleted file path, description of error]

Returns:

the same list of undeleted files

getAnalyzed(cascade=True, force=False, report=False)

Induces further decomposition of the container into logical sub-elements.

getCrawled(cascade=True, force=False, report=False)

Induces content search for metadata elements based on appropriate set of search rules and terms.

getSerializationDictionary(cascade=True)

Creates JSON structured string with serialization of the container and its sub-containers

Parameters:

cascade – whether to recurse through sub-containers

classmethod getSpecializedSubclassType(**kwargs)

This method comes handy when one ContainerHandler subclass needs to control some rules for creation of own subclasses Default is ‘no specialization’ e.a. returns the same type as is

keywordsDBname = None
listOwnFiles(collection)
removeAllConcepts(cascade=False)

Removes all string-concept assigned to container and all sub-containers recursively if desired :return: None

removeAllMethods(cascade=False)

Removes all string-method translation assigned to container and all sub-containers recursively if desired :return: None

removeAllUnits(cascade=False)

Removes all string-unit translations assigned to container and all sub-containers recursively if desired :return: None

removeConceptOfString(string, concept_to_remove)

Remove string-concept translation from container. If the translation was last for given string, the string gets removed from the translations as well.

Parameters:
  • string – string that has the concept assigned

  • concept_to_remove – concept to be removed

Returns:

None if the string was not in the containers string-concept translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the concept from

removeMethodOfString(string, method_to_remove)

Remove string-method translation from container. If the translation was last for given string, the string gets removed from the translations as well.

Parameters:
  • string – string that has the method assigned

  • method_to_remove – method to be removed

Returns:

None if the string was not in the containers string-method translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the method from container’s list

removeUnitOfString(string, unit_to_remove)

Remove string-unit translation from container. If the translation was last for given string, the string gets removed from the translations as well.

Parameters:
  • string – string that has the unit assigned

  • unit_to_remove – unit to be removed

Returns:

None if the string was not in the containers string-unit translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the unit

serializationDict = {}
showContents(depth=0, ind='. ', show_concepts=True, show_methods=True, show_units=True)

Prints structured info about the container and invokes showContents on all of its containers

Parameters:
  • depth – current depth of showContent recursion

  • ind – string of a single level indentation

  • show_concepts – whether to show also the string-concepts translations

  • show_methods – whether to show also the string-methods translations

  • show_units – whether to show also the string-units translations

updateDBrecord(db_connection, cascade=True)

Invokes updating of containers record in storage

class soilpulsecore.project_management.ContainerHandlerFactory(project)

Bases: object

ContainerHandler object instances factory, the only way to create container handlers Each Project has one to keep track of all the ContainerHandler class and all subclass’ instances created

containerTypes = {}
createHandler(general_type, *args, **kwargs)

Creates and returns instance of ContainerHandler of given type Subclasses can implement further specialization of the type by overriding ContainerHandler.getSpecializedSubclassType()

classmethod getAllNeededDBfields()

Returns list of needed fields for storing all container types that are registered in factory

getContainerByID(cid)

Returns container of given ID from inner dictionary

classmethod registerContainerType(containerTypeClass, key)

Registers ContainerHandler subclasses in the factory

removeContainerByID(cid)

Removes container with given local ID and all of its sub-containers from projects container tree

class soilpulsecore.project_management.Crawler(container)

Bases: object

Top level abstract class of the metadata/data crawler

analyze(report=False)

Analyzes inner structure of the container :return: list of containers - container tree

crawl(report=False)

Parses the container content and searches for metadata elements. :return:

crawlerType = 'zero'
find_translations_in_dictionary(dictionary, min_match_length=2, full_match_only=True)

Searches for string matches between selected attributes of crawler’s container and strings in dictionary. :param dictionary: the dictionary with string translations :param min_match_length: minimum length of a match to be included in results :param full_match_only: returns only perfect matches if True :return: list of matches

find_translations_in_vocabulary(vocabulary, min_match_length=3, full_match_only=False)

Searches for string matches between selected attributes of crawler’s container and a term in vocabulary. :param vocabulary: the vocabulary with term meanings :param min_match_length: minimum length of a match to be included in results :param full_match_only: returns only perfect matches if True :return: list of matches

classmethod getFallbackCrawlerType(container, **kwargs)
classmethod getSpecializedCrawlerType(container, **kwargs)
validate()

Validates suitability of particular crawler type for given container

class soilpulsecore.project_management.CrawlerFactory

Bases: object

Factory of Crawler class instances

crawlerExtensions = {}
crawlerTypes = {'zero': <class 'soilpulsecore.project_management.Crawler'>}
classmethod createCrawler(general_type, container, *args, **kwargs)

Creates and returns instance of Crawler subclass based on registered types and their specialization procedures

classmethod registerCrawlerType(crawler_class)
class soilpulsecore.project_management.Dataset(name, project)

Bases: object

Represents a set of data containers that form together a distinct collection of data represented by a MetadataStructureMap. The instance has its own MetadataStructureMap that is being composed during the metadata generation phase

addContainer(container)

Adds a ContainerHandler instances to Dataset’s containers list

addContainers(containers)

Wrapper for adding more containers at once in a list to Dataset’s containers list

checkMetadataStructure()
createDedicatedDirectory()

Creates directory for a dataset to store its files

getAllContainerIDsList()

Return list of IDs of all containers that belong to dataset (all containers within containers)

getAnalyzed(cascade=True, force=False, report=False)

Induces analysis of own containers.

getContainerIDsList()

Return list of container IDs that are directly in the dataset

getCrawled(cascade=True, force=False, report=False)

Induces crawling of own containers

getFrictionlessCompleteTransformation()
getSerializationDictionary()
get_frictionless_package(output_path=None)

Composes frictionless package from containers of the dataset. Recursively searches for table containers and then builds valid Package instance :param output_path: file path to save the package descriptor

load_transformation_steps(path)

Loads a file content and tries to evaluate it as a steps definition for transformation Pipeline

removeAllConcepts()

Removes all concepts from all containers within dataset

removeAllMethods()

Removes all methods from all containers within dataset

removeAllUnits()

Removes all units from all containers within dataset

removeContainer(containers_to_remove)

Removes one or more ContainerHandler instances from Dataset’s containers list

showContainerTree(show_concepts=True, show_methods=True, show_units=True)

Induces printing contents of the dataset’s container tree

showContents(show_containers=True, show_concepts=True, show_methods=True, show_units=True)
updateDBrecord(db_connection)

Invokes saving/updating of the dataset storage record

class soilpulsecore.project_management.ProjectManager(db_connection, user_id, **kwargs)

Bases: object

Top level manager of metadata mining project. Gathers all source files either from remote sources (download from URL) or local sources (upload from local computer).

collectContainerConcepts()

Collects all concepts from containers of the tree

collectContainerMethods()

Collects all methods from containers of the tree

collectContainerUnits()

Collects all units from containers of the tree

createDataset(name, id=None)

Adds Dataset object instance to dataset list, creates directory in project datasets folder

deleteAllProjectFiles()

Deletes files of all containers in the tree

deleteDownloadedFiles()

Deletes all files that are stored in the list of downloaded files

deleteUploadedFiles()

Deletes all files that are stored in the list of uploaded files

downloadFilesFromURL(urls)

Handles all needed steps to download file/files from a session (unpack archives if necessary) and create file structure tree :param urls: url string or list of url strings

downloadPublishedFiles(list=None)

Download files that are stored in self.sourceFiles dictionary

Parameters:
  • list – list of SourceFile indexes to be downloaded, or None if files are to be downloaded

  • unzip – if the downloaded file is a .zip archive it will be extracted if unzip=True

Returns:

list of local relative paths of all files copied to the local/temporary storage

exportTranslationsDictionaryToFile(dictionary, filepath)

Saves string translations dictionary to a file, overwrites if exists :param dictionary: dictionary to dump :param filepath: path of a file to save

getAllFilesList()

Collects file paths from all containers in the tree :return: list of file paths

getContainerByID(cid)

Returns container instances of given ID/IDs :param cid: single ID or a list of IDs

getContainersByParentID(pid)

Return container instances that have parent container with given ID

Parameters:

pid – ID of the parent container

getContainersSerialization()

Collects serialization JSON structure of all containers in the tree :return: dictionary with all containers attributes

static getDOImetadata(doi)

Get metadata in JSON from registration agency for provided DOI

Parameters:

doi – doi string of the resource

Returns:

json of metadata

getDatasetsSerialization()

Collects serialization JSON structure of all datasets in the project :return: dictionary with all datasets attributes

getPublisher(DOI_metadata)

Gets the Publisher class instance from what’s stored in the DOI metadata

static getRegistrationAgencyOfDOI(doi, meta=False)

Get registration agency from doi.org API.

Parameters:
  • doi – the DOI string of a published dataset (10.XXX/XXXX).

  • meta – true to return whole json, false to return only a string of registration agency

Returns:

complete registration agency json if meta = True, else registration agency name string

loadTranslationsFromFile(input_file)

Loads string-* translations JSON file

Parameters:

input_file – path of vocabulary file to load from

removeAllDatasets()

Removes all datasets from a project including all their directories

removeContainer(container)
removeDataset(dataset)

Removes Dataset object instance from project’s datasets. Deletes dataset’s directory.

Parameters:

dataset – Dataset handler object instance

removeDatasetByID(dataset_id)

Removes Dataset object instance from project’s datasets based on its ID. Deletes dataset’s directory.

Parameters:

dataset_id – Local (project scope) Dataset ID to be removed

removeDatasetByIndex(index)

Removes Dataset object instance from dataset list by index. Deletes dataset’s directory

Parameters:

index – index of the dataset in the self.datasets list

setDOI(doi)

Changes the DOI of a project with all appropriate actions - reads the registration agency and publisher response metadata and assigns them to the ProjectManager - check for files bound to the DOI record - remove old files if there were any :param doi: DOI to apply

showContainerTree(show_concepts=True, show_methods=True, show_units=True)

Induces printing contents of the whole container tree

showDatasetsContents(show_containers=True)

Induces printing contents of all dataset in project

showDictionaries()

Prints structured dictionaries contents to console

showFilesStructure()

Prints “file path” - “container ID” mapping for all containers in the project

updateConceptsTranslationsFromContents()

Updates project’s string-concept translations by translations from own containers

updateConceptsTranslationsFromFile(input_file)

Adds string-concepts translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load

updateDBrecord(cascade=True)

Saves current state of the project and its contents (if specified) through current DBconnector object

Parameters:

cascade – whether to save the containers, datasets and other state attributes

updateMethodsTranslationsFromContents()

Updates project’s string-method translations by translations from own containers

updateMethodsTranslationsFromFile(input_file)

Adds string-method translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load

updateUnitsTranslationsFromContents()

Updates project’s string-unit translations by translations from own containers

updateUnitsTranslationsFromFile(input_file)

Adds string-concepts translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load

uploadFilesFromSession(files)

Handles all needed steps to upload files from a session (unpack archives if necessary) and create file structure tree :param files: path string or list of path strings

class soilpulsecore.project_management.Publisher

Bases: object

getFileInfo(*args)
getMetadata(*args)
key = None
name = None
class soilpulsecore.project_management.PublisherFactory

Bases: object

Publisher object factory

classmethod createHandler(publisherKey, *args)

Creates and returns instance of Publisher of given key

publishers = {'Zenodo': <class 'soilpulsecore.data_publishers.ZenodoPublisher'>}
classmethod registerPublisher(publisherClass)
class soilpulsecore.project_management.SourceFile(id, filename, size=None, source_url=None, checksum=None, checksum_type=None)

Bases: object

soilpulsecore.project_management.get_directory_size(path)

Calculates total occupied space of a directory and its contents

soilpulsecore.project_management.get_formated_file_size(path)

Return a string of dynamically formatted file size.

soilpulsecore.project_management.updateTranslationsDictionary(target_dict, input_dict)

General function to update one translations dictionary (concepts/methods/units) by another. Strings from input dictionary are added to target dictionary if not there already. Translations from input dictionary are added to target vocabulary if not there already

soilpulsecore.relationships module

class soilpulsecore.relationships.Relationship

Bases: object

Defines the very basic relation between entities

Module contents