soilpulsecore package
Submodules
soilpulsecore.data_publishers module
- class soilpulsecore.data_publishers.ZenodoPublisher(zenodo_id)
Bases:
Publisher- getFileInfo()
Collect resource files information from Zenodo record
- Returns:
list of SourceFile instances created from Zenodo API response
- getMetadata()
Collect metadata package from Zenodo record
- Parameters:
zenodo_id – Zenodo record identifier
- Returns:
response as JSON
- key = 'Zenodo'
- name = 'Zenodo'
soilpulsecore.db_access module
@author: Jan Devátý, Jonas Lenz
- class soilpulsecore.db_access.DBconnector
Bases:
object- concepts_translations_filename = '_concepts_translations.json'
- concepts_vocabulary_filenames = {'AGROVOC': 'agrovoc_excerpt.json'}
- create_project_directory(project_id)
- datasets_directory_name = 'datasets'
- deleteDatasetRecord(dataset)
- deleteProject(project, delete_dir=True)
- dirname_prefix = None
- establishProjectRecord(user_id, project, unique_names=True)
- getDatasetsOfProject(project_id)
- getNewProjectID()
Finds a correct ID that should be assigned to next new project.
- getProjectsOfUser(user_id)
- getUserNameByID(id)
- classmethod get_connector()
Returns DBconnector subclass instance that links to storage where project structural information will be stored. If running MySQL server with soilpulse DB is found MySQLconnector is return otherwise NullConnector
- loadConceptsVocabularies()
Loads string-concepts translations JSON file
- loadMethodsVocabularies()
Loads string-method translations JSON file
- loadProject(project)
- loadTranslationsOfProject(project)
- loadUnitsVocabularies()
Loads string-unit translations JSON file
- loadVocabularyFromFile(input_file)
- loadVocabularyFromResource(filename)
Loads string-* translations JSON file
- Parameters:
filename – name of vocabulary file inside the package ‘vocabularies’ folder
- methods_translations_filename = '_methods_translations.json'
- methods_vocabulary_filenames = {}
- printUserInfo(user_id)
- project_files_dir_name = 'project_files'
- project_files_root = PosixPath('/home/docs/SoilPulse/project_files')
- soilpulse_root_dir_name = 'SoilPulse'
- units_translations_filename = '_units_translations.json'
- units_vocabulary_filenames = {}
- updateContainerRecord(container, cascade=False)
- updateProjectRecord(project, cascade=False)
- updateTranslationDictionaries(project)
- vocabularies_dir_name = 'vocabularies'
- class soilpulsecore.db_access.EntityKeywordsDB
Bases:
objectProvides methods to access and manipulate the SoilPulse metadata entity keywords.
- DBs = {}
- classmethod connect(dbpath)
- dbDir = 'soilpulse\\databases'
- classmethod loadKeywords(entityClass)
Loads entity’s keywords from DB and translates them into RE search patterns. Here the translation of the keywords by some thesaurus is possible …
- Parameters:
entity – MetadataEntity subclass
- Returns:
dictionary of regular expression patterns with group names {unique group name: search pattern, …}
- classmethod registerKeywordsDB(dbType, dbFilename)
- class soilpulsecore.db_access.EntitySearchPatternsDB
Bases:
objectProvides methods to access and manipulate the SoilPulse metadata entity types and their properties stored in the DB “entities”
- classmethod connect()
- dbpath = 'soilpulse\\databases\\entity_search_patterns'
- classmethod loadSearchPatterns(entityClass)
Loads search patterns stored in the DB for given entityID (string ID from metadata scheme definition). Returns dictionary
- Parameters:
entityClass – MetadataEntity subclass
- Returns:
dictionary of regular expression patterns with group names {unique group name: search pattern, …}
- class soilpulsecore.db_access.MySQLConnector
Bases:
DBconnectorProvides methods to access and manipulate the SoilPulse database storage of MetadataMappings and possibly the data storage as well
- checkContainersTableStructure(needed_fields)
- checkoutUser(user_id)
- conceptDictionaryTableName = '`concepts_dictionary`'
- conceptsContainersTableName = '`container_concepts`'
- containersTableName = '`containers`'
- datasetContainerRecordExists(dataset, container)
Checks if datset-container link already has database entry
- Parameters:
dataset – local container ID
container – ID of project the container belongs to
- datasetsContainersTableName = '`datasets_containers`'
- datasetsTableName = '`datasets`'
- db_name = 'soilpulse'
- deleteDatasetRecord(dataset)
- deleteOrphannedConceptTranslations(project_id, current_vocab)
Deletes translations from the database that are no longer in use for given project.
- Parameters:
project_id – The ID of the project for which to clean up translations.
current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).
- deleteOrphannedMethodTranslations(project_id, current_vocab)
Deletes string-method translations from the database that are no longer in use for given project.
- Parameters:
project_id – The ID of the project for which to clean up translations.
current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).
- deleteOrphannedUnitTranslations(project_id, current_vocab)
Deletes string-unit translations from the database that are no longer in use for given project.
- Parameters:
project_id – The ID of the project for which to clean up translations.
current_vocab – A dictionary of currently used translations (structured as {string: [{vocabulary, uri}, …]}).
- deleteProject(project, delete_dir=True)
- deleteRemovedContainers()
- dirname_prefix = ''
- establishProjectRecord(user_id, project, unique_names=True)
- getConceptTranslationID(string, vocabulary, uri, project_id)
Returns id of string-concept translation in DB or None if no such translation exists. The return ID is used in container-translation relation
- getContainerGlobalID(container)
Get a global ID (SoilPulse DB scope) of a container
- Parameters:
container – the container
- Returns:
global ID of the container in SoilPulse DB
- getDatasetGlobalID(dataset)
Checks if dataset with provided name and Project ID already has database entry
- Parameters:
dataset – dataset object instance
- getDatasetsOfProject(project_id)
Loads Dataset info of a given project ID from SoilPulse database. Call this function to obtain list of dataset names within a project
- Parameters:
project_id – ID of the Project whose Datasets should be loaded
- Returns:
list of dataset names [ID, … ]
- getMethodTranslationID(string, vocabulary, uri, project_id)
Returns id of string-method translation in DB or None if no such translation exists. The return ID is used in container-translation relation
- getNewProjectID()
Finds a correct ID that should be assigned to next new project.
- getProjectsOfUser(user_id)
Loads Projects info of a given user from SoilPulse database. Call this function to obtain dictionary of Project names and IDs that are owned by user with given user_id
- Parameters:
user_id – ID of the user whose Projects should be loaded
- Returns:
dictionary of ProjectManagers info {Project id: Project name, …}
- getUnitTranslationID(string, vocabulary, uri, project_id)
Returns id of string-unit translation in DB or None if no such translation exists. The return ID is used in container-translation relation
- getUserNameByID(id)
- insertStringConceptTranslation(string, term, vocabulary, uri, project_id)
Inserts DB entry of a string-concept definition into database table and returns ID of the translation
- Parameters:
string – the string which translation it is
term – the term of the uri
vocabulary – vocabulary of the unit definition
uri – unique identifier of the term within specified vocabulary
project_id – ID of project to which the translation belongs
- Returns:
ID of the newly created translation DB entry
- insertStringMethodTranslation(string, term, vocabulary, uri, project_id)
Inserts DB entry of a string-method definition into database table and returns ID of the translation
- Parameters:
string – the string which translation it is
term – the term of the uri
vocabulary – vocabulary of the unit definition
uri – unique identifier of the term within specified vocabulary
project_id – ID of project to which the translation belongs
- Returns:
ID of the newly created translation DB entry
- insertStringUnitTranslation(string, term, vocabulary, uri, project_id)
Inserts DB entry of a string-unit definition into database table and returns ID of the translation
- Parameters:
string – the string being translated
term – the term of the uri
vocabulary – vocabulary of the unit definition
uri – unique identifier of the term within specified vocabulary
project_id – ID of project to which the translation belongs
- Returns:
ID of the newly created translation DB entry
- loadChildContainers(project, parent_container=None)
- loadConceptsOfContainer(container)
Collects all concept records from DB belonging to a given container
- Parameters:
container – the container instance
- Returns:
- loadDatasetsOfProject(project)
- loadMethodsOfContainer(container)
Collects all method records from DB belonging to a given container
- Parameters:
container – the container instance
- Returns:
- loadProject(project, cascade=True)
Loads database record of Project and all of its contents if cascade == True
- Parameters:
project – the Project instance to be loaded from DB
cascade – whether to load all contents
- Returns:
input Project instance with filled attributes
- loadSearchPatterns(entity)
Loads search patterns stored in the DB for given entityID (string ID from metadata scheme definition). Returns dictionary
- Parameters:
entity – MetadataEntity subclass
- Returns:
dictionary of regular expression patterns with group names {unique group name: search pattern, …}
- loadUnitsOfContainer(container)
Collects all unit records from DB belonging to a given container
- Parameters:
container – the container instance
- Returns:
- methodsContainersTableName = '`container_methods`'
- methodsDictionaryTableName = '`methods_dictionary`'
- projectsTableName = '`projects`'
- pwd = 'NFDI4earth'
- server = 'localhost'
- unitsContainersTableName = '`container_units`'
- unitsDictionaryTableName = '`units_dictionary`'
- updateConceptsOfContainer(container)
Updates DB record of all string-concept translations of a container
- updateContainerRecord(container, cascade=False, cont_updated=[])
Updates DB entry of the container and its related entities (concepts, unit, methods) On cascade=True recursively invokes update on all sub-containers Records list of updated container ids in cont_updated
- Parameters:
container – the container instance to be updated
cascade – whether or not to update the sub-containers’ records too
cont_updated – list of IDs of updated containers
- updateDatasetRecord(dataset)
Saves dataset state into DB
- updateMethodsOfContainer(container)
Updates DB record of all string-method translations of a container
- updateProjectRecord(project, cascade=False)
Updates database record of a Project and all of its contents
- Parameters:
project – the Project instance reference to be saved
cascade – whether to update the containers too
- updateUnitsOfContainer(container)
Updates DB record of all string-units translations of a container
- userProjectsTableName = '`user_projects`'
- userTableName = '`users`'
- username = 'soilpulse'
- class soilpulsecore.db_access.NullConnector
Bases:
DBconnector- checkoutUser(user_id)
- containers_attr_filename = '_containers.json'
- datasets_attr_filename = '_datasets.json'
- deleteProject(project, delete_dir=True)
- dirname_prefix = 'temp_'
- establishProjectRecord(user_id, project, unique_names=True)
- getAllTempProjectIDs()
- getNewProjectID()
Finds a correct ID that should be assigned to next new project.
- getProjectsOfUser(user_id)
- getUserNameByID(user_id)
- loadChildContainers(project, containers_serialized, parent_container=None)
- loadDatasets(project, datasets_serialized)
- loadProject(project, cascade=True)
- printUserInfo(user_id)
- project_attr_filename = '_project.json'
- updateContainerRecord(container, cascade=False)
- updateDatasetRecord(dataset)
- updateProjectRecord(project, cascade=False)
- soilpulsecore.db_access.generate_project_unique_name(existing_names, name)
Generates a unique name for a user’s project by appending a number in parentheses if the name already exists.
- Parameters:
existing_names – list of currently existing names in database
name – the original name to check and modify if necessary.
- Returns:
a unique name
soilpulsecore.exceptions module
@author: Jan Devátý
- exception soilpulsecore.exceptions.DOIdataRetrievalException(message)
Bases:
ExceptionThis exception is raised whenever there’s something wrong about DOI data retrieval and manipulation
- exception soilpulsecore.exceptions.DatabaseFetchError(message)
Bases:
ExceptionThis exception is raised when there’s something wrong with data retrieval from SoilPulse database
- exception soilpulsecore.exceptions.DeserializationError(message)
Bases:
ExceptionThis exception is raised when there’s something wrong with data being read from a serialization
- exception soilpulsecore.exceptions.LocalFileManipulationError(message)
Bases:
ExceptionThis exception is raised when there’s something wrong with local files manipulation
soilpulsecore.metadata_scheme module
@author: Jan Devátý
- class soilpulsecore.metadata_scheme.AlternateTitle(value, language, encoding)
Bases:
TextMetadataEntity- ID = '2.1'
- description = 'A short name by which the dataset is also known.'
- key = 'alternate_title'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Alternate title'
- class soilpulsecore.metadata_scheme.DateAccapted(value)
Bases:
DateMetadataEntity- ID = '5.1'
- description = 'The date that the publisher accepted the resource into their system.'
- key = 'date_accepted'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Date accepted'
- class soilpulsecore.metadata_scheme.DateAvailable(value)
Bases:
DateMetadataEntity- ID = '5.2'
- description = 'The date the resource was or will be made publicly available.'
- key = 'date_available'
- maxMultiplicity = 1
- minMultiplicity = 1
- name = 'Date available'
- class soilpulsecore.metadata_scheme.DateCollected(value)
Bases:
DateMetadataEntity- ID = '5.3'
- description = 'The date or date range in which the dataset content was collected.'
- key = 'date_collected'
- maxMultiplicity = 2
- minMultiplicity = 0
- name = 'Date collected'
- class soilpulsecore.metadata_scheme.DateCopyrighted(value)
Bases:
DateMetadataEntity- ID = '5.4'
- description = 'The specific, documented date at which the dataset receives a copyrighted status, if applicable.'
- key = 'date_copyrighted'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Date copyrighted'
- class soilpulsecore.metadata_scheme.DateCreated(value)
Bases:
DateMetadataEntity- ID = '5.5'
- description = 'The date the dataset itself was put together; a single date for a final component (e.g. the finalised file with all of the data).'
- key = 'date_created'
- maxMultiplicity = 1
- minMultiplicity = 1
- name = 'Date created'
- class soilpulsecore.metadata_scheme.DateIssued(value)
Bases:
DateMetadataEntity- ID = '5.6'
- key = 'date_issued'
- maxMultiplicity = 1
- minMultiplicity = 1
- name = 'Date issued'
- class soilpulsecore.metadata_scheme.DateMetadataEntity(value)
Bases:
MetadataEntityAbstract interface class of metadata element with date value
- class soilpulsecore.metadata_scheme.DateSubmitted(value)
Bases:
DateMetadataEntity- ID = '5.7'
- description = 'The date the author submits the resource to the publisher. This could be different from “Accepted” if the publisher then applies a selection process.'
- key = 'date_submitted'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Date submitted'
- class soilpulsecore.metadata_scheme.DateUpdated(value)
Bases:
DateMetadataEntity- ID = '5.8'
- description = 'The date of the last update (last revision) to the dataset, when the dataset is being added to.'
- key = 'date_updated'
- maxMultiplicity = 1
- minMultiplicity = 1
- name = 'Date updated'
- class soilpulsecore.metadata_scheme.DateValid(value)
Bases:
DateMetadataEntity- ID = '5.9'
- description = 'The date or date range during which the dataset or resource is accurate.'
- key = 'date_valid'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Date valid'
- class soilpulsecore.metadata_scheme.EntityManager
Bases:
objectManages the entity type classes, provides access to entity instances, takes care about limiting number of instances of a particular type
- classmethod checkMaxCounts()
- classmethod checkMinCounts()
- classmethod createEntityInstance(entityType, *args)
- currentCount = {'alternate_title': 0, 'bounding_box': 0, 'date_accepted': 0, 'date_available': 0, 'date_collected': 0, 'date_copyrighted': 0, 'date_created': 0, 'date_issued': 0, 'date_submitted': 0, 'date_updated': 0, 'date_valid': 0, 'funding_reference': 0, 'graphic_overview': 0, 'other_alternate_title': 0, 'responsible_organization': 0, 'responsible_person': 0, 'subtitle': 0, 'summary': 0, 'temporal_extent': 0, 'title': 0, 'translated_title': 0}
- keywordDatabases = {}
- keywordPatterns = {'alternate_title': {}, 'bounding_box': {}, 'date_accepted': {}, 'date_available': {}, 'date_collected': {}, 'date_copyrighted': {}, 'date_created': {}, 'date_issued': {}, 'date_submitted': {}, 'date_updated': {}, 'date_valid': {}, 'funding_reference': {}, 'graphic_overview': {}, 'other_alternate_title': {}, 'responsible_organization': {}, 'responsible_person': {}, 'subtitle': {}, 'summary': {}, 'temporal_extent': {}, 'title': {}, 'translated_title': {}}
- maxCounts = {'alternate_title': 1, 'bounding_box': 1, 'date_accepted': 1, 'date_available': 1, 'date_collected': 2, 'date_copyrighted': 1, 'date_created': 1, 'date_issued': 1, 'date_submitted': 1, 'date_updated': 1, 'date_valid': 1, 'funding_reference': None, 'graphic_overview': None, 'other_alternate_title': 1, 'responsible_organization': None, 'responsible_person': None, 'subtitle': 1, 'summary': None, 'temporal_extent': 1, 'title': 1, 'translated_title': 1}
- metadataEntities = {'alternate_title': <class 'soilpulsecore.metadata_scheme.AlternateTitle'>, 'bounding_box': <class 'soilpulsecore.metadata_scheme.GeographicalBoundingBox'>, 'date_accepted': <class 'soilpulsecore.metadata_scheme.DateAccapted'>, 'date_available': <class 'soilpulsecore.metadata_scheme.DateAvailable'>, 'date_collected': <class 'soilpulsecore.metadata_scheme.DateCollected'>, 'date_copyrighted': <class 'soilpulsecore.metadata_scheme.DateCopyrighted'>, 'date_created': <class 'soilpulsecore.metadata_scheme.DateCreated'>, 'date_issued': <class 'soilpulsecore.metadata_scheme.DateIssued'>, 'date_submitted': <class 'soilpulsecore.metadata_scheme.DateSubmitted'>, 'date_updated': <class 'soilpulsecore.metadata_scheme.DateUpdated'>, 'date_valid': <class 'soilpulsecore.metadata_scheme.DateValid'>, 'funding_reference': <class 'soilpulsecore.metadata_scheme.FundingReference'>, 'graphic_overview': <class 'soilpulsecore.metadata_scheme.GraphicOverview'>, 'other_alternate_title': <class 'soilpulsecore.metadata_scheme.OtherAlternateTitle'>, 'responsible_organization': <class 'soilpulsecore.metadata_scheme.ResponsibleOrganization'>, 'responsible_person': <class 'soilpulsecore.metadata_scheme.ResponsiblePerson'>, 'subtitle': <class 'soilpulsecore.metadata_scheme.Subtitle'>, 'summary': <class 'soilpulsecore.metadata_scheme.Summary'>, 'temporal_extent': <class 'soilpulsecore.metadata_scheme.TemporalExtent'>, 'title': <class 'soilpulsecore.metadata_scheme.Title'>, 'translated_title': <class 'soilpulsecore.metadata_scheme.TranslatedTitle'>}
- minCounts = {'alternate_title': 0, 'bounding_box': 1, 'date_accepted': 0, 'date_available': 1, 'date_collected': 0, 'date_copyrighted': 0, 'date_created': 1, 'date_issued': 1, 'date_submitted': 0, 'date_updated': 1, 'date_valid': 0, 'funding_reference': 1, 'graphic_overview': 0, 'other_alternate_title': 0, 'responsible_organization': None, 'responsible_person': 2, 'subtitle': 0, 'summary': 1, 'temporal_extent': 0, 'title': 1, 'translated_title': 0}
- classmethod registerMetadataEntityType(entityClass)
- searchPatterns = {'alternate_title': {}, 'bounding_box': {}, 'date_accepted': {}, 'date_available': {}, 'date_collected': {}, 'date_copyrighted': {}, 'date_created': {}, 'date_issued': {}, 'date_submitted': {}, 'date_updated': {}, 'date_valid': {}, 'funding_reference': {}, 'graphic_overview': {}, 'other_alternate_title': {}, 'responsible_organization': {}, 'responsible_person': {}, 'subtitle': {}, 'summary': {}, 'temporal_extent': {}, 'title': {}, 'translated_title': {}}
- classmethod showEntityCount()
- classmethod showSearchExpressions()
- class soilpulsecore.metadata_scheme.FundingReference(sourceString, value=None)
Bases:
MetadataEntity- ID = '7'
- description = 'Information about financial support (funding) for the dataset being registered.'
- key = 'funding_reference'
- maxMultiplicity = None
- minMultiplicity = 1
- name = 'Funding reference'
- class soilpulsecore.metadata_scheme.GeographicalBoundingBox(northLat, southLat, westLong, eastLong, coordinateSystem, epsg=None)
Bases:
GeographicalMetadataEntity- ID = '9'
- description = 'The spatial limits of a box. A box is defined by two geographic points. Lower left corner and upper right corner. Each point is defined by its longitude and latitude value.'
- key = 'bounding_box'
- maxMultiplicity = 1
- minMultiplicity = 1
- name = 'Geographical bounding box'
- class soilpulsecore.metadata_scheme.GeographicalMetadataEntity(coordinateSystem, epsg=None)
Bases:
MetadataEntityAbstract interface of metadata element with geographical value
- class soilpulsecore.metadata_scheme.GraphicOverview(sourceString, value=None)
Bases:
MetadataEntity- ID = '4'
- description = 'Graphic that provides an illustration of the dataset.'
- key = 'graphic_overview'
- maxMultiplicity = None
- minMultiplicity = 0
- name = 'Graphic overview'
- class soilpulsecore.metadata_scheme.MetadataEntity(sourceString, value=None)
Bases:
objectTop level abstract class of the metadata entity. Defines metadata elements interface
- ID = None
- dataType = None
- description = None
- domain = None
- getMySQLrepresenatation()
Creates the string of element’s MySQL snippet :return: MySQL query string
- getXMLrepresenatation()
Creates the string of element’s XML representation :return: XML string
- key = None
- keywords = {}
- maxMultiplicity = None
- minMultiplicity = 0
- name = None
- searchPatterns = {}
- classmethod showSearchPhrases()
- subtypeOf = None
- class soilpulsecore.metadata_scheme.MetadataStructureMap
Bases:
objectRealisation of a metadata element set and relationships describing particular dataset
- addEntity(entity, pointer)
Adds an entity-pointer pair to elements list
- Parameters:
entity – MetadataEntity instance
pointer – Pointer instance
- checkConsistency()
Checks the number of appearances of entity types
- mergeEntities()
Merges two entities into one.
- removeEntity(index)
Removes an entity-pointer pair from elements list by index
- Parameters:
index – list index of the entity-pointer pair to be removed
- saveToDatabase()
Saves the structure map to a database
- splitEntity()
Splits one entity into two
- class soilpulsecore.metadata_scheme.OtherAlternateTitle(value, language, encoding)
Bases:
TextMetadataEntity- ID = '2.4'
- description = ''
- key = 'other_alternate_title'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Other alternate title'
- class soilpulsecore.metadata_scheme.ResponsibleOrganization(roleType)
Bases:
SubjectMetadataEntity- ID = '6.2'
- description = 'Institution involved in producing (collecting, managing, distributing, or otherwise contributing to the development of the dataset) the data, or having a relation to the authors of the publication, in priority order.'
- key = 'responsible_organization'
- maxMultiplicity = None
- minMultiplicity = None
- name = 'Responsible organization'
- roleTypes = {'Distributor': 0, 'Hosting institution': 0, 'Owner': 0, 'Point of contact': 0, 'Processor': 0, 'Publisher': 0, 'Registration agency': 0, 'Registration authority': 0, 'Research group': 0, 'Resource provider': 0, 'Rights Holder': 0, 'Sponsor': 0}
- setRoleType(roleType)
Setter for private __roleType attribute
- class soilpulsecore.metadata_scheme.ResponsiblePerson(roleType)
Bases:
SubjectMetadataEntity- ID = '6.1'
- description = 'Person involved in producing (collecting, managing, distributing, or otherwise contributing to the development of the dataset) the data, or the authors of the publication, in priority order. Will be cited if Author is used as contact type.'
- key = 'responsible_person'
- maxMultiplicity = None
- minMultiplicity = 2
- name = 'Responsible person'
- roleTypes = {'Author': 1, 'Custodian': 0, 'Data Collector': 0, 'Data Curator': 0, 'Editor': 0, 'Originator': 0, 'Owner': 0, 'Point of contact': 0, 'Principal investigator': 0, 'Processor': 0, 'Producer': 0, 'Project leader': 1, 'Project manager': 0, 'Project member': 0, 'Related person': 0, 'Researcher': 0, 'Rights Holder': 0, 'Sponsor': 0, 'Supervisor': 0, 'User': 0, 'Work package leader': 0}
- setRoleType(roleType)
Setter for private __roleType attribute
- class soilpulsecore.metadata_scheme.SubjectMetadataEntity(value)
Bases:
MetadataEntityAbstract interface class of metadata element that represents a person or an institution that is responsible for producing (collecting, managing, distributing, or otherwise contributing to the development of the dataset) the data, or has relation to authors of the publication
- roleTypes = {}
- class soilpulsecore.metadata_scheme.Subtitle(value, language, encoding)
Bases:
TextMetadataEntity- ID = '2.2'
- description = ''
- key = 'subtitle'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Subtitle'
- class soilpulsecore.metadata_scheme.Summary(value, language, encoding)
Bases:
TextMetadataEntity- ID = '3'
- description = 'Brief narrative summary of the content of the dataset.'
- key = 'summary'
- maxMultiplicity = None
- minMultiplicity = 1
- name = 'Summary'
- class soilpulsecore.metadata_scheme.TemporalExtent
Bases:
DateMetadataEntity- ID = '11'
- description = 'The time period in which the resource content was collected (e.g. From 2008-01-01 to 2008-12-31)'
- key = 'temporal_extent'
- maxMultiplicity = 1
- minMultiplicity = 0
- name = 'Temporal extent'
- class soilpulsecore.metadata_scheme.TextMetadataEntity(value, language, encoding)
Bases:
MetadataEntityAbstract interface class of metadata element with textual value
soilpulsecore.project_management module
- class soilpulsecore.project_management.ContainerHandler(project, parent_container, **kwargs)
Bases:
objectRepresents an enclosed data structure. It can be either a file or string or other data structure that can be manipulated and analyzed
- DBfields = {}
- addStringConcept(string, concept)
Add string to concept translation to container while checking for duplicity with already present concepts
- Parameters:
string – string to be assigned the concept translation
concept – concept to be added
- Returns:
None
- addStringMethod(string, method)
Add string to method translation to container while checking for duplicity with already present methods
- Parameters:
string – string to be assigned the method translation
method – method to be added
- Returns:
None
- addStringUnit(string, unit)
Add string to unit translation to container while checking for duplicity with already present methods
- Parameters:
string – string to be assigned the unit translation
unit – unit to be added
- Returns:
None
- assignCrawler(crawler)
- collectConcepts(collection={}, cascade=True)
Collects assigned string-concepts translations from the container and recursively from all sub-containers if desired
- Parameters:
collection – the collection of string-concepts translations that will be returned
cascade – whether to include translations from sub-containers
- Returns:
collection of string-concepts translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]
- collectContainerIDsToList(output=[])
Collects recursively IDs of all sub-containers (and their sub-containers …)
- Parameters:
output – the output list of all sub-container IDs
- collectMethods(collection={}, cascade=True)
Collects assigned string-methods translations from the container and recursively from all sub-containers if desired
- Parameters:
collection – the collection of string-methods translations that will be returned
cascade – whether to include translations from sub-containers
- Returns:
collection of string-methods translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]
- collectUnits(collection={}, cascade=True)
Collects assigned string-units translations from the container and recursively from all sub-containers if desired
- Parameters:
collection – the collection of string-units translations that will be returned
cascade – whether to include translations from sub-containers
- Returns:
collection of string-units translations {the string: [{‘vocabulary’: vocabulary string, ‘uri’: uri string}, {…}]
- containerFormat = None
- containerType = None
- createTree(*args)
- deleteOwnFiles(failed=[])
Deletes container’s own file (if exists) from locale storage. First induces deleting own files of sub-containers to prevent errors.
- Parameters:
failed – list of unsuccessful attempts and reason for that [undeleted file path, description of error]
- Returns:
the same list of undeleted files
- getAnalyzed(cascade=True, force=False, report=False)
Induces further decomposition of the container into logical sub-elements.
- getCrawled(cascade=True, force=False, report=False)
Induces content search for metadata elements based on appropriate set of search rules and terms.
- getSerializationDictionary(cascade=True)
Creates JSON structured string with serialization of the container and its sub-containers
- Parameters:
cascade – whether to recurse through sub-containers
- classmethod getSpecializedSubclassType(**kwargs)
This method comes handy when one ContainerHandler subclass needs to control some rules for creation of own subclasses Default is ‘no specialization’ e.a. returns the same type as is
- keywordsDBname = None
- listOwnFiles(collection)
- removeAllConcepts(cascade=False)
Removes all string-concept assigned to container and all sub-containers recursively if desired :return: None
- removeAllMethods(cascade=False)
Removes all string-method translation assigned to container and all sub-containers recursively if desired :return: None
- removeAllUnits(cascade=False)
Removes all string-unit translations assigned to container and all sub-containers recursively if desired :return: None
- removeConceptOfString(string, concept_to_remove)
Remove string-concept translation from container. If the translation was last for given string, the string gets removed from the translations as well.
- Parameters:
string – string that has the concept assigned
concept_to_remove – concept to be removed
- Returns:
None if the string was not in the containers string-concept translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the concept from
- removeMethodOfString(string, method_to_remove)
Remove string-method translation from container. If the translation was last for given string, the string gets removed from the translations as well.
- Parameters:
string – string that has the method assigned
method_to_remove – method to be removed
- Returns:
None if the string was not in the containers string-method translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the method from container’s list
- removeUnitOfString(string, unit_to_remove)
Remove string-unit translation from container. If the translation was last for given string, the string gets removed from the translations as well.
- Parameters:
string – string that has the unit assigned
unit_to_remove – unit to be removed
- Returns:
None if the string was not in the containers string-unit translations 0 if the string translations was empty and the string was removed from the translations 1 on successful removal of the unit
- serializationDict = {}
- showContents(depth=0, ind='. ', show_concepts=True, show_methods=True, show_units=True)
Prints structured info about the container and invokes showContents on all of its containers
- Parameters:
depth – current depth of showContent recursion
ind – string of a single level indentation
show_concepts – whether to show also the string-concepts translations
show_methods – whether to show also the string-methods translations
show_units – whether to show also the string-units translations
- updateDBrecord(db_connection, cascade=True)
Invokes updating of containers record in storage
- class soilpulsecore.project_management.ContainerHandlerFactory(project)
Bases:
objectContainerHandler object instances factory, the only way to create container handlers Each Project has one to keep track of all the ContainerHandler class and all subclass’ instances created
- containerTypes = {}
- createHandler(general_type, *args, **kwargs)
Creates and returns instance of ContainerHandler of given type Subclasses can implement further specialization of the type by overriding ContainerHandler.getSpecializedSubclassType()
- classmethod getAllNeededDBfields()
Returns list of needed fields for storing all container types that are registered in factory
- getContainerByID(cid)
Returns container of given ID from inner dictionary
- classmethod registerContainerType(containerTypeClass, key)
Registers ContainerHandler subclasses in the factory
- removeContainerByID(cid)
Removes container with given local ID and all of its sub-containers from projects container tree
- class soilpulsecore.project_management.Crawler(container)
Bases:
objectTop level abstract class of the metadata/data crawler
- analyze(report=False)
Analyzes inner structure of the container :return: list of containers - container tree
- crawl(report=False)
Parses the container content and searches for metadata elements. :return:
- crawlerType = 'zero'
- find_translations_in_dictionary(dictionary, min_match_length=2, full_match_only=True)
Searches for string matches between selected attributes of crawler’s container and strings in dictionary. :param dictionary: the dictionary with string translations :param min_match_length: minimum length of a match to be included in results :param full_match_only: returns only perfect matches if True :return: list of matches
- find_translations_in_vocabulary(vocabulary, min_match_length=3, full_match_only=False)
Searches for string matches between selected attributes of crawler’s container and a term in vocabulary. :param vocabulary: the vocabulary with term meanings :param min_match_length: minimum length of a match to be included in results :param full_match_only: returns only perfect matches if True :return: list of matches
- classmethod getFallbackCrawlerType(container, **kwargs)
- classmethod getSpecializedCrawlerType(container, **kwargs)
- validate()
Validates suitability of particular crawler type for given container
- class soilpulsecore.project_management.CrawlerFactory
Bases:
objectFactory of Crawler class instances
- crawlerExtensions = {}
- crawlerTypes = {'zero': <class 'soilpulsecore.project_management.Crawler'>}
- classmethod createCrawler(general_type, container, *args, **kwargs)
Creates and returns instance of Crawler subclass based on registered types and their specialization procedures
- classmethod registerCrawlerType(crawler_class)
- class soilpulsecore.project_management.Dataset(name, project)
Bases:
objectRepresents a set of data containers that form together a distinct collection of data represented by a MetadataStructureMap. The instance has its own MetadataStructureMap that is being composed during the metadata generation phase
- addContainer(container)
Adds a ContainerHandler instances to Dataset’s containers list
- addContainers(containers)
Wrapper for adding more containers at once in a list to Dataset’s containers list
- checkMetadataStructure()
- createDedicatedDirectory()
Creates directory for a dataset to store its files
- getAllContainerIDsList()
Return list of IDs of all containers that belong to dataset (all containers within containers)
- getAnalyzed(cascade=True, force=False, report=False)
Induces analysis of own containers.
- getContainerIDsList()
Return list of container IDs that are directly in the dataset
- getCrawled(cascade=True, force=False, report=False)
Induces crawling of own containers
- getFrictionlessCompleteTransformation()
- getSerializationDictionary()
- get_frictionless_package(output_path=None)
Composes frictionless package from containers of the dataset. Recursively searches for table containers and then builds valid Package instance :param output_path: file path to save the package descriptor
- load_transformation_steps(path)
Loads a file content and tries to evaluate it as a steps definition for transformation Pipeline
- removeAllConcepts()
Removes all concepts from all containers within dataset
- removeAllMethods()
Removes all methods from all containers within dataset
- removeAllUnits()
Removes all units from all containers within dataset
- removeContainer(containers_to_remove)
Removes one or more ContainerHandler instances from Dataset’s containers list
- showContainerTree(show_concepts=True, show_methods=True, show_units=True)
Induces printing contents of the dataset’s container tree
- showContents(show_containers=True, show_concepts=True, show_methods=True, show_units=True)
- updateDBrecord(db_connection)
Invokes saving/updating of the dataset storage record
- class soilpulsecore.project_management.ProjectManager(db_connection, user_id, **kwargs)
Bases:
objectTop level manager of metadata mining project. Gathers all source files either from remote sources (download from URL) or local sources (upload from local computer).
- collectContainerConcepts()
Collects all concepts from containers of the tree
- collectContainerMethods()
Collects all methods from containers of the tree
- collectContainerUnits()
Collects all units from containers of the tree
- createDataset(name, id=None)
Adds Dataset object instance to dataset list, creates directory in project datasets folder
- deleteAllProjectFiles()
Deletes files of all containers in the tree
- deleteDownloadedFiles()
Deletes all files that are stored in the list of downloaded files
- deleteUploadedFiles()
Deletes all files that are stored in the list of uploaded files
- downloadFilesFromURL(urls)
Handles all needed steps to download file/files from a session (unpack archives if necessary) and create file structure tree :param urls: url string or list of url strings
- downloadPublishedFiles(list=None)
Download files that are stored in self.sourceFiles dictionary
- Parameters:
list – list of SourceFile indexes to be downloaded, or None if files are to be downloaded
unzip – if the downloaded file is a .zip archive it will be extracted if unzip=True
- Returns:
list of local relative paths of all files copied to the local/temporary storage
- exportTranslationsDictionaryToFile(dictionary, filepath)
Saves string translations dictionary to a file, overwrites if exists :param dictionary: dictionary to dump :param filepath: path of a file to save
- getAllFilesList()
Collects file paths from all containers in the tree :return: list of file paths
- getContainerByID(cid)
Returns container instances of given ID/IDs :param cid: single ID or a list of IDs
- getContainersByParentID(pid)
Return container instances that have parent container with given ID
- Parameters:
pid – ID of the parent container
- getContainersSerialization()
Collects serialization JSON structure of all containers in the tree :return: dictionary with all containers attributes
- static getDOImetadata(doi)
Get metadata in JSON from registration agency for provided DOI
- Parameters:
doi – doi string of the resource
- Returns:
json of metadata
- getDatasetsSerialization()
Collects serialization JSON structure of all datasets in the project :return: dictionary with all datasets attributes
- getPublisher(DOI_metadata)
Gets the Publisher class instance from what’s stored in the DOI metadata
- static getRegistrationAgencyOfDOI(doi, meta=False)
Get registration agency from doi.org API.
- Parameters:
doi – the DOI string of a published dataset (10.XXX/XXXX).
meta – true to return whole json, false to return only a string of registration agency
- Returns:
complete registration agency json if meta = True, else registration agency name string
- loadTranslationsFromFile(input_file)
Loads string-* translations JSON file
- Parameters:
input_file – path of vocabulary file to load from
- removeAllDatasets()
Removes all datasets from a project including all their directories
- removeContainer(container)
- removeDataset(dataset)
Removes Dataset object instance from project’s datasets. Deletes dataset’s directory.
- Parameters:
dataset – Dataset handler object instance
- removeDatasetByID(dataset_id)
Removes Dataset object instance from project’s datasets based on its ID. Deletes dataset’s directory.
- Parameters:
dataset_id – Local (project scope) Dataset ID to be removed
- removeDatasetByIndex(index)
Removes Dataset object instance from dataset list by index. Deletes dataset’s directory
- Parameters:
index – index of the dataset in the self.datasets list
- setDOI(doi)
Changes the DOI of a project with all appropriate actions - reads the registration agency and publisher response metadata and assigns them to the ProjectManager - check for files bound to the DOI record - remove old files if there were any :param doi: DOI to apply
- showContainerTree(show_concepts=True, show_methods=True, show_units=True)
Induces printing contents of the whole container tree
- showDatasetsContents(show_containers=True)
Induces printing contents of all dataset in project
- showDictionaries()
Prints structured dictionaries contents to console
- showFilesStructure()
Prints “file path” - “container ID” mapping for all containers in the project
- updateConceptsTranslationsFromContents()
Updates project’s string-concept translations by translations from own containers
- updateConceptsTranslationsFromFile(input_file)
Adds string-concepts translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load
- updateDBrecord(cascade=True)
Saves current state of the project and its contents (if specified) through current DBconnector object
- Parameters:
cascade – whether to save the containers, datasets and other state attributes
- updateMethodsTranslationsFromContents()
Updates project’s string-method translations by translations from own containers
- updateMethodsTranslationsFromFile(input_file)
Adds string-method translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load
- updateUnitsTranslationsFromContents()
Updates project’s string-unit translations by translations from own containers
- updateUnitsTranslationsFromFile(input_file)
Adds string-concepts translations to project’s dictionary (if not already there) from specified file :param input_file: path of a file to load
- uploadFilesFromSession(files)
Handles all needed steps to upload files from a session (unpack archives if necessary) and create file structure tree :param files: path string or list of path strings
- class soilpulsecore.project_management.Publisher
Bases:
object- getFileInfo(*args)
- getMetadata(*args)
- key = None
- name = None
- class soilpulsecore.project_management.PublisherFactory
Bases:
objectPublisher object factory
- classmethod createHandler(publisherKey, *args)
Creates and returns instance of Publisher of given key
- publishers = {'Zenodo': <class 'soilpulsecore.data_publishers.ZenodoPublisher'>}
- classmethod registerPublisher(publisherClass)
- class soilpulsecore.project_management.SourceFile(id, filename, size=None, source_url=None, checksum=None, checksum_type=None)
Bases:
object
- soilpulsecore.project_management.get_directory_size(path)
Calculates total occupied space of a directory and its contents
- soilpulsecore.project_management.get_formated_file_size(path)
Return a string of dynamically formatted file size.
- soilpulsecore.project_management.updateTranslationsDictionary(target_dict, input_dict)
General function to update one translations dictionary (concepts/methods/units) by another. Strings from input dictionary are added to target dictionary if not there already. Translations from input dictionary are added to target vocabulary if not there already