Get started with SoilPulse

Ingest data to soilpulse-core - step 1

Here we show how you can provide your data (and existing metadata) to create a soilpulsecore project.

[1]:
# first we import all relevant soilpulsecore functionalities

from soilpulsecore.project_management import *
from soilpulsecore.resource_managers.filesystem import *
from soilpulsecore.resource_managers.mysql import *
from soilpulsecore.resource_managers.xml import *
from soilpulsecore.resource_managers.data_structures import *
from soilpulsecore.resource_managers.json import *
from soilpulsecore.data_publishers import *
from soilpulsecore.metadata_scheme import *
from soilpulsecore.db_access import EntityKeywordsDB, NullConnector
> Crawler type 'zero' registered.
Container type 'filesystem' registered
* Keywords database soilpulse\databases\keywords_filesystem registered as 'filesystem'
Container type 'file' registered
Container type 'directory' registered
Container type 'archive' registered
> Crawler type 'filesystem' registered.
> Crawler type 'csv' registered.
> Crawler type 'txt' registered.
Container type 'mysql' registered
* Keywords database soilpulse\databases\keywords_mysql registered as 'mysql'
Container type 'xml' registered
* Keywords database soilpulse\databases\keywords_xml registered as 'xml'
Container type 'table' registered
> Crawler type 'table' registered.
Container type 'column' registered
> Crawler type 'column' registered.
Container type 'json' registered
* Keywords database soilpulse\databases\keywords_json registered as 'json'
> Crawler type 'json' registered.
Publisher 'Zenodo' registered
[2]:
# then we define some example DOI records that can be used
example_doi = {"name": "Soil erosion data of TUBAF rainsimlators in Lenz, 2022",
               "doi": "10.5281/zenodo.6654150"}
example_doi_url = {"name": "Rainfall simulation data Ries et al. 2019",
                   "doi": "10.6094/unifr/151460",
                   "url": "https://freidok.uni-freiburg.de/files/151460/twflMtwtvn01bDCC/Extreme_rainfall_experiment_data_06122019.zip"}
example_url = {"name": "Soil erosion data in Punjab, India, Lenz et. al",
               "url": "https://www.mdpi.com/2076-3263/8/11/396/s1"}
example_file_upload = {"name": "CTU soil erosion data example"}
example_reload_soilpulse_project = {"": ""}

by DOI

[3]:

# then we establish a new soilpulse core project from the given information dbcon = NullConnector() user_id = 1 project_doi = ProjectManager(dbcon, user_id, **example_doi) project_doi.downloadPublishedFiles()
failed to load concept vocabulary 'AGROVOC' from 'vocabularies\agrovoc.json'
failed to load concept vocabulary 'TestConceptVocabulary' from 'vocabularies\_concepts_vocabulary_1.json'
failed to load method vocabulary 'TestMethodsVocabulary' from 'vocabularies\_methods_vocabulary_1.json'
loaded methods vocabularies:
failed to load units vocabulary 'TestUnitsVocabulary' from 'vocabularies\_units_vocabulary_1.json'
loaded units vocabularies:
doi: '10.5281/zenodo.6654150'

Obtaining metadata from DOI registration agency ...
 ... successful

File 'DOI_metadata.json' successfuly saved.
File 'Publisher_metadata.json' successfuly saved.
downloading remote files to SoilPulse storage ...
        10-toolboxvignette.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        06-lookout.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        index.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
Extracting 'lenz2022.zip' to 'C:\Users\JL\SoilPulse\project_files\temp_1\lenz2022_zip'
        01-intro.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        02-state_know.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        03-database.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        03a-code.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        04-results.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        04a-results_PO.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        04b-results-statis.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        05-discussion.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        05a-reallookout.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        06-lookout.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        07-references.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        08-E3DIssues.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        09-database.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        10-toolboxvignette.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        11-varrain.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        comp_E3D_landlab.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        comp_infil.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        implicite_GA.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        GA_comparison.ipynb - unsupported Crawler subclass special type 'ipynb' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
Searching for delimiters in container log.txt failed.
        functions_for_DC.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        functions_for_visualization.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        hydraulic_func.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
Content of container 'Diss_michael_anlage.csv' couldn't be analyzed due to encoding issues.
Searching for delimiters in container remarks_on_AnneRuns.txt failed.
        single_file.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        Datensatz2.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        prep.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        index.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        aMC_in_R.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        aMC_in_R_ewid.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        avisualization.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        avisualization2.R - unsupported Crawler subclass special type 'R' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        preamble.tex - unsupported Crawler subclass special type 'tex' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        _output.yml - unsupported Crawler subclass special type 'yml' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        09-database.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        11-varrain.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        08-E3DIssues.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        _output.yml - unsupported Crawler subclass special type 'yml' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        07-references.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        preamble.tex - unsupported Crawler subclass special type 'tex' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
        03a-code.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
 ... successful

[3]:
['C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\10-toolboxvignette.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\06-lookout.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\index.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\lenz2022.zip',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\09-database.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\11-varrain.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\08-E3DIssues.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\_output.yml',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\07-references.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\preamble.tex',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\03a-code.Rmd']

by url

[4]:
dbcon = NullConnector()
user_id = 1
project_url = ProjectManager(dbcon, user_id, **example_url)

project_url.downloadPublishedFiles()

project_url.showContainerTree()
project_url.updateDBrecord()

failed to load concept vocabulary 'AGROVOC' from 'vocabularies\agrovoc.json'
failed to load concept vocabulary 'TestConceptVocabulary' from 'vocabularies\_concepts_vocabulary_1.json'
failed to load method vocabulary 'TestMethodsVocabulary' from 'vocabularies\_methods_vocabulary_1.json'
loaded methods vocabularies:
failed to load units vocabulary 'TestUnitsVocabulary' from 'vocabularies\_units_vocabulary_1.json'
loaded units vocabularies:
doi: 'None'
Empty DOI provided. DOI metadata were not retrieved.
The list of published files is empty.


================================================================================
Soil erosion data in Punjab, India, Lenz et. al
container tree:
--------------------------------------------------------------------------------
================================================================================



Saving project "Soil erosion data in Punjab, India, Lenz et. al" with ID 2 ...
        (no containers to save)
        (no datasets to save)
        concepts vocabulary saved
        methods vocabulary saved
        units vocabulary saved
 ... successful.

step 2 - analyze file system structure

[ ]: