Upload commands
The upload
subcommands allow you to import data to Arkindex.
Images stored on an S3-compatible bucket (MinIO)
The minio
subcommand generates IIIF image URLs for images that are stored on a given S3-compatible (AWS, MinIO, Ceph...) bucket, which can then been uploaded to Arkindex using the IIIF import subcommand.
arkindex upload minio -b $BUCKET_NAME
If there are multiple folders on the target bucket, the subcommand will output one file per folder, containing the URLs for the images in this folder.
Authentication
Before running the minio
subcommand, you need to authenticate yourself with credentials for an account that has access to the bucket you're targeting. This authentication is done through environment variables.
export MINIO_ACCESS_KEY=$YOUR_ACCESS_KEY
export MINIO_SECRET_KEY=$YOUR_SECRET_KEY
Required arguments
The only required argument for the minio
upload subcommand is the name of the targeted bucket, provided using -b
or --bucket-name
.
Optional arguments and default parameters
Other arguments are not required, but have default values that are used by the subcommand.
--iiif-server
: the URL of the IIIF server through which the images on the bucket are exposed. By default, the IIIF server used ishttps://europe-gamma.iiif.teklia.com/iiif/2/
. Example usage:
arkindex upload minio -b $BUCKET_NAME --iiif-server https://some.iiif-server.com/iiif/
--minio-url
: the URL of the server on which the target bucket is located. By default, the server URL isceph.iiif.teklia.com
. Example usage:
arkindex upload minio -b $BUCKET_NAME --minio-url a-storage-server.domain.com
--out-dir
: the path to the output directory where the IIIF URLs lists will be created. By default, aiiif_urls_output
folder is created in the current working directory.
arkindex upload minio -b $BUCKET_NAME --out-dir $PATH/TO/FOLDER/
--prefix
: the path to the location of the files, if you do not wish to list all the files inside a bucket. This path does not include the bucket name, as it is already provided by the--bucket-name
argument. In order to list the files located inBUCKET_NAME/folder/subfolder
, you must use the following command:
arkindex upload minio -b $BUCKET_NAME --prefix $folder/$subfolder
Shortcut command for importing images from multiple files
Before using this command, you might want to read the IIIF import subcommand documentation to be aware of all the available options.
corpus_id=<CORPUS ID>
cd iiif_urls_output/
for file in * ; do
echo $file
folder_name=`basename $file .txt`
arkindex -p demo upload iiif-images $file --corpus-id $corpus_id --import-folder-name $folder_name
done
Page XML documents
The pagexml
subcommand allows you to upload elements and transcriptions from Page XML documents to their corresponding Page
elements on Arkindex, as long as the imageFilename
attribute in the Page XML files matches the name of the Arkindex Page
elements.
arkindex upload pagexml --xml-path $PATH/TO/FOLDER --parent $ARKINDEX_ELEMENT_ID
This command takes two arguments:
--xml-path
: either the path to a folder containing the Page XML files, or the path to a file containing a list of the paths to the XML files, one file per line.
arkindex upload pagexml --xml-path $PATH/TO/paths_file.txt --parent $ARKINDEX_ELEMENT_ID
<paths_file.txt>
/PATH/TO/FOLDER/filename_1.xml
/PATH/TO/ANOTHER_FOLDER/filename_2.xml
/PATH/TO/ANOTHER_FOLDER/filename_3.xml
...
--parent
: the ID of a folder-type Arkindex element, which contains the target ArkindexPage
elements onto which the elements and transcriptions will be uploaded.
arkindex upload pagexml --xml-path $PATH/TO/FOLDER --parent $ARKINDEX_ELEMENT_ID
Alto XML documents
The alto
subcommand allows you to upload images, elements and transcriptions from Alto XML documents to Arkindex, as long as the images are available from a IIIF server and the image filenames on this server match the content of the fileName
node in the corresponding Alto XML files. If the images have already been imported into Arkindex, they will be retrieved and used to create new Page
elements.
arkindex upload alto --iiif-base-url http://some-server.domain.com/iiif/folder-name/ --parent-id $ARKINDEX_ELEMENT_ID --create-types
A limited subset of ALTO 1.4 documents is officially supported:
- The Alto XML documents must have their
MeasurementUnit
set to pixel. - Shapes are not supported; only the
HPOS
,VPOS
,WIDTH
andHEIGHT
attributes are used to build rectangles. String
elements within other nodes are only imported as transcriptions for these nodes, not as elements; theirHPOS
,VPOS
,WIDTH
andHEIGHT
attributes are ignored, and only theCONTENT
attribute is used.
Importing Alto XML files describing multiple pages/images is supported, as long as each Page
node has a PHYSICAL_IMG_NR
attribute that can be used to build a IIIF URL just like the fileName
node for single-page documents.
Path to files
The alto
command takes one optional positional argument: the path to the Alto XML files. If no path is specified, this defaults to the current working directory.
arkindex upload alto $PATH/TO/FOLDER/ --iiif-base-url http://some-server.domain.com/iiif/folder/ --parent-id $ARKINDEX_ELEMENT_ID --create-types
Required arguments
The alto
command takes three required arguments.
--iiif-base-url
: the base URL on a IIIF image server from which the image URLs are built ($IIIF_BASE_URL/{imageFilename}
). It must include both the IIIF server address and the encoded path to the target images (with%2F
as/
).
arkindex upload alto --iiif-base-url https://some-server.domain.com/iiif/public%2Fsomedate%2Ffolder/ --parent-id $ARKINDEX_ELEMENT_ID --create-types
--parent-id
: the ID of an existing folder-type element on Arkindex, into which the pages, elements and transcriptions will be imported.
arkindex upload alto --parent-id $ARKINDEX_ELEMENT_ID --iiif-base-url http://some-server.domain.com/iiif/folder/ --create-types
There are two ways the Alto XML import can deal with the elements found in the Alto XML files. You can choose an option using one of these two (required, mutually exclusive) arguments:
--create-types
: the import will create element types in the target Arkindex corpus for each element type found in the XML files (unless a type with that slug already exists, in which case it will use the existing type). For example, if the Alto XML files contain Page, TextBlock, Paragraph and TextLine nodes, and none of these already exist within the target corpus, then thepage
,textblock
,paragraph
andtextline
element types will be created and used for the import. This is the recommended approach, as it ensures that all the information from the Alto XML files will be imported into Arkindex.
arkindex upload alto $PATH/TO/FOLDER/ --create-types --iiif-base-url http://some-server.domain.com/iiif/folder/ --parent-id $ARKINDEX_ELEMENT_ID
--existing-types
: specify a correspondence between the Alto XML nodes and existing element types in the target Arkindex corpus. Any nodes for which no corresponding element type has been set will be ignored and not imported. The types matching must follow the following format:"alto_type:arkindex_type alto_type_2:arkindex_type_2"
(within double quotation marks, both Alto XML and Arkindex element types in lowercase).
arkindex upload alto --existing-types="alto_type:arkindex_type alto_type_2:arkindex_type_2" --iiif-base-url http://some-server.domain.com/iiif/folder/ --parent-id $ARKINDEX_ELEMENT_ID
IIIF images
The iiif-images
subcommand allows you to create elements on Arkindex from a text file containing a list of IIIF images URIs (such as those generated by the MinIO upload subcommand).
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --corpus-id $ARKINDEX_CORPUS_ID
Required arguments
The minimal required arguments for running the iiif-images
command are:
- the path to the file containing the list of IIIF image URIs, which is a positional argument.
- the ID of either an Arkindex corpus or an Arkindex folder-type element into which the images will be imported, specified using one of two (mutually exclusive) arguments:
--corpus-id
--parent-folder
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --parent-folder $ARKINDEX_ELEMENT_ID
Optional arguments
Whether you import your images into a corpus or into a folder-type element, the IIIF import command will create a folder for your imported images. You can specify its name and type using the following arguments:
--import-folder-name
: defaults to "IIIF import".--import-folder-type
: an existing element type in the target corpus; defaults tofolder
.
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --import-folder-name $FOLDER_NAME --import-folder-type $FOLDER_TYPE --corpus-id $ARKINDEX_CORPUS_ID
--element-type
: you can use this argument to specify the type (an existing element type in the target corpus) of the elements that will be created from your IIIF images; defaults topage
.
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --element-type $ELEMENT_TYPE --corpus-id $ARKINDEX_CORPUS_ID
--image-name-delimiter
: define the delimiter for the last part of the image URI which will be used as the element name on Arkindex; defaults to/
. For example, if your image's URI ishttp://some-server.domain.com/iiif/folder/date%category%filename.jpg
, if you do not specify a delimiter the import will use/
and the element's name will bedate%category%filename.jpg
; if you set the delimiter to%
then the created element's name will befilename.jpg
.
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --image-name-delimiter % --corpus-id $ARKINDEX_CORPUS_ID
Elements hierarchy
If you want to import your images to Arkindex with a given hierarchy, not importing all the images in one folder, you can use the following arguments:
- Either one of the mutually exclusive:
--keep-hierarchy
: recreate on Arkindex the hierarchy contained in the IIIF image URIs. For example, if your URIs look like this:http://some-server.domain.com/iiif/FOLDER1/SUBFOLDER1/SUBFOLDER2/filename.jpg
then the import command will create, inside the import folder, aFOLDER1
element, and inside it aSUBFOLDER1
element, and inside it aSUBFOLDER2
element and inside it your image.--group-prefix-delimiter
: create sub-folders grouping IIIF images by name prefix, splitting file names between group prefix and image names according to the group prefix delimiter. For example, if you have images with URIs that look likehttp://some-server.domain.com/iiif/folder/subfolder/date%location1%filename1.jpg
and others with URIs likehttp://some-server.domain.com/iiif/folder/subfolder/date%location2%filename2.jpg
, you can putfilename1.jpg
into adate%location1
sub-folder, andfilename2.jpg
into adate%location2
sub-folder with the following command:
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --group-prefix-delimiter % --corpus-id $ARKINDEX_CORPUS_ID
--group-folder-type
: define the type of the sub-folders that will be created to contain your grouped images; defaults to the type set byimport-folder-type
.
arkindex upload iiif-images $PATH/TO/uris_list_file.txt --keep-hierarchy --group-folder-type $ELEMENT_TYPE --corpus-id $ARKINDEX_CORPUS_ID
Usage examples
Grouping images by prefix
You want to import the images whose URIs are listed in the following my_iiif_images.txt
file:
http://some-server.domain.com/iiif/folder/subfolder/numgrp10-france%aveyron%page1.jpg
http://some-server.domain.com/iiif/folder/subfolder/numgrp10-france%aveyron%page2.jpg
http://some-server.domain.com/iiif/folder/subfolder/numgrp10-france%aveyron%page3.jpg
http://some-server.domain.com/iiif/folder/subfolder/numgrp10-france%tarn%page1.jpg
http://some-server.domain.com/iiif/folder/subfolder/numgrp10-france%tarn%page2.jpg
You do not care about numgrp10
, which corresponds to a digitization campaign, or about the folder hierarchy before it. You want to import these images inside a folder
element called occitanie
, within a corpus with the ID aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
, and you want the images to be grouped inside subfolders of type departement
(you have created this element type, ticking the "folder" checkbox, in your corpus from the project administration page) based on the france%aveyron
-like prefixes.
You should then run the following command:
arkindex import upload iiif-images ./my_iiif_images.txt --corpus-id aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa --import-folder-name occitanie --image-name-delimiter - --group-prefix-delimiter % --group-folder-type departement
This command results in the creation, in the targeted corpus, of an occitanie
folder
element, within which are two france%aveyron
and france%tarn
departement
elements, containing their respective page
elements.
Recreating a folder hierarchy
You want to import the images whose URIs are listed in the following my_iiif_images.txt
file:
http://some-server.domain.com/iiif/cork/cork/file1.jpg
http://some-server.domain.com/iiif/cork/cork/file2.jpg
http://some-server.domain.com/iiif/cork/cork/file3.jpg
http://some-server.domain.com/iiif/cork/mallow/file1.jpg
http://some-server.domain.com/iiif/cork/mallow/file2.jpg
http://some-server.domain.com/iiif/limerick/kilmallock/file1.jpg
http://some-server.domain.com/iiif/limerick/kilmallock/file2.jpg
You want to import these images inside a folder
element called Ireland
, within a corpus with the ID aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
, and you want to reproduce on Arkindex the hierarchy from the URIs. You do not want to import your images as page
elements, but as double_page
elements (you have created this element type in your corpus from the project administration page).
You should then run the following command:
arkindex import upload iiif-images ./my_iiif_images.txt --corpus-id aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa --import-folder-name Ireland --element-type double_page --keep-hierarchy
This command results in the creation, in the targeted corpus, of an Ireland
folder
element, within which are two cork
and limerick
folder
elements. The cork
folder contains two more sub-folders, called cork
and mallow
; the limerick
folder contains one kilmallock
folder. Inside those sub-folders, your images have been imported as double_page
elements.