Elements commands¶
The elements
subcommands allow you to perform operations on existing Arkindex elements.
Linking elements¶
The link
subcommand can be used to create relations between parent and children elements, in order to organize your data on Arkindex.
arkindex elements link --parent $PARENT_ELEMENT_ID --child $CHILD_ELEMENT_ID
You can only link
elements that belong to the same corpus.
The link
subcommand only creates new relations between elements, it does not destroy/replace existing ones.
Parent element¶
You can use two (required, mutually exclusive) arguments when specifying the parent element.
--parent
: the ID of an existing element on Arkindex.arkindex elements link --parent $PARENT_ELEMENT_ID --child $CHILD_ELEMENT_ID
--create
: instead of providing the ID of an Arkindex element, create that element and link the child element(s) to the newly created element. You will be prompted to enter the ID of the corpus in which to create the parent element (the same corpus the child element(s) belong to, for thelink
to work), an element type (one that exists in the target corpus) and a name for that new element.arkindex elements link --create --child $CHILD_ELEMENT_ID
Child element(s)¶
You can use four (required, mutually exclusive) arguments as child inputs.
--child
: one or more Arkindex element ID(s).arkindex elements link --parent $PARENT_ELEMENT_ID --child $ID_1 $ID_2 $ID_3
--uuid-list
: a path to a text file containing a list of element IDs, with one ID per line.arkindex elements link --parent $PARENT_ELEMENT_ID --uuid-list $PATH/TO/file.txt
--selection
: link the elements in your selection on Arkindex to the parent element.arkindex elements link --parent $PARENT_ELEMENT_ID --selection
--stray-pages
is specific to one operation: the linking of all thepage
elements that do not have a parent element, and are situated directly at the root of a corpus, to a parent element.arkindex elements link --parent $PARENT_ELEMENT_ID --stray-pages
Unlinking elements¶
The unlink
subcommand can be used to destroy relationships between parent and children elements, in order to organize your data on Arkindex.
arkindex elements unlink --parent $PARENT_ELEMENT_ID --child $CHILD_ELEMENT_ID
You can only unlink
elements that belong to the same corpus. (Only elements belonging to the same corpus can be linked in the first place.)
Parent element¶
You have to specify the ID of the element from which you want to unlink the child element(s), using the --parent
argument.
arkindex elements unlink --parent $PARENT_ELEMENT_ID --child $CHILD_ELEMENT_ID
Child element(s)¶
You can use three (required, mutually exclusive) arguments as child inputs.
--child
: one or more Arkindex element ID(s).arkindex elements unlink --parent $PARENT_ELEMENT_ID --child $ID_1 $ID_2 $ID_3
--uuid-list
: a path to a text file containing a list of element IDs, with one ID per line.arkindex elements unlink --parent $PARENT_ELEMENT_ID --uuid-list $PATH/TO/file.txt
--selection
: unlink the elements in your selection on Arkindex from the parent element.arkindex elements unlink --parent $PARENT_ELEMENT_ID --selection
Optional arguments¶
The --orphan
flag allows you to unlink an element from the parent even if it does not have any other parent elements, which results in the element ending up directly at the root of the corpus.
arkindex elements unlink --parent $PARENT_ELEMENT_ID --child $CHILD_ELEMENT_ID --orphan
Without the --orphan
argument, you cannot unlink an element and its parent if it has no other parent element.
Copying page elements¶
The page-copy
subcommand can be used to copy Page
elements to a folder, within or outside of their corpus of origin. It only copies Page
elements, without any of their children. Technically, it creates a new Page
element from the same Image
, with the same name and type, inside another folder.
arkindex elements page-copy --folder $PARENT_ELEMENT_ID --pages $PAGE_ELEMENT_1 $PAGE_ELEMENT_2 $PAGE_ELEMENT_3
You can only copy pages to a folder-type element, using the --folder
argument. It is not possible to create page elements at the root of a corpus using this subcommand.
The copied pages can be specified with three (required, mutually exclusive) arguments:
--pages
: one or more page IDs as input.arkindex elements page-copy --folder $PARENT_ELEMENT_ID --pages $PAGE_ELEMENT_ID
--selection
: the pages to be copied are retrieved from your current selection on Arkindex.arkindex elements page-copy --folder $PARENT_ELEMENT_ID --selection
--uuid-list
: path to a text file containing a list of page IDs, one ID per line.arkindex elements page-copy --folder $PARENT_ELEMENT_ID --uuid-list $PATH/TO/file.txt
Rejecting classifications¶
The reject-classifications
subcommand can be used to reject (if the classification was created by a worker) or delete (if the classification was created manually) one or more classification(s) from one or more element(s).
arkindex elements reject-classifications --element $ELEMENT_ID_1 $ELEMENT_ID_2 --classes $ML_CLASS_NAME
Target elements¶
The elements to reject/remove classifications from can be retrieved using three (required, mutually exclusive) arguments:
--element
: one or more element IDs.arkindex elements reject-classifications --element $ELEMENT_ID --classes $ML_CLASS_NAME
--selection
: the target elements are retrieved from your selection on Arkindex.arkindex elements reject-classifications --selection --classes $ML_CLASS_NAME
--uuid-list
: the target elements are retrieved from a text file containing a list of element IDs, with one ID per line.arkindex elements reject-classifications --uuid-list $PATH/TO/file.txt --classes $ML_CLASS_NAME
Target classifications¶
The classes to reject/remove can be specified using two (required, mutually exclusive) arguments:
--all
: all the classifications on the target elements will be rejected/deleted.arkindex elements reject-classifications --element $ELEMENT_ID --all
--classes
: specify which classes will be rejected/removed, using their names.arkindex elements reject-classifications --element $ELEMENT_ID --classes $ML_CLASS_1 $ML_CLASS_2
Creating data splits for machine learning¶
The ml-splits
subcommand can be used to organise data in an Arkindex project into splits to train Machine Learning models.
Basic usage¶
arkindex elements ml-splits --project $ARKINDEX_PROJECT_ID
This command will create, in the target project, a “Training dataset” dataset containing three sets: “train”, “dev” and “test”. These sets will contain respectively 80%, 10% and 10% of all the page
elements in the target project.
Required arguments¶
For the command to run, either one of the --project
or --folder
arguments has to be set. If using a Project UUID, then the “Training dataset” dataset will contain elements from this project regardless of eventual parent folders. If using a parent element / Folder UUID then the “Training dataset” will only contain elements that are children of this parent.
Optional arguments¶
The ml-splits
command can take a number of optional arguments.
--dataset-name
: set the name of the training dataset folder which will be created at the root of the target project and contain the Train, Validation and Test folders. Defaults toTraining dataset
.arkindex elements ml-splits --folder $ARKINDEX_ELEMENT_ID --dataset-name $DATASET_NAME
Info
As dataset names must be unique within a project, setting the --dataset-name
parameter is recommended if you want to avoid the command failing because of a pre-existing dataset named “Training dataset”.
--element-type
: only use elements of one or multiple given type(s) to created your training dataset. Defaults topage
.arkindex elements ml-splits --project $ARKINDEX_PROJECT_ID --element-type $ELEMENT_TYPE_1 $ELEMENT_TYPE_2
--recursive
: when using elements from a parent element (with the--folder
argument), adding the--recursive
option lists those elements recursively, instead of only retrieving those that are direct children of the parent element.arkindex elements ml-splits --folder $ARKINDEX_ELEMENT_ID --recursive --element-type $ELEMENT_TYPE
--set
: this argument defines different sets with their ratio to split your data between them. The ratios have to be greater than 0 and inferior to 1. The sum of the ratios must be equal to 1. Defaults totrain:0.8 dev:0.1 test:0.1
.arkindex elements ml-splits --folder $ARKINDEX_ELEMENT_ID --set train:0.7 dev:0.1 test:0.2
--nb-elements
: you can use this argument to limit the number of elements used to create the training dataset. If not set, the training dataset will contain all the retrieved elements of the given type and from the given parent/project.arkindex elements ml-splits --folder $ARKINDEX_ELEMENT_ID --recursive --nb-elements 200