Export command

The export subcommands allow you to convert an sqlite export of an Arkindex project to other formats.

There are only two ways to export data from Arkindex:

  • using the sqlite export, which is the most efficient and recommended approach;
  • making API calls to retrieve the data you want.

There are no other formats for Arkindex exports, which is why the export subcommands exist to transform sqlite exports.

PDF export

The pdf subcommand creates PDF files from the sqlite export of an Arkindex project.

Basic usage

arkindex export $PATH/TO/database.sqlite pdf --output $PATH/TO/FOLDER

This will export the entire project into PDF files named after each folder element found in the SQLite database. Each PDF will have one page for each page element, and a transcription from each text_line element found in the page recursively will be added so that text becomes searchable.

The only required argument is the path to the sqlite export, which is a positional argument. ⚠️ This path comes before the pdf subcommand and its options. You can specify the path to the output directory using the --output argument; if unspecified, it defaults to the current working directory.

PDF options

You can change the elements types used to build the PDF export using the following arguments:

  • --folder-type: specify the type of the elements, containing pages, for which PDF files will be created; defaults to folder.
  • --page-type: specify the type of the elements from which PDF pages will be created; defaults to page.
  • --line-type: specify the type of the elements containing transcriptions; defaults to text_line.
arkindex export $PATH/TO/database.sqlite pdf --folder-type volume --page-type folio

You can restrict the PDF creation to only part of your export/project using the --folder-ids argument; the command will only create PDF files from the folder elements whose IDs were given using this argument, ignoring the others.

arkindex export $PATH/TO/database.sqlite pdf --folder-ids $FOLDER_ID_1 $FOLDER_ID_2 $FOLDER_ID_3

The --debug flag makes both the transcription text and boundings boxes visible on the PDF pages, which can be useful for testing the export itself or for transcription troubleshooting.

Example PDF

Using the following command

arkindex export './demo-book-of-hours-20220524-104657.sqlite' pdf --folder-ids 6661cc31-c437-4a35-8fd5-e34a0d3a638e

generated this PDF from this volume in the sqlite export of the the Demo | Book of Hours project on demo.arkindex.org (only the 10 first pages were preserved in the PDF).

ALTO XML export

The alto subcommand creates Alto XML files from the sqlite export of an Arkindex project.

Basic usage

arkindex export $PATH/TO/database.sqlite alto --output $PATH/TO/FOLDER

This command exports the entire project into Alto XML files. One directory in the specified output directory is created for each folder, named after the folder's UUID. One file is created for each page in each folder, and is named after the page's UUID. The files include <TextLine> nodes for each transcription found in a text_line element, and use <Processing> nodes to store the worker versions associated with the elements and transcriptions.

The only required argument is the path to the sqlite export, which is a positional argument. ⚠️ This path comes before the alto subcommand and its options. You can specify the path to the output directory using the --output argument; if unspecified, it defaults to the current working directory.

Optional METS file

You can generate a METS file alongside your Alto XML files, linking the generated files (and their paths) to their corresponding images (with their IIIF URLs) using the --mets flag. One METS file is generated per exported folder element, and it is saved as mets_entrypoint.xml in the corresponding folder, with the Alto XML files.

arkindex export $PATH/TO/database.sqlite alto --output $PATH/TO/FOLDER --mets

Alto XML options

You can change the elements types used to build the Alto XML files using the following arguments:

  • --folder-type: specify the type of the elements, containing pages, for which folders will be created in the output directory; defaults to folder.
  • --page-type: specify the type of the elements from which Alto XML files will be created; defaults to page.
  • --line-type: specify the type of the elements containing transcriptions, from which <TextLine> nodes will be created; defaults to text_line.
  • --layout-tag: an optional argument, which allows you to specify one type of elements (without transcriptions) to export along with the line-type elements. The elements are exported as GraphicalElement nodes in the Alto XML files.
arkindex export $PATH/TO/database.sqlite alto --line-type print_line --layout-tag barcode

You can restrict the Alto XML conversion to only part of your export/project using the --folder-ids argument; the command will only create Alto XML files from the page elements contained in the folder elements whose IDs were given using this argument, ignoring the others.

arkindex export $PATH/TO/database.sqlite alto --folder-ids $FOLDER_ID_1 $FOLDER_ID_2 $FOLDER_ID_3

CSV export

The csv subcommand creates a CSV file from the sqlite export of an Arkindex project.

Basic usage

arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv

This command creates a CSV file with one line for each element contained in the exported project.

The only required argument is the path to the sqlite export, which is a positional argument. ⚠️ This path comes before the csv subcommand and its options. You can specify the path to the output file using the --output argument; if unspecified, it defaults to an elements.csv file created in the current working directory.

CSV options

The csv export subcommand can take the following arguments:

  • --parent: only export to CSV the elements which are the children of the specified element. If you want to get these elements recursively, and not only the direct children of that parent, you need to use it in conjunction with the --recursive flag.
arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv --parent $ELEMENT_ID
  • --recursive: this flag can only be used in in conjunction with the --parent argument. If no parent element ID is specified, the export is recursive by default.
arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv --parent $ELEMENT_ID --recursive
  • --type: restrict the export to elements of a given type.
arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv --type $ELEMENT_TYPE
  • --with_classes: add the exported elements' classifications to the output CSV. In the CSV file, there will be one column per class, filled with the classification confidence on the corresponding line if the class is set on the element, and left empty if it is not.
arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv --with_classes
  • --with_metadata: add the exported elements' metadata to the output CSV. As the same metadata can be set multiple times with different values on one element, there can be more than one column for each metadata in the CSV file; in that case, the metadata columns are named like this: {metadata_name}_1, {metadata_name}_2… On each line the metadata column is filled with the corresponding metadata value if that metadata is set on the element, or left blank.
arkindex export $PATH/TO/database.sqlite csv --output $PATH/TO/FOLDER/elements.csv --with_metadata

CSV output

The output CSV contains the following columns/information:

  • the id, name and type of the element (columns id, name and type);
  • the id and url of the element's image (columns image_id and image_url);
  • the element's polygon (polygon);
  • the id of the worker version that created this element, if it was created by a worker; if the element was not created by a worker this column is left blank (worker_version_id);
  • the element's creation date (created).

If using the --with_classes flag, one column per classification that is present on at least one of the exported elements is added. These columns contain the classification confidence for these classes on each element (left blank if the class is not present on the element).

If using the --with_metadata flag, at least one columns per metadata that is present on at least one of the exported elements is added. If some metadata are set multiple times with different values on some elements, then there are multiple columns for these metadata, named like this: {metadata_name}_1, {metadata_name}_2… These columns contain the metadata value(s) for these metadata on each element (left blank if the metadata is not present on the element).