Bootstraping an Impira collection

The Impira CLI allows you to setup an Impira collection based on results from another service (using the infer-fields) or Impira (using the snapshot command). Each of these commands outputs a directory containing a file named manifest.json which contains information about files, fields, and labels to reconstruct the collection.

To use the bootstrap command, you must know the credentials for your Impira account. See Setting up the Impira API for instructions on how to obtain them.

Running the bootstrap command

To bootstrap a collection, run a command like

$ impira bootstrap -d /path/to/snapshotted/data/ \
    --org-name  YOUR_ORG_NAME \
    --api-token YOUR_API_TOKEN \

The command will log its progress as it runs and print a link to the new collection with an automatically generated name. You can rename the collection whatever you’d like in the Impira app.

Reusing existing state

By default, the bootstrap command will create a new collection and then upload the files, setup the fields, and label the files defined in the snapshotted data directory automatically. You can optionally specify a --collection flag to work inside of an existing collection and a --skip-upload flag to reuse the files in the collection instead of uploading them from scratch.

bootstrap will automatically reuse existing fields if they exist and create new ones that do not. If there are type conflicts, it will print a warning message and may fail to provide labels.

Common Patterns

From AWS Textract

Impira requires you to setup the fields you wish to extract and provide at least one label. Although this method allows Impira to provide a very high level of accuracy, it can be cumbersome to setup on files with lots of fields, like ACORD forms.

The infer-fields command allows you to automatically guess the fields and labels in a file using AWS Textract. You can then use this to bootstrap a new collection in Impira, and then simply correct any inaccuracies within the Impira UI.

# This will output a directory like data/capture/document.ext-98bf
$ impira infer-fields -d data \
    /path/to/document.ext 

# This will output a collection and its id
$ impira bootstrap -d data/capture/document.ext-98bf \
    --org-name YOUR_ORG_NAME \
    --api-token YOUR_API_TOKEN \

If you run infer-fields on multiple files of the same document type, you can add more files to the collection too:

# Upload an additional document
$ impira bootstrap -d data/capture/document-2.ext-a81e \
    --org-name YOUR_ORG_NAME \
    --api-token YOUR_API_TOKEN \
    --collection COLLECTION_ID

Warning

Textract does not necessarily produce the same field names between documents of the same type. For example, two ACORD 25 forms might produce fields with slightly different names like INSURER A and INSURER A:. The Impira CLI will do its best to reconcile fields with different names, but you may end up with duplicate fields in Impira if you bootstrap from multiple files.

From another Impira collection

The snapshot command can be used together with the bootstrap command to save and restore Impira collections. This is useful for quality assurance and to experiment with new labeling approaches without disrupting an existing collection.

# This will output a directory like data/snapshot/07b71143a26b7163-98bf
$ impira snapshot -d data \
    --org-name YOUR_ORG_NAME \
    --api-token YOUR_API_TOKEN \
    --collection YOUR_COLLECTION_ID

# This will output a collection and its id
$ impira bootstrap -d data/snapshot/07b71143a26b7163-98bf \
    --org-name YOUR_ORG_NAME \
    --api-token YOUR_API_TOKEN \

Advanced options

  • To use an existing collection rather than creating a new one, pass in the collection’s id using the --collection flag.

  • By default, bootstrap will upload files into the collection. However, if you pre-populate the collection with each of the relevant files (by name), then you can use --skip-upload to use the existing files.

  • The bootstrap command attempts to pick specific types (like number and timestamp) using Impira’s named entity recognition. You can disable this behavior with --skip-type-inference.

For a full list of options, run impira bootstrap --help.