Multimodal Information Extraction

Time:

Summer 2022

Project: Efficient Few-shot Multimodal Information Extraction in Visually-rich Documents

Visually-rich documents consist of three modalities (language, image, and layout structure of contents). The task is to harness meta-knowledge to accelerate the learning process of

understanding new document types given a pre-trained text-image Large Language Model;
localizing rarely-occurred key information types from out-of-distribution information.

Accomplishment:

Successfully published two research papers.

NLP: Entity retrieval, Multimodal Transformer-based Large Language Models
Tools: Tensorflow, Jax, seqeval
Languages: Python

Collaborators:

Mentors: Hanjun Dai (Google DeepMind), Bo Dai (Google DeepMind), Wei Wei (Cloud DocAI & Core ML App)

Time:

Project: Efficient Few-shot Multimodal Information Extraction in Visually-rich Documents

Accomplishment:

Related Skills:

Collaborators: