Skip to content

Resources for my research on Multimodal Machine Translation within the framework of Frame Semantics (this is a WIP 🚧)

Notifications You must be signed in to change notification settings

viridiano/fnbr.vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fnbr.vision

1. papers 📄

BELCAVELLO, F.; VIRIDIANO, M.; DINIZ DA COSTA, A.; MATOS, E. E.; TORRENT, T. T. (2020). Frame-Based Annotation of Multimodal Corpora: Tracking (A)Synchronies in Meaning Construction. In: Proceedings of the LREC International FrameNet Workshop 2020. Marseille, France: ELRA, p. 23-30. pdf bibtex

BELCAVELLO, F.; DINIZ DA COSTA, A.; ALMEIDA, V. ; VIRIDIANO, M.; TORRENT, T. T. (2019). Multimodal Analysis for Building Semantic Representations in The Tourism Domain Using Frames and Qualia. In: 4th Bremen Conference on Multimodality (BreMM19) 2019 Conference Procedings. Bremem, Germany. pdf

2. books 📚

Study the fundamentals first by reading Speech and Language Processing, 2nd Edition, by Jurafsky and Martin. The 3rd edition is in progress and some chapters are available as pdf.

Also...

  • BENDER, Emily M. Linguistic fundamentals for natural language processing: 100 essentials from morphology and syntax. Synthesis lectures on human language technologies, v. 6, n. 3, p. 1-184, 2013. doi10.2200/S00493ED1V01Y201303HLT020
  • BENDER, Emily M.; LASCARIDES, Alex. Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Synthesis Lectures on Human Language Technologies, v. 12, n. 3, p. 1-268, 2019. doi10.2200/S00935ED1V02Y201907HLT043
  • GOLDBERG, Yoav. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, v. 10, n. 1, p. 1-309, 2017. doi 10.2200/S00762ED1V01Y201703HLT037
  • HUTCHINS, William John; SOMERS, Harold L. An introduction to machine translation. London: Academic Press, 1992. [download pdf]
  • MANNING, Christopher D.; MANNING, Christopher D.; SCHÜTZE, Hinrich. Foundations of statistical natural language processing. MIT press, 1999. [download pdf]
  • KOEHN, Philipp. Neural machine translation. arXiv preprint arXiv:1709.07809, 2017. [download pdf]
  • KOEHN, Philipp. Statistical machine translation. Cambridge University Press, 2009. doi 10.1017/CBO9780511815829

The following texts are useful, but not required. All of them can be read free online.

If you have no background in Neural Networks, you might well find one of these books helpful to give you more background:

For learning about Deep Learning for NLP, take the Stanford cs224n online course or watch the Stanford cs224n Lecture collection on NLP with Deep Learning.

3. lectures 💬

Also...

4. repos :octocat:

  • NLP Pandect – a fantastically detailed, curated collection of NLP resources on everything NLP — from general information resources, to frameworks, to podcsats and Youtube channels
  • NLP Tutorial – includes lots of minimal walk-throughs of NLP models implemented with less than 100 lines of code
  • NLP Roadmap 2019 – roadmap and keyword for students those who have interest in learning Natural Language Processing
  • NLP Progress – Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks, by @sebastianruder

5. blog posts 📌

6. tools 🔨

  • SRITagging
  • ImageGraph – Visual Computing made easy. Computer Vision. Image Processing. Data Visualization. All drag-and-drop in the browser.
  • YOLOv3 – Real-Time Object Detection
  • MakeSense.AI – An open-source and free to use annotation tool under GPLv3
  • ScaLabel – A scalable open-source web annotation tool
  • RectLabel – An image annotation tool to label images for bounding box object detection and segmentation
  • labelme – Image Polygonal Annotation with Python
  • LabelImg – A graphical image annotation tool written in Python.
  • VGG Image Annotator – A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers
  • Figure Eight – If you need labels and annotations for your machine learning project, we can help. You upload your unlabeled data, with the rules you need for your machine learning project, and launch. We use a distributed network of human annotators and cutting edge machine learning models to annotate that data at enterprise scale

7. datasets ☁️

Dataset Download Paper Description
Multi 30K [Elliott et al. 2016] arXiv:1605.00459 Extends the Flickr30K dataset with German translations created by professional translators over a subset of the English descriptions
Flickr 30K Entities [Plummer et al. 2015] arXiv:1505.04870 244k coreference chains and 276k manually annotated bounding boxes for each of the 31,783 images and 158,915 English captions (five per image) in the original dataset
Flickr 30K [Young et al. 2014] doi 10.1162/tacl_a_00166 Standard benchmark for sentence-based image description
MS COCO [Lin et al. 2014] arXiv:1405.0312 Large-scale object detection, segmentation, and captioning dataset
AVA [Roth et al. 2019] arXiv:1901.01342 Spatio-temporal audiovisual annotations of human actions in movies, suitable for training localized action recognition systems
Open Images [Kuznetsova et al. 2018] arXiv:1811.00982 ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narrative
Google's Conceptual Captions [SHARMA, Piyush et al. 2018] doi 10.18653/v1/P18-1238 ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at the current version of the captions, we have developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.
VCR [ZELLERS, Rowan et al. 2019] arXiv:1811.10830 A dataset consisting of 290k multiple choice QA problems derived from 110k movie scenes.
VisualCOMET [PARK, Jae Sung et al. 2020] arXiv:2004.10796 A large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after + person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Also...

  • The Big Bad NLP Database
  • YouTube BoundingBoxes – Large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations. All the video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
  • What's Cookin' – A list of cooking-related Youtube video ids, along with time stamps marking the (estimated) start and end of various events.
  • PASCAL VOC – A standardised image data sets for object class recognition and a common set of tools for accessing the data sets and annotations
  • PASCAL Context – Indoor and outdoor scenes with 400+ classes
  • MPII Human Pose Dataset – State of the art benchmark for evaluation of articulated human pose estimation
  • Cityscapes Dataset – benchmark suite and evaluation server for pixel-level and instance-level semantic labeling
  • Mapillary Vistas Dataset – a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world
  • ApolloScape Scene Parsing – RGB videos with high resolution image sequences and per pixel annotation, survey-grade dense 3D points with semantic segmentation
  • Stanford Background Dataset – A set of outdoor scenes with at least one foreground object

8. semantic parsers ☑️

  • SEMAFOR – automatically processes English sentences according to the form of semantic analysis in Berkeley FrameNet.
  • Google Sling – Natural language frame semantics parser
  • Open Sesame – Frame-semantic parsing system based on a softmax-margin SegRNN
  • PathLSMT – Neural SRL model

About

Resources for my research on Multimodal Machine Translation within the framework of Frame Semantics (this is a WIP 🚧)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages