GitHub - viridiano/fnbr.vision: Resources for my research on Multimodal Machine Translation within the framework of Frame Semantics (this is a WIP :construction:)

fnbr.vision

1. papers 📄

BELCAVELLO, F.; VIRIDIANO, M.; DINIZ DA COSTA, A.; MATOS, E. E.; TORRENT, T. T. (2020). Frame-Based Annotation of Multimodal Corpora: Tracking (A)Synchronies in Meaning Construction. In: Proceedings of the LREC International FrameNet Workshop 2020. Marseille, France: ELRA, p. 23-30.

BELCAVELLO, F.; DINIZ DA COSTA, A.; ALMEIDA, V. ; VIRIDIANO, M.; TORRENT, T. T. (2019). Multimodal Analysis for Building Semantic Representations in The Tourism Domain Using Frames and Qualia. In: 4th Bremen Conference on Multimodality (BreMM19) 2019 Conference Procedings. Bremem, Germany.

My Zotero library.

2. books 📚

Study the fundamentals first by reading Speech and Language Processing, 2nd Edition, by Jurafsky and Martin. The 3rd edition is in progress and some chapters are available as pdf.

Also...

BENDER, Emily M. Linguistic fundamentals for natural language processing: 100 essentials from morphology and syntax. Synthesis lectures on human language technologies, v. 6, n. 3, p. 1-184, 2013. 10.2200/S00493ED1V01Y201303HLT020
BENDER, Emily M.; LASCARIDES, Alex. Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Synthesis Lectures on Human Language Technologies, v. 12, n. 3, p. 1-268, 2019. 10.2200/S00935ED1V02Y201907HLT043
GOLDBERG, Yoav. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, v. 10, n. 1, p. 1-309, 2017. 10.2200/S00762ED1V01Y201703HLT037
HUTCHINS, William John; SOMERS, Harold L. An introduction to machine translation. London: Academic Press, 1992. [download pdf]
MANNING, Christopher D.; MANNING, Christopher D.; SCHÜTZE, Hinrich. Foundations of statistical natural language processing. MIT press, 1999. [download pdf]
KOEHN, Philipp. Neural machine translation. arXiv preprint arXiv:1709.07809, 2017. [download pdf]
KOEHN, Philipp. Statistical machine translation. Cambridge University Press, 2009. 10.1017/CBO9780511815829

The following texts are useful, but not required. All of them can be read free online.

Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning

If you have no background in Neural Networks, you might well find one of these books helpful to give you more background:

Michael A. Nielsen. Neural Networks and Deep Learning
Eugene Charniak. Introduction to Deep Learning

For learning about Deep Learning for NLP, take the Stanford cs224n online course or watch the Stanford cs224n Lecture collection on NLP with Deep Learning.

3. lectures 💬

Deep Learning Drizzle has the most comprehensive database of online courses on NLP and ML.

Also...

Dan Jurafsky and Christopher Manning: Introduction to NLP

4. repos

NLP Pandect – a fantastically detailed, curated collection of NLP resources on everything NLP — from general information resources, to frameworks, to podcsats and Youtube channels
NLP Tutorial – includes lots of minimal walk-throughs of NLP models implemented with less than 100 lines of code
NLP Roadmap 2019 – roadmap and keyword for students those who have interest in learning Natural Language Processing
NLP Progress – Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks, by @sebastianruder

5. blog posts 📌

Image Data Labelling and Annotation – Image annotation types, formats and tools
Object Detection and Tracking in 2020
A complete overview of ML online courses – Every single Machine Learning course on the internet, ranked by your reviews
An overview of semantic image segmentation – how to use convolutional neural networks for the task of semantic image segmentation
Going beyond the bounding box with semantic segmentation
Semantic Image Segmentation with DeepLab in TensorFlow

6. tools 🔨

SRITagging
ImageGraph – Visual Computing made easy. Computer Vision. Image Processing. Data Visualization. All drag-and-drop in the browser.
YOLOv3 – Real-Time Object Detection
MakeSense.AI – An open-source and free to use annotation tool under GPLv3
ScaLabel – A scalable open-source web annotation tool
RectLabel – An image annotation tool to label images for bounding box object detection and segmentation
labelme – Image Polygonal Annotation with Python
LabelImg – A graphical image annotation tool written in Python.
VGG Image Annotator – A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers
Figure Eight – If you need labels and annotations for your machine learning project, we can help. You upload your unlabeled data, with the rules you need for your machine learning project, and launch. We use a distributed network of human annotators and cutting edge machine learning models to annotate that data at enterprise scale

7. datasets ☁️

Dataset Download	Paper	Description
Multi 30K	[Elliott et al. 2016] arXiv:1605.00459	Extends the Flickr30K dataset with German translations created by professional translators over a subset of the English descriptions
Flickr 30K Entities	[Plummer et al. 2015] arXiv:1505.04870	244k coreference chains and 276k manually annotated bounding boxes for each of the 31,783 images and 158,915 English captions (five per image) in the original dataset
Flickr 30K	[Young et al. 2014] 10.1162/tacl_a_00166	Standard benchmark for sentence-based image description
MS COCO	[Lin et al. 2014] arXiv:1405.0312	Large-scale object detection, segmentation, and captioning dataset
AVA	[Roth et al. 2019] arXiv:1901.01342	Spatio-temporal audiovisual annotations of human actions in movies, suitable for training localized action recognition systems
Open Images	[Kuznetsova et al. 2018] arXiv:1811.00982	~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narrative
Google's Conceptual Captions	[SHARMA, Piyush et al. 2018] 10.18653/v1/P18-1238	~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at the current version of the captions, we have developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.
VCR	[ZELLERS, Rowan et al. 2019] arXiv:1811.10830	A dataset consisting of 290k multiple choice QA problems derived from 110k movie scenes.
VisualCOMET	[PARK, Jae Sung et al. 2020] arXiv:2004.10796	A large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after + person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Also...

The Big Bad NLP Database
YouTube BoundingBoxes – Large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations. All the video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
What's Cookin' – A list of cooking-related Youtube video ids, along with time stamps marking the (estimated) start and end of various events.
PASCAL VOC – A standardised image data sets for object class recognition and a common set of tools for accessing the data sets and annotations
PASCAL Context – Indoor and outdoor scenes with 400+ classes
MPII Human Pose Dataset – State of the art benchmark for evaluation of articulated human pose estimation
Cityscapes Dataset – benchmark suite and evaluation server for pixel-level and instance-level semantic labeling
Mapillary Vistas Dataset – a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world
ApolloScape Scene Parsing – RGB videos with high resolution image sequences and per pixel annotation, survey-grade dense 3D points with semantic segmentation
Stanford Background Dataset – A set of outdoor scenes with at least one foreground object

8. semantic parsers ☑️

SEMAFOR – automatically processes English sentences according to the form of semantic analysis in Berkeley FrameNet.
Google Sling – Natural language frame semantics parser
Open Sesame – Frame-semantic parsing system based on a softmax-margin SegRNN
PathLSMT – Neural SRL model

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
books		books
flickr30k_001-300		flickr30k_001-300
papers		papers
classnotes.md		classnotes.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

books

books

flickr30k_001-300

flickr30k_001-300

papers

papers

classnotes.md

classnotes.md

readme.md

readme.md

Repository files navigation

fnbr.vision

1. papers 📄

2. books 📚

3. lectures 💬

4. repos

5. blog posts 📌

6. tools 🔨

7. datasets ☁️

8. semantic parsers ☑️

About

Releases

Packages

Languages

viridiano/fnbr.vision

Folders and files

Latest commit

History

Repository files navigation

fnbr.vision

1. papers 📄

2. books 📚

3. lectures 💬

4. repos

5. blog posts 📌

6. tools 🔨

7. datasets ☁️

8. semantic parsers ☑️

About

Topics

Resources

Stars

Watchers

Forks

Languages