Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List all possible fields which can be extracted #5

Open
oparoz opened this issue Jun 1, 2016 · 52 comments
Open

List all possible fields which can be extracted #5

oparoz opened this issue Jun 1, 2016 · 52 comments

Comments

@oparoz
Copy link
Contributor

oparoz commented Jun 1, 2016

We need to have a full list and prioritise it.

There should be a list for EXIF, IPTC and XMP

@imjalpreet
Copy link
Member

@oparoz, I have looked at the data fields in all the three types of data and decided to go with the following fields:

IPTC Data:

  • File Format
  • Date Created
  • Time Created
  • Digitization Date
  • Digitization Time
  • City
  • Country Code
  • Country Name
  • Image Type
  • Image Orientation
  • Preview Format
  • Preview Version
  • Preview

EXIF Data:

  • FileName
  • FileDateTime
  • FileSize
  • MimeType
  • Height
  • Width
  • IsColor
  • Thumbnail.Height
  • Thumbnail.Width
  • Camera Make*
  • Camera Model*
  • DateTimeOriginal
  • DateTimeDigitized
  • Exif Image Width
  • Exif Image Length

XMP Data:

  • Creator Email
  • Owner Name
  • Creation Date
  • Modification Date
  • Label
  • City
  • State
  • Country
  • Country Code
  • Location
  • Title
  • Description
  • Keywords

We can add/remove based on our requirements as we go ahead in the project.

@imjalpreet
Copy link
Member

According to me, we can extract the following fields from EXIF Data: (I am taking reference from here)

  • Exif.Image.ImageWidth (or Exif.Iop.RelatedImageWidth )
  • Exif.Image.ImageLength (or Exif.Iop.RelatedImageLength)
    Though, I have already written the code for this, it is indirectly taken from exif/iptc data itself.
  • Exif.Image.DateTime (Last Modified Time)
  • Exif.Image.HostComputer (Maybe we can get the owner info from this, will have to check)
  • Exif.Image.ImageID (I think it may be useful, according to what is given in the description in the link I have provided)
  • Exif.Image.TimeZoneOffset
  • Exif.Image.DateTimeOriginal
  • Exif.Photo.DateTimeOriginal (I will check what is the difference between the two)
  • Exif.Photo.SubSecTime, Exif.Photo.SubSecTimeOriginal (To get the fraction of seconds for the respective times)
  • Exif.Photo.CameraOwnerName (or Exif.Image.HostComputer)
  • Exif.GPSInfo.GPSLatitudeRef (N or S)
  • Exif.GPSInfo.GPSLatitude
  • Exif.GPSInfo.GPSLongitudeRef
  • Exif.GPSInfo.GPSLongitude
    We can also extract altitude if needed.
  • Exif.GPSInfo.GPSAreaInformation (Name of the area)
  • There is also some destination GPS Latitude and Longitude info, will have to check which one we require.

@oparoz What are your views?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 15, 2016

Thank you for the link.
Overall, it's not easy to pick the essential ones as this app is supposed to be generic and not linked to Gallery, but I did ask you to cross-reference each field to see if they exist in all 3 sets since this would help pick the ones we have to have.
So maybe start with that and then we can look at the details?

Exif.Image.ImageLength (or Exif.Iop.RelatedImageLength) Though, I have already written the code for this, it is indirectly taken from exif/iptc data itself.

I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata

Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?

Exif.Image.HostComputer (Maybe we can get the owner info from this, will have to check)

I think this would be considered to invasive, let's put it on the "maybe" pile and once we have a list we could ask people for their opinion.
Same with Exif.Image.HostComputer

Exif.Image.ImageID (I think it may be useful, according to what is given in the description in the link I have provided)

On the maybe pile?

Exif.Photo.SubSecTime, Exif.Photo.SubSecTimeOriginal (To get the fraction of seconds for the respective times)

Probably unnecessary.

Looking at the list, it sounds like date and location are the 2 main areas and those are definitely needed.

But it seems some pretty obvious ones are missing

  • Exif.Image.ImageDescription
  • Exif.Photo.UserComment
  • Exif.Image.Make and Exif.Image.Model
  • Exif.Image.Orientation

@imjalpreet
Copy link
Member

I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata

Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?

Actually, I had checked it out and I saw the function that I was using getimagesize and found out that they are extracting it from EXIF/IPTC data itself. (I will let you know the file) But I too don't think there is any need to change the current method.

I agree with the other points you have made.

But it seems some pretty obvious ones are missing
•Exif.Image.ImageDescription
•Exif.Photo.UserComment
•Exif.Image.Make and Exif.Image.Model
•Exif.Image.Orientation

Actually, I forgot to mention about these. I thought first I would finish with size, location and time and then come to these fields. On the other hand, 100% We need to extract the above mentioned fields.

Also, the aim of matching the fields in the three sets is to not have any repetitions. right?

@imjalpreet
Copy link
Member

Also, coming to the generic part of the App, we can always add any additional fields which we may want at any time in the future.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 15, 2016

Actually, I had checked it out and I saw the function that I was using getimagesize and found out that they are extracting it from EXIF/IPTC data itself.

Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.

I thought first I would finish with size, location and time and then come to these fields

Yes, it's not a problem, we just need to have maybe 3 priorities for the implementation. It should take very little time to add ways to extract new fields if the methods are properly designed.

Also, the aim of matching the fields in the three sets is to not have any repetitions. right?

Not, it would be to be able to extract the same information from all 3 formats.

So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format.
Does that make sense?

@imjalpreet
Copy link
Member

Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.

I didn't get you.

As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)

Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error, I think I had told you about this. I am not sure whether there was some problem in my local machine or some where else.

So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format.
Does that make sense?

Yes.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 15, 2016

As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)

OK, so

  1. GD can only be used on image formats it supports. If you send a PDF, your app will crash
  2. A PNG doesn't contain any EXIF or IPTC meta data, so GD can't use that to determine the size of the image. Instead it relies on libraries designed to handle that format

That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.

Is it clearer?

Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error

This is what you've sent me when putting in place Codeception

[PHPUnit_Framework_Exception]  
 getimagesize(): Read error!

That's the problem I'm talking about.

You can write the 1st test with a JPG and once it passes, you'll need to create another one which involves processing a txt file and then update your code to make it work.
But that's all for another issue as this one focuses on identifying the tags.

@imjalpreet
Copy link
Member

OK, so

  1. GD can only be used on image formats it supports. If you send a PDF, your app will crash
  2. A PNG doesn't contain any EXIF or IPTC meta data, so GD can't use that to determine the size of the image. Instead it relies on libraries designed to handle that format

That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.

Is it clearer?

Yeah, I had read about that. Thanks for reminding.

Thanks, I understood the tests problem.

@imjalpreet
Copy link
Member

imjalpreet commented Jun 16, 2016

IPTC Tags that represent similar data to EXIF: (Only for Location and Time)

  • Location (GPS)
    • Iptc.Application2.LocationCode
    • Iptc.Application2.LocationName
    • Iptc.Application2.City
    • Iptc.Application2.SubLocation
    • Iptc.Application2.ProvinceState
    • Iptc.Application2.CountryCode
    • Iptc.Application2.CountryName
  • Time
    • Iptc.Application2.DateCreated
    • Iptc.Application2.TimeCreated
    • Iptc.Application2.DigitizationDate
    • Iptc.Application2.DigitizationTime
  • Caption
    • Iptc.Application2.Caption
    • Iptc.Application2.Writer
  • Others
    • Iptc.Application2.ImageOrientation
    • Iptc.Application2.ImageType
  • Preview (If needed)
    • Iptc.Application2.PreviewFormat
    • Iptc.Application2.PreviewVersion
    • Iptc.Application2.Preview

@oparoz Please give your suggestions.

@imjalpreet
Copy link
Member

Reference Link for IPTC.

@imjalpreet
Copy link
Member

imjalpreet commented Jun 16, 2016

Similar XMP Tags:

  • Date
    • CreateDate
    • ModifyDate
    • MetadataDate
  • Others
    • Thumbnails (If needed)
    • creator
    • description
    • format

Reference:

The following table lists the XMP properties defined solely by Exif.

The following table lists additional XMP properties defined solely by Exif.

This schema specifies the IPTC Core XMP properties.

This schema specifies the IPTC Extension XMP properties.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 16, 2016

OK, but now you need to prioritise these groups and then put them in a table.

Priority DB EXIF IPTC XMP
1 creation_date

@imjalpreet
Copy link
Member

@oparoz I will come up with my table by today and put it over here and then we can discuss on the changes.

@imjalpreet
Copy link
Member

imjalpreet commented Jun 17, 2016

@oparoz Can you also explain to me how exactly should I select my priority order? I mean what factors should I keep in mind?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

Since your question was answered via email, I'm looking forward to your prioritised table.

@imjalpreet
Copy link
Member

Another resource for XMP data: http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html

@imjalpreet
Copy link
Member

@oparoz I have made a table of data fields with priority values. You can find it over here.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 20, 2016

Good job :)

Could you order it, so that people looking at it get the proper order right away?

Looks like Caption and Description are the same thing, no?

I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.

Overall, it seems like IPTC is going to be the one holding us back.

Could you please add all the references you've added to this issue to the Wiki?

@imjalpreet
Copy link
Member

imjalpreet commented Jun 20, 2016

@oparoz,

Could you order it, so that people looking at it get the proper order right away?

Yeah I have done it.

Looks like Caption and Description are the same thing, no?

Yeah, I also had that doubt. I will merge them while coding.

I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.

Done.

Could you please add all the references you've added to this issue to the Wiki?

Okay, I will add them soon.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 21, 2016

Looks like "location" is going to be a problem. City is too vague to be able to put a picture on a map.

@imjalpreet
Copy link
Member

@oparoz I think we will be able to put the picture on the map with the help of coordinates itself. I have earlier used the google maps api for GIS with which we can mark a position on the map with the coordinates. On the other hand, we can use the city as a tag for the pictures, so that if we search a city, we get all the pictures clicked there. What do you think about this?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 21, 2016

Yes, no problem with coordinates, but with IPTC tags, there is not enough granularity, which means that if you're visiting a city, all the pictures will be piled up in one location instead of being spread out, near their real locations.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 21, 2016

I still think it's not as precise as GPS coordinates, but a good fallback

@imjalpreet
Copy link
Member

Yeah, not as good as GPS coordinates but better than just city.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 21, 2016

Indeed :)

@imjalpreet
Copy link
Member

Also, there is one more field named sublocation as well, we can look at that too, when I am implementing it.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 21, 2016

We'll have to test on a large sample of images to figure out what the best approach is. A good idea would be to get a list of IPTC data generators to see what gets inserted and how.

@imjalpreet
Copy link
Member

@oparoz I am planning to update the database.xml file with the final fields along with finishing that test. So can you finalize the fields in that google sheet? As soon as I complete the database.xml, I will be able to start with the extraction part and I am planning to complete at least the EXIF extraction by Friday(or Saturday).

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

@imjalpreet - Did you complete this task?

We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.

I'd like to avoid having 4 fields for GPS data if not necessary

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

Some other things.

  • This is still not fixed

    Looks like Caption and Description are the same thing, no?

  • Description or orientation is more important than "Creator"

  • Let's add a column with the database fields

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

The goal is to be able to present this to users and get some feedback to see if that's what people want

@imjalpreet
Copy link
Member

@oparoz I have done all the changes you asked.

@imjalpreet - Did you complete this task?

We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.

What exactly do you want me to do in this? Should I find out what practices are used in Google and Twitter for GPS location?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

Should I find out what practices are used in Google and Twitter for GPS location?

Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.

@imjalpreet
Copy link
Member

Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.

Okay, I will have a look at it.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

Thanks!

@imjalpreet
Copy link
Member

@oparoz This is the response I get from the Facebook graph API if for example I request the current location of one of my friend:

"current_location": {
        "city": "Mumbai",
        "state": "Maharashtra",
        "country": "India",
        "zip": "",
        "latitude": 18.975,
        "longitude": 72.8258,
        "id": "114759761873412",
        "name": "Mumbai, India"
      }

What are your views on this?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 28, 2016

Try to get a few more, like Twitter, but it seems like latitude and longitude is all we would need.

@imjalpreet
Copy link
Member

@oparoz I looked out and found out that Twitter also uses only latitude and longitude for the location.
The JSON object returned by the twitter API is of this form:
"geo": { "type":"Point", "coordinates":[37.78029, -122.39697] }

Resource: Link

So, I think we can go forward with latitude and longitude only.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 30, 2016

                                                                                  OK, let's go with that

@imjalpreet
Copy link
Member

@oparoz Okay, so should I update the database.xml file with the finalized fields?

@imjalpreet
Copy link
Member

@oparoz Do you want any more changes?

imjalpreet added a commit that referenced this issue Jul 4, 2016
@oparoz
Copy link
Contributor Author

oparoz commented Jul 5, 2016

There is still nothing in the OP or the wiki. We need to be able to quickly find the information. Please do that ASAP, then we can ask people for their opinion, then we can update the fields.

@imjalpreet
Copy link
Member

@oparoz Can you give some brief info on what all should I add to the Wiki?

@oparoz
Copy link
Contributor Author

oparoz commented Jul 5, 2016

Sure. Right now we need one page with the links to all the description of the fields we can pick from.
In this OP, you can add a link to the wiki and a link to your spreadsheet.

As a general rule, when you find something useful, just add it to the wiki. It can be the code in core that you used as a reference per example, making it easy for someone else to quickly find some sort of reference document.

@oparoz
Copy link
Contributor Author

oparoz commented Jul 7, 2016

The spreadsheet needs to be fixed to reflect your latest findings. We only need 2 fields for the GPS coordinates.

@imjalpreet
Copy link
Member

@oparoz I forgot to tell you that I had updated this.

@oparoz
Copy link
Contributor Author

oparoz commented Jul 9, 2016

OK. In column F, you need to add the database fields for the GPS coordinates

oparoz added a commit that referenced this issue Jul 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
V1.0
Ready
Development

No branches or pull requests

2 participants