Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criteria for what items are significant enough to include in the deck? #137

Open
JohnHBrock opened this issue Sep 1, 2019 · 18 comments
Open
Labels
conception Scope of the deck, memorisation, contribution guidelines, etc.

Comments

@JohnHBrock
Copy link
Contributor

The deck as it stands is impressively comprehensive, but as I think about what other cards could be added, I wonder where to draw the line. For example, I'd love to include Canadian provinces or maybe the individual islands of Hawaii, but if we start adding sub-national items like provinces/states/departments (or parts of states, in this case of the Hawaiian islands) for every country in the world, the deck will grow very large.

Maybe the rule should be that we only include provinces/states/departments/other subregions if they're exclaves/enclaves or semi-exclaves/enclaves. This seems consistent with the current deck, e.g., French Guiana, Ceuta, Melilla, the Caribbean Netherlands, etc. If this were the rule, then, for example, we would have cards for Hawaii and Alaska, but not cards for individual Hawaiian islands or for British Columbia (since British Columbia is part of the Canadian mainland).

I have a similar question for bodies of water. I want to start contributing various lakes and rivers to the deck, but I'm wondering what the criteria should be for what's OK to include. For example, Lake Superior and Lake Victoria seem like obvious candidates. But what about the much smaller Lake Tahoe and Lake Balaton? Should we include Lake Victoria, but not Lake Malawi? Maybe volume or surface area should be criteria? The deck currently includes the relatively small Sea of Galilee and Dead Sea, but maybe small is OK if there's historic or religious significance?

So what thoughts do you all have on guidelines for what to contribute, @axelboc in particular?

@Yorwba
Copy link

Yorwba commented Sep 1, 2019

Regarding sub-national items: I wrote a Python script to generate decks for exactly that purpose, and this issue prompted me to open-source it as anki-wikidata-geography. I don't include information like capitals or flags yet, but that shouldn't be hard to add if there's demand (except for the fact that those don't always exist at the sub-national level). Bodies of water might also be possible to support, depending on whether that information is easily available from Wikidata.

@axelboc
Copy link
Collaborator

axelboc commented Sep 2, 2019

Cool stuff @Yorwba!

It's a tough one, @JohnHBrock. Honestly, the current notes of non-sovereign states (Guadeloupe, Java, Sardinia, etc.) are just a legacy of the original Ultimate Geography deck.

I don't think adding internal divisions is plausible. We can't possibly add every country's internal divisions or the deck would become unusable (and which divisions do we add? France has regions and departments, for instance), so we'd have to decide for which countries to add internal divisions, which is way too subjective and biaised.

Current inclusions in the deck seem to be based on the following (very, very) rough criteria:

  • size (e.g. Greenland, Corsica, Java, French Guiana, etc.)
  • exhaustiveness (e.g. every island in the Lesser Antilles)
  • self-governance (e.g. Ceuta, Isle of Man, etc.)
  • distance/separation from mainland.

I don't think defining exact, objective criteria is plausible either, so I think it comes down to looking at each case one by one and deciding as a community. Personally, I'd be happy to see Alaska and Hawaii being added, but not Hawaiian islands for instance.

When it comes to water bodies, there are many, many missing. The lack of a good source for maps is a problem, but the criteria for inclusion, to me, should simply be size and length: find a list of all the seas, lakes, gulfs, rivers, etc. ordered by area or length on Wikipedia and take the first <some arbitrary number(s) here> of each. The Dead Sea and the Sea of Galile are inland seas, which make them a bit special, but their inclusions are debatable. @aplaice made some nice maps for them, though, so it'd be a shame to remove them. 😄

@axelboc axelboc added the conception Scope of the deck, memorisation, contribution guidelines, etc. label Sep 2, 2019
@axelboc axelboc changed the title RFC: Criteria for what items are significant enough to include in the deck? Criteria for what items are significant enough to include in the deck? Sep 15, 2019
@aplaice
Copy link
Collaborator

aplaice commented Sep 19, 2019

So what thoughts do you all have on guidelines for what to contribute, @axelboc in particular?

I'm obviously not axelboc, but I am interested in having great anki decks/cards on these topics, though not necessarily in the main Anki Ultimate Geography deck, or even under the Anki Ultimate Geography "aegis", but ideally following the same high standards regarding quality and consistency.

Sub-national regions that are not exclaves etc.

I don't think adding internal divisions is plausible. We can't possibly add every country's internal divisions or the deck would become unusable (and which divisions do we add? France has regions and departments, for instance), so we'd have to decide for which countries to add internal divisions, which is way too subjective and biaised.

Fully agree!

anki-wikidata-geography is amazing! Another existing possible, partial alternative is "Western Geography", which appears to have been modelled on UG, but I haven't investigated it in detail.

Bodies of water

I have a similar question for bodies of water. I want to start contributing various lakes and rivers to the deck, but I'm wondering what the criteria should be for what's OK to include.

It would be great to have a nice anki deck with rivers and lakes, though as @axelboc noted, a serious issue is the lack of of consistent "upstream" maps. I'm agnostic as to whether it should be part of UG (I can see three options:

  1. part of the main deck,

  2. not in the main deck but managed in the anki-ultimate-geography repository, with anki-dm exluding the relevant cards by tag, when building the main deck (though AFAIR anki-dm doesn't have this capability at the moment),

  3. in a separate repository, but (ideally) with a similar structure to `anki-ultimate-geography.)

Two existing deck are "Rivers, Lakes, Seas, and Oceans", and "International waters: Ocean, Sea etc." but again I haven't investigated them in detail yet.

Sub-national regions that are "exclaves" etc.

Current inclusions in the deck seem to be based on the following (very, very) rough criteria:

  • size (e.g. Greenland, Corsica, Java, French Guiana, etc.)
  • exhaustiveness (e.g. every island in the Lesser Antilles)
  • self-governance (e.g. Ceuta, Isle of Man, etc.)
  • distance/separation from mainland.

In this category I'd like to add some cards to the main UG deck, based on these criteria.

Islands

IMO the best criterion here would be based on population, rather than surface area (possibly (?) with some minimal surface area cut-off to exclude tiny islands that are parts of cities, like New York's Long Island or Mumbai's Salsette). I'm basing this judgement on the fact that, looking at the islands with greatest surface area, I don't really care that I wouldn't be able to pinpoint Canada's Baffin Island, Victoria Island or Ellesmere Island (and would find it relatively hard to motivate myself to learn them), while I am ashamed that I can't distinguish between Kyushu and Shikoku with 100 % accuracy and don't even know the names of the Philippine islands other than Luzon and Mindanao.

A further argument in favour of focusing on population rather than area is that, in general, UG's focus has been political, rather than natural, geography.

OTOH that's my biased opinion, and it might still make sense to include the top ~ 20 largest islands, in addition to, say, the top ~ 40 most populous ones. (For comparison, Sicily is 45th by area and 22nd by population, Sardinia is 48th and 42nd, while Corsica is 83rd and 107th.)

Some questions to clarify the criteria:

  1. Do we include non-autonomous islands? (IMO yes, since we already have some of Indonesia's islands, which aren't autonomous.)

  2. Do we include islands that aren't far from the "mainland" (if such even exists)? (IMO yes, again since we already have Indonesia's islands and also islands like Sicily and Sardinia, which aren't really far from Italy.)

  3. Do we include islands shared among several countries, such as Borneo or Hispaniola? (Leaning towards yes.)

  4. Do we include "tiny" islands, such as Salsette (Mumbai, India), Long Island (New York, USA), Pulau Ujong (Singapore)? (Leaning towards no, but it's based on my rather arbitrary feeling.)

  5. How many of the largest and most populous islands do we include? (Arbitrarily, say 20 largest and 40 most populous?) Ideally, we'd use some weighted combination of area and population, and rank based on that, but it's probably overkill...

Lesser Antilles

  • exhaustiveness (e.g. every island in the Lesser Antilles)

In that case, should we include Saint Martin and Saint Barthélemy? Their populations are 35,107 and 9,625, respectively, compared to 33,609 and 1,991 for Sint Maarten and Saba. From a political point of view, they're Overseas collectivities (COMs), just like French Polynesia (though according to Wikipedia, French Polynesia has a great degree of autonomy, while they don't).

@JohnHBrock
Copy link
Contributor Author

Thanks for everyone's thoughts!

In general, deciding what to include based on rank seems reasonable.

For bodies of water, I'd prefer them to be part of this repo, but I'm indifferent to whether the main deck includes them or not.

For islands, I agree with @aplaice that we should continue to include non-autonomous islands, islands close to the mainland, and islands shared among several nations. I think it's OK to include tiny islands if they're sufficiently populated. I'm also fine including sparsely populated islands if they're sufficiently large in surface area. For starters, how about we aim to cover the union of top 25 islands by surface area and top 25 islands by population?

For the Lesser Antilles, the deck is so thorough that I think we might as well add Saint Martin and Saint Barthélemy to be consistent.

@axelboc
Copy link
Collaborator

axelboc commented Sep 21, 2019

OTOH that's my biased opinion, and it might still make sense to include the top ~ 20 largest islands, in addition to, say, the top ~ 40 most populous ones. (For comparison, Sicily is 45th by area and 22nd by population, Sardinia is 48th and 42nd, while Corsica is 83rd and 107th.)

Far out ! Never thought Corsica would rank so low. I must not know very many islands in the top 20/40...

Honestly, I'm concerned that if we add too many islands, natural regions, bodies of water, etc. the deck will lose its focus on what it's really good at, and what most people are interested in learning: sovereign states. I'm biased, obviously, but I think it's by far (and has been for some time) the best deck out there for learning countries, capitals, flags and locations of the world's sovereign states. It's exhaustive, up to date and starting to be translated into other languages. I think having too many extra notes would scare off users who are mostly interested in sovereign states, especially if they're not familiar with Anki and creating filtered decks.

I also don't see the benefit of using this repository for storing notes for other decks. It would dilute contributions and complexify the file structure. Why not set up Anki DM in a separate repository and manage a deck dedicated to water bodies in there?

All in all, I don't think the deck should grow much more. It should remain focused on sovereign states, with just enough general geography knowledge on the side to make it more fun and make you want to learn more. Here is what I suggest we do, in order:

  1. Let's start by clarifying which types of notes the deck should include so that we can define criteria for each of them.
  • For water bodies, the obvious ones are oceans, seas, rivers and lakes, but there are other potential candidates, like bays -- is it worth including them?
  • Overseas territories, dependent areas, etc. are very vague terms in the sense that some are islands while others are cities, some are inhabited while others are not, and so on. Most importantly, they're political terms, Should we focus on geographical features instead, like islands, archipelagos, peninsulas, continents, etc.? We can't really go for both political and geographical, or we'd have duplicates... I think geographical features are more in line with our current trend of becoming more "objectively ultimate". 😄 I'd argue that many of the dependent territories currently in the deck would be better suited to the Western Geography deck that you mention, @aplaice. Also, water bodies are (mostly) geographical features, so it would be more consistent to have two big types of notes: sovereign states and geographical features.
  • What about ... mountain ranges ?
  1. For each note type chosen in the first step, we'll then need to pick a target number of notes to include in the deck. The deck currently has 39 notes for water bodies, and 74 for overseas territories, dependent areas, regions, continents, etc. I think we could go up to at most 150 notes in total (i.e. fewer than sovereign states).
  2. Once this is done, we can define an objective criterion for each note type (e.g. combination of population and area for islands) and select the notes accordingly. Let's not be scared of removing notes. If we decide to include, say, 40 islands, and Corsica, Sardinia, and Sicily don't make the cut, so be it! There's no point in adding 100 impossible-to-remember islands just because some notes were arbitrarily added to the deck a while ago.

Do you think this process makes sense?

@aplaice
Copy link
Collaborator

aplaice commented Sep 21, 2019

I think that this approach of not extending this deck too much, makes sense, on the whole, though the exact details might be hard to get right.

One issue with objective criteria is that it's sometimes hard to pick meaningful ones.

Particularly in the case of seas and water bodies, using the most obvious and arguably only easily-measurable metric (surface area), might not be ideal. For instance, the surface area of the Arabian Sea is far greater than that of the Persian Gulf or the Red Sea, but IMO the locations of the latter are far more important to know. In the case of straits, surface area is not even well-defined.

Another problem is that if dependent/autonomous regions are to be removed, what happens to self-proclaimed but not internationally recognised states? On the spectrum between sovereign state to "just a province of a sovereign state, with no special rights" they're slightly closer to "sovereign" state than an autonomous region would be, but often not much. Also what about weird edge-cases like the Isle of Man, the constituent countries of the Kingdom of the Netherlands or Hong Kong?

Should we focus on geographical features instead, like islands, archipelagos, peninsulas, continents, etc.? We can't really go for both political and geographical, or we'd have duplicates...

That might be sensible (though we'll still have scope for duplicates, given the many countries that take up an island, most of an island or have islands named after them).

Additionally, for the geographical features, should we use a different map convention, since the current one is focused on the political boundaries? (Perhaps something approximately like this?)

What about ... mountain ranges ?

and mountain peaks? :) (and deserts(?))

(I'm ambivalent about adding mountain ranges, mountain peaks, rivers, lakes etc. to this deck, since it doesn't really contain any yet. On the one hand, I can see the argument about piquing people's curiosities with a couple of each. On the other hand, adding just a couple might feel incomplete. Perhaps just the very largest/longest per continent?)

@Erim24
Copy link
Collaborator

Erim24 commented Sep 22, 2019

Overall, I have to say that I mostly agree with @axelboc.
I think the deck should not grow much more. For example the amount of islands (especially in oceania and the carribean) is quite much to learn.
In that sense I think that adding many more sub-national and non-autonomous islands/territories. In my opinion following things should included:

  • big geographical entities. For example
    • some of the main islands of some countries (e.g. Indonesia, Japan?, Hispanola)
    • main mountain ranges (Andes, Alps, Himalaya and some others)
    • deserts
    • the main water bodies
  • smaller geographical entities that are important
    e. g. the dead sea, Sea of Galilee, Gulf of Mexico, ...

@axelboc axelboc added this to the v3.3 milestone Sep 22, 2019
@axelboc axelboc removed this from the v3.3 milestone Nov 17, 2019
@mighty-cthulhu
Copy link

What about splitting this deck into two? One for political geography and the other one for physical geography.

Political geography deck would contain all the countries, territories, dependencies, and other geopolitical areas that currently constitute the bulk of the Ultimate Geography deck.

Physical geography deck would contain cards about continents, oceans, seas, bays, gulfs, straits, rivers, islands, peninsulars, mountain ranges, deserts, glaciers, etc.

@ohare93
Copy link
Member

ohare93 commented Nov 23, 2019

What about splitting this deck into two? One for political geography and the other one for physical geography.

Political geography deck would contain all the countries, territories, dependencies, and other geopolitical areas that currently constitute the bulk of the Ultimate Geography deck.

Physical geography deck would contain cards about continents, oceans, seas, bays, gulfs, straits, rivers, islands, peninsulars, mountain ranges, deserts, glaciers, etc.

I do not like this idea in principle 🙁 I don't see what we'd gain from doing so. Furthermore, what about physical geography whose names are not fully agreed on in different countries, generally politically? The Sea of Japan for instance: https://en.wikipedia.org/wiki/Sea_of_Japan_naming_dispute

@axelboc
Copy link
Collaborator

axelboc commented Apr 28, 2020

Difficult to find a way forward for this issue. We've discussed a lot of things... If I may, I'm going to try to summarise what we mostly agree on:

  1. Keep both political and physical geography in the same deck (sorry @mighty-cthulhu 😅).
  2. The political geography side of the deck is pretty good as it is. It needs stricter inclusion rules, but it doesn't need to change much overall -- dependent territories stay in; country subdivisions stay out (e.g. US states).
  3. The physical geography side of the deck can be extended, within reasons, with more entities (water bodies, islands, mountain ranges, deserts, etc.) Each entity type must come with its own set of inclusion rules that significantly restricts the number of notes the entity type brings to the deck.

From this, I can think of the following actions:

  1. Come up with inclusion rules for political entities other than sovereign states, then add/remove notes as required.
  2. Come up with inclusion rules for water bodies, then remove notes as required.
  3. Come up with inclusion rules for islands, then remove notes as required.
  4. Long term: add notes for islands and water bodies as per new inclusion rules.
  5. Long term: consider including more types of physical entities (desert, mountain ranges, etc.)

If these actions look alright to all of you, I'll open up issues for the first three and close this issue.

@The-Wap
Copy link

The-Wap commented May 22, 2020

Hi everyone,

as I just love using the extended German version of this awesome deck (thanks for all the hard work to all the contributors!!), I just wanted to drop my 2 cents into this discussion:

  1. Regarding the sub-country level detail (like all the federal states), I also think it would be overkill to include them into the deck (for most users) and would somehow dilute the deck. There are already some great decks in that regard available for Anki for a lot of countries already if needed. And with a few modifications and tweeks (like using the UG cards template) you almost get the feeling is if it belongs to this deck (I did it for decks for Japanese prefectures, German federal states and American states until now) ;)

  2. Regarding the bodies of water, I am currently using the "Rivers, Lakes, Seas, and Oceans"-Deck mentioned by @aplaice. I deleted (most) of the cards that were already present in the UG deck, translated the rest into german and also modified it to match the UG cards (although I also changed the UG design a bit to e.g. get the corresponding Wiki articles embedded in an iFrame to learn more about population etc. in case I want to). The mentioned "Rivers, Lakes, Seas, and Oceans"-Deck has really good maps of the water bodies. The deck itself is an export of the ""Rivers, Lakes and Seas" deck by azrael42 on Memrise.
    Most of the location maps were created by azrael42 and are based on the work of the following Wikipedia Commons users:
    Africa: Sting (Eric Gaba), Middle East, Central America and Australia: Виктор В, Europe: Alexrk2, New Zealand, Sulawesi: NordNordWest, Japan: Chumwa, all remaining maps (!): de.wikipedia.org/wiki/Benutzer:Uwe_Dedering
    Please see here for more information on the original deck." (from the description of the anki deck).
    Maybe there is a chance to "just include" this deck into UG, with the appropriate mentionings (I have no clue about the license stuff)? If this is the case I can provide you with my translated and slightly modified privately used Deck, to not do the translation works twice (at least into German). I can extract a double language deck (with both English and German names included).

  3. Mountain ranges and deserts would be epic as well :)

@gitonthescene
Copy link

gitonthescene commented Jun 27, 2020

Hello,

I've been playing with the Wikidata SPARQL interface quite a bit lately. The data there isn't nearly as complete as the data on Wikipedia (presumably because someone needs to scrape and organize the data), but we could help! ;-)

In any event, I wrote a query for searching for lakes with an area greater than a threshold.

This makes an argument for including Lake Chad and Lake Superior. (Lake Agassiz is no longer, but I couldn't find a sensible way to filter it out). Conversely if you lower the threshold to include the Dead Sea (currently in the deck), you then have an argument for adding 284 more lakes. If you lower it still to include the Sea of Galilee you're up to nearly 500 lakes of comparable size.

In order to capture all the Great Lakes of the US you'd have 62 lakes of comparable size. I wouldn't mind learning all of those, but perhaps it's better to make a separate Bodies of Water deck?

( As a super quick introduction to the SPARQL language. The database is comprised of "statements" which are triples of (subject, verb, object). In the link you can hover over stuff to find out what it represents. wdt:P31 means "instance of" where wdt: introduces a property (i.e. verb). wd: introduces an "entity". Queries are statements with variables like ?territory and the results are the values of all the sets of variables which make all the statements true. Lastly, if you start a name like wd:Caspian and hit Ctrl-[space] you get a drop down menu of possible completions. This much got me functional. )

@gitonthescene
Copy link

Similarly, this query makes an argument for including the Gulf of Maine and the Gulf of Saint Lawrence.

@gitonthescene
Copy link

gitonthescene commented Jun 27, 2020

It's interesting to note that none of the states in this query are tagged as sovereign state, but instead are tagged as state with limited recognition. That designation pulls in a few other states not listed on the wiki for sovereign states including Donetsk and Kurdistan.

I'll do a write up separately (to not clutter this discussion), but I've got a set of queries which comes within 100 entries of duplicating the current deck, more than two-thirds of the difference is in bodies of water. The nice thing about using the queries is that anyone can run them or tweak them so it serves as a pretty universal basis for discussion. The not nice thing is that not many people know SPARQL.

@aplaice
Copy link
Collaborator

aplaice commented Jun 27, 2020

Thanks for this! It's really useful to play around with (though much of the data is currently* missing or wrong)!

* as you point out we could help with this :)

However, in the case of water bodies, I'm not convinced that a global pure surface area criterion would be sufficient.

For instance, the "surface area" of the Bering Strait is tiny, but the Bering Strait should definitely be included. Also, the area of the Persian Gulf is smaller than that of the Arabian Sea, but if I were to have to choose only one for inclusion, I'd pick the Persian Gulf.

I personally wouldn't care too much about the removal of the Dead Sea and not at all about the removal of the Sea of Galilee, but I've seen people explicitly mention them as entities they would like kept (probably due to their cultural importance).

Seas that are near to densely-populated areas might be of more interest than seas off the coast of Antarctica.

In some cases, but not all of them, I can see easy ways of "patching" the criteria:

  1. Include straits between continents.

  2. Have a different criterion for Gulfs than for Seas

    • but what about seas that are in effect, gulfs without being called such?

One issue regarding the surface area data that's particularly a problem with non-bounded water bodies (seas etc.) is that the limits of the seas aren't well-defined.

Similarly, this query makes an argument for including the Gulf of Maine and the Gulf of Saint Lawrence.

Unfortunately, the area for the Gulf of Mexico in Wikidata is absolutely incorrect.


All of the above obviously isn't to say that the Wikidata queries aren't useful — they definitely are, so thanks!

@gitonthescene
Copy link

Agreed. I think it would be tough to consider the queries anything more than a tool given that the data can be sketchy. I.e. I wouldn't declare anything coming from the queries definitive. At the very least it brings things into the discussion, like "why not Lake Chad?"

@carteryott
Copy link

Just dropping my two cents here... I agree that this deck is large enough, but have you considered making a second deck with geographical features such as deserts, lakes, rivers, mountain ranges, etc. Now i have no idea the amount of effort this will take but the same layout should be used. I just don't think it would be plausible to expand this deck to incorporate everything, for example there are a number of historically, culturally, and religiously relevant bodies of water, deserts, mountain ranges, etc. that would never get to be included in this deck because they are insignificant in terms of size. I think the islands should stay in this deck, they are landmasses after all.

@ohare93
Copy link
Member

ohare93 commented Feb 21, 2021

Just dropping my two cents here... I agree that this deck is large enough, but have you considered making a second deck with geographical features such as deserts, lakes, rivers, mountain ranges, etc.

We have indeed! 😁 That is the future goal for Brain Brew, the new deck manager of UG, to allow for multiple different deck recipes to combine at the will of the user. In this case you would make a git repo containing only the new lakes, rivers, etc that you (and others) are interested in, then you could sync that into your own main UG deck while maintaining the ability to keeo in sync with the main changes. That's the goal anyways! 😁 may be a while before we get there. Read ohare93/brain-brew#4 (comment) (the "Federation of UG" section) for more details on this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conception Scope of the deck, memorisation, contribution guidelines, etc.
Development

No branches or pull requests

10 participants