This repository describes the BISON framework (Blockchain Interpretable Success prediction for SOcial media NFTs), which leverages linguistic statistics and blockchain-derived features to model and explain the success of blockchain-native articles (e.g., writing NFTs).
Below are the features used in the model, categorized by type. These features are used in a "multimodal", explainable ML pipeline to predict and interpret success on decentralized content platforms.
These features are extracted from the textual content of articles:
Kincaid Grade Level
: U.S. grade level required to understand the text.Flesch Reading Ease
: Score from 0–100; higher means easier.Gunning Fog Index
: Years of formal education required.
characters_per_word
: Average characters per wordsyll_per_word
: Average syllables per wordwords_per_sentence
: Average words per sentencesentences_per_paragraph
: Average sentences per paragraphtype_token_ratio
: Lexical diversity (% of unique word types over total tokens)characters
: Total character countsyllables
: Total syllable countwords
: Number of content tokenswordtypes
: Number of unique content word typessentences
: Total sentence countparagraphs
: Total paragraph countlong_words
: Words with more than 6 letterscomplex_words
: Polysyllabic and uncommon words
cleaned_text
: Text after removing noise and irrelevant characterslanguage
: Detected language of the textcleaned_body
: Cleaned version of the article body textcleaned_title
: Cleaned article titleprocessed_cleaned_text
: Text after further processing like normalizationcleaned_text_tokenized
: Tokenized version of the cleaned textcleaned_text_lemmatized
: Lemmatized tokens (base forms of words)cleaned_text_POS
: Part-of-speech tagging of tokenscleaned_text_sentiment
: Sentiment score derived from textwords_body
: Word count in the article bodywords_title
: Word count in the titlewords_text
: Total word count combining title and bodynormalized_tfidf_sum
: Normalized sum of TF-IDF scores across documentverbs_density
: Density of verbs in textadjectives_density
: Density of adjectives in textnouns_density
: Density of nouns in text
Thematic topics extracted from articles:
topic
: (categorical) Represents the main topic of the article. Possible values are: T1: Gaming, Virtual Worlds & Characters; T2: Wallets, Airdrops & Ethereum Tools; T3: Web3, Blockchain & Digital Platforms; T4: DeFi, Market Strategies & Liquidity; T5: Blockchain, Transactions & Smart Contracts; T6: Web3 Launches, Rewards & Creators; T7: Human Thoughts, Emotions & Reflections.topic_T1: Gaming, Virtual Worlds & Characters
topic_T2: Wallets, Airdrops & Ethereum Tools
topic_T3: Web3, Blockchain & Digital Platforms
topic_T4: DeFi, Market Strategies & Liquidity
topic_T5: Blockchain, Transactions & Smart Contracts
topic_T6: Web3 Launches, Rewards & Creators
topic_T7: Human Thoughts, Emotions & Reflections
For each keyword (nft
, web3
, community
, blockchain
, crypto
, wallet
, chain
):
<keyword>
: indicates the presence (1) or absence (0) of the keyword in the article text.
days_since_epoch
: Days elapsed since article publicationpublication_date
: Full publication date of the article or NFT inYYYY-MM-DD
format.year_month
: Publication date grouped by year and month inYYYY-MM
format, useful for temporal aggregation.year
: Year of publicationmonth
: Month of publication (values from 1 to 12)day
: Day of publicationweekday
: Weekday of publication, encoded as 0=Monday, 1=Tuesday, 2=Wednesday, 3=Thursday, 4=Friday, 5=Saturday, 6=Sunday
These features capture blockchain and crypto ecosystem signals relevant to each article:
For each token (BTC
, TETHER
, OPTIMISM
, ETH
, USDC
, DAI
) at the publication date:
open_<token>_usd
: Opening pricelast_<token>_usd
: Closing pricemax_<token>_usd
: Daily maximummin_<token>_usd
: Daily minimumvol_<token>
: Trading volumevar%_<token>
: Daily % price change
daily_transactions_optimism
: Daily transaction count on Optimism networketh_active_addresses_total
: Total active Ethereum addresseseth_active_addresses_sender
: Active sending addresses on Ethereumeth_active_addresses_receiver
: Active receiving addresses on Ethereumoptimism_active_addresses_total
: Total active addresses on Optimism network
author_address
: Wallet address of the authorauthor_ether_balance
: ETH balance of author's walletauthor_transactions_number
: Total blockchain transactions by authorauthorPostCount
: Number of published articles by authorauthorTotalSales
: Number of Writing NFTs sold by authorauthorTotalRevenue
: Total ETH revenue from NFT sales by authorAuthor Homepage
: URL of the author's homepage or profile
writing_nft
: Identifier indicating the article is minted as a writing NFTTotal Sold(ETH)
: Total ETH earned from all sales of the NFTTotal Sold Numbers
: Total quantity of NFTs soldTotal Buyers
: Number of unique buyersPrice(ETH)
: Listing or sale price of the NFTnft_address
: Blockchain address of the NFT contractcollection
: Name of the NFT collectionfees
: Associated fees (e.g., royalties) on NFT salescreated_date
: Date of NFT or article creationlink
: URL to the article or NFT pagedigest
: Unique content hash or digesttransaction_id
: Blockchain transaction ID for mint or salebody
: Raw text body of the articletimestamp
: Timestamp of article or NFT eventtitle
: Raw article title
week_google_searches_nft
: Google Trends score for "nft" in publication weekweek_google_searches_crypto
: Google Trends score for "crypto"week_google_searches_bitcoin
: Google Trends score for "bitcoin"week_google_searches_ethereum
: Google Trends score for "ethereum"week_google_searches_optimism
: Google Trends score for "optimism"
Success
: Numeric indicator of article successSuccessBinary
: Binary success label (success/failure)