SocialED.utils

utility

A set of utility functions to support social event detection tasks.

SocialED.utils.utility.construct_graph(df, G=None)[source]

Construct a graph from a DataFrame containing social media data.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing social media data with columns: tweet_id, user_mentions, user_id, entities, sampled_words

  • G (networkx.Graph, optional (default=None)) – Existing graph to add nodes/edges to. If None, creates new graph.

Returns:

G – Graph with nodes for tweets, users, entities and words, and edges between them.

Return type:

networkx.Graph

SocialED.utils.utility.tokenize_text(text, max_length=512)[source]

Tokenize text for social event detection tasks.

Parameters:
  • text (str) – The input text to tokenize.

  • max_length (int, optional (default=512)) – Maximum length of tokenized sequence.

Returns:

tokens – List of tokenized words/subwords.

Return type:

list

SocialED.utils.utility.pprint(params, offset=0, printer=<built-in function repr>)[source]

Pretty print the dictionary ‘params’.

Parameters:
  • params (dict) – The dictionary to pretty print

  • offset (int, optional (default=0)) – The offset at the beginning of each line

  • printer (callable, optional (default=repr)) – The function to convert entries to strings

Returns:

Pretty printed string representation

Return type:

str

SocialED.utils.utility.validate_device(gpu_id)[source]

Validate the input GPU ID is valid on the given environment. If no GPU is presented, return ‘cpu’.

Parameters:

gpu_id (int) – GPU ID to check.

Returns:

device – Valid device, e.g., ‘cuda:0’ or ‘cpu’.

Return type:

str

SocialED.utils.utility.check_parameter(value, lower, upper, param_name, include_left=True, include_right=True)[source]

Check if a parameter value is within specified bounds.

Parameters:
  • value (int or float) – The parameter value to check

  • lower (int or float) – Lower bound

  • upper (int or float) – Upper bound

  • param_name (str) – Name of the parameter for error messages

  • include_left (bool, optional (default=True)) – Whether to include lower bound in valid range

  • include_right (bool, optional (default=True)) – Whether to include upper bound in valid range

Returns:

True if parameter is valid, raises ValueError otherwise

Return type:

bool

SocialED.utils.utility.currentTime()[source]

Get current time as formatted string.

Returns:

Current time in format ‘YYYY-MM-DD HH:MM:SS’

Return type:

str

SocialED.utils.utility.sim(z1, z2)[source]

Compute cosine similarity between two sets of vectors.

Parameters:
  • z1 (torch.Tensor) – First set of vectors

  • z2 (torch.Tensor) – Second set of vectors

Returns:

Similarity matrix

Return type:

torch.Tensor

SocialED.utils.utility.pairwise_sample(embeddings, labels=None, model=None)[source]
SocialED.utils.utility.SBERT_embed(s_list, language)[source]

Use Sentence-BERT to embed sentences. s_list: a list of sentences/ tokens to be embedded. language: the language of the sentences (‘English’, ‘French’, ‘Arabic’). output: the embeddings of the sentences/ tokens.

SocialED.utils.utility.DS_Combin(alpha, classes)[source]
Parameters:

alpha – All Dirichlet distribution parameters.

Returns:

Combined Dirichlet distribution parameters.

SocialED.utils.utility.graph_statistics(G, save_path)[source]