Max Bartolo
https://www.maxbartolo.com/
Recent content on Max BartoloHugo -- gohugo.ioen-usMon, 17 Jun 2019 00:00:00 +0000About
https://www.maxbartolo.com/home/about/
Wed, 20 Apr 2016 00:00:00 +0000https://www.maxbartolo.com/home/about/I am a PhD student at the UCL Machine Reading group under the supervision of Pontus Stenetorp and Sebastian Riedel. I have a Masters degree from the UCL Department of Computer Science and a Bachelors in Mechanical Engineering from the University of Malta. Previously, I worked as a Machine Learning Engineer at Bloomsbury AI as well as other start-ups and corporates across the artificial intelligence, automotive, digital and financial industries.Coming soon
https://www.maxbartolo.com/ml-index-item/coming-soon-coding-basics/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/coming-soon-coding-basics/Coming soon.Coming soon
https://www.maxbartolo.com/ml-index-item/coming-soon-ml-basics/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/coming-soon-ml-basics/Coming soon.Scalars
https://www.maxbartolo.com/ml-index-item/scalars/
Tue, 26 Feb 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/scalars/A scalar is a single number. Scalars are usually written in italics with lowercase variable names.
$$s = 1.7$$
In Python, a scalar can be represented as a floating point variable.
s = 1.7 print("Scalar {} has type {}".format(s, type(s))) Scalar 1.7 has type <class 'float'> News
https://www.maxbartolo.com/home/news/
Wed, 20 Apr 2016 00:00:00 +0000https://www.maxbartolo.com/home/news/This is the Content of the news sectionPublications
https://www.maxbartolo.com/home/publications/
Wed, 20 Apr 2016 00:00:00 +0000https://www.maxbartolo.com/home/publications/This is the Content of the publications sectionVectors
https://www.maxbartolo.com/ml-index-item/vectors/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/vectors/A vector is an array of scalar numbers. We can identify each individual number by its index in that ordering. Typically, we give vectors lowercase names in bold typeface, such as $\mathbf{x}$.
Individual elements in the vector can be identified by the name in italics with a subscript indicating the element position. Vectors are conventionally $1$-indexed and are typically assumed to be column vectors.
$\mathbf{x} = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix}$Matrices
https://www.maxbartolo.com/ml-index-item/matrices/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/matrices/A matrix is a $2$-dimensional (2D) array of numbers. This means that every element in the matrix is identified by two indices, commonly $i$ representing the row-index and $j$ representing the column-index.
Matrices are usually given uppercase variable names in bold, such as $\mathbf{A}$.
$\mathbf{A} = \begin{bmatrix} A_{1,1} & A_{1,2} \\ A_{2,1} & A_{2,2} \end{bmatrix}$
We use a colon “:” to represent all the elements across an axis. So, $\mathbf{A}_{i,:}$ identifies all the elements in the $i$th row and $\mathbf{A}_{:,j}$ identifies all the elements in the $j$th column.Tensors
https://www.maxbartolo.com/ml-index-item/tensors/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/tensors/In the context of Machine Learning, it is convenient to think of a tensor as an $n$-dimensional array. Tensor dimensionality is also commonly referred to as its order, degree or rank, which formally is the sum of the tensor contravariant and covariant indices.
Scalars are $0$-th order tensors. Vectors can be represented as $1$-dimensional arrays and are therefore $1$st-order tensors. In a fixed basis, a standard linear map that maps a vector to a vector, is represented by a matrix (a $2$-dimensional array) and is therefore a $2$nd-order tensor.Dot (Scalar) Product
https://www.maxbartolo.com/ml-index-item/dot-scalar-product/
Wed, 06 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-index-item/dot-scalar-product/The dot product is an algebraic operation which takes two equal-sized vectors and returns a single scalar (which is why it is sometimes referred to as the scalar product). In Euclidean geometry, the dot product between the Cartesian components of two vectors is often referred to as the inner product.
The dot product is represented by a dot operator: $$s = \mathbf{x} \cdot \mathbf{y}$$
It is defined as: $$s = \mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n}x_iy_i = x_1y_1 + x_2y_2 + … + x_ny_n$$
https://www.maxbartolo.com/news/present-asking-harder-questions/
Mon, 17 Jun 2019 00:00:00 +0000https://www.maxbartolo.com/news/present-asking-harder-questions/Presented Asking Harder Questions at the UCL NLP Inauguration Event followed by a poster session on ShARC.
https://www.maxbartolo.com/news/workshop-ucl-som/
Wed, 15 May 2019 00:00:00 +0000https://www.maxbartolo.com/news/workshop-ucl-som/Delivered a two-part workshop titled Overview of NLP to this year’s UCL MSc Business Analytics cohort.
https://www.maxbartolo.com/news/present-sharc-senlp/
Mon, 15 Apr 2019 00:00:00 +0000https://www.maxbartolo.com/news/present-sharc-senlp/Co-presented Interpretation of Natural Language Rules in Conversational Machine Reading at the South England Natural Language Processing (SENLP) meetup.
https://www.maxbartolo.com/news/workshop-peking-university/
Fri, 12 Apr 2019 00:00:00 +0000https://www.maxbartolo.com/news/workshop-peking-university/Led a workshop titled Introduction to Python and Machine Learning at the Peking University HSBC Business School (PHBS) in Oxford.CCG Supertagging
https://www.maxbartolo.com/ml-terms-item/ccg-supertagging/
Mon, 25 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/ccg-supertagging/Combinatory Categorical Grammar (CCG) Supertagging is a sequence tagging task in NLP. The standard parsing model of Clark and Curran (2007) uses over $400$ lexical categories (or supertags), compared to about $50$ part-of-speech (POS) tags for typical context-free grammar (CFG) parsers (Xu et al., 2017). As an example, take the following sequence of tokens as an input: [I, saw, squirrels, with, nuts], with output: [NP, (S\NP)/NP, NP, (NP\NP)/NP, NP].Order of Magnitude
https://www.maxbartolo.com/ml-terms-item/order-of-magnitude/
Mon, 25 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/order-of-magnitude/In mathematics, if one amount is an order of magnitude larger than another, it is $10$ times larger than the other. If it is two orders of magnitude larger, it is $100$ ($10^2$) times larger.Ablation
https://www.maxbartolo.com/ml-terms-item/ablation/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/ablation/Removal or decreasing of something. Typically used in the context of an Ablation Study in ML where some element of the data or model is removed and performance is compared to the original model (without ablation). The corresponding performance drop (usually) is taken as insight into the contribution of the element or feature which has been ablated over (i.e. removed).Bag-Of-Words (BOW)
https://www.maxbartolo.com/ml-terms-item/bag-of-words-bow/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/bag-of-words-bow/A collection of counts of how many times a word appears in a document (term frequency of raw counts). Information about the order or structure of the words in the sequence is lost. CBOW (Continuous BOW) is a dense vector representation extracted from a model trained to predict a centre word from the surrounding context.Cloze Questions
https://www.maxbartolo.com/ml-terms-item/cloze-questions/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/cloze-questions/Fill-in-the-blank questions with multiple choice. Fill-in-the-blank questions without multiple choice are commonly referred to as open cloze questions. E.g. “John went for a ____ in the park.”Cosine Similarity
https://www.maxbartolo.com/ml-terms-item/cosine-similarity/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/cosine-similarity/A measure of similarity between two vectors based on their orientation. Two vectors with the same orientation have a cosine similarity of $1$ while two orthogonal vectors have a cosine similarity of $0$, irrespective of their magnitudes. It is derived from the Euclidean dot product formula and is given as: $\text{similarity} = \cos(\theta) = \frac{\mathbf{A}\cdot\mathbf{B}}{||\mathbf {A}|| \ ||\mathbf {B} ||}$. If both vectors $\mathbf{A}$ and $\mathbf{B}$ are normalised, then their magnitudes are both $1$ and the product of their magnitudes ($||\mathbf {A}|| \ ||\mathbf {B} ||$) is also $1$.End-to-end learning
https://www.maxbartolo.com/ml-terms-item/end-to-end-learning/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/end-to-end-learning/End-to-end refers to the fact that we are asking the learning algorithm to go directly from the input to the desired output i.e. the learning algorithm directly connects the input end of the system to the output end.F Score
https://www.maxbartolo.com/ml-terms-item/f-score/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/f-score/Commonly refers to the $F_1$ score, however, it’s important to note that the F measure can be weighted towards higher recall than precision (e.g. the $F_2$ score) or higher precision (e.g. $F_{0.5}$). The general formula is $F_\beta = (1+\beta^2) \cdot \frac{precision \cdot recall}{(\beta^2 \cdot precision) + recall}$.F1 Score
https://www.maxbartolo.com/ml-terms-item/f1-score/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/f1-score/The harmonic mean of precision and recall. $F_1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}$.Hamming Distance
https://www.maxbartolo.com/ml-terms-item/hamming-distance/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/hamming-distance/When between strings of equal length, is the minimum number of substitutions for one string to become the other.Hashing Trick (Feature Hashing)
https://www.maxbartolo.com/ml-terms-item/hashing-trick-feature-hashing/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/hashing-trick-feature-hashing/Allows you to use variable-size feature vectors with standard learning algorithms. Essentially, instead of using a vocabulary to define a BOW one-hot vector representation (so each word added to the vocabulary changes the vector size), define a large vector (say $\text{vector_size} = 2^{28}$) and a hashing function modulo vector size ($\% \text{vector_size}$). Then, pass each word into the hashing function and increment the resulting index. In this way, the vector size is fixed for any number of words including new words.Hierarchical Clustering
https://www.maxbartolo.com/ml-terms-item/hierarchical-clustering/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/hierarchical-clustering/Can be Agglomerative (bottom-up) or Divisive (top-down). Standard algorithm takes $O(n^3)$ time complexity and requires $O(n^2)$ memory.Hierarchical Softmax
https://www.maxbartolo.com/ml-terms-item/hierarchical-softmax/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/hierarchical-softmax/Approximating the softmax function by converting to a binary tree. Probabilities are normalised because sum of any path to a leaf is $1$.Imputing Missing Values
https://www.maxbartolo.com/ml-terms-item/imputing-missing-values/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/imputing-missing-values/Filling missing values in the data with approximating values. Common choices include the expected value (mean), most common value for categoricial features or using an ML model to predict the most likely values given some other features in the data.Latent
https://www.maxbartolo.com/ml-terms-item/latent/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/latent/[adjective] Something which is hidden or not directly observable.Levenshtein Distance
https://www.maxbartolo.com/ml-terms-item/levenshtein-distance/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/levenshtein-distance/A string metric for measuring the difference between two strings defined as the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other. For example, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
kitten → sitten (substitution of “s” for “k”) sitten → sittin (substitution of “i” for “e”) sittin → sitting (insertion of “g” at the end).Low-rank Approximation
https://www.maxbartolo.com/ml-terms-item/low-rank-approximation/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/low-rank-approximation/In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank.Manifold
https://www.maxbartolo.com/ml-terms-item/manifold/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/manifold/In 1D, lines or circles (but not figures of 8 because they have crossing points). In 2D, surfaces e.g. sphere. A manifold is a space “modeled” on Euclidean space.NP-Complete
https://www.maxbartolo.com/ml-terms-item/np-complete/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/np-complete/An NP-complete decision problem is one belonging to both the NP and the NP-hard complexity classes. In this context, NP stands for “nondeterministic polynomial time”.Perplexity
https://www.maxbartolo.com/ml-terms-item/perplexity/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/perplexity/Commonly used as an evaluation metric for language models. Perplexity is a measurement of how well a probability distribution or probability model (e.g. language model) predicts a sample. A low perplexity indicates that the probability distribution is good at predicting the sample (therefore lower perplexity is better). Where $H(p)$ is the entropy, perplexity is $2^{H(p)} = 2^{-\sum_x{p(x)\log_2p(x)}}$.Polysemy
https://www.maxbartolo.com/ml-terms-item/polysemy/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/polysemy/[noun] The existence of several meanings in a single word. For example, the word play has different meanings in the following sentences: “I like to play football” and “I went to watch a play”.Precision
https://www.maxbartolo.com/ml-terms-item/precision/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/precision/Precision (also positive predictive value) is the ratio of True Positives ($TP$) to the total number of positive examples in the data (true positives and false positives, $TP + FP$). $\text{precision} = \frac{TP}{TP + FP}$. A model with a perfect precision evaluation means that the model is predicting all positive examples correctly (i.e. no false positives).Recall
https://www.maxbartolo.com/ml-terms-item/recall/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/recall/Recall (also sensitivity or true positive rate) is the ratio of True Positives ($TP$) to the sum of true positives and false negatives ($TP + FN$). $\text{recall} = \frac{TP}{TP + FN}$. A model with a perfect recall evaluation means that the model is predicting all negative examples correctly (i.e. no false negatives).Salient
https://www.maxbartolo.com/ml-terms-item/salient/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/salient/[adjective] Most important or notable. E.g. “He read the salient facts quickly”.Semantic
https://www.maxbartolo.com/ml-terms-item/semantic/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/semantic/[adjective] Semantic is used to describe things that deal with the meanings of words and sentences. E.g. “He did not want to enter into a semantic debate”.Semantic Role Labeling
https://www.maxbartolo.com/ml-terms-item/semantic-role-labeling-srl/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/semantic-role-labeling-srl/“Who did what to whom”. A sequence labeling task which aims to model the predicate-argument structure of a sentence.Seq2seq
https://www.maxbartolo.com/ml-terms-item/seq2seq/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/seq2seq/“Sequence-to-sequence”. The task involves taking a sequence of tokens (usually characters, bytes, words, word pieces, etc) as inputs and outputs a sequence of predictions. Commonly used in an encoder-decoder setup.Sequence Labeling
https://www.maxbartolo.com/ml-terms-item/sequence-labeling/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/sequence-labeling/A task which takes in a sequence of tokens as input and predicts a categorical label corresponding to each input token. Examples of sequence labeling tasks are part-of-speech (POS) tagging and semantic role labeling (SRL).Sharding
https://www.maxbartolo.com/ml-terms-item/sharding/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/sharding/A type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.Soft Cosine Similarity
https://www.maxbartolo.com/ml-terms-item/soft-cosine-similarity/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/soft-cosine-similarity/Cosine similarity matrix re-weighted using some other distance such as Levenshtein distance.Softmax
https://www.maxbartolo.com/ml-terms-item/softmax/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/softmax/Normalized exponential function ($\frac{e^x}{\sum_ie^{x_i}}$). Takes a vector as an input and outputs a vector in which the highest input values have been pushed towards $1$ and the lowest input values have been pushed towards $0$. The sum of all the elements in the softmaxed vector is $1$ (i.e. normalized).Syntactic
https://www.maxbartolo.com/ml-terms-item/syntactic/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/syntactic/[adjective] Syntactic means relating to syntax i.e. the arrangement of and relationships among words, phrases, and clauses forming sentences; sentence structure.Wasserstein Distance
https://www.maxbartolo.com/ml-terms-item/wasserstein-distance/
Wed, 20 Mar 2019 00:00:00 +0000https://www.maxbartolo.com/ml-terms-item/wasserstein-distance/Also referred to as “earth mover’s” distance. If each distribution is viewed as a unit amount of dirt piled on $M$, the metric is the minimum cost of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved multiplied by the distance it has to travel.
https://www.maxbartolo.com/news/started-phd-at-ucl/
Mon, 14 Jan 2019 00:00:00 +0000https://www.maxbartolo.com/news/started-phd-at-ucl/I have started a PhD at UCL under the guidance of Pontus Stenetorp and Sebastian Riedel.
https://www.maxbartolo.com/news/bloomsbury-emnlp-2018/
Sat, 03 Nov 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-emnlp-2018/Presented Interpretation of Natural Language Rules in Conversational Machine Reading at EMNLP together with Patrick Lewis and other co-authors.
https://www.maxbartolo.com/news/bloomsbury-sharc-launch/
Thu, 25 Oct 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-sharc-launch/The ShARC dataset from our EMNLP ‘18 paper is now live!Interpretation of Natural Language Rules in Conversational Machine Reading
https://www.maxbartolo.com/publication/interpretation-natural-language-rules/
Sat, 01 Sep 2018 00:00:00 +0000https://www.maxbartolo.com/publication/interpretation-natural-language-rules/Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader’s background knowledge. One example is the task of interpreting regulations to answer “Can I…?” or “Do I have to…?
https://www.maxbartolo.com/news/bloomsbury-ai-joins-facebook/
Fri, 31 Aug 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-ai-joins-facebook/Bloomsbury AI has joined Facebook to strengthen its efforts in natural language processing research.
https://www.maxbartolo.com/news/bloomsbury-cape-triviaqa/
Wed, 22 Aug 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-cape-triviaqa/Cape (open source) is the new state-of-the-art for open-domain question answering on TriviaQA.
https://www.maxbartolo.com/news/bloomsbury-cape-launch/
Fri, 17 Aug 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-cape-launch/Our large-scale question answering system, Cape, is now available open source!
https://www.maxbartolo.com/news/bloomsbury-emnlp-2018-accepted/
Fri, 10 Aug 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-emnlp-2018-accepted/Interpretation of Natural Language Rules in Conversational Machine Reading has been accepted at EMNLP.
https://www.maxbartolo.com/news/bloomsbury-fuse/
Wed, 25 Apr 2018 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-fuse/We’ve been accepted into the Allen & Overy Fuse accelerator programme.ML Index
https://www.maxbartolo.com/ml-index/
Mon, 01 Jan 2018 00:00:00 +0000https://www.maxbartolo.com/ml-index/ML Index is a collection of machine-learning terms, explained in brief and demonstrated with code wherever possible. Suggestions and discussion are appreciated and encouraged.
All content is available in Jupyter Notebook format on GitHub. If you find it useful, share the love by starring the repo or contributing!ML Terms
https://www.maxbartolo.com/ml-terms/
Mon, 01 Jan 2018 00:00:00 +0000https://www.maxbartolo.com/ml-terms/
https://www.maxbartolo.com/news/bloomsbury-meetup-presentation/
Wed, 22 Nov 2017 00:00:00 +0000https://www.maxbartolo.com/news/bloomsbury-meetup-presentation/Invited presentation of the work we’re doing at Bloomsbury AI at the A Common Language for Intelligence meet-up hosted by Grakn AI.
https://www.maxbartolo.com/news/join-bloomsbury/
Fri, 12 May 2017 00:00:00 +0000https://www.maxbartolo.com/news/join-bloomsbury/I have joined NLP-focused startup Bloomsbury AI, working on open-domain question answering.News Archive
https://www.maxbartolo.com/news/
Mon, 01 Jan 0001 00:00:00 +0000https://www.maxbartolo.com/news/