from sklearn.linear_model import LogisticRegression
from modAL.models import ActiveLearner
X = pd.DataFrame([[1],[2],[3]])
y = pd.Series([True, False, False])
my_learner = ActiveLearner(estimator=LogisticRegression(), X_training=X, y_training=y)
df = pd.concat([X]*2000)
query_idx, _ = my_learner.query(df, n_instances=100)
KeyError: "None of [Int64Index([1665, 1662, 5412, 3399, 1758, 4866, 1755, 3402, 1752, 5415, 3405,\n 1749, 1746, 3408, 1743, 5418, 4863, 1740, 3411, 1737, 3414, 1734,\n 5421, 1731, 3417, 1728, 4860, 3420, 1725, 5424, 1722, 3423, 1719,\n 3426, 1716, 5427, 1713, 4857, 3429, 1710, 3432, 1707, 5430, 1704,\n 3435, 1701, 4854, 1698, 5433, 3438, 1695, 3441, 1692, 1689, 5436,\n 3444, 1686, 4851, 1683, 3447, 1680, 5439, 3450, 1677, 1674, 3453,\n 1671, 5442, 4848, 1668, 3456, 1764, 3459, 5469, 1587, 3492, 1608,\n 5463, 3495, 1605, 1602, 3498, 1599, 5466, 4833, 1596, 3501, 1593,\n 3504, 1590, 4836, 1575, 3513, 3519, 4827, 1569, 5475, 1572, 3516,\n 1614],\n dtype='int64')] are in the [columns]"
/databricks/python/lib/python3.7/site-packages/modAL/uncertainty.py in uncertainty_sampling(classifier, X, n_instances, random_tie_break, **uncertainty_measure_kwargs)
157 query_idx = shuffled_argmax(uncertainty, n_instances=n_instances)
158
--> 159 return query_idx, X[query_idx]
...
query_idx, _ = my_learner.query(X, n_instances=1)
This simple example:
yields:
at:
It works fine with a smaller input, like:
It seems like
query_idxis an array for smaller input, but a different index representation,Int64Indexwhen the number of instances or input is large. And then that can't be used for indexing rows in X.Is it possible that this needs to be
X.iloc[query_idx]? I don't really know enough pandas to know for sure. Thanks!