Source code for ibeis.algo.hots.automated_helpers

# -*- coding: utf-8 -*-
"""
Idea:
    what about the probability of a descriptor match being a score like in SIFT.
    we can learn that too.

Have:
    * semantic and visual uuids
    * Test that accepts unknown annotations one at a time and
      for each runs query, makes decision about name, and executes decision.
    * As a placeholder for exemplar decisions  an exemplar is added if
      number of exemplars per name is less than threshold.
    * vs-one reranking query mode
    * test harness but start with larger test set
    * vs-one score normalizer ~~/ score normalizer for different values of K * / different params~~
      vs-many score normalization doesnt actually matter. We just need the ranking.
    * need to add in the multi-indexer code into the pipeline. Need to
      decide which subindexers to load given a set of daids
    * need to use set query as an exemplar if its vs-one reranking scores
      are below a threshold
    * flip the vsone ratio score so its < .8 rather than > 1.2 or whatever
    * start from nothing and let the system make the first few decisions correctly
    * tell me the correct answer in the automated test
    * turn on multi-indexing. (should just work..., probably bugs though. Just need to throw the switch)
    * paramater to only add exemplar if post-normlized score is above a threshold
    * ensure vsone ratio test is happening correctly
    * normalization gets a cfgstr based on the query
    * need to allow for scores to be un-invalidatd post spatial verification
      e.g. when the first match initially is invalidated through
      spatial verification but the next matches survive.
    * keep distinctiveness weights from vsmany for vsone weighting
      basically involves keeping weights from different filters and not
      aggregating match weights until the end.
    * Put test query mode into the main application and work on the interface for it.
    * add matches to multiple animals (merge)
    * update normalizer (have setup the datastructure to allow for it need to integrate it seemlessly)
    * score normalization update. on add the new support data, reapply bayes
     rule, and save to the current cache for a given algorithm configuration.
    * spawn background process to reindex chunks of data


TODO:
    * Improve vsone scoring.
    * test case where there is a 360 view that is linkable from the tests case
    * ~~Remember name_confidence of decisions for manual review~~ Defer

Tasks:

    Algorithm::
        * Incremental query needs to handle
            - test mode and live mode
            - normalizer update
            - use correct distinctivenes score in vsone
            - tested application of distinctiveness, foreground, ratio,
                spatial_verification, vsone verification, and score
                normalization.

        * Mathematically formal description of the space of choices
            - getting the proability of each choice will give us a much better
                confidence measure for our decision. An example of a probability
                partition might be .2 - merge with rank1.  .2 merge with rank 2, .5
                merge with rank1 and rank2, .1 others

        * Improved automated exemplar decision mechanism

        * Improved automated name decision mechanism

     SQL::
         * New Image Columns
             - image_posix_timedelta

         * New Name Columns
             - name_temp_flag
             - name_alias_text

             - name_uuid
             - name_visual_uuid
             - name_member_annot_rowids_evalstr
             - name_member_num_annot_rowids

         * New ImageSet Columns
             - imageset_start_time
             - imageset_end_time
             - imageset_lat
             - imageset_lon
             - imageset_processed_flag
             - imageset_shipped_flag

    Decision UIs::
        * Query versus top N results
            - ability to draw an undirected edge between the query and any number of
                results. ie create a match any of the top results
            - a match to more than one results should by default merge the two names
                (this involves a name enhancement subtask). trigger a split / merge dialog
        * Is Exemplar
            - allows for user to set the exemplars for a given name
        * Name Progress
            - Shows the current name matching progress
        * Split
            - Allows a user to split off some images from a name into a new name
              or some other name.
        * Merge
            - Allows a user to join two names.


    GUI::
        * NameTree needs to not refresh unless absolutely necessary
        * Time Sync
        * ImageSet metadata sync from the SMART
        * Hide shipped imagesets
            - put flag to turn them on
        * Mark processed imagesets
        * Gui naturally ensures that all annotations in the query belong
           to the same species
        * Garbage collection function that removes all non-exemplar
          information from imagesets that have been shipped.
        * Spawn process that reindexes large chunks of descriptors as the
          database grows.


LONG TERM TASKS:

    Architecture:
        * Pipeline needs
            - DEFER: a move from dict based representation to list based
            - DEFER: spatial verification cyth speedup
            - DEFER: nearest neighbor (based on visual uuid caching) caching

    Controller:
         * LONGTERM: AutogenController
             - register data convertors for verts / other eval columns. Make
               several convertors standard and we can tag those columns to
               autogenerate their functions.
             - be able to mark a column as determined by the aggregate of other
               columns. Then the data is either generated on the fly, or it is
               cached and the necessary book-keeping functions are
               autogenerated.

    Decision UIs::
        * Is Exemplar
            - LONG TERM: it would be cool if they were visualized by using
              networkx or some gephi like program and clustered by match score.

"""
from __future__ import absolute_import, division, print_function, unicode_literals
import utool as ut
import six
from six.moves import input
print, rrr, profile = ut.inject2(__name__, '[autohelp]')


[docs]def assert_testdb_annot_consistency(ibs_gt, ibs2, aid_list1, aid_list2): """ just tests uuids if anything goes wrong this should fix it: from ibeis.other import ibsfuncs aid_list1 = ibs_gt.get_valid_aids() ibs_gt.update_annot_visual_uuids(aid_list1) ibs2.update_annot_visual_uuids(aid_list2) ibsfuncs.fix_remove_visual_dupliate_annotations(ibs_gt) """ assert len(aid_list2) == len(aid_list1) visualtup1 = ibs_gt.get_annot_visual_uuid_info(aid_list1) visualtup2 = ibs2.get_annot_visual_uuid_info(aid_list2) _visual_uuid_list1 = [ut.augment_uuid(*tup) for tup in zip(*visualtup1)] _visual_uuid_list2 = [ut.augment_uuid(*tup) for tup in zip(*visualtup2)] assert ut.hashstr(visualtup1) == ut.hashstr(visualtup2) ut.assert_lists_eq(visualtup1[0], visualtup2[0]) ut.assert_lists_eq(visualtup1[1], visualtup2[1]) ut.assert_lists_eq(visualtup1[2], visualtup2[2]) #semantic_uuid_list1 = ibs_gt.get_annot_semantic_uuids(aid_list1) #semantic_uuid_list2 = ibs2.get_annot_semantic_uuids(aid_list2) visual_uuid_list1 = ibs_gt.get_annot_visual_uuids(aid_list1) visual_uuid_list2 = ibs2.get_annot_visual_uuids(aid_list2) # make sure visual uuids are still determenistic ut.assert_lists_eq(visual_uuid_list1, visual_uuid_list2) ut.assert_lists_eq(_visual_uuid_list1, visual_uuid_list1) ut.assert_lists_eq(_visual_uuid_list2, visual_uuid_list2) if ut.VERBOSE: ibs1_dup_annots = ut.debug_duplicate_items(visual_uuid_list1) ibs2_dup_annots = ut.debug_duplicate_items(visual_uuid_list2) else: ibs1_dup_annots = ut.find_duplicate_items(visual_uuid_list1) ibs2_dup_annots = ut.find_duplicate_items(visual_uuid_list2) # if these fail try ibsfuncs.fix_remove_visual_dupliate_annotations assert len(ibs1_dup_annots) == 0 assert len(ibs2_dup_annots) == 0
@profile
[docs]def ensure_testdb_clean_data(ibs_gt, ibs2, aid_list1, aid_list2): """ removes previously set names and exemplars """ # Make sure that there are not any names in this database nid_list2 = ibs2.get_annot_name_rowids(aid_list2, distinguish_unknowns=False) print('Removing names from the incremental test database') if not ut.list_all_eq_to(nid_list2, 0): ibs2.set_annot_name_rowids(aid_list2, [ibs2.UNKNOWN_NAME_ROWID] * len(aid_list2)) ibs2.delete_names(ibs2._get_all_known_name_rowids()) #exemplarflag_list2 = ibs2.get_annot_exemplar_flags(aid_list2) #if not ut.list_all_eq_to(exemplarflag_list2, 0): print('Unsetting all exemplars from database') ibs2.set_annot_exemplar_flags(aid_list2, [False] * len(aid_list2)) # this test is for plains #assert ut.list_all_eq_to(ibs2.get_annot_species_texts(aid_list2), 'zebra_plains') ibs2.delete_empty_nids()
[docs]def annot_testdb_consistency_checks(ibs_gt, ibs2, aid_list1, aid_list2): try: assert_testdb_annot_consistency(ibs_gt, ibs2, aid_list1, aid_list2) except Exception as ex: # update and try again on failure ut.printex(ex, ('warning: consistency check failed.' 'updating and trying once more'), iswarning=True) ibs_gt.update_annot_visual_uuids(aid_list1) ibs2.update_annot_visual_uuids(aid_list2) assert_testdb_annot_consistency(ibs_gt, ibs2, aid_list1, aid_list2)
[docs]def interactive_commandline_prompt(msg, decisiontype): prompt_fmtstr = ut.codeblock( ''' Accept system {decisiontype} decision? ========== {msg} ========== * press ENTER to ACCEPT * enter {no_phrase} to REJECT * enter {embed_phrase} to embed into ipython * any other inputs ACCEPT system decision * (input is case insensitive) ''' ) ans_list_embed = ['cmd', 'ipy', 'embed'] ans_list_no = ['no', 'n'] #ans_list_yes = ['yes', 'y'] prompt_str = prompt_fmtstr.format( no_phrase=ut.conj_phrase(ans_list_no), embed_phrase=ut.conj_phrase(ans_list_embed), msg=msg, decisiontype=decisiontype, ) prompt_block = ut.msgblock('USER_INPUT', prompt_str) ans = input(prompt_block).lower() if ans in ans_list_embed: ut.embed() #print(ibs2.get_dbinfo_str()) #qreq_ = ut.search_stack_for_localvar('qreq_') #qreq_.normalizer elif ans in ans_list_no: return False else: return True
[docs]def make_incremental_test_database(ibs_gt, aid_list1, reset): """ Makes test database. adds image and annotations but does not transfer names. if reset is true the new database is gaurenteed to be built from a fresh start. Args: ibs_gt (IBEISController): aid_list1 (list): reset (bool): if True the test database is completely rebuilt Returns: IBEISController: ibs2 """ import ibeis print('make_incremental_test_database. reset=%r' % (reset,)) aids1_hashid = ut.hashstr_arr(aid_list1) prefix = '_INCTEST_' + aids1_hashid + '_' dbname2 = prefix + ibs_gt.get_dbname() ibs2 = ibeis.opendb(dbname2, allow_newdir=True, delete_ibsdir=reset, use_cache=False) # reset if flag specified or no data in ibs2 if reset or len(ibs2.get_valid_gids()) == 0: assert len(ibs2.get_valid_aids()) == 0 assert len(ibs2.get_valid_gids()) == 0 assert len(ibs2.get_valid_nids()) == 0 # Get annotations and their images from database 1 gid_list1 = ibs_gt.get_annot_gids(aid_list1) gpath_list1 = ibs_gt.get_image_paths(gid_list1) # Add all images from database 1 to database 2 gid_list2 = ibs2.add_images(gpath_list1, auto_localize=False) # Image UUIDS should be consistent between databases image_uuid_list1 = ibs_gt.get_image_uuids(gid_list1) image_uuid_list2 = ibs2.get_image_uuids(gid_list2) assert image_uuid_list1 == image_uuid_list2 ut.assert_lists_eq(image_uuid_list1, image_uuid_list2) return ibs2
@profile
[docs]def setup_incremental_test(ibs_gt, clear_names=True, aid_order='shuffle'): r""" CommandLine: python -m ibeis.algo.hots.automated_helpers --test-setup_incremental_test:0 python dev.py -t custom --cfg codename:vsone_unnorm --db PZ_MTEST --allgt --vf --va python dev.py -t custom --cfg codename:vsone_unnorm --db PZ_MTEST --allgt --vf --va --index 0 4 8 --verbose Example: >>> # DISABLE_DOCTEST >>> from ibeis.algo.hots.automated_helpers import * # NOQA >>> import ibeis # NOQA >>> ibs_gt = ibeis.opendb('PZ_MTEST') >>> ibs2, aid_list1, aid1_to_aid2 = setup_incremental_test(ibs_gt) Example: >>> # DISABLE_DOCTEST >>> from ibeis.algo.hots.automated_helpers import * # NOQA >>> import ibeis # NOQA >>> ibs_gt = ibeis.opendb('GZ_ALL') >>> ibs2, aid_list1, aid1_to_aid2 = setup_incremental_test(ibs_gt) """ print('\n\n---- SETUP INCREMENTAL TEST ---\n\n') # Take a known dataase # Create an empty database to test in ONLY_GT = True if ONLY_GT: # use only annotations that will have matches in test aid_list1_ = ibs_gt.get_aids_with_groundtruth() else: # use every annotation in test aid_list1_ = ibs_gt.get_valid_aids() if ut.get_argflag('--gzdev'): # Use a custom selection of gzall from ibeis.algo.hots import devcases assert ibs_gt.get_dbname() == 'GZ_ALL', 'not gzall' vuuid_list, ignore_vuuids = devcases.get_gzall_small_test() # TODO; include all names of these annots too aid_list = ibs_gt.get_annot_aids_from_visual_uuid(vuuid_list) ignore_aid_list = ibs_gt.get_annot_aids_from_visual_uuid(ignore_vuuids) ignore_nid_list = ibs_gt.get_annot_nids(ignore_aid_list) ut.assert_all_not_None(aid_list) other_aids = ut.flatten(ibs_gt.get_annot_groundtruth(aid_list)) aid_list.extend(other_aids) aid_list = sorted(set(aid_list)) nid_list = ibs_gt.get_annot_nids(aid_list) isinvalid_list = [nid in ignore_nid_list for nid in nid_list] print('Filtering %r annots specified to ignore' % (sum(isinvalid_list),)) aid_list = ut.filterfalse_items(aid_list, isinvalid_list) #ut.embed() aid_list1_ = aid_list #ut.embed() # Add aids in a random order VALID_ORDERS = ['shuffle', 'stagger', 'same'] #AID_ORDER = 'shuffle' aid_order = ut.get_argval('--aid-order', default=aid_order) assert VALID_ORDERS.index(aid_order) > -1 if aid_order == 'shuffle': aid_list1 = ut.deterministic_shuffle(aid_list1_[:]) elif aid_order == 'stagger': from six.moves import zip_longest, filter aid_groups, unique_nid_list = ibs_gt.group_annots_by_name(aid_list1_) def stagger_group(list_): return ut.filter_Nones(ut.iflatten(zip_longest(*list_))) aid_multiton_group = list(filter(lambda aids: len(aids) > 1, aid_groups)) aid_list1 = stagger_group(aid_multiton_group) pass elif aid_order == 'same': aid_list1 = aid_list1_ # If reset is true the test database is started completely from scratch reset = ut.get_argflag('--reset') aid1_to_aid2 = {} # annotation mapping ibs2 = make_incremental_test_database(ibs_gt, aid_list1, reset) # Preadd all annotatinos to the test database aids_chunk1 = aid_list1 aid_list2 = add_annot_chunk(ibs_gt, ibs2, aids_chunk1, aid1_to_aid2) #ut.embed() # Assert annotation visual uuids are in agreement if ut.DEBUG2: annot_testdb_consistency_checks(ibs_gt, ibs2, aid_list1, aid_list2) # Remove names and exemplar information from test database if clear_names: ensure_testdb_clean_data(ibs_gt, ibs2, aid_list1, aid_list2) # Preprocess features before testing ibs2.ensure_annotation_data(aid_list2, featweights=True) return ibs2, aid_list1, aid1_to_aid2
[docs]def check_results(ibs_gt, ibs2, aid1_to_aid2, aids_list1_, incinfo): """ reports how well the incremental query ran when the oracle was calling the shots. """ print('--------- CHECKING RESULTS ------------') testcases = incinfo.get('testcases') if testcases is not None: count_dict = ut.count_dict_vals(testcases) print('+--') #print(ut.dict_str(testcases)) print('---') print(ut.dict_str(count_dict)) print('L__') # TODO: dont include initially added aids in the result reporting aid_list1 = aids_list1_ # ibs_gt.get_valid_aids() #aid_list1 = ibs_gt.get_aids_with_groundtruth() aid_list2 = ibs2.get_valid_aids() nid_list1 = ibs_gt.get_annot_nids(aid_list1) nid_list2 = ibs2.get_annot_nids(aid_list2) # Group annotations from test and gt database by their respective names grouped_dict1 = ut.group_items(aid_list1, nid_list1) grouped_dict2 = ut.group_items(aid_list2, nid_list2) grouped_aids1 = list(six.itervalues(grouped_dict1)) grouped_aids2 = list(map(tuple, six.itervalues(grouped_dict2))) #group_nids1 = list(six.iterkeys(grouped_dict1)) #group_nids2 = list(six.iterkeys(grouped_dict2)) # Transform annotation ids from database1 space to database2 space grouped_aids1_t = [tuple(ut.dict_take_list(aid1_to_aid2, aids1)) for aids1 in grouped_aids1] set_grouped_aids1_t = set(grouped_aids1_t) set_grouped_aids2 = set(grouped_aids2) # Find names we got right. (correct groupings of annotations) # these are the annotation groups that are intersecting between # the test database and groundtruth database perfect_groups = set_grouped_aids2.intersection(set_grouped_aids1_t) # Find names we got wrong. (incorrect groupings of annotations) # The test database sets that were not perfect nonperfect_groups = set_grouped_aids2.difference(perfect_groups) # What we should have got # The ground truth database sets that were not fully identified missed_groups = set_grouped_aids1_t.difference(perfect_groups) # Mark non perfect groups by their error type false_negative_groups = [] # failed to link enough false_positive_groups = [] # linked too much for nonperfect_group in nonperfect_groups: if ut.is_subset_of_any(nonperfect_group, missed_groups): false_negative_groups.append(nonperfect_group) else: false_positive_groups.append(nonperfect_group) # Get some more info on the nonperfect groups # find which groups should have been linked aid2_to_aid1 = ut.invert_dict(aid1_to_aid2) false_negative_groups_t = [tuple(ut.dict_take_list(aid2_to_aid1, aids2)) for aids2 in false_negative_groups] false_negative_group_nids_t = ibs_gt.unflat_map(ibs_gt.get_annot_nids, false_negative_groups_t) assert all(map(ut.allsame, false_negative_group_nids_t)), 'inconsistent nids' false_negative_group_nid_t = ut.get_list_column(false_negative_group_nids_t, 0) # These are the links that should have been made missed_links = ut.group_items(false_negative_groups, false_negative_group_nid_t) print(ut.dict_str(missed_links)) print('# Name with failed links (FN) = %r' % len(false_negative_groups)) print('... should have reduced to %d names.' % (len(missed_links))) print('# Name with wrong links (FP) = %r' % len(false_positive_groups)) print('# Name correct names (TP) = %r' % len(perfect_groups)) #ut.embed()
@profile
[docs]def add_annot_chunk(ibs_gt, ibs2, aids_chunk1, aid1_to_aid2): """ adds annotations to the tempoarary database and prevents duplicate additions. aids_chunk1 = aid_list1 Args: ibs_gt (IBEISController): ibs2 (IBEISController): aids_chunk1 (list): aid1_to_aid2 (dict): Returns: list: aids_chunk2 """ # Visual info guuids_chunk1 = ibs_gt.get_annot_image_uuids(aids_chunk1) verts_chunk1 = ibs_gt.get_annot_verts(aids_chunk1) thetas_chunk1 = ibs_gt.get_annot_thetas(aids_chunk1) # Non-name semantic info species_chunk1 = ibs_gt.get_annot_species_texts(aids_chunk1) gids_chunk2 = ibs2.get_image_gids_from_uuid(guuids_chunk1) ut.assert_all_not_None(aids_chunk1, 'aids_chunk1') ut.assert_all_not_None(guuids_chunk1, 'guuids_chunk1') try: ut.assert_all_not_None(gids_chunk2, 'gids_chunk2') except Exception as ex: #index = ut.get_first_None_position(gids_chunk2) #set(ibs2.get_valid_gids()).difference(set(gids_chunk2)) ut.printex(ex, keys=['gids_chunk2']) #ut.embed() #raise # Add this new unseen test case to the database aids_chunk2 = ibs2.add_annots(gids_chunk2, species_list=species_chunk1, vert_list=verts_chunk1, theta_list=thetas_chunk1, prevent_visual_duplicates=True) def register_annot_mapping(aids_chunk1, aids_chunk2, aid1_to_aid2): """ called by add_annot_chunk """ # Should be 1 to 1 for aid1, aid2 in zip(aids_chunk1, aids_chunk2): if aid1 in aid1_to_aid2: assert aid1_to_aid2[aid1] == aid2 else: aid1_to_aid2[aid1] = aid2 # Register the mapping from ibs_gt to ibs2 register_annot_mapping(aids_chunk1, aids_chunk2, aid1_to_aid2) print('Added: aids_chunk2=%s' % (ut.truncate_str(repr(aids_chunk2), maxlen=60),)) return aids_chunk2
if __name__ == '__main__': """ CommandLine: python -m ibeis.algo.hots.automated_helpers python -m ibeis.algo.hots.automated_helpers --allexamples python -m ibeis.algo.hots.automated_helpers --allexamples --noface --nosrc """ import multiprocessing multiprocessing.freeze_support() # for win32 import utool as ut # NOQA ut.doctest_funcs()